mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2017-07-27, 21:48   #23
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

261568 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
3. Bursty hardware requirements.
Exactly!

Madpoo, how else could you spin up exactly the amount of kit you needed, use it for exactly what you needed, and then shut it down?

I really like that I can outsource the cold times in the machine room....
chalsall is offline   Reply With Quote
Old 2017-07-27, 23:00   #24
GP2
 
GP2's Avatar
 
Sep 2003

1010000111102 Posts
Default

Quote:
Originally Posted by Madpoo View Post
I don't think people will be swapping out their home machines (or work desktops, school computers, etc) for any typical kind of cloud solution.
https://aws.amazon.com/workspaces/

"Amazon WorkSpaces is a fully managed, secure Desktop-as-a-Service (DaaS) solution which runs on AWS. With Amazon WorkSpaces, you can easily provision virtual, cloud-based Microsoft Windows desktops for your users, ..."

If you can imagine it, there's an -as-a-Service version of it.
GP2 is offline   Reply With Quote
Old 2017-07-28, 18:32   #25
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

65168 Posts
Default

Quote:
Originally Posted by chalsall View Post
Exactly!

Madpoo, how else could you spin up exactly the amount of kit you needed, use it for exactly what you needed, and then shut it down?

I really like that I can outsource the cold times in the machine room....
Yeah, okay, I realized after my mini-rant that infrequent requirements for more juice is a good use case. I was only thinking of long term, stable trend, usage.

In fact, where I work is a very good example... not my own department, but a different one, has extremely seasonal traffic patterns where for a few months of the year they really need to ramp up the # of web servers. In their case a cloud (or hybrid cloud, if you were to ask me) makes good sense, where you can add in another 20-30 servers just for a short period of time.

But yeah, to go back to my analogy of people outsourcing other stuff like yard or car maintenance... yeah, you can pay someone to mow your lawn rather than buy a lawnmower, or pay to get your oil changed rather than invest in a few tools. Even I usually take my car in rather than do it myself... when the cost difference is only a few bucks each time I do it, it doesn't seem as bad.

But when the cost differential is potentially millions of dollars per year whether you "roll your own" or use Azure/AWS/whatever, then it begins to matter.

In the case of our own beloved Primenet, when we were looking at new hardware, I did look at the cloud offerings at the time (AWS and Azure as well as a few smaller companies). To match the storage, memory and core count of the server we ended up with would have been in excess of $1000-$1200 a month, and even then I think that was some lowball hardware.

As it is, Primenet currently has dual 4-core Xeons (with hyperthreading), 54GB of RAM, and it uses somewhere in the neighborhood of 120GB although that grows depending on tempdb sizes and the frequency of the backups (tx log growth between full backups)., which also includes the size of the web files and misc data.

Conservatively I would have spec'd something with 250GB+... with some re-architecting it could even work with an Azure DB instance and an app server for the web rather than as a big chunky VM, and maybe pay about the same as the hosting costs are, but for a smaller DB instance and only a couple cores on the app server and limited RAM/storage.

At some point I could maybe see where hosting costs fall enough where your 2-3 year costs aren't terribly different from what you'd get in the cloud, plus enjoying the benefits of cloud resources getting more powerful for the same cost over those same years (more cores/ram/storage at the same price points). It's not there yet, but it could be in time.

Anyway, back to the point of this, I don't really know if home users are going to adopt the cloud in any significant way... I don't see someone saying "nah, I don't need to pay $600 for a new computer I'll use for 3-4 years... I'd rather pay $20 a month and still have to buy some kind of basic machine anyway to connect to the internet" LOL

Businesses might, home users, doubtful. Businesses are already using things like Dropbox or Google Drive for storage so you really can get away with having smaller drives on the desktop. Or a cheaper CPU makes sense if you're using Visual Studio Online for development and using cloud systems for testing code. I don't even care that my work desktop is (relatively) slow because I spend most of my time connected to servers to run SQL queries or run more intensive tasks, etc. A cheap system works fine for me even though they keep saying I can order a fancy, expensive laptop...
Madpoo is offline   Reply With Quote
Old 2017-07-28, 20:20   #26
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

24×199 Posts
Default

Quote:
Originally Posted by Madpoo View Post
adopt the cloud in any significant way...
I saw the perfect migration strategy posted on Twitter recently:

Quote:
How to make a monolithic app cloud native:
1) run it in a docker
2) change the url from .com to .io
https://twitter.com/pczarkowski/stat...62299643641856
Mark Rose is offline   Reply With Quote
Old 2017-07-28, 23:39   #27
GP2
 
GP2's Avatar
 
Sep 2003

2·5·7·37 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
How to make a monolithic app cloud native:
1) run it in a docker
An alternative to the "How-to guide for running LL tests on the Amazon EC2 cloud", which involves a number of preliminary setup steps, would be to run mprime using AWS Batch, which involves using Docker containers.

When AWS Batch was first introduced around the beginning of the year, it was only available in us-east-1 (Northern Virginia), where the spot instance prices were more expensive, so I didn't really explore that option. But as of a month ago, it's available in a bunch of regions including us-east-2 (Ohio) where spot instances are cheaper.

Are any of you out there familiar with Docker and how to set things up to run mprime on a single exponent and then terminate, within a Docker container? Should be very simple in principle, but I've never gotten around to playing with containers.
GP2 is offline   Reply With Quote
Old 2017-08-02, 01:18   #28
vsuite
 
Jan 2010

11100102 Posts
Default

Quote:
Originally Posted by preda View Post
The classical way to "validate" an LL result is the double check. If two independent LLs produce the same result it is extremely unlikely that the result is wrong. (because the space of the LL results is huge, even the space of 64-bit residues is huge, and assuming a mostly uniform distribution of wrong results over this space, the probability of two erroneous LL matching "by chance" is v. small).

But what if my GPU, for some big exponent range, displays a reliability of 20%? than most of the results would be wrong. Even if later disproved by double-checks, I would call the work of this GPU useless or even negative.

The situation changes radically if the GPU itself applies iterative double-checking. For example, it would double check every iteration at every step along the way.

The probability of an individual iteration being correct is extremely high (e.g. 0.99999998 for the previous example 20% reliability at 80M exponent). If the results of running the iteration twice [with different offsets] match than we are "sure" the iteration result is correct.

Thus from a "bad" GPU we get extremely reliable LL results. I would argue such a result, let's call it "iterativelly self-double-checked" is almost as strong as an independent double-check. It does take twice the work -- though in this aspect it's not different from a double-checked LL (twice the work as well).
Can this iterative double checking be programmed into CudaLucas or would that be prohibitively complex?
vsuite is offline   Reply With Quote
Old 2017-08-02, 05:55   #29
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default

Quote:
Originally Posted by vsuite View Post
Can this iterative double checking be programmed into CudaLucas or would that be prohibitively complex?
It probably could be done, but it doesn't occur to me as a great allocation of programming talent. A quick look at my LL results in the past year that were double checks or double-checked indicates GPU reliability in LL tests is comparable to prime95. One gpu was identified as having memory errors. It was demoted to P-1 and then became unreliable at that also. It seems to be ok doing trial factoring. The memory errors were found to be reproducibly occurring in the same blocks from one memtest to another, at a rate that was almost unaffected by clock changes. (Test all memory possible!)

A doubled run by the same hardware and software is not the same as a double check. Presumably what has been proposed involves repeating the same iteration set with the same offset. An algorithmic error that only shows up in special cases could be duplicated, with both runs wrong. Remember the pentium fdiv bug? It only showed up for a small subset of the possible operands. But it was highly reproducible. You could run the calculations a hundred times on a bug pentium and get the same wrong results. It was discovered by someone doing number theory calculations. https://en.wikipedia.org/wiki/Pentium_FDIV_bug Standard cpu or GPU LL tests in prime95 or CUDALucas are done with pseudorandom offsets, almost certainly different, applied to both first-time check and double-check.
kriesel is online now   Reply With Quote
Old 2017-08-07, 17:50   #30
error
 
Sep 2014

23 Posts
Default

Quote:
Originally Posted by preda View Post
It appears one of my GPUs recently became less reliable than before -- once in a while (about every 12hours) I get "Error is too large; retrying", with the retry producing a different, plausible-looking result, and it keeps going from there.
It is possible to make a very simple check at any iteration by computing the Jacobi symbol of the residue minus 2 (Res-2/Mp).

All valid LL-residues for Mp will have (Res-2/Mp) = -1

If you make a random error at some iteration, you have 50% change of getting all subsequent residues with (Res-2/Mp) = 1

This condition can be checked at any suitable interval, since it won't change back again, if no further errors occur.

Checking at every iteration would catch 75% of random errors but this would take more computing power.

(This may already be a standard check in all LL testing software - I have no idea.)
error is offline   Reply With Quote
Old 2017-08-07, 23:48   #31
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22×3×112 Posts
Default

Quote:
Originally Posted by error View Post
It is possible to make a very simple check at any iteration by computing the Jacobi symbol of the residue minus 2 (Res-2/Mp).

All valid LL-residues for Mp will have (Res-2/Mp) = -1
How do you compute the Jacobi symbol in this context?
preda is offline   Reply With Quote
Old 2017-08-08, 05:33   #32
error
 
Sep 2014

278 Posts
Default

Quote:
Originally Posted by preda View Post
How do you compute the Jacobi symbol in this context?
You mean it is too tedious?

You could do that at every 10^6 iterations or whatever with marginal overall cost and still save a lot of time potentially. If it flips anywhere in between, you have a chance of spotting that.
error is offline   Reply With Quote
Old 2017-08-08, 05:48   #33
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

26548 Posts
Default

Quote:
Originally Posted by error View Post
You mean it is too tedious?
No, I simply don't know how to do it -- I'm lacking the math background. If you could describe how to do it, that'd help me understand.
preda is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Stockfish / Lutefisk game, move 14 poll. Hungry for fish and black pieces. MooMoo2 Other Chess Games 0 2016-11-26 06:52
Redoing factoring work done by unreliable machines tha Lone Mersenne Hunters 23 2016-11-02 08:51
Unreliable AMD Phenom 9850 xilman Hardware 4 2014-08-02 18:08
[new fish check in] heloo mwxdbcr Lounge 0 2009-01-14 04:55
The Happy Fish thread xilman Hobbies 24 2006-08-22 11:44

All times are UTC. The time now is 15:26.


Fri Jul 7 15:26:52 UTC 2023 up 323 days, 12:55, 0 users, load averages: 0.95, 1.09, 1.09

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔