mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2021-03-18, 13:38   #1
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

FA316 Posts
Default Gerbicz Error Checking - Originally: Would a "rack server" run the same as a desktop server

Title Note: Only the first three posts are concerned with the original title.

I have several desktop Ubuntu machines running headless using remote access via ssh and VNC. Would a rack server, such as a Dell Poweredge R720XD, work the same? Would server RAM for a rack server be the same, or would maybe the rack server only use low profile RAM modules?

Anything else to consider?

Thanks!

Last fiddled with by EdH on 2021-03-23 at 13:40
EdH is offline   Reply With Quote
Old 2021-03-18, 16:06   #2
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

138416 Posts
Default

The main thing to consider is that rack machines use high volume / high speed / very very high noise fans.
I was gifted a Dell C6100 years ago (4 nodes, dual 4-core DDR2-era Xeons in each node), and besides the ~kilowatt it drew, I found I could not run the thing at home because the fan noise from two rooms away was louder than the 5-fan desktop three feet away. Think closer to a quiet vacuum cleaner than a desktop.

It's possible a 1U server might want low-profile memory, but most servers of the sort you might find used are 2U and highly likely to take normal ECC ram. There are a couple flavors of ECC, though- I believe registered and unregistered sticks cannot be used in the same system, so one should wait until one has physical possession of a machine, check the memory type/label, and order upgrades to match.
VBCurtis is offline   Reply With Quote
Old 2021-03-18, 20:24   #3
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

FA316 Posts
Default

Excellent point! Two routers I picked up are way too loud to use in the house. A loud server might fit that profile.

I need to study the RAM issue a bit. My latest build has a motherboard that only accepts 24GB of non or unbuffered ECC, but 96GB of registered. Something I was looking at seemed to say registered performs quite a bit slower than the others.
EdH is offline   Reply With Quote
Old 2021-03-18, 20:54   #4
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

24·32·53 Posts
Default

And with Gerbicz error checking, higher priced ECC ram no longer provides benefits to the PRP tester.
Prime95 is offline   Reply With Quote
Old 2021-03-19, 02:43   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

578310 Posts
Default

Quote:
Originally Posted by Prime95 View Post
And with Gerbicz error checking, higher priced ECC ram no longer provides benefits to the PRP tester.
Except that errors reduce net throughput. The run may rewind by 200k iterations per error detected in gpuowl. Or 50k. Mprime etc. will retreat and retry on cpu too.
Code:
2021-03-18 21:26:38 roa/radeonvii 340000607 OK 96450000  28.37%; 4139 us/it; ETA 11d 16:00; 2560ad1b65fd94e8 (check 2.36s) 26 errors
2021-03-18 21:30:07 roa/radeonvii 340000607 EE 96500000  28.38%; 4140 us/it; ETA 11d 16:01; 3a397a709767f8b4 (check 2.24s) 26 errors
2021-03-18 21:30:09 roa/radeonvii 340000607 OK 96450000 loaded: blockSize 400, 2560ad1b65fd94e8
2021-03-18 21:33:39 roa/radeonvii 340000607 OK 96500000  28.38%; 4140 us/it; ETA 11d 16:00; 614a86697c2e4515 (check 2.41s) 27 errors
2021-03-18 21:37:08 roa/radeonvii 340000607 OK 96550000  28.40%; 4140 us/it; ETA 11d 15:59; 38b95efc08eff002 (check 2.46s) 27 errors
There goes 3.5 minutes at 50k iterations. This is on a system with ECC cpu-side ram, but not ECC on the gpu.

Last fiddled with by kriesel on 2021-03-19 at 02:50
kriesel is offline   Reply With Quote
Old 2021-03-19, 03:25   #6
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

3×1,993 Posts
Default

Quote:
Originally Posted by Prime95 View Post
And with Gerbicz error checking, higher priced ECC ram no longer provides benefits to the PRP tester.


Benefits of ECC to the project are dramatically reduced thanks to GEC. It's great if you want it for some other purpose.
CRGreathouse is offline   Reply With Quote
Old 2021-03-19, 06:46   #7
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

2×32×349 Posts
Default

Quote:
Originally Posted by Prime95 View Post
And with Gerbicz error checking, higher priced ECC ram no longer provides benefits to the PRP tester.
I disagree.

ECC protects all data flows, not just the FFT data. So the RAM data for code, the OS, the drivers, etc. are also less likely to be corrupted, leading to fewer crashes, more uptime, and more throughput. This is in addition to kriesel's point about fewer rewinds during the test.
retina is online now   Reply With Quote
Old 2021-03-19, 21:07   #8
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

2·32·83 Posts
Default

Quote:
Originally Posted by kriesel View Post
Except that errors reduce net throughput. The run may rewind by 200k iterations per error detected in gpuowl. Or 50k. Mprime etc. will retreat and retry on cpu too.
Code:
2021-03-18 21:26:38 roa/radeonvii 340000607 OK 96450000  28.37%; 4139 us/it; ETA 11d 16:00; 2560ad1b65fd94e8 (check 2.36s) 26 errors
2021-03-18 21:30:07 roa/radeonvii 340000607 EE 96500000  28.38%; 4140 us/it; ETA 11d 16:01; 3a397a709767f8b4 (check 2.24s) 26 errors
2021-03-18 21:30:09 roa/radeonvii 340000607 OK 96450000 loaded: blockSize 400, 2560ad1b65fd94e8
2021-03-18 21:33:39 roa/radeonvii 340000607 OK 96500000  28.38%; 4140 us/it; ETA 11d 16:00; 614a86697c2e4515 (check 2.41s) 27 errors
2021-03-18 21:37:08 roa/radeonvii 340000607 OK 96550000  28.40%; 4140 us/it; ETA 11d 15:59; 38b95efc08eff002 (check 2.46s) 27 errors
There goes 3.5 minutes at 50k iterations. This is on a system with ECC cpu-side ram, but not ECC on the gpu.
Suboptimal run, with those many errors you could get a very good approx for the error rate and using that you/we could optimize the block size to lower the expected number of iterations.

Check me: the expected(!) cost of error checking+potential rollback(s) over the fixed p iterations is:
Code:
F(p,e,B)=q=(1-e)^B;return(p*2/sqrt(B)/q+p*(1/q-1))
where p is the tested prime, e is the error rate for given p/gpu [and FFT size!], B is block size (B<p).
Using a slightly simplified case when we assume that the probability for 2 or more errors in a given block is much smaller than the probability of a single error.
What is the case in all real situations, unless you have a very faulty gpu/cpu where error rate is too large (say e>1/4).

In your case p=340000607; e=27./(96550000+50000*27); B=50000 [was it fixed?].
(notice that we used more than 96550000 iterations due to the rollbacks, assuming that 27 errros is the total error count,
does it save this count or start with count=0 after a restart?].

Code:
? F(p,e,50000)
%113 = 7804224.9927642080311375201439985915790
? F(p,e,23400)
%114 = 6675385.4735534531717799947002467584233
So you would save more than 1M iterations using B~23400 on your given gpu,p,FFT. (B=23400 is close to the optimal, ofcourse there is some uncertainity in e).
R. Gerbicz is offline   Reply With Quote
Old 2021-03-20, 22:29   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,783 Posts
Default

Thanks for the suggestion on optimizing GEC check interval ("blocksize") according to error rate.
I'm working on a detailed case study report, but here is an initial response.

Note that the shorter gpuowl GEC interval, computing the check on the cpu more frequently, to reduce gpu throughput impact of large blocks with errors, may cost some mprime/prime95 throughput on the cpu. That is not accounted for in the optimization calculation yet. A Radeon VII gpu is far more powerful than any cpu I have, so that is probably a rather small effect.

Also in gpuowl there is some flexibility independently on check interval B and individual GEC block size b. So I think the sqrt(B) does not quite match gpuowl's behavior.
If I understand your math, e is per iteration, so (1-e)^B gets closer to 1 (reliable) with smaller B, which helps avoid 3-strikes-and-program-exit. That is an issue on one of my RX550 gpus, sometimes, at B=50000, and lowering B is helping there.

GpuOwl GEC error detection count per exponent primality test is stored in the header of each exponent's PRP save file. Mostly. For sufficiently low error rate, there is one error per detection; at high error rate, there might be more than one error per detection, as you mentioned.
The error counting is also modal; isolated EE occurrences are all normally counted, but 3 in rapid back to back succession that cause program exit are counted as 1 in the file header, and the known-bad residues are not saved to the file. This 340M exponent/gpu combo is having only isolated occurrences, so the rapid-fire EE situation is not an issue in this case. An error detection, or I think a set of up to 3 such could go entirely uncounted, if the user does CTRL-C twice soon enough to cause the program to terminate immediately without saving its most recent progress. (First CTRL-C goes to the program for orderly termination, second terminates it immediately without saving, third terminates the batch script running the program.) So the counts we would obtain directly from logs or save files are lower bounds for how many GEC error detections occurred in the course of an exponent's PRP test. A similar undercount occurs with iterations.

GpuOwl has the ability to automatically adjust GEC check/log interval depending on error occurrence somewhat.
There is greater adjustability with the -log option on the command line or in gpuowl's config.txt, but that requires multiples of 10,000 iterations, per the program's help output. Check interval initially was 800 then immediately 200k until errors appeared, then stepped down to 100k after the first error, later 50k after the second, and stayed at 50k since. It's logged up to 9 error detections per day. The past 24 hours there were zero. I think it may recover to higher blocksize if enough time passes after the last error detection. I have been periodically lowering gpu ram clock frequency to try to lower error rate. (Intending to vary error rate, downward.) Maximum gpu clock frequency used in this exponent's run was at least 990 MHz. Since 2021-03-19 20:09 it has been 900. MHz. (All gpuowl log file times are local; in this case, US CDT UTC-0500 after 2am 2021-03-14, US CST UTC-0600 before. Except for result lines' embedded date/time stamps which are UTC.)

Some supporting detail follows.

Gpuowl PRP file header fields are
Code:
>more 340000607.owl
OWL PRP 10 340000607 115850000 400 ec48bc26a538efcb 29
That's gpuowl-file-signature, worktype, file-version, exponent, iteration count, GECblocksize, res64, errorcount-since-iteration0

Gpuowl help output says in part
Code:
2020-09-07 09:43:38 gpuowl v6.11-380-g79ea0cc
...
-block <value>     : PRP GEC block size, or LL iteration-block size. Must divide 10'000.
-log <step>        : log every <step> iterations. Multiple of 10'000.
-jacobi <step>     : (LL-only): do Jacobi check every <step> iterations. Default 1'000'000.
or
Code:
2021-02-23 13:06:07 GpuOwl VERSION v7.2-53-ge27846f
...
-block <value>     : PRP error-check block size. Must divide 10'000.
-log <step>        : log every <step> iterations. Multiple of 10'000.
(Blocksize default in recent versions is 400, and has been 500 instead of 400 in some versions. Also 1000 IIRC.) So normal error-free running is log interval 200k iterations, block 400, each interval is 500 blocks, but may step down from there based on error experience to 250 or 125.
Observed GEC interval in the p=340000607 PRP run:
Code:
2021-03-14 06:08:00 roa/radeonvii Expected maximum carry32: 89CA0000
2021-03-14 06:08:04 roa/radeonvii OpenCL args "-DEXP=340000607u -DWIDTH=1024u -DSMALL_HEIGHT=1024u -DMIDDLE=9u -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DMM_CHAIN=2u -DMM2_CHAIN=3u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xf.b18fc6dd93bcp-4 -DIWEIGHT_STEP_MINUS_1=-0xf.d866d332c56p-5 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2021-03-14 06:08:13 roa/radeonvii OpenCL compilation in 8.93 s
2021-03-14 06:08:16 roa/radeonvii 340000607 OK    18800 loaded: blockSize 400, 05170f8523ce46a4
2021-03-14 06:08:16 roa/radeonvii validating proof residues for power 9
2021-03-14 06:08:16 roa/radeonvii Proof using power 9
2021-03-14 06:08:21 roa/radeonvii 340000607 OK    19600   0.01%; 3939 us/it; ETA 15d 12:02; a41eedf5cb826649 (check 2.33s)
2021-03-14 06:20:21 roa/radeonvii 340000607 OK   200000   0.06%; 3975 us/it; ETA 15d 15:09; 73ddddae5b6e44d8 (check 2.28s)
2021-03-14 06:33:40 roa/radeonvii 340000607 OK   400000   0.12%; 3983 us/it; ETA 15d 15:43; bcae6edea14aff1e (check 2.44s)
2021-03-14 06:46:59 roa/radeonvii 340000607 OK   600000   0.18%; 3984 us/it; ETA 15d 15:34; 294ad9a594748b84 (check 2.47s)
2021-03-14 07:00:19 roa/radeonvii 340000607 OK   800000   0.24%; 3991 us/it; ETA 15d 16:04; 35097de005cc0e41 (check 2.42s)
2021-03-14 07:13:40 roa/radeonvii 340000607 OK  1000000   0.29%; 3990 us/it; ETA 15d 15:41; f9cc27618d78064d (check 2.28s)
2021-03-14 07:27:00 roa/radeonvii 340000607 OK  1200000   0.35%; 3989 us/it; ETA 15d 15:23; 06be4e94037e3e1f (check 2.36s)
2021-03-14 07:40:23 roa/radeonvii 340000607 OK  1400000   0.41%; 4002 us/it; ETA 15d 16:24; ae27bb42be2d5452 (check 2.30s)
2021-03-14 07:53:43 roa/radeonvii 340000607 OK  1600000   0.47%; 3990 us/it; ETA 15d 15:03; 3af49b33d3719483 (check 2.31s)
2021-03-14 08:07:03 roa/radeonvii 340000607 OK  1800000   0.53%; 3987 us/it; ETA 15d 14:31; a92994ab8a700441 (check 2.36s)
2021-03-14 08:20:25 roa/radeonvii 340000607 OK  2000000   0.59%; 4000 us/it; ETA 15d 15:31; 8a517da67fc6dc32 (check 2.34s)
2021-03-14 08:33:45 roa/radeonvii 340000607 EE  2200000   0.65%; 3991 us/it; ETA 15d 14:31; 40593e3281ef501f (check 2.31s)
2021-03-14 08:33:48 roa/radeonvii 340000607 OK  2000000 loaded: blockSize 400, 8a517da67fc6dc32
2021-03-14 08:40:29 roa/radeonvii 340000607 OK  2100000   0.62%; 3984 us/it; ETA 15d 13:58; ce1ebf29fc1bd481 (check 2.30s) 1 errors
2021-03-14 08:47:12 roa/radeonvii 340000607 OK  2200000   0.65%; 4014 us/it; ETA 15d 16:39; d287dbffb45552c5 (check 2.28s) 1 errors
...
2021-03-14 18:51:17 roa/radeonvii 340000607 OK 11200000   3.29%; 4010 us/it; ETA 15d 06:14; a9adcf39425a4344 (check 2.26s) 1 errors
2021-03-14 18:58:00 roa/radeonvii 340000607 EE 11300000   3.32%; 4018 us/it; ETA 15d 06:50; 6170aa84438debf5 (check 2.16s) 1 errors
2021-03-14 18:58:03 roa/radeonvii 340000607 OK 11200000 loaded: blockSize 400, a9adcf39425a4344
2021-03-14 19:01:26 roa/radeonvii 340000607 OK 11250000   3.31%; 4010 us/it; ETA 15d 06:10; aa4dd62179fe524a (check 2.26s) 2 errors
2021-03-14 19:04:49 roa/radeonvii 340000607 OK 11300000   3.32%; 4025 us/it; ETA 15d 07:28; 37f520680880635f (check 2.55s) 2 errors
...
2021-03-19 13:41:09 roa/radeonvii 340000607 OK 110250000  32.43%; 4175 us/it; ETA 11d 02:26; 5cf5fafe08768177 (check 2.37s) 28 errors
2021-03-19 13:44:39 roa/radeonvii 340000607 EE 110300000  32.44%; 4156 us/it; ETA 11d 01:10; bed66bfc267390c1 (check 2.26s) 28 errors
2021-03-19 13:44:42 roa/radeonvii 340000607 OK 110250000 loaded: blockSize 400, 5cf5fafe08768177
2021-03-19 13:48:12 roa/radeonvii 340000607 OK 110300000  32.44%; 4160 us/it; ETA 11d 01:26; 8c75b681d85f1c84 (check 2.46s) 29 errors
2021-03-19 13:51:42 roa/radeonvii 340000607 OK 110350000  32.46%; 4157 us/it; ETA 11d 01:09; 81b74b9d2bb8c032 (check 2.44s) 29 errors
...
2021-03-20 15:28:06 roa/radeonvii 340000607 OK 131350000  38.63%; 4165 us/it; ETA 10d 01:23; 964dae11549f69da (check 2.39s) 29 errors
2021-03-20 15:31:36 roa/radeonvii 340000607 OK 131400000  38.65%; 4166 us/it; ETA 10d 01:24; 0342098bcec3a40d (check 2.38s) 29 errors
kriesel is offline   Reply With Quote
Old 2021-03-21, 12:13   #10
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

2×32×83 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
Check me: the expected(!) cost of error checking+potential rollback(s) over the fixed p iterations is:
Code:
F(p,e,B)=q=(1-e)^B;return(p*2/sqrt(B)/q+p*(1/q-1))
where p is the tested prime, e is the error rate for given p/gpu [and FFT size!], B is block size (B<p).
Using a slightly simplified case when we assume that the probability for 2 or more errors in a given block is much smaller than the probability of a single error.
What is the case in all real situations, unless you have a very faulty gpu/cpu where error rate is too large (say e>1/4).
Looks like my above formula is good, my reasoning:
we have q=(1-e)^B probability that in a single block all B iterations is good. You have 2*sqrt(B) iterations to do the in the check per block,
and we have p/B blocks in the p iterations. If an event has pr=q probability to pass, then in average you need 1/pr=1/q trials to see this.
[for Maths: https://en.wikipedia.org/wiki/Geometric_distribution 's mean].
So in average there'll be p/B*2*sqrt(B)*1/q=p*2/sqrt(B)/q iterations in the check.

1/q-1 is the expected number of rollbacks for a single B block. Hence the total number of rollbacks will be:
p/B*(1/q-1), giving the total number of iterations in rollbacks=p*(1/q-1).

And the sum of these two terms=p*2/sqrt(B)/q+p*(1/q-1), what we needed.
There is one little issue here: you could have also an error in the error checking's iterations, but for "large" B the probability
for this is much smaller than in the "main" iterations, because per block we have only 2*sqrt(B) iterations, what is much smaller than B.

Actually the number of rollbacks=p/B*(1/q-1)
is a monotone decreasing function in B. But we have no problem with that, the task is not to minimize
the number of rollbacks, but to minimize the expected number of iterations.

What I don't understand with that block=400 and (give a name) superblock=50000 is that in optimal setup
shouldn't be B=superblock=block^2 ? Ofcourse handle the last few iterations from the p iterations seperately or go
past the p iterations.
R. Gerbicz is offline   Reply With Quote
Old 2021-03-21, 13:52   #11
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,783 Posts
Default

I've seen it stated elsewhere, by Preda I think, that GEC overhead is 0.2% with b=1000 but 0.5% with b=400, without effect of detected error. For maller b, as in ~sqrt(50000), sqrt(20000), sqrt(10000), overhead would get substantial. Also there are probably programming advantages to having b fixed during a run. B needs to be an integral multiple of b, and b constant during a B-long interval, don't they? Having log entries in multiples of round numbers is more convenient from an end user point of view, than having seemingly random iteration counts appear as the sum of various round number B values plus the occasional 2b at start and stop if b sometimes= 224, 141, or 100. Maybe it's partly esthetics.
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
OFFICIAL "SERVER PROBLEMS" THREAD ewmayer PrimeNet 2350 2021-10-10 21:36
newPGen "Data Execution Prevention" on Windows Server R2 2012 MisterBitcoin Software 4 2017-02-21 15:50
AMD Announces Industry's First "Supercomputing" Server Graphics Card ET_ GPU Computing 23 2013-11-18 17:49
Server has been "busy" and/or "unavailable Grant Information & Answers 0 2008-01-13 22:45
"Archive" server - community input requested delta_t PrimeNet 8 2007-03-09 20:24

All times are UTC. The time now is 09:34.


Thu Oct 21 09:34:35 UTC 2021 up 90 days, 4:03, 1 user, load averages: 0.73, 1.01, 1.04

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.