mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2019-02-14, 04:21   #364
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124528 Posts
Default

Quote:
Originally Posted by Flammo View Post
On the topic of beta testing, and just as a little bit of feedback of the good intent variety. I've always found it interesting that when you type 'download prime95' into a search engine, every woman and her dog seems to be fileserving the latest prime95 beta versions but not the stable version available from mersenne.org. As a long term user I've kind of formed an impression of these fileservers being used to save the mersenne server from download activity, and that mersenne.org just needs to be updated but no one has had time to update the website with the latest p95 version, as no one gets paid to run the GIMPS. I'm learning that these fileservers really are serving quite beta versions though. I'm proud to be bleeding on the edge and be part of the evolution process and tell my grand-kids where I was in 2019 doing this. But, I fear others might not really realise that unless you download from mersenne.org it shouldn't be assumed as stable. So my thoughts are have you considered labelling the beta versions out in the wild on these file servers as beta, and serving the beta on mersenne.org to keep the masses searching for the latest beta drawn to the site? (No need to actually respond, just sharing a perspective that might not be obvious to people more actively involved in me than GIMPS).
Um no. 29.5 is not released yet. I would not trust a download from "everyone and his / her dog" sites. Serving up the various test builds to the small fraction of the ~7000 GIMPS participants that would download it is not a particularly heavy server load.
kriesel is online now   Reply With Quote
Old 2019-02-14, 04:58   #365
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

141208 Posts
Default

Quote:
Originally Posted by Flammo View Post
On the topic of beta testing, and just as a little bit of feedback of the good intent variety. I've always found it interesting that when you type 'download prime95' into a search engine, every woman and her dog seems to be fileserving the latest prime95 beta versions but not the stable version available from mersenne.org.
They are probably laced with viruses and bitcoin miners.

But also, in the Internet era everyone has been trained to always download the latest (and thus it will be the greatest) version of software. So of course they would offer only the latest version. All other versions are clearly inferior.

So those folks getting their code from other places without doing due diligence will just have to accept the consequences.
retina is online now   Reply With Quote
Old 2019-02-14, 05:11   #366
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

331310 Posts
Default

Quote:
Originally Posted by Prime95 View Post
...
Long answer: When you erroneously got the Bad FFT Data errors prime95 updated the error count in the save file (the error counter it updated is the same one as the illegal sumout error counter which maxes out at 15).
...
In my deep dive of analyzing error codes, assuming I'm getting the right bits for the illegal sumout count, the results are less encouraging, on average.

For an illegal sumout count of 15+, the ratio of bad:good results is 1398:2144

Results with just one illegal sumout have a bad:good of 3519:5995

FYI, if I'm reading the source code comments correctly, the illegal sumout count in the error code is:
version 29.3+ => error code & 0x000F0000 / 0x10000
version < 29.3 => error code & 0x00FF0000 / 0x10000

(versions before 29.3 would hold a value up to 255 while 29.3 and above are limited to 15... the other nibble was repurposed to store PRP error info)

EDIT: I should point out, I'm not analyzing the illegal sumouts in a vacuum... those error codes may very well have other errors going on that skew things. A glance at results where the ONLY errors were illegal sumouts (of any quantity), the ratio bad:good is 2108:12009 (nearly 15% bad... well above the average of 3-4% for results with no errors at all)

Last fiddled with by Madpoo on 2019-02-14 at 05:18
Madpoo is offline   Reply With Quote
Old 2019-02-14, 15:13   #367
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×32×7×43 Posts
Default

Quote:
Originally Posted by Madpoo View Post
In my deep dive of analyzing error codes, assuming I'm getting the right bits for the illegal sumout count, the results are less encouraging, on average.

For an illegal sumout count of 15+, the ratio of bad:good results is 1398:2144

Results with just one illegal sumout have a bad:good of 3519:5995

FYI, if I'm reading the source code comments correctly, the illegal sumout count in the error code is:
version 29.3+ => error code & 0x000F0000 / 0x10000
version < 29.3 => error code & 0x00FF0000 / 0x10000

(versions before 29.3 would hold a value up to 255 while 29.3 and above are limited to 15... the other nibble was repurposed to store PRP error info)

EDIT: I should point out, I'm not analyzing the illegal sumouts in a vacuum... those error codes may very well have other errors going on that skew things. A glance at results where the ONLY errors were illegal sumouts (of any quantity), the ratio bad:good is 2108:12009 (nearly 15% bad... well above the average of 3-4% for results with no errors at all)
Thanks for this data. It's always interesting to see what close access to the server can reveal.
15+ illegal sumout: 1398/2144=~0.652
single illegal sumout: 3519/5995=~0.587

I feel it's time to transition from one word for all the error counts, to more than one. A few bits just doesn't cover the truly unreliable hardware cases. I'd give each error type to be counted about 16 bits, maybe 32. Exponents are up to 30 bits in mersenne.org. Some errors might occur nearly every iteration in pathological cases, or even more with retries from earlier save files or from restart.

The 3-4% seems high to me. I've seen other sources indicating half that.

One is a graph of error rate over time, and the other is George's statement about overall error rates. Maybe it's a difference in expressing error, such as 3-4% of result residues where there are double checks disagree. In 50 exponents with two results each, do we count two exponents with mismatching residues as 2 errors or 4 errors? I think after a triple check matches one and not the other for an exponent, that exponent is counted as one error and 2 good results. But until then, is that exponent's results counted as one error or two? That's regarding the numerator. A factor of two could also appear in handling the denominator, if someone calculates error rate as bad-residues/exponents and someone else calculates error rate as bad-residues/primality-tests. I think the way to go is bad-residues/primality-tests. It handles the cases of triple and quadruple tests etc more fairly.

Last fiddled with by kriesel on 2019-02-14 at 15:19
kriesel is online now   Reply With Quote
Old 2019-02-15, 18:34   #368
GP2
 
GP2's Avatar
 
Sep 2003

A1916 Posts
Default

I got a hang during benchmarking, with build 9.

Hyperthreading is turned on because for previous builds I found that it helps for Wagstaff exponents in small exponent ranges.

Perhaps this is old news and already fixed in build 10, but here's the config information and the trace from gcore:

Code:
model name      : Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
stepping        : 4
microcode       : 0x2000043
cpu MHz         : 3403.873
cache size      : 25344 KB

cpu cores       : 2

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
Code:
WorkerThreads=1
CoresPerTest=2
HyperthreadLL=1
Code:
PRPBase=3
PRPResidueType=5
Code:
PRP=1,2,8953393,1,"3"
Code:
[Main thread Feb 6 20:11] Mersenne number primality test program version 29.5
[Main thread Feb 6 20:11] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 2x1 MB, L3 cache size: 25344 KB

[Work thread Feb 15 03:51] Setting affinity to run helper thread 2 on CPU core #2
[Work thread Feb 15 03:51] Setting affinity to run helper thread 3 on CPU core #2
[Work thread Feb 15 03:51] Setting affinity to run helper thread 1 on CPU core #1
[Work thread Feb 15 03:51] Starting Gerbicz error-checking PRP test of (2^8953393+1)/3 using all-complex AVX-512 FFT length 480K, Pass1=128, Pass2=3840, clm=2, 4 threads

[Work thread Feb 15 05:27] Gerbicz error check passed at iteration 8000000.
[Work thread Feb 15 05:27] Iteration: 8010000 / 8953393 [89.46%], ms/iter:  0.717, ETA: 00:11:16
[Work thread Feb 15 05:27] Iteration: 8020000 / 8953393 [89.57%], ms/iter:  0.714, ETA: 00:11:06
[Work thread Feb 15 05:27] Iteration: 8030000 / 8953393 [89.68%], ms/iter:  0.715, ETA: 00:11:00
[Main thread Feb 15 05:27] Benchmarking multiple workers to tune FFT selection.
[Work thread Feb 15 05:27] Stopping PRP test of (2^8953393+1)/3 at iteration 8036656 [89.76%]
[Work thread Feb 15 05:27] Worker stopped while running needed benchmarks.
[Main thread Feb 15 05:27] Timing 480K FFT, 2 cores, 1 worker.  Average times:  0.75 ms.  Total throughput: 1336.15 iter/sec.
[Main thread Feb 15 05:28] Timing 480K FFT, 2 cores hyperthreaded, 1 worker.  Average times:  0.74 ms.  Total throughput: 1344.48 iter/sec.
[Main thread Feb 15 05:28] Timing 480K FFT, 2 cores, 1 worker.  Average times:  0.75 ms.  Total throughput: 1338.40 iter/sec.
[Main thread Feb 15 05:28] Timing 480K FFT, 2 cores hyperthreaded, 1 worker.  Average times:  0.71 ms.  Total throughput: 1411.24 iter/sec.
[Main thread Feb 15 05:28] Timing 480K FFT, 2 cores, 1 worker.  Average times:  0.76 ms.  Total throughput: 1319.07 iter/sec.
[Main thread Feb 15 05:28] Timing 480K FFT, 2 cores hyperthreaded, 1 worker.  Average times:  0.73 ms.  Total throughput: 1369.07 iter/sec.
[Main thread Feb 15 05:29] Timing 480K FFT, 2 cores, 1 worker.  Average times:  0.77 ms.  Total throughput: 1294.44 iter/sec.
[Main thread Feb 15 05:29] Timing 480K FFT, 2 cores hyperthreaded, 1 worker.  Average times:  0.76 ms.  Total throughput: 1323.86 iter/sec.
[Main thread Feb 15 05:29] Timing 480K FFT, 2 cores, 1 worker.  Average times:  0.75 ms.  Total throughput: 1340.28 iter/sec.
[Main thread Feb 15 05:29] Timing 480K FFT, 2 cores hyperthreaded, 1 worker.  Average times:  0.72 ms.  Total throughput: 1396.03 iter/sec.
[Main thread Feb 15 05:29] Timing 480K FFT, 2 cores, 1 worker.  Average times:  0.76 ms.  Total throughput: 1314.52 iter/sec.
[Main thread Feb 15 05:30] Timing 480K FFT, 2 cores hyperthreaded, 1 worker.  Average times:  0.72 ms.  Total throughput: 1396.34 iter/sec.
[Main thread Feb 15 05:30] Timing 480K FFT, 2 cores, 1 worker.  Average times:  0.85 ms.  Total throughput: 1179.50 iter/sec.
[Main thread Feb 15 05:30] Timing 480K FFT, 2 cores hyperthreaded, 1 worker.
Code:
[Fri Feb 15 05:28:01 2019]
FFTlen=480K, Type=3, Arch=8, Pass1=128, Pass2=3840, clm=4 (2 cores, 1 worker):  0.75 ms.  Throughput: 1336.15 iter/sec.
FFTlen=480K, Type=3, Arch=8, Pass1=128, Pass2=3840, clm=4 (2 cores hyperthreaded, 1 worker):  0.74 ms.  Throughput: 1344.48 iter/sec.
FFTlen=480K, Type=3, Arch=8, Pass1=128, Pass2=3840, clm=2 (2 cores, 1 worker):  0.75 ms.  Throughput: 1338.40 iter/sec.
FFTlen=480K, Type=3, Arch=8, Pass1=128, Pass2=3840, clm=2 (2 cores hyperthreaded, 1 worker):  0.71 ms.  Throughput: 1411.24 iter/sec.
FFTlen=480K, Type=3, Arch=8, Pass1=128, Pass2=3840, clm=1 (2 cores, 1 worker):  0.76 ms.  Throughput: 1319.07 iter/sec.
FFTlen=480K, Type=3, Arch=8, Pass1=128, Pass2=3840, clm=1 (2 cores hyperthreaded, 1 worker):  0.73 ms.  Throughput: 1369.07 iter/sec.
FFTlen=480K, Type=3, Arch=8, Pass1=192, Pass2=2560, clm=4 (2 cores, 1 worker):  0.77 ms.  Throughput: 1294.44 iter/sec.
FFTlen=480K, Type=3, Arch=8, Pass1=192, Pass2=2560, clm=4 (2 cores hyperthreaded, 1 worker):  0.76 ms.  Throughput: 1323.86 iter/sec.
FFTlen=480K, Type=3, Arch=8, Pass1=192, Pass2=2560, clm=2 (2 cores, 1 worker):  0.75 ms.  Throughput: 1340.28 iter/sec.
FFTlen=480K, Type=3, Arch=8, Pass1=192, Pass2=2560, clm=2 (2 cores hyperthreaded, 1 worker):  0.72 ms.  Throughput: 1396.03 iter/sec.
FFTlen=480K, Type=3, Arch=8, Pass1=192, Pass2=2560, clm=1 (2 cores, 1 worker):  0.76 ms.  Throughput: 1314.52 iter/sec.
FFTlen=480K, Type=3, Arch=8, Pass1=192, Pass2=2560, clm=1 (2 cores hyperthreaded, 1 worker):  0.72 ms.  Throughput: 1396.34 iter/sec.
FFTlen=480K, Type=3, Arch=8, Pass1=640, Pass2=768, clm=4 (2 cores, 1 worker):  0.85 ms.  Throughput: 1179.50 iter/sec.
Code:
(gdb) bt
#0  0x00007f3cb40c68ed in pthread_join () from /lib64/libpthread.so.0
#1  0x000000000047fabf in gwthread_wait_for_exit ()
#2  0x000000000041a791 in LaunchWorkerThreads ()
#3  0x000000000044292c in linuxContinue ()
#4  0x000000000040818b in main ()
Code:
(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0x7f3cb483e740 (LWP 2782) 0x00007f3cb40c68ed in pthread_join () from /lib64/libpthread.so.0
  2    Thread 0x7f3cb3683700 (LWP 2790) 0x00007f3cb40c68ed in pthread_join () from /lib64/libpthread.so.0
  3    Thread 0x7f3cb2e82700 (LWP 2797) 0x00007f3cb40cb86d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4    Thread 0x7f3cb167f700 (LWP 18531) 0x00007f3cb40c68ed in pthread_join () from /lib64/libpthread.so.0
  5    Thread 0x7f3cb2681700 (LWP 18533) 0x00007f3cb40cb86d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
Code:
(gdb) thread apply all bt

Thread 5 (Thread 0x7f3cb2681700 (LWP 18533)):
#0  0x00007f3cb40cb86d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000047f811 in gwevent_wait ()
#2  0x000000000046d637 in auxiliary_thread ()
#3  0x000000000047f6ba in ThreadStarter ()
#4  0x00007f3cb40c554b in start_thread () from /lib64/libpthread.so.0
#5  0x00007f3cb37792ff in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f3cb167f700 (LWP 18531)):
#0  0x00007f3cb40c68ed in pthread_join () from /lib64/libpthread.so.0
#1  0x000000000047fabf in gwthread_wait_for_exit ()
#2  0x000000000046087f in multithread_term ()
#3  0x000000000046094e in gwdone ()
#4  0x0000000000445cb6 in primeBenchOneWorker ()
#5  0x000000000047f6ba in ThreadStarter ()
#6  0x00007f3cb40c554b in start_thread () from /lib64/libpthread.so.0
#7  0x00007f3cb37792ff in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f3cb2e82700 (LWP 2797)):
#0  0x00007f3cb40cb86d in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000047f811 in gwevent_wait ()
#2  0x00000000004291f9 in implement_stop_autobench ()
#3  0x00000000004447d4 in primeContinue ()
#4  0x0000000000446a4b in LauncherDispatch ()
#5  0x000000000044ad04 in Launcher ()
#6  0x000000000047f6ba in ThreadStarter ()
#7  0x00007f3cb40c554b in start_thread () from /lib64/libpthread.so.0
#8  0x00007f3cb37792ff in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f3cb3683700 (LWP 2790)):
#0  0x00007f3cb40c68ed in pthread_join () from /lib64/libpthread.so.0
#1  0x000000000047fabf in gwthread_wait_for_exit ()
#2  0x0000000000441adc in primeBenchMultipleWorkersInternal ()
#3  0x0000000000443217 in autoBench ()
#4  0x000000000044ba3e in timed_events_scheduler ()
#5  0x000000000047f6ba in ThreadStarter ()
#6  0x00007f3cb40c554b in start_thread () from /lib64/libpthread.so.0
#7  0x00007f3cb37792ff in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f3cb483e740 (LWP 2782)):
#0  0x00007f3cb40c68ed in pthread_join () from /lib64/libpthread.so.0
#1  0x000000000047fabf in gwthread_wait_for_exit ()
#2  0x000000000041a791 in LaunchWorkerThreads ()
#3  0x000000000044292c in linuxContinue ()
#4  0x000000000040818b in main ()

Last fiddled with by GP2 on 2019-02-15 at 18:37
GP2 is offline   Reply With Quote
Old 2019-02-15, 18:38   #369
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·3,767 Posts
Default

Quote:
Originally Posted by GP2 View Post
I got a hang during benchmarking, with build 9.
Yes, fixed in (the deleted) build 10. I'll try to put together a new build today or tomorrow.
Prime95 is online now   Reply With Quote
Old 2019-02-16, 04:27   #370
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011011102 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Yes, fixed in (the deleted) build 10. I'll try to put together a new build today or tomorrow.
May be a delay. I upgraded the assembler to the latest version but it has a bug.
Prime95 is online now   Reply With Quote
Old 2019-02-17, 18:32   #371
vsuite
 
Jan 2010

11410 Posts
Default

Quote:
Originally Posted by Prime95 View Post
May be a delay. I upgraded the assembler to the latest version but it has a bug.

What assembler is this please?
vsuite is offline   Reply With Quote
Old 2019-02-17, 20:24   #372
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1D6E16 Posts
Default

UASM --- http://www.terraspace.co.uk/uasm.html

Problem is already fixed. Running tests now.
Prime95 is online now   Reply With Quote
Old 2019-02-18, 04:15   #373
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3,767 Posts
Default

Please upgrade to version 29.6 and report bugs in that thread. Thanks all.
Prime95 is online now   Reply With Quote
Old 2019-02-18, 08:16   #374
GP2
 
GP2's Avatar
 
Sep 2003

258510 Posts
Default

It's a bit late to be reporting this, but I didn't notice it until now...

I have a couple of one-core virtual machines doing PRP-CF double checks. They were using 29.5 with various builds until I upgraded them to 29.6 build 1 just now.

Some of the JSON lines (in results.txt for earlier builds and results.json.txt for later builds), maybe one percent or fewer, have no "errors" field.

That is, the JSON data outputs the "timestamp" field and then the "user" field right after, without the usual "errors":{"gerbicz":n}, in between, or anywhere else in the line. The "n" above is always 0 except in extremely rare cases.

These lines with no "errors" field are seemingly randomly interspersed among all the others which do have an "errors" field.

Examples for build 9 include the exponents 6934727, 6939487, 6955763, 6968809, on 2019-01-31, 2019-02-01, 2019-02-09, 2019-02-14. But it also happened for some cases with builds 3, 5, and 6, as far back as October.

This only happened with PRP-CF. I have a huge number of PRP tests of Wagstaff exponents below 10M, and none of their JSON lines are ever missing the Gerbicz "errors" field.

I wonder under what circumstances does this field not get printed out, and can this still happen in 29.6?

Last fiddled with by GP2 on 2019-02-18 at 08:17
GP2 is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 18:11.


Sun Aug 1 18:11:41 UTC 2021 up 9 days, 12:40, 0 users, load averages: 3.87, 3.16, 2.64

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.