![]() |
I think theres a bug in prime95 on latest Windows 10 17763.
Just stress testing, Min FFT 51200, max FFT 51200 FFT in place unchecked RAM to test 13926 MB Time for each test 5 min Always within a couple minutes prime95 will just quit to desktop, sometimes flickering the screen, bugging Windows 10 color theme around the edge of windows, requiring me to click any color in settings, then pick my (or auto) color again to clear it. Every time, every system, overclocked, stock, underclocked. Back to previous Windows 10 version, doesnt happen.. It also appears to load an abnormally amount of low RAM when its clearly available. Using FMA3 on 8700k. |
Maybe it's a bug in Windows 10 17763. Which piece of software changed, after all?
|
[QUOTE=Yuno;497165]...Always within a couple minutes prime95 will just quit to desktop, sometimes flickering the screen, bugging Windows 10 color theme around the edge of windows, requiring me to click any color in settings, then pick my (or auto) color again to clear it. Every time, every system, overclocked, stock, underclocked. Back to previous Windows 10 version, doesnt happen..[/QUOTE]
It sounds to me like some type of video driver issue. There is an option in there somewhere that prevents Automatic Updates from downloading hardware drivers. It did that on me once and everything went wacko. Even so, this is still very strange behavior for [I]Prime95[/I]. [QUOTE=kladner]Maybe it's a bug in Windows 10 17763. Which piece of software changed, after all?[/QUOTE] An excellent point. If ever there was a piece of software in a constant state of flux, it is Windows 10. This is a subject which might find better detail in the [U]Hardware[/U] forum. |
feature request or not :)
As all you know I extreme like Prime95, and do all PRP testing with him. Since Mark make "new" type of twinsieve I have now interesting situation.
I have two workers , both have two cores ( on quad CPU) One worker do PRP=1574,3,1778899,1 and second worker do PRP=1574,3,1778899,-1 Since both worker do candidates i same time, I got this error [QUOTE][Worker #2 Oct 5 11:51:06] Error reading intermediate file: p1574_1778899 [Worker #2 Oct 5 11:51:06] Renaming p1574_1778899 to p1574_1778899.bad1 [Worker #2 Oct 5 11:51:06] All intermediate files bad. Temporarily abandoning work unit.[/QUOTE] I search on Google and may say it is computer problem: but in my case is not: problem is that intermediate file for both files called same , so that is true reason So question- request, can intermediate file for +1 has P in name , and intermediate file for -1 has N in name: so prime95 know what intermediate file is for what candidate? |
[QUOTE=pepi37;497451]As all you know I extreme like Prime95, and do all PRP testing with him. Since Mark make "new" type of twinsieve I have now interesting situation.
I have two workers , both have two cores ( on quad CPU) One worker do PRP=1574,3,1778899,1 and second worker do PRP=1574,3,1778899,-1 [/QUOTE] My free (almost) 2x speedup tip: do only the k*3^n+1 prime tests, and if that is prime then later on or immediately run the k*3^n-1 prp test (which you can also verify it with a true primality test because the factorization of q+1 is trivial). ps. but why are you using base=3? Quite unusual, also blocks my fast error checking. |
[QUOTE=pepi37;497451]
One worker do PRP=1574,3,1778899,1 and second worker do PRP=1574,3,1778899,-1 Since both worker do candidates i same time, I got this error [/QUOTE] I think the two tests are using the same save file name. Not good. If you must do these two tests at the same time, run two different instances of prime95 in two different directories. One instance does the +1 candidates and the other instance does the -1 candidates. |
[QUOTE=R. Gerbicz;497455]My free (almost) 2x speedup tip: do only the k*3^n+1 prime tests, and if that is prime then later on or immediately run the k*3^n-1 prp test (which you can also verify it with a true primality test because the factorization of q+1 is trivial).
ps. but why are you using base=3? Quite unusual, also blocks my fast error checking.[/QUOTE] The default behavior of twinsieve is that if a factor is found for -1, then -1 and +1 terms are removed. And I dont search twin primes, and this is not "ordinary" twinsiever. So I ask Rogue, that add switch that will skip removing of +1 factor if factor for -1 is found. ( or in reverse order) So in total I got: when sieve process is done I got sieve file for +/-side but I only sieve once: so speedup is already 50% :) Second: why use base3? This is my little project: I already find "pair" of primes with same exponent on base2 , and now i switch on base 3 :) P.S When I do base2 search your fast error checking was flawless Mr. Gerbicz |
[QUOTE=Prime95;497457]I think the two tests are using the same save file name. Not good.
If you must do these two tests at the same time, run two different instances of prime95 in two different directories. One instance does the +1 candidates and the other instance does the -1 candidates.[/QUOTE] Ok, then little XLS sheet will found those pairs, so I will test it different time, and all my problems are solved :) |
ECM finds an improper factor
1 Attachment(s)
Here's an interesting bug:
Create [c]worktodo.txt[/c] with the following line: [CODE] [Worker #1] ECM2=1,3,34961,-1,50000,5000000,279,"2" [/CODE] This is looking for factors of (3[SUP]34961[/SUP] − 1) /2 . It has known factors 162235122061 and 71915094739482479, but we don't include those in the factor string in the worktodo line, just "2". Create a suitable [c]local.txt[/c], and [c]prime.txt[/c] with [c]UsePrimenet=0[/c], because Primenet doesn't want this result. Use the save file [c]e0034961[/c] from the zip file attached to this post. This will run the ECM curve with s=4898604651034280. Run [c]mprime -d[/c] The result is: [CODE] [Main thread Oct 10 22:48] Starting worker. [Work thread Oct 10 22:48] Worker starting [Work thread Oct 10 22:48] Using FMA3 FFT length 3K [Work thread Oct 10 22:48] ECM on 3^34961-1: curve #243 with s=4898604651034280, B1=50000, B2=5000000 [Work thread Oct 10 22:48] Using 65MB of memory in stage 2. [Work thread Oct 10 22:48] Stage 2 init complete. 18626 transforms, 2 modular inverses. Time: 0.253 sec. [Work thread Oct 10 22:48] 3^34961-1 has a factor: 216349680636949431... ... ... ... ...1985595899541669816801 (ECM curve 243, B1=50000, B2=5000000) Segmentation fault [/CODE] That is, it prints out a 16681-digit factor, which turns out to be precisely (3[SUP]34961[/SUP] − 1) /2 itself, and then crashes. When I try running the same parameters with GMP-ECM, nothing special happens. No factor is found: [CODE] echo "(3^34961-1)/2" | ./ecm -sigma 4898604651034280 50000 5000000 GMP-ECM 7.0.4 [configured with GMP 6.1.1, --enable-asm-redc] [ECM] Input number is (3^34961-1)/2 (16681 digits) Using B1=50000, B2=5000000, polynomial x^1, sigma=0:4898604651034280 Step 1 took ...ms Step 2 took ...ms [/CODE] However, the documentation suggests that sigma is not always defined in the same way by mprime as by GMP-ECM, so maybe it's not doing the same calculations. |
[QUOTE=GP2;497821]Here's an interesting bug:
Create [c]worktodo.txt[/c] with the following line: [CODE] [Worker #1] ECM2=1,3,34961,-1,50000,5000000,279,"2" [/CODE] This is looking for factors of (3[SUP]34961[/SUP] − 1) /2 . It has known factors 162235122061 and 71915094739482479, but we don't include those in the factor string in the worktodo line, just "2". Create a suitable [c]local.txt[/c], and [c]prime.txt[/c] with [c]UsePrimenet=0[/c], because Primenet doesn't want this result. Use the save file [c]e0034961[/c] from the zip file attached to this post. This will run the ECM curve with s=4898604651034280. Run [c]mprime -d[/c] The result is: [CODE] [Main thread Oct 10 22:48] Starting worker. [Work thread Oct 10 22:48] Worker starting [Work thread Oct 10 22:48] Using FMA3 FFT length 3K [Work thread Oct 10 22:48] ECM on 3^34961-1: curve #243 with s=4898604651034280, B1=50000, B2=5000000 [Work thread Oct 10 22:48] Using 65MB of memory in stage 2. [Work thread Oct 10 22:48] Stage 2 init complete. 18626 transforms, 2 modular inverses. Time: 0.253 sec. [Work thread Oct 10 22:48] 3^34961-1 has a factor: 216349680636949431... ... ... ... ...1985595899541669816801 (ECM curve 243, B1=50000, B2=5000000) Segmentation fault [/CODE] That is, it prints out a 16681-digit factor, which turns out to be precisely (3[SUP]34961[/SUP] − 1) /2 itself, and then crashes. When I try running the same parameters with GMP-ECM, nothing special happens. No factor is found: [CODE] echo "(3^34961-1)/2" | ./ecm -sigma 4898604651034280 50000 5000000 GMP-ECM 7.0.4 [configured with GMP 6.1.1, --enable-asm-redc] [ECM] Input number is (3^34961-1)/2 (16681 digits) Using B1=50000, B2=5000000, polynomial x^1, sigma=0:4898604651034280 Step 1 took ...ms Step 2 took ...ms [/CODE] However, the documentation suggests that sigma is not always defined in the same way by mprime as by GMP-ECM, so maybe it's not doing the same calculations.[/QUOTE] Looks like the print buffer was defined statically at 16K? |
[QUOTE=GP2;497821]Here's an interesting bug:
Create [c]worktodo.txt[/c] with the following line: [CODE] [Worker #1] ECM2=1,3,34961,-1,50000,5000000,279,"2" [/CODE]This is looking for factors of (3[SUP]34961[/SUP] − 1) /2 . It has known factors 162235122061 and 71915094739482479, but we don't include those in the factor string in the worktodo line, just "2". [/QUOTE] There is something funny with your save file. This job runs just fine: [CODE] [Worker #1] ECM2=1,3,34961,-1,50000,5000000,1,4898604651034280,"2" [/CODE]resulting in [CODE] [Oct 15 14:14] ECM on 3^34961-1: curve #1 with s=4898604651034280, B1=50000, B2=5000000 [Oct 15 14:14] Stage 1 complete. 1286025 transforms, 1 modular inverses. Time: 10.312 sec. [Oct 15 14:14] Using 65MB of memory in stage 2. [Oct 15 14:14] Stage 2 init complete. 18313 transforms, 1 modular inverses. Time: 0.215 sec. [Oct 15 14:14] Stage 2 complete. 592895 transforms, 1 modular inverses. Time: 5.134 sec. [Oct 15 14:14] Stage 2 GCD complete. Time: 0.051 sec. [Oct 15 14:14] 3^34961-1 completed 1 ECM curve, B1=50000, B2=5000000, We4: 01420FF3 [Oct 15 14:14] No work to do at the present time. Waiting. [/CODE] |
[QUOTE=error;498073]There is something funny with your save file.
This job runs just fine: [CODE] [Worker #1] ECM2=1,3,34961,-1,50000,5000000,1,4898604651034280,"2" [/CODE][/QUOTE] No, it crashes in the same way. [QUOTE] [CODE] [Oct 15 14:14] 3^34961-1 completed 1 ECM curve, B1=50000, B2=5000000, We4: 01420FF3 [/CODE][/QUOTE] The "We4" indicates maybe that you are using version 27 of mprime? For my run it is "Wg8". The letter indicates the version and the number indicates the platform. I am using the latest version 29 on Linux. [QUOTE] [CODE] [Oct 15 14:14] Stage 2 init complete. 18313 transforms, 1 modular inverses. Time: 0.215 sec. [/CODE] [/QUOTE] It's interesting that there is a different number of transforms. You get 18313, but in my runs (both times) it was 18626. Maybe this is a clue to the problem. Interestingly, 626 = 2 times 313, is that just a coincidence? |
[QUOTE=GP2;498074]No, it crashes in the same way.
The "We4" indicates maybe that you are using version 27 of mprime? For my run it is "Wg8". The letter indicates the version and the number indicates the platform. I am using the latest version 29 on Linux. It's interesting that there is a different number of transforms. You get 18313, but in my runs (both times) it was 18626. Maybe this is a clue to the problem. Interestingly, 626 = 2 times 313, is that just a coincidence?[/QUOTE] Do you still have the savefile there? If you do, it will cause the crash. That happens to me too. I use 28.5 on Win7. |
[QUOTE=error;498077]Do you still have the savefile there? If you do, it will cause the crash. That happens to me too. I use 28.5 on Win7.[/QUOTE]
Yes, that's the problem. I used the same working directory with the savefile in it. It worked the second time, 18313 transforms and no factor found, like yours. But the savefile was written by mprime itself, so it's some kind of rare bug one way or another. And the original crash happened in the middle of a run, so resuming from the savefile merely reproduced that crash. In other words, I don't think it's because of corruption introduced by the routines that write the savefile. |
P-1 bounds selected are falling short of primenet values
I had thought that if I gave prime95 v29.4b8 plenty of ram allocation, that it would use bounds that satisfy primenet targets. But it seems it falls well short. See for example [URL]https://www.mersenne.ca/exponent/89200591[/URL] which was run with a memory limit of 16GB for the 4-core worker. [CODE]
Limit Ghz-Days Probability B1 B2 Actual 2[SUP]76[/SUP] 171.5683 84.2649% 720,000 14,220,000 PrimeNet 2[SUP]71[/SUP] 5.3596 80.9970% 970,000 24,250,000 GPU72 2[SUP]75[/SUP] 85.7832 83.6597% 970,000 24,250,000 Difference +1 +85.7851 +0.6052% -250,000 -10,030,000 [/CODE] Reading through readme.txt and undoc.txt and searching for strings like primenet, bound, P-1 did not reveal a way to set it to use primenet bounds for P-1. Manually modifying the worktodo entries would be tedious and impractical since it's doing one each 7 hours or so. I'm running now with 32GB for the P-1 memory limit, since it looks like it was fully utilizing 16GB in stage 2. Is there a way to cause v29.4 to automatically go to the primenet bounds? Is satisfying primenet P-1 bounds something that will or could be included in v29.5 as an option? Is there some reason matching primenet bounds is at least sometimes a bad idea? |
[QUOTE=kriesel;498476]Is there a way to cause v29.4 to automatically go to the primenet bounds?
Is there some reason matching primenet bounds is at least sometimes a bad idea?[/QUOTE]The bounds chosen for P-1 are to optimize throughput, where the amount of work spent doing P-1 is less than the amount of work saved by any factors found. What that means for bounds is that as TF is done to higher bounds the chance of finding a P-1 factor is slightly lower, therefore less time should be spent doing P-1. Since the advent of GPU-TF all exponents are TF'd 3-4 bitlevels higher than the old CPU-TF limits of Prime95 (which are largely irrelevant now). The important thing to note in the table you copy-pasted is the "Combined Probability" column, which is the chance of finding a factor with both TF and P-1: [FONT="Courier New"]Prime95 CPU limits: TF:71, B1:970000, B2: 24250000 = 86.8439% Actual work done: TF:76, B1:720000, B2: 14220000 = 87.3853%[/FONT] Much more TF, somewhat less P-1, but overall a higher chance of finding a factor. If you really truly want to get Prime95 to pick the higher bounds you can manually enter the bounds in worktodo (see undoc.txt for details), or you can adjust the bitlevel in your existing worktodo lines to a lower level, but just be aware that Prime95 automatically selects the optimal bounds and any changes you make will likely lower GIMPS overall throughput (even if you do find more factors). edit: also note two things about the bounds on mersenne.ca: 1) the P-1 bounds in the GPU72 line are wrong, they are not adjusted as described above, I need to figure out how to do so. 2) the P-1 bounds in general are close to magic numbers, they're currently plucked off an Excel graph I did in 2008 based on observed bounds for exponents 1M-500M, so while they might be reasonably close to reality, they'll never match exactly. |
[QUOTE=kriesel;498476]...
Is satisfying primenet P-1 bounds something that will or could be included in v29.5 as an option? Is there some reason matching primenet bounds is at least sometimes a bad idea?[/QUOTE]James beet me to it, but I'll leave my answer. As far as I know "PrimeNet" bounds are a concept used by mersenne.ca only. The Prime95 program calculates the bounds for each work-unit based on how deep factored, the number of LL tests saved and the available memory. The bounds are based on the assumption that the same machine will do the LL tests. Also I have always had a better success rate than that calculated by Prime 95. Once you allow the program to use a reasonable quantity of memory it doesn't affect the bounds very much, it will have an influence on the Brent-Sumaya extensions. In other words, at the moment, except by modifying the work-units in the worktodo.txt file there is nothing you can do. But you can ask George if he would consider introducing a fiddling factor for P-1 factoring. There is another way you can drastically change the bounds used by Prime95 : change the number of LL tests saved, by default it is set at two. Jacob |
[QUOTE=James Heinrich;498477]edit: also note two things about the bounds on mersenne.ca:
1) the P-1 bounds in the GPU72 line are wrong, they are not adjusted as described above, I need to figure out how to do so. 2) the P-1 bounds in general are close to magic numbers, they're currently plucked off an Excel graph I did in 2008 based on observed bounds for exponents 1M-500M, so while they might be reasonably close to reality, they'll never match exactly.[/QUOTE] Thanks for the explanation. I had thought that the red font used for negative numbers in the difference row meant that they were considered somehow insufficient. Similar questions had arisen in regard to CUDAPm1, which would be a little easier to manage worktodo entry adjustment on, since it is all manually assigned and manually reported, rather than sometimes being assigned and completed quicker than I would have time to intervene as on primenet-connected prime95. I've seen some manual P-1 work done in CUDAPm1 be marked as expired when the result was reported, and that seemed to occur when one or both the bounds reported were lower than the primenet values at mersenne.ca. A recent example is [URL]https://www.mersenne.org/report_exponent/?exp_lo=200000797&full=1[/URL] [URL]https://www.mersenne.ca/exponent/200000797[/URL] A particularly interesting case is [URL]https://www.mersenne.ca/exponent/200000551[/URL], where I pushed the manual P-1 to runs saved = 3, and tf and p-1 bounds all exceed primenet values, yet the computed probability fell a bit short. |
[QUOTE=James Heinrich;498477]1) the P-1 bounds in the GPU72 line are wrong, they are not adjusted as described above, I need to figure out how to do so.[/QUOTE]I transcribed the Prime95 P-1 bound estimator to PHP so I should now be able to get the same numbers are Prime95 (I'm going on the assumption that plenty of RAM is available, as in sufficient to do the whole P-1 in a single pass, for simplicity, so bounds may be [i]slightly[/i] higher than what you may get in real life). However the code is quite ponderous due to the (large) number of iterations it goes through to find the "optimal" solution... 66k iterations at 2[sup]76[/sup] and 116k iterations at 2[sup]71[/sup], which has a non-trivial runtime (several seconds) such that I wouldn't want to use it on a live webpage. Now you know why it takes forever to get an estimated completion date if you have many P-1 assignments in Prime95. I'll have to see if I can optimize it somewhat while remaining relatively accurate...
The numbers I got for [url=https://www.mersenne.ca/exponent/89200591]M89200591[/url]:[code]TF=71 [b1] => 950000 [b2] => 26600000 [success_rate] => 0.059250926679056 TF=76 [b1] => 725000 [b2] => 14318750 [success_rate] => 0.031280274205779[/code] |
So I just realized two of my "higher end" machines (Haswell or so) are 32 bit windows 7 (don't laugh too much, I have some old stuff). Is that going to impact LL/PRP testing or should I redo them into 64 bit?
I saw a thread from 12 years ago that said it wouldn't affect it so I thought I'd ask now. |
1 Attachment(s)
[QUOTE=James Heinrich;498516]I'll have to see if I can optimize it somewhat while remaining relatively accurate...[/QUOTE]Indeed. By picking some more sensible starting, ending and step points in the loops I cut the iterations from 65000 to 1500 and a much more sane runtime while still getting more-or-less the same results.
Looking back at [url]https://www.mersenne.ca/exponent/89200591[/url] the table at the top should now show better values for P-1 on the GPU72 line, and more sensibly show that despite the lower P-1 bounds the overall factor probability was higher due to more TF. |
[QUOTE=irowiki;498543]are 32 bit windows 7 ....Is that going to impact LL/PRP testing or should I redo them into 64 bit?
[/QUOTE] Redo. 10-20% gain |
[QUOTE=Prime95;498574]Redo. 10-20% gain[/QUOTE]
Okay, for science I figured I'd post my results here. I redid the machine, and flipped it from Win7 32 bit to Win10 64 bit. It is an i3-3220 with 4 GB (2x2). It's doing about 9% better. Score! |
1 Attachment(s)
Using latest version 29.4.b8 (Win64) on Skylake X platforms, I have noticed that in main thread after launching Prime95 there is:
[code]Optimizing for CPU architecture: Corei3/i5/i7, L2 cache size: [b]256 KB[/b], L3 cache size: 14080 KB[/code] Whereas L2 cache should be reported as 1MB :smile: In undoc.txt there is a section concerning L2 cache size, but it's said to relate to Pentium 4 architecture: [code]CpuL2CacheSize=128 or 256 or 512 CpuL2CacheLineSize=32 or 64 or 128 CpuL2SetAssociative=4 or 8[/code] After setting it to: [code]CpuL2CacheSize=1024 CpuL2CacheLineSize=64 CpuL2SetAssociative=16[/code] Program reports it properly: [code]Optimizing for CPU architecture: Corei3/i5/i7, L2 cache size: [b]1 MB[/b], L3 cache size: 14080 KB[/code] Question: should detection of L2 cache parameters affect Skylake X performance? After setting it to proper values I haven't noticed improvement in iteration times, at least for FMA3 FFT length of 864K. Another minor issue I have noticed is the representation of CPU information - the line below L1 cache information is unreadable (see attachment) :smile: |
Sorry. This is about the latest version. No thread, yet.
I switched to 295B4 in the past two days. I just noticed that Manual Communication and Unreserve Exponent menu items are greyed out. I tried commenting the GPU72 proxy out of prime.txt, with no change. Did I do something wrong? I don't think I've ever seen this, before. This persisted after program and system restarts. EDIT: Found the cause. In prime.txt I found 'UsePrimenet=0'. I guess this was a misstep in transferring the prime.txt information to the new version. Changed to '=1' and all is well. Maybe not. On contacting the server, I was told that the assignment key for a 60% done DC (482955xx) 'belongs to another user.' However, this did not remove the assignment from P95. What I see is that GPU72 is still aware of this assignment, but Primenet does not update progress. However, Primenet Exponent Status still lists me as the owner. |
[QUOTE=Cruelty;500586]Another minor issue I have noticed is the representation of CPU information - the line below L1 cache information is unreadable (see attachment) :smile:[/QUOTE]
Mine also cuts off the last line; is it because the AVX512F wraps to a new line? |
I will widen the dialog box.
The L2 cache mis-detection is a known bug, I do plan to fix it. The bug does not impact performance. |
[QUOTE=kladner;500591]
Maybe not. On contacting the server, I was told that the assignment key for a 60% done DC (482955xx) 'belongs to another user.' However, this did not remove the assignment from P95. What I see is that GPU72 is still aware of this assignment, but Primenet does not update progress. However, Primenet Exponent Status still lists me as the owner.[/QUOTE] Are you still listed with your ID in File -> Primenet? I've had some of my clients randomly switch to anon and I had to switch it back. |
[QUOTE=irowiki;500619]Are you still listed with your ID in File -> Primenet? I've had some of my clients randomly switch to anon and I had to switch it back.[/QUOTE]
Thank you! That was it. :smile: I filled in the info and everyone was happy. This has to have happened as one of a couple of my fumbles making the upgrade to 295b4. Being sick and "medicated" is not the best time to juggle files. |
Well, I give up
I have tried every combination of the params used to control hyperthreading & cores to get 1 worker using all cores with hyperthreading to no avail.
I have tried 3 different machines. Yes, I scrub the directory clean between attempts ( except local.txt ) so it looks like a new install. Can some kind soul please clue me in? TIA |
[QUOTE=tServo;500853]I have tried every combination of the params used to control hyperthreading & cores to get 1 worker using all cores with hyperthreading to no avail.
I have tried 3 different machines. Can some kind soul please clue me in? TIA[/QUOTE] Windows? Linux? |
[QUOTE=tServo;500853]Can some kind soul please clue me in?[/QUOTE]
Memory bandwidth limited? The toy may actually [U]do[/U] use all cores, but they each has own free time, caused by the fact that they can't get/send the data in time. How does task manager look like, are all cores busy x<100 percent, or are n<all cores busy 100% and the rest 0%? |
[QUOTE=tServo;500853]I have tried every combination of the params used to control hyperthreading & cores to get 1 worker using all cores with hyperthreading to no avail.
I have tried 3 different machines. Yes, I scrub the directory clean between attempts ( except local.txt ) so it looks like a new install. Can some kind soul please clue me in? TIA[/QUOTE] There are at least 2 options, in local.txt assuming you have 4 physical cores and 8 threads with hyperthreading: WorkerThreads=1 CoresPerTest=4 HyperthreadLL=1 OR WorkerThreads=1 ThreadsPerTest=8 |
I was curious about this, so I set up an 8 core, 1 worker on the test version I keep. I mostly use this instance for torture testing. It is v295b4, like the working copy.
After creating the setup, I was too lazy to set up a worktodo.txt, so I ran a few seconds of Blend TT. When I checked back in Worker Windows, it had reverted to the 4 core settings, even though 'Use multi-threading' was still checked. I haven't tried anything more, but the result sounded similar to yours. 8 hyper-threaded cores, one worker settings revert to 4 cores, 1 worker. |
Prime95 v29.4 b8
I see binaries for version b8 (e.g., [URL="http://www.mersenne.org/ftp_root/gimps/p95v294b8.win64.zip"]Windows: 64-bit 29.4b8 - Released on: 2018-02-09[/URL]), but there is only source code available up to b7. How can I get the source for b8?
|
[QUOTE=mikegold10;500918]I see binaries for version b8 (e.g., [URL="http://www.mersenne.org/ftp_root/gimps/p95v294b8.win64.zip"]Windows: 64-bit 29.4b8 - Released on: 2018-02-09[/URL]), but there is only source code available up to b7. How can I get the source for b8?[/QUOTE]
You can't. It's been a long while, I suspect I did not upload new source because the difference was very minor. |
[QUOTE=ATH;500873]There are at least 2 options, in local.txt assuming you have 4 physical cores and 8 threads with hyperthreading:
WorkerThreads=1 CoresPerTest=4 HyperthreadLL=1 OR WorkerThreads=1 ThreadsPerTest=8[/QUOTE] Thanks ATH ! The first suggestion worked so I did not try the second. To the others ( irowiki & LaurV ) Running Windoze. I am running memory bandwidth tests, Laurv. I'll report on these later. |
[QUOTE=kladner;500917]I was curious about this, so I set up an 8 core, 1 worker on the test version I keep. I mostly use this instance for torture testing. It is v295b4, like the working copy.
After creating the setup, I was too lazy to set up a worktodo.txt, so I ran a few seconds of Blend TT. When I checked back in Worker Windows, it had reverted to the 4 core settings, even though 'Use multi-threading' was still checked. I haven't tried anything more, but the result sounded similar to yours. 8 hyper-threaded cores, one worker settings revert to 4 cores, 1 worker.[/QUOTE] I don't remember when (it was a recent version), prime95 Worker Windows dialog box changed from a "thread-centric" view to a "physical core-centric" view. Take your 4 core hyperthreaded machine. The old prime95 let you allocate up to 8 threads among your workers. If you were smart enough to only assign 4 threads, then prime95 was smart enough to use different physical cores for each thread. The new prime95 interface lets you assign up to 4 physical cores to your workers with a separate checkbox for "use hyperthreading". I think the new scheme is easier to understand, but the transition period may be confusing. |
[QUOTE=Prime95;501012]I think the new scheme is easier to understand, but the transition period may be confusing.[/QUOTE]
+1 |
PRP Roundoff
I am getting the following in each output line:
[CODE]Possible hardware errors have occurred during the test! 1 ROUNDOFF > 0.4 Confidence is final result is excellent. [/CODE] When this happens, the worktodo file is modified for the next assignment. "FFT2=336K" is placed directly behind the AID number. Only the next assignment, not all. This is happening on my older HP Z220 workstation. i5-3570 with 4 GB of RAM and Windows 7 Ultimate. I suppose I could go into the worktodo file and replicate the FFT size for each assignment since they are all relatively the same size. Is there a workaround for this, or is it something I need not be concerned with? I searched the documentation and did not see anything relative. |
[QUOTE=storm5510;501974]I am getting the following in each output line:
[CODE]Possible hardware errors have occurred during the test! 1 ROUNDOFF > 0.4 Confidence is final result is excellent. [/CODE] When this happens, the worktodo file is modified for the next assignment. "FFT2=336K" is placed directly behind the AID number. Only the next assignment, not all. This is happening on my older HP Z220 workstation. i5-3570 with 4 GB of RAM and Windows 7 Ultimate. I suppose I could go into the worktodo file and replicate the FFT size for each assignment since they are all relatively the same size. Is there a workaround for this, or is it something I need not be concerned with? I searched the documentation and did not see anything relative.[/QUOTE] No, this is all routine. The program is making an adjustment on the fly for one exponent. Don't edit your worktodo file to add [c]FFT2=[/c] fields, the program will do that automatically if warranted on a case-by-case basis. |
You might want to de-dust your case interior if it has been a while. A "possible hardware error" makes me consider possible remedies in case there was an actual computation error.
|
[QUOTE=VBCurtis;502001]You might want to de-dust your case interior if it has been a while. A "possible hardware error" makes me consider possible remedies in case there was an actual computation error.[/QUOTE]
With beta version 29.5 I am getting more frequent and routine "possible hardware error" for PRP testing. Say, one or two ROUNDOFF > 0.4 over the course of a PRP test. This is on AWS servers with ECC memory that have a spotless track record, so it can't be an actual hardware issue. We are more confident about Gerbicz error checking now, so it doesn't matter so much anymore if a lower FFT produces a very occasional higher roundoff error in a PRP test, since it's easily caught and corrected. Perhaps the FFT limits are calculated differently now, for greater speed by default, since there is no longer as much need to be on the safe side. |
prime95 P-1 and memory limits
One one system (16-core e5-2670 cpu with 128GB), prime95 v29.4b8, 4 workers, I had no problem setting P-1/ECM memory limits to 32000MB.
On another (2-core i7-7500U cpu with 16GB), prime95 v29.4b8, 2 workers, limited P-1/ECM memory limits to 7281MB max. Why a limit hit in one case, that was not an issue in the other? undoc.txt says "Since P-1 stage 2 runs faster with more memory available you can have the program only run stage 2 at night when more memory is available." On the i7 system above, I found going from 1600MB to 7200MB [B]increased[/B] all candidate large exponents' estimated P-1 times significantly (44 to 103%). Maybe at these large exponents, it's starved for ram at the lower settings and uses much lower bounds for a lower probability of finding a factor, but goes further at the larger memory setting? |
[QUOTE=kriesel;502007]I found going from 1600MB to 7200MB [B]increased[/B] ... estimated P-1 times significantly.[quote=undoc.txt]"... P-1 stage 2 runs faster with more memory available..." [color=red][b]*[/b][/color][/quote][/QUOTE][color=red][b]*[/b][/color] True, up to a point, assuming the bounds stay constant. If possible Prime95 will do the entire test in one pass, but that takes a fair bit of RAM (~20GB or so these days). If there isn't that much RAM available then it will do it in multiple passes, at the expense of a small amount of overhead for each pass, therefore the fewer passes you do the less overhead and greater throughput. You can see approximate RAM requirements for a P-1 assignment on my [url=https://www.mersenne.ca/prob.php]P-1 probability page[/url].
However, the magic of picking appropriate bounds takes many things into account, including factor probability vs runtime, compounded with the available RAM (which influences the runtime at given bounds). So if you increase the available RAM it could do one of two things: a) run at the same bounds but faster b) run at higher bounds, with greater factor probability, but same or longer runtime. It's an unpleasantly iterative task to calculate the appropriate bounds for the available RAM on the current assignment, but you can be assured that Prime95 tries to select bounds the provide the highest project throughput. |
[QUOTE=GP2;502006]Perhaps the FFT limits are calculated differently now, for greater speed by default, [/QUOTE]
Yes, in the last 29.5 beta FFT crossovers are calculated differently for AVX2 FFTs (a.k.a. FMA FFTs). FMA has slightly better roundoff behavior than plain-old AVX FFTs. So, I made a slight adjustment higher in FFT crossovers for FMA FFTs. |
[QUOTE=kriesel;502007]One one system (16-core e5-2670 cpu with 128GB), prime95 v29.4b8, 4 workers, I had no problem setting P-1/ECM memory limits to 32000MB.
On another (2-core i7-7500U cpu with 16GB), prime95 v29.4b8, 2 workers, limited P-1/ECM memory limits to 7281MB max. Why a limit hit in one case, that was not an issue in the other?[/QUOTE] But a 16GB computer cannot allocate 32000MB of memory. |
[QUOTE=GP2;501986]No, this is all routine.
The program is making an adjustment on the fly for one exponent. Don't edit your worktodo file to add [c]FFT2=[/c] fields, the program will do that automatically if warranted on a case-by-case basis.[/QUOTE] This is all I needed to know. Thank you! :smile: [QUOTE=VBCurtis]You might want to de-dust your case interior if it has been a while. A "possible hardware error" makes me consider possible remedies in case there was an actual computation error [/QUOTE] I was in there just recently and I didn't notice anything. However, I will look again in more detail. :smile: |
I found some added material in the [I]results.txt[/I] file:
[CODE]Disregard last error. Result is reproducible and thus not a hardware problem. For added safety, redoing iteration using a slower, more reliable method. [/CODE] |
[QUOTE=GP2;502022]But a 16GB computer cannot allocate 32000MB of memory.[/QUOTE]
Well, I knew that it could not allocate nearly double its physical ram, so never considered asking it to. It's unclear why it would not allow the 8000MB I attempted, which would have been the same fraction allocated, if applied to all workers. (But both instances are only running P-1 on one worker, no ECM.) 128GB Ram on one Windows 7 system, with 4 workers, allowed 32000MB, very close to 1/4 of all ram, 1/(number of workers).32000MB*4/(128*1024 MB)=~0.9766 16GB ram on a Windows 10 system, with 2 workers, refused 8000MB, said 7281 max. 7281MB*2/(16*1024) =~0.8888, while 8000MB*2/(16*1024)=~0.9766. The question remains, why the odd 7281 MB figure? |
Probably Windows needs ~2-3Gb for itself, regardless of how much ram the system has. So a system with more ram has a larger proportion available for applications to use.
And Windows 10 probably needs more memory than Windows 7. Chris |
[QUOTE=kriesel;502210]
16GB ram on a Windows 10 system, with 2 workers, refused 8000MB, said 7281 max. 7281MB*2/(16*1024) =~0.8888, while 8000MB*2/(16*1024)=~0.9766. The question remains, why the odd 7281 MB figure?[/QUOTE] My bad, remembered the Win10 system wrong, it's 16GB-[U][I]CAPABLE[/I][/U], 8GB [U]installed[/U] currently. Win10 and the rest of prime95 V29.4b8 including a primality test worker are getting by in the remaining 8192-7281 max=911 MB min. It's currently using 6531 MB of the 7281 MB in a P-1 run. |
CPU usage, Intel i9-7900X @3.3GHz
I'm on Windows 7 Pro. My CPU has 10 physical cores, 20 logical cores and 64GB of ram.
When I run single-threaded programs I can run 20 of them at 'full speed' which is reflected by the CPU usage being shown as 100% in task manager. When I run Prime95 and set everything as it recommends (and using 32GB of ram) it decides on 6 workers and task manager shows 65% usage, ie 13 threads. I tried setting number of CPU cores etc in local.txt: NumCPUs=10 CpuNumHyperthreads=2 CpuSpeed=3300 ...makes no difference. If you try to change to 7 workers it grumbles then if you force it, it drops to around 30% CPU usage. Is there any way to get it to fully utilise the 20 threads? Also the completion estimates are way, way out. Every minute or so it gives a new estimate for each worker, and that estimate drops by between 7 and 20 minutes each time. Will this be fixed after the program runs a benchmark over the next day or so? |
Try with:
WorkerThreads=5 CoresPerTest=2 HyperthreadLL=1 or if you want fewer workers: WorkerThreads=2 CoresPerTest=5 HyperthreadLL=1 But it might be faster to disable hyperthreading in the BIOS, it is very rare that HT is faster in Prime95 because everything is capped by the RAM speed. HT is designed for "normal" tasks like browsing, word processing etc. where each thread to not utilize the core fully, so another thread can successfully use it some of the time. |
[QUOTE=theonetruepath;503651]I'm on Windows 7 Pro. My CPU has 10 physical cores, 20 logical cores and 64GB of ram.
When I run single-threaded programs I can run 20 of them at 'full speed' which is reflected by the CPU usage being shown as 100% in task manager. When I run Prime95 and set everything as it recommends (and using 32GB of ram) it decides on 6 workers and task manager shows 65% usage, ie 13 threads. I tried setting number of CPU cores etc in local.txt: NumCPUs=10 CpuNumHyperthreads=2 CpuSpeed=3300 ...makes no difference. If you try to change to 7 workers it grumbles then if you force it, it drops to around 30% CPU usage. Is there any way to get it to fully utilise the 20 threads? Also the completion estimates are way, way out. Every minute or so it gives a new estimate for each worker, and that estimate drops by between 7 and 20 minutes each time. Will this be fixed after the program runs a benchmark over the next day or so?[/QUOTE] There are a number of things to consider: 32 vs 64 bit OS, 32 vs 64 bit CPU registers, \(2^{32}\)vs \(2^{64}\) addressable memory locations, memory speed, memory channels, memory bus width, memory rank, instructions per memory clock, and more. All these go into formulae for theoretical maximum possible operation bandwidth of CPU or Memory. if Memory burst rate bandwidth is less than CPU output then you are memory bound. at 1 byte per address: a 32 bit register can only address 4 GiB of memory. a 64 bit register can address 16 EiB of memory in theory. if each CPU core (with 1 register for addressing) is working at 3.3 GHz then: 32 bit can put 13.2 GiB into memory per second per core in theory. 64 bit can put 26.4 GiB into memory per second per core in theory. This all assumes 1 operation per clock. etc. So without proper information on your setup, people can only guess at possible best throughput scenarios. |
As ATH said, your system is probably memory bandwidth bound trying to run 10 cores, unless you have DDR4-3200 or higher. If you have stock speed memory at 2666, it's probably no slower to run only 8 cores.
With regards to hyperthreading, it offers almost no benefit and is often a detriment to Prime95. Prime95 is finely tuned assembly capable of saturating a CPU core: it doesn't leave execution gaps that another thread could easily fill. If it's a desktop system, I would still leave hyperthreading enabled in the BIOS. Prime95 is an excellent way to fill the execution gaps left by desktop apps, and the hyperthreading makes those apps more responsive when running Prime95. |
[QUOTE=theonetruepath;503651]I'm on Windows 7 Pro. My CPU has 10 physical cores, 20 logical cores and 64GB of ram.
When I run single-threaded programs I can run 20 of them at 'full speed' which is reflected by the CPU usage being shown as 100% in task manager. [/QUOTE] Run version 29.5 -- it makes use of the AVX-512 instructions (about 15% faster). You'll have to dig through the 29.5 thread to find the link to the Windows executable. Your goal is to get to 50% CPU usage -- hyperthreading will slow down prime95. You can leave hyperthreading on in the BIOS. |
another thing to point out is there is a difference between multithreading and hyperthreading
[YOUTUBE]7ENFeb-J75k[/YOUTUBE] |
If you leave Hyperthreading on like George and Mark suggest then remove the HyperthreadLL=1 line from local.txt or use HyperthreadLL=0.
Then use a combination of WorkerThreads= and CoresPerTest= that multiplies to 10, like 1/10, 2/5, 5/2 or 10/1. Then you have to content with that CPU usage of Prime95 will show as 50% but it will be running as fast as it can. The other 50% is the 10 "virtual" threads 1 for each core that is not currently working, while the other 10 threads are using the 10 cores fully. Yes it is a stupid system imo as well, CPU usage should show the actually usage of the physical cores not some virtual threads that does not matter...which is why I have disabled HT in my BIOS. |
[QUOTE=ATH;503662]
But it might be faster to disable hyperthreading in the BIOS, it is very rare that HT is faster in Prime95 because everything is capped by the RAM speed. [/QUOTE] If the system is dedicated to Prime95 I would agree. For a system that sees other uses, HT still helps. When I have HT disabled and play a game or encode videos, my ms/iter time in Prime95 goes up by 500-600%. With HT enabled, Prime is only about 10-15% slower on my system when running those tasks. |
[QUOTE=NookieN;503712]When I have HT disabled and play a game or encode videos, my ms/iter time in Prime95 goes up by 500-600%.[/QUOTE]
That's exactly by design. Every other program on your computer gets priority over Prime95. [QUOTE=NookieN;503712]With HT enabled, Prime is only about 10-15% slower on my system when running those tasks.[/QUOTE] OK, that suggests your games and your video encoders are not terribly optimized. But an important thing to take into consideration is Prime95 /can/ slow down the responsiveness of your other apps. Every "use case" is different. Do "what makes sense" for you. |
[QUOTE=chalsall;503718]
OK, that suggests your games and your video encoders are not terribly optimized. But an important thing to take into consideration is Prime95 /can/ slow down the responsiveness of your other apps. Every "use case" is different. Do "what makes sense" for you.[/QUOTE] Yeah that's a good point, the encoding likely is slower when Prime is running. I agree that it really depends on your use cases, and that was my point. I would not encourage someone to just turn off HT unless they know that it is in fact detrimental to their intended workload. |
[QUOTE=NookieN;503729]I would not encourage someone to just turn off HT unless they know that it is in fact detrimental to their intended workload.[/QUOTE]
Sigh... That is the exact opposite of what I intended to communicate. |
[QUOTE=Prime95;503669]Run version 29.5 -- it makes use of the AVX-512 instructions (about 15% faster). You'll have to dig through the 29.5 thread to find the link to the Windows executable.
Your goal is to get to 50% CPU usage -- hyperthreading will slow down prime95. You can leave hyperthreading on in the BIOS.[/QUOTE] OK from all the answers it sounds like Prime95 is probably doing just fine at configuring itself on the defaults, and 65% CPU is not low at all. Sounds like a good idea to shift to 29.5 if it goes 15% faster. ms/iter for workers 1 through 6 is averaging 8.3 8.2 5.8 23 23 23 My RAM is rated at 3000MHz, who knows what it's clocked at though. ETAs appear to be the output of a high quality random number generator. |
[QUOTE=theonetruepath;503732]My RAM is rated at 3000MHz, who knows what it's clocked at though.[/QUOTE]Tools like [url=https://www.cpuid.com/softwares/cpu-z.html]CPU-Z[/url] can tell you. Note that some tools, CPU-Z included, report the clockspeed, not the "double-data-rate" (hence the term DDR) transfers-per-second, so if your RAM was running at its rated speed it would report as 1500MHz in CPU-Z.
|
[QUOTE=James Heinrich;503733]Tools like [url=https://www.cpuid.com/softwares/cpu-z.html]CPU-Z[/url] can tell you. Note that some tools, CPU-Z included, report the clockspeed, not the "double-data-rate" (hence the term DDR) transfers-per-second, so if your RAM was running at its rated speed it would report as 1500MHz in CPU-Z.[/QUOTE]
Looks like my RAM is underclocked, CPU-Z says 1067 instead of 1500. Might have to dig into the BIOS settings. I tried Prime95-29.5 build 5 and the subsequent "special build for someone", both crash immediately, I'll wait for the final version there. |
[QUOTE=theonetruepath;503738]Looks like my RAM is underclocked, CPU-Z says 1067 instead of 1500. Might have to dig into the BIOS settings.
[/QUOTE] RAM now running at 1500MHz. Prime95 29.4 now reports Workers 1 through 6: cores/ms per iter 3/6.24 3/6.6 4/5.1 1/19.35 1/19.4 1/19.4 For some reason I'm trying to use 13 cores, which is why task manager says I'm at 65% So should I reduce this to 10 for better throughput? |
[QUOTE=theonetruepath;503738]Looks like my RAM is underclocked, CPU-Z says 1067 instead of 1500. Might have to dig into the BIOS settings.[/QUOTE]Assuming your CPU is the [url=https://ark.intel.com/products/123613]i9-7900X[/url], note that the max (officially) supported RAM speed is DDR4-2666. Having higher speed-rated RAM isn't a bad thing though, as you may be able to run it at lower timings and that could have as much (or more) benefit as the higher clock speed. On a supported motherboard, the best available RAM settings are likely listed under an "[URL="https://en.wikipedia.org/wiki/Serial_presence_detect#XMP"]XMP[/URL]" profile -- again consult CPU-Z under the "SPD" tab to see available timing specs for your specific RAM.
|
2 Attachment(s)
[QUOTE=James Heinrich;503743]Assuming your CPU is the [url=https://ark.intel.com/products/123613]i9-7900X[/url], note that the max (officially) supported RAM speed is DDR4-2666. Having higher speed-rated RAM isn't a bad thing though, as you may be able to run it at lower timings and that could have as much (or more) benefit as the higher clock speed. On a supported motherboard, the best available RAM settings are likely listed under an "[URL="https://en.wikipedia.org/wiki/Serial_presence_detect#XMP"]XMP[/URL]" profile -- again consult CPU-Z under the "SPD" tab to see available timing specs for your specific RAM.[/QUOTE]
Yup I turned on the XMP profile. Seems stable so far. Mobo is a Gigabyte Aorus Gaming 9 v1. |
[QUOTE=theonetruepath;503742]For some reason I'm trying to use 13 cores, which is why task manager says I'm at 65%
So should I reduce this to 10 for better throughput?[/QUOTE] Absolutely. |
[QUOTE=Prime95;503747]Absolutely.[/QUOTE]
OK I reduced to three workers. Timings for three workers are now (cores/ms per iter) 3/4.8 3/4.8 4/3.5 So the overall average time per iteration drops from around 12.2 (13 cores) to around 4.4 (10 cores) I'm assuming I can just average these since they are all "10000 iteration" times. You aren't kidding when you say it slows down if using extra cores... Any idea why version 29.5 crashes on my system - is there some diagnostic info I can scrape together for you? |
[QUOTE=theonetruepath;503749]
I'm assuming I can just average these since they are all "10000 iteration" times. [/QUOTE] Heh of course you can't. So I took an arbitrary time period and worked out how many iterations each worker contributed in that time, then worked out net msecs per iter for the system 13 cores is 1.62 ms/iter 10 cores is 1.42 ms/iter Maybe the program could work this out from time to time... |
[QUOTE=theonetruepath;503751]Heh of course you can't. So I took an arbitrary time period and worked out how many iterations each worker contributed in that time, then worked out net msecs per iter for the system
13 cores is 1.62 ms/iter 10 cores is 1.42 ms/iter Maybe the program could work this out from time to time...[/QUOTE] It does. Consider it a homework assignment to find the benchmark tools, run such a benchmark, and choose whichever combination of workers and threads per worker maximizes iterations per second. It may be that the first time you ran it, you had something else running that confused the testing process and left P95 with that 13-core choice. |
[QUOTE=VBCurtis;503756]It does. Consider it a homework assignment to find the benchmark tools, run such a benchmark, and choose whichever combination of workers and threads per worker maximizes iterations per second.
It may be that the first time you ran it, you had something else running that confused the testing process and left P95 with that 13-core choice.[/QUOTE] I had a few tiling programs flogging 4 threads yes. So apparently 1 worker with 10 cores is quickest... Down to 1.28 ms/iter now |
[QUOTE=theonetruepath;503749]
Any idea why version 29.5 crashes on my system - is there some diagnostic info I can scrape together for you?[/QUOTE] See [url]https://www.mersenneforum.org/showpost.php?p=502140&postcount=99[/url] |
[QUOTE=Prime95;503887]See [url]https://www.mersenneforum.org/showpost.php?p=502140&postcount=99[/url][/QUOTE]
Yup I tried that one as well. Both versions ask the initial questions if firing up in a 'clean' directory, but crash immediately when finished with those, and immediately if running subsequently but without those initial questions. |
[QUOTE=theonetruepath;503901]Yup I tried that one as well. Both versions ask the initial questions if firing up in a 'clean' directory, but crash immediately when finished with those, and immediately if running subsequently but without those initial questions.[/QUOTE]
If you add "CPUSupportsAVX512F=0" to local.txt, does it still crash? |
[QUOTE=Prime95;503908]If you add "CPUSupportsAVX512F=0" to local.txt, does it still crash?[/QUOTE]
That stops it crashing. Seems to run slower than the older version though. |
[QUOTE=theonetruepath;503913]That stops it crashing. Seems to run slower than the older version though.[/QUOTE]
That should make it essentially equivalent to 29.4. The question is why does your machine crash trying to use AVX-512 instructions. |
[QUOTE=Prime95;503917]That should make it essentially equivalent to 29.4. The question is why does your machine crash trying to use AVX-512 instructions.[/QUOTE]
Might be something to do with this. (I'm running Windows 7 Pro) ============================ Note that an operating system (OS) needs to support the new extended CPU state (OS-XSAVE) to allow software to use new instruction sets like AVX, AVX2/FMA, AVX512*, etc. Windows 7, 8.x, 10.x or later / Server 2008/R2, 2012/R2, 2016 is required. * AVX512 and later are/will only supported by Windows 10.x / Server 2016. Older operating systems like Windows XP, Windows Vista and earlier have not been updated and are not likely to be. |
Here is what y-cruncher says:
====================== Performance Warning: This processor supports AVX512 instructions. However, it cannot be used because the operating system has not enabled it. This could be due to either of the following reasons: - The operating system is too old and does not support AVX512. - AVX512 has been explicitly disabled. To achieve maximum performance, AVX512 must be enabled in the OS. ====================== So I'm hoping Microsoft will come to the party with a hotfix to enable AVX-512 in Windows 7... |
Yeah. I think P95 could be improved by testing that the instructions can be executed, instead of assuming they can be executed by only reading the feature bits. Indeed there is no need to read the feature bits at all, just test the instructions and catch any exceptions. That way you test the entire path: CPU, OS, VM, etc.; all the things that might not have support for recent instructions.
|
[QUOTE=retina;504002]Yeah. I think P95 could be improved by testing that the instructions can be executed, instead of assuming they can be executed by only reading the feature bits. .[/QUOTE]
Will do |
[QUOTE=theonetruepath;504000]So I'm hoping Microsoft will come to the party with a hotfix to enable AVX-512 in Windows 7...[/QUOTE]
I think that is a forlorn hope. "Mainstream support" for Windows 7 ended in January 2015, and it is now on "extended support" until January 2020. "Extended support", in Microsoft-speak, means they aren't adding any new features. |
In case it hasn't already been reported, I noticed that while performing Fermat ECM work that the program will stop the worker to perform needed benchmarks, but then not perform any, then immediately restart the worker.
|
[QUOTE=GP2;504014]I think that is a forlorn hope.
"Mainstream support" for Windows 7 ended in January 2015, and it is now on "extended support" until January 2020. "Extended support", in Microsoft-speak, means they aren't adding any new features.[/QUOTE] When Microsoft proclaimed Windows X forever, they erred by 3. |
Hi, I just noticed that the PRP test in prime95/mprime seems to be broken since version 29.4 (at least on Linux 64 Bit).
Test cases for the worktodo.txt (all should be prime): [CODE] PRP=1,10,19,-1,"9" PRP=1,10,23,-1,"9" PRP=1,10,317,-1,"9" PRP=1,10,1031,-1,"9" PRP=1,10,49081,-1,"9" PRP=1,10,86453,-1,"9" PRP=1,10,109297,-1,"9" PRP=1,10,270343,-1,"9" [/CODE] latest 29.3 works fine, 29.4b8 and 29.5b5 give composites as results with the following example output: [CODE]10^270343-1/9 is not prime. Type-5 RES64: 8E38E38E38E38E39[/CODE]My guess is an off-by-one error in the iteration count. |
[QUOTE=MrRepunit;504151]Hi, I just noticed that the PRP test in prime95/mprime seems to be broken since version 29.4 (at least on Linux 64 Bit).
[/QUOTE] Type 5 residues don't work in 29.4, except for Wagstaff (and for Mersenne, type 1 and type 5 are the same thing). I reported the problem a while ago. Type 1 residues do work, and since there is no Gerbicz error checking for repunits other than Mersenne (b=2) and Wagstaff (b=−2), there is no advantage to using type 5 for any bases other than Mersenne and Wagstaff. You can get type 1 residues with PRP base = 3 by specifying, for example: [CODE] PRP=1,10,19,-1,"9",99,0,3,1 [/CODE] For type 5 you can change the [c],99,0,3,1[/c] to [c],99,0,3,5[/c] or just omit it entirely. The drawback is that the numerical value of a non-zero type-1 residue will be entirely different from the non-zero type-5 residue for the same composite Mersenne number. So you can't mix and match results for the two residue types. In 29.4, for the exponent 19 I got the type 5 residue 0x7B5BAD595E238E39, which is 8888888888888888889 in decimal. By reverse engineering, I figured out what mprime 29.4 is actually calculating when it calculates residues: [CODE] zero = mpz(0) one = mpz(1) minus_two = mpz(-2) b = mpz(args.repunit) # 10 a = mpz(args.prp_base) # 3 t = args.bit_length # 64 pow2_t = mpz(1<<t) if b > zero: mp_numer = b**p - one mp_denom = b - one mp_ratio = mp_numer // mp_denom if args.residue_type == 1: res = pow(a, mp_ratio - one, mp_ratio) % pow2_t elif args.residue_type == 5: res = pow(a, mp_ratio - one, mp_numer) % pow2_t else: mp_numer = (-b)**p + one mp_denom = -b + one mp_ratio = mp_numer // mp_denom if args.residue_type == 1: res = pow(a, mp_ratio - one, mp_ratio) % pow2_t elif args.residue_type == 5 and b != minus_two: res = pow(a, mp_ratio - one, mp_numer) % pow2_t elif args.residue_type == 5 and b == minus_two: res = pow(a, mp_numer - one, mp_numer) % pow2_t [/CODE] |
Hmmm. While doing a 50M double check, I just got a Jacobi error. After it was finished checking through backup files, it reported the chances of a good test as "fair".
The test is 50% done. After I investigate and decide the hardware is fixed, should I start the test over and lose 5 days for sure, or should I let it finish and lose 10 days, but only if the test turns out to be bad? |
[QUOTE=PhilF;504168]Hmmm. While doing a 50M double check, I just got a Jacobi error. After it was finished checking through backup files, it reported the chances of a good test as "fair".
The test is 50% done. After I investigate and decide the hardware is fixed, should I start the test over and lose 5 days for sure, or should I let it finish and lose 10 days, but only if the test turns out to be bad?[/QUOTE] With one error, I'd let it run. If you see multiple errors in a run, I'd switch the machine to PRP work. |
That is what DC's are for, checking on machines.
|
[QUOTE=Mark Rose;504170] I'd switch the machine to PRP work.[/QUOTE]
Switch the machine to PRP testing for all future tests. |
[QUOTE=PhilF;504168]The test is 50% done.<...> should I start the test over and lose 5 days for sure, or should I let it finish and lose 10 days, but only if the test turns out to be bad?[/QUOTE]
The "confidence" of the check is 50%, so your chances are equal, either way. You have 100% lose 5 days or 50% lose 10 days, hehe.You don't know what happens if you start again, it may repeat some error (the chances are not 100% to be successful, you may lose the 5 days, plus some other in the future, but of course, that was only a joke, because it sounded funny). I would let it finish. And after, switch to PRP testing (where the error check is more robust), at least for next few exponents, to be sure the hardware is really fixed. |
I can't remember if I asked this before, but...
We can't do PRP base 2 for Mersenne or Wagstaff testing, so by default we choose PRP base 3. However, for other choices of [c]b[/c] in [c]k*b^n+c[/c], we could use PRP base 2. For instance, for repunits using b=10, we could use either PRP base 2 or base 3 (or others). Would there be any speed advantage to using PRP base 2 over base 3? |
[QUOTE=GP2;504243]Would there be any speed advantage to using PRP base 2 over base 3?[/QUOTE]
No. |
[QUOTE=MrRepunit;504151]Hi, I just noticed that the PRP test in prime95/mprime seems to be broken since version 29.4 (at least on Linux 64 Bit).[/QUOTE]
Fixed in next 29.5 build. The bug affected type-5 PRP tests with base != 2 and with known factors. |
benchmark crash
1 Attachment(s)
Same i7-8750H Win10 x64 system that crashes or stalls in prime95 v29.5b5 benchmark, also does in prime95 x64 v29.4b8 benchmark after several rounds at 2304k.This system is 2 weeks old.
|
| All times are UTC. The time now is 17:51. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.