Prime95 version 28.6 / 28.7 (28.7 now available!)
 2015-05-24, 11:39 #78 harlee     Sep 2006 Odenton, MD, USA 22·41 Posts I just installed the 28.6 Windows 32-bit software onto my older P4 system. I'm doing P1 testing and noticed that the B1 and B2 bounds are now lower. Just wondering if this is correct as I didn't see anything about the bounds changing in the whatsnew.txt file. Last fiddled with by harlee on 2015-05-24 at 11:42 Reason: fixed the number of bits from 35 to 32
 2015-05-25, 03:06 #79 Madpoo Serpentine Vermin Jar     Jul 2014 63358 Posts Version 28.6 and "ScaleOutputFrequency=1" I just updated my triple-checking systems to version 28.6 and I'm noticing a difference in how the "ScaleOutputFrequency=1" option is working. It doesn't seem to scale the update frequency of vastly different workers like it did in 28.5. The most extreme example is one system where I'm testing M383838383 on one socket, and a little 30M exponent on the other. I previously had the iterations between screen outputs set to 30000 and that worked fairly well. I could see progress on the big 383M and the 30M exponents moving along. Now it seems to ignore that option entirely and only updates either one at the actual specified rate, no scaling. I peeked at the source code changes between 28.5 and 28.6 and I do see some changes that happened in there, so I'm guessing that's the reason, but I didn't see anyone else mention this yet.
 Originally Posted by Madpoo I just updated my triple-checking systems to version 28.6 and I'm noticing a difference in how the "ScaleOutputFrequency=1" option is working. It doesn't seem to scale the update frequency of vastly different workers like it did in 28.5. The most extreme example is one system where I'm testing M383838383 on one socket, and a little 30M exponent on the other.
A bug was reported and it should be fixed in 28.7 --- please try that version if and when it becomes available.

 2015-05-25, 11:27 #81 preda     "Mihai Preda" Apr 2015 101010100002 Posts 128GB RAM, E=12 in P-1. I run mprime on a system with 128GB of free memory, and I run a single thread of P-1. It uses about about 30GB in stage 2 and the status is always E=12. The B2 bound is about 15M (B1 about 700K). My questions are: - what's the meaning of E=12? - would P-1 benefit from using more memory in stage 2? If yes, why it does not use it? In local.txt I have: Memory=400000 during 7:30-1:00 else 400000
 2015-05-25, 11:57 #82 TheJudger     "Oliver" Mar 2005 Germany 11·101 Posts AFAIK current version of Prime95/mprime won't use more than ~30GB of memory in P-1 stage 2 for current P-1 wavefront assignments. I've tried 2,5TiB for a single instance of P-1, it simply just utilizes ~30GB at most. E=12 is for Brent-Suyama extension Oliver
 Originally Posted by Prime95 A bug was reported and it should be fixed in 28.7 --- please try that version if and when it becomes available.
Cool, thanks. I don't have any Haswell systems, but how does the AVX2 stuff look in 28.6/28.7? I think I saw some changes in the source related to that, which would be cool. I'm specing out some new servers which will have Haswell-E chips so at some point I'll be able to do some burn-in with those if you need some feedback down the road.

 Originally Posted by preda I run mprime on a system with 128GB of free memory, and I run a single thread of P-1. It uses about about 30GB in stage 2 and the status is always E=12. The B2 bound is about 15M (B1 about 700K). My questions are: - what's the meaning of E=12? - would P-1 benefit from using more memory in stage 2? If yes, why it does not use it? In local.txt I have: Memory=400000 during 7:30-1:00 else 400000
The status outputs every so often should have a message during stage 2 of the form "Processing x out of y relative primes" where x and y are numbers. Each relative prime takes up a fair amount of memory... if you have 15GB available, Prime95 would use those 15GB and give a value for x that's approximately half the value of y (processing the first half in 15GB, then the second half in 15GB, as opposed to all at once in 30GB). It just so happens that for a B2 of ~15M, appropriate for the current LL wavefront, 30GB is enough to process all relative primes at once.

Continuing on the previous GWNUM library modification

 Originally Posted by Prime95 Serge, try the attached gwnum.c
George,

You've already improved the special case k=1; how about k=2? Just strictly k=2; there is no interest in higher k values but b is relatively large.

Debug case is prepared:
Can a PRP test for "PRP=2,67607,371171,1" use not a generic reduction AVX FFT length 640K (which the library choses):
Code:
[Work thread Jun 2 18:08] Resuming PRP test of 2*67607^371171+1 using generic reduction AVX FFT length 640K, Pass1=640, Pass2=1K
[Work thread Jun 2 18:08] Iteration: 2 / 5955397 [0.00%].
[Work thread Jun 2 18:08] Iteration: 500 / 5955397 [0.00%], roundoff: 0.047, ms/iter: 18.580, ETA: 30:44:04
[Work thread Jun 2 18:08] Iteration: 1000 / 5955397 [0.01%], roundoff: 0.047, ms/iter: 17.244, ETA: 28:31:18
[Work thread Jun 2 18:08] Iteration: 1500 / 5955397 [0.02%], roundoff: 0.047, ms/iter: 17.248, ETA: 28:31:33
[Work thread Jun 2 18:09] Iteration: 2000 / 5955397 [0.03%], roundoff: 0.047, ms/iter: 17.241, ETA: 28:30:41
[Work thread Jun 2 18:09] Iteration: 2500 / 5955397 [0.04%], roundoff: 0.047, ms/iter: 17.246, ETA: 28:31:03
Code:
[Work thread Jun 2 18:05] Starting PRP test of 2*67607^371171+1 using all-complex AVX FFT length 1M, Pass1=256, Pass2=4K
[Work thread Jun 2 18:05] Iteration: 500 / 5955397 [0.00%], roundoff: 0.395, ms/iter: 31.972, ETA: 52:53:07
[Work thread Jun 2 18:05] Iteration: 1000 / 5955397 [0.01%], roundoff: 0.395, ms/iter:  7.315, ETA: 12:05:54
[Work thread Jun 2 18:06] Iteration: 1500 / 5955397 [0.02%], roundoff: 0.395, ms/iter:  7.326, ETA: 12:06:56
[Work thread Jun 2 18:06] Iteration: 2000 / 5955397 [0.03%], roundoff: 0.395, ms/iter:  7.319, ETA: 12:06:12
[Work thread Jun 2 18:06] Iteration: 2500 / 5955397 [0.04%], roundoff: 0.395, ms/iter:  7.309, ETA: 12:05:12
[Work thread Jun 2 18:06] Iteration: 3000 / 5955397 [0.05%], roundoff: 0.395, ms/iter:  7.318, ETA: 12:05:57
(which I actually can force it to choose with PRP=FFT2=1M,2,67607,371171,1 ).
Can we generalize/automate a similar all-complex FFT2 choice for any PRP=2,67607,n,1 up to n of say 1M?
Or maybe even improve to a better choice of a special FFT, if some light optimization is needed in the library?

This is a special base for which no primes 2*b^n+1 are known. A prime of this form would divide some Phi(M,2) where M can be determined post hoc (with what is essentially a modified base 2 PRP test); for a large b, M is very likely to be b^n or b^{n-1}.

 2015-06-03, 13:41 #86 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 165048 Posts Did you look at the roundoff error on your 1M all-complex FFT? The gwnum code isn't using the 1M FFT because it is afraid of a fatal roundoff error during the PRP test. gwnum does support some options to be a little less conservative in choosing FFT lengths, I'll look and see if prime95 exposes any of those features.
 2015-06-03, 15:33 #87 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 223318 Posts There is some blanket rule that kicks in at n>10000 for FFT choice -- and only generic reduction AVX FFT is used for all n's. (For n<10000, all-complex AVX FFT is used most of the time; I can send you a lightly sieved set of n, or else for debugging tests you can use any n.) If you use n=351111 for example, the error will be well-controlled for all-complex AVX FFT, yet it will not be chosen. I think an appropriate sized all-complex AVX FFT for this form can always be used, but even with forced FFT2=NNN I cannot force it because for some ranges of n, even the forced FFT2 does not force all-complex, but a zero-padded instead.
 2015-06-07, 15:00 #88 ramshanker     "Ram Shanker" May 2015 Delhi 2×19 Posts Official release of 28.7? When is Ver 28.7 being released on official download page?

