mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 version 28.6 / 28.7 (28.7 now available!) (https://www.mersenneforum.org/showthread.php?t=20156)

harlee 2015-05-24 11:39

I just installed the 28.6 Windows 32-bit software onto my older P4 system. I'm doing P1 testing and noticed that the B1 and B2 bounds are now lower. Just wondering if this is correct as I didn't see anything about the bounds changing in the whatsnew.txt file.

Madpoo 2015-05-25 03:06

Version 28.6 and "ScaleOutputFrequency=1"
 
I just updated my triple-checking systems to version 28.6 and I'm noticing a difference in how the "ScaleOutputFrequency=1" option is working.

It doesn't seem to scale the update frequency of vastly different workers like it did in 28.5. The most extreme example is one system where I'm testing M383838383 on one socket, and a little 30M exponent on the other.

I previously had the iterations between screen outputs set to 30000 and that worked fairly well. I could see progress on the big 383M and the 30M exponents moving along.

Now it seems to ignore that option entirely and only updates either one at the actual specified rate, no scaling.

I peeked at the source code changes between 28.5 and 28.6 and I do see some changes that happened in there, so I'm guessing that's the reason, but I didn't see anyone else mention this yet.

Prime95 2015-05-25 05:03

[QUOTE=Madpoo;402930]I just updated my triple-checking systems to version 28.6 and I'm noticing a difference in how the "ScaleOutputFrequency=1" option is working.

It doesn't seem to scale the update frequency of vastly different workers like it did in 28.5. The most extreme example is one system where I'm testing M383838383 on one socket, and a little 30M exponent on the other.[/QUOTE]

A bug was reported and it should be fixed in 28.7 --- please try that version if and when it becomes available.

preda 2015-05-25 11:27

128GB RAM, E=12 in P-1.
 
I run mprime on a system with 128GB of free memory, and I run a single thread of P-1. It uses about about 30GB in stage 2 and the status is always E=12. The B2 bound is about 15M (B1 about 700K).

My questions are:
- what's the meaning of E=12?
- would P-1 benefit from using more memory in stage 2? If yes, why it does not use it?

In local.txt I have:
Memory=400000 during 7:30-1:00 else 400000

TheJudger 2015-05-25 11:57

AFAIK current version of Prime95/mprime won't use more than ~30GB of memory in P-1 stage 2 for current P-1 wavefront assignments.
I've tried 2,5TiB for a single instance of P-1, it simply just utilizes ~30GB at most.

E=12 is for [URL="http://www.mersennewiki.org/index.php/Brent-Suyama_extension"]Brent-Suyama extension[/URL]

Oliver

Madpoo 2015-05-25 15:54

[QUOTE=Prime95;402933]A bug was reported and it should be fixed in 28.7 --- please try that version if and when it becomes available.[/QUOTE]

Cool, thanks. I don't have any Haswell systems, but how does the AVX2 stuff look in 28.6/28.7? I think I saw some changes in the source related to that, which would be cool. I'm specing out some new servers which will have Haswell-E chips so at some point I'll be able to do some burn-in with those if you need some feedback down the road.

Dubslow 2015-05-25 17:58

[QUOTE=preda;402952]I run mprime on a system with 128GB of free memory, and I run a single thread of P-1. It uses about about 30GB in stage 2 and the status is always E=12. The B2 bound is about 15M (B1 about 700K).

My questions are:
- what's the meaning of E=12?
- would P-1 benefit from using more memory in stage 2? If yes, why it does not use it?

In local.txt I have:
Memory=400000 during 7:30-1:00 else 400000[/QUOTE]

The status outputs every so often should have a message during stage 2 of the form "Processing x out of y relative primes" where x and y are numbers. Each relative prime takes up a fair amount of memory... if you have 15GB available, Prime95 would use those 15GB and give a value for x that's approximately half the value of y (processing the first half in 15GB, then the second half in 15GB, as opposed to all at once in 30GB). It just so happens that for a B2 of ~15M, appropriate for the current LL wavefront, 30GB is enough to process all relative primes at once.

Batalov 2015-06-03 01:23

Continuing on the previous GWNUM library modification
 
[QUOTE=Prime95;402317]Serge, try the attached gwnum.c[/QUOTE]
George,

You've already improved the special case k=1; how about k=2? Just strictly k=2; there is no interest in higher k values but b is relatively large.

[U]Debug case[/U] is prepared:
Can a PRP test for "PRP=2,67607,371171,1" use [B]not[/B] a generic reduction AVX FFT length 640K (which the library choses):
[CODE][Work thread Jun 2 18:08] Resuming PRP test of 2*67607^371171+1 using generic reduction AVX FFT length 640K, Pass1=640, Pass2=1K
[Work thread Jun 2 18:08] Iteration: 2 / 5955397 [0.00%].
[Work thread Jun 2 18:08] Iteration: 500 / 5955397 [0.00%], roundoff: 0.047, ms/iter: 18.580, ETA: 30:44:04
[Work thread Jun 2 18:08] Iteration: 1000 / 5955397 [0.01%], roundoff: 0.047, ms/iter: 17.244, ETA: 28:31:18
[Work thread Jun 2 18:08] Iteration: 1500 / 5955397 [0.02%], roundoff: 0.047, ms/iter: 17.248, ETA: 28:31:33
[Work thread Jun 2 18:09] Iteration: 2000 / 5955397 [0.03%], roundoff: 0.047, ms/iter: 17.241, ETA: 28:30:41
[Work thread Jun 2 18:09] Iteration: 2500 / 5955397 [0.04%], roundoff: 0.047, ms/iter: 17.246, ETA: 28:31:03
[/CODE]
but instead use:
[CODE][Work thread Jun 2 18:05] Starting PRP test of 2*67607^371171+1 using all-complex AVX FFT length 1M, Pass1=256, Pass2=4K
[Work thread Jun 2 18:05] Iteration: 500 / 5955397 [0.00%], roundoff: 0.395, ms/iter: 31.972, ETA: 52:53:07
[Work thread Jun 2 18:05] Iteration: 1000 / 5955397 [0.01%], roundoff: 0.395, ms/iter: 7.315, ETA: 12:05:54
[Work thread Jun 2 18:06] Iteration: 1500 / 5955397 [0.02%], roundoff: 0.395, ms/iter: 7.326, ETA: 12:06:56
[Work thread Jun 2 18:06] Iteration: 2000 / 5955397 [0.03%], roundoff: 0.395, ms/iter: 7.319, ETA: 12:06:12
[Work thread Jun 2 18:06] Iteration: 2500 / 5955397 [0.04%], roundoff: 0.395, ms/iter: 7.309, ETA: 12:05:12
[Work thread Jun 2 18:06] Iteration: 3000 / 5955397 [0.05%], roundoff: 0.395, ms/iter: 7.318, ETA: 12:05:57
[/CODE]
(which I actually can force it to choose with PRP=FFT2=1M,2,67607,371171,1 ).
Can we generalize/automate a similar all-complex FFT2 choice for any PRP=2,67607,n,1 up to n of say 1M?
Or maybe even improve to a better choice of a special FFT, if some light optimization is needed in the library?

This is a special base for which no primes 2*b^n+1 are known. A prime of this form would divide some Phi(M,2) where M can be determined post hoc (with what is essentially a modified base 2 PRP test); for a large b, M is very likely to be b^n or b^{n-1}.

Prime95 2015-06-03 13:41

Did you look at the roundoff error on your 1M all-complex FFT? The gwnum code isn't using the 1M FFT because it is afraid of a fatal roundoff error during the PRP test.

gwnum does support some options to be a little less conservative in choosing FFT lengths, I'll look and see if prime95 exposes any of those features.

Batalov 2015-06-03 15:33

There is some blanket rule that kicks in at n>10000 for FFT choice -- and only generic reduction AVX FFT is used for all n's.
(For n<10000, all-complex AVX FFT is used most of the time; I can send you a lightly sieved set of n, or else for debugging tests you can use any n.)
If you use n=351111 for example, the error will be well-controlled for all-complex AVX FFT, yet it will not be chosen.

I think an appropriate sized all-complex AVX FFT for this form can always be used, but even with forced FFT2=NNN I cannot force it because for some ranges of n, even the forced FFT2 does not force all-complex, but a zero-padded instead.

ramshanker 2015-06-07 15:00

Official release of 28.7?
 
When is Ver 28.7 being released on official download page?


All times are UTC. The time now is 05:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.