mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2021-07-21, 11:58   #210
axn
 
axn's Avatar
 
Jun 2003

19×271 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
4 workers is takes about 10 days whereas 16 would have taken 25-26 days. I'll try 32 workers when these 4 have finished. If it's less than 50 days (after tuning) I think I can forego P-1 stage 2 and still expect maximum throughput -- this should fit inside the 16GB MCDRAM (and I will not have 64 threads running a legacy cronjob).
Is there any reason why you're not trying 8 workers ?!
axn is online now   Reply With Quote
Old 2021-07-21, 13:36   #211
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

3,863 Posts
Default

Quote:
Originally Posted by axn View Post
Is there any reason why you're not trying 8 workers ?!
Well, the Phi seems to favour using fewer cores but more workers and as you pointed out there is not much difference between using 32 workers compared to 64. I am currently testing 32 workers -- it will be some hours before I know if the ETA will be less than 50 days. Also Linux top to is showing 6400% now.

Last fiddled with by paulunderwood on 2021-07-21 at 13:43
paulunderwood is online now   Reply With Quote
Old 2021-07-21, 14:52   #212
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

132278 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Code:
Prime95 64-bit version 30.3, RdtscTiming=1
Timings for 6048K FFT length (64 cores, 1 worker):   Throughput: 218.85 iter/sec.
Timings for 6048K FFT length (64 cores, 2 workers):  Throughput: 432.09 iter/sec.
Timings for 6048K FFT length (64 cores, 4 workers):  Throughput: 602.49 iter/sec.
Timings for 6048K FFT length (64 cores, 8 workers):  Throughput: 654.40 iter/sec.
Timings for 6048K FFT length (64 cores, 16 workers): Throughput: 666.12 iter/sec.
Timings for 6048K FFT length (64 cores, 32 workers): Throughput: 693.20 iter/sec.
Timings for 6048K FFT length (64 cores, 64 workers): Throughput: 717.88 iter/sec.
So, Xeon Phi 7210?
Why not mprime V30.6b4?
What linux distro & version was that benchmark run upon?
kriesel is online now   Reply With Quote
Old 2021-07-21, 15:06   #213
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

3,863 Posts
Default

Quote:
Originally Posted by kriesel View Post
So, Xeon Phi 7210?
Why not mprime V30.6b4?
What linux distro & version was that benchmark run upon?
Yes,
Good question -- I'll upgrade mprime soon.
Debian 10 -- I plan to distro-upgrade the replacement disk (Debian 8) fairly soon

Last fiddled with by paulunderwood on 2021-07-21 at 15:06
paulunderwood is online now   Reply With Quote
Old 2021-07-21, 20:47   #214
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

2·32·83 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I tried 64 workers instead of 16 and tests went from 26 days to 170 days, plus I got 48 unwanted double checks. I scrubbed all that work.

Now I am running 4 workers each with 16 cores and 3072MB of RAM for stage 2 P-1 work each.

Edit Doh, those timings were heavily skewed by a an old @reboot cronjob left on the "new" disk I just fitted. Anyway I am going to stick with 4 workers.
Quote:
Originally Posted by paulunderwood View Post
4 workers is takes about 10 days whereas 16 would have taken 25-26 days. I'll try 32 workers when these 4 have finished. If it's less than 50 days (after tuning) I think I can forego P-1 stage 2 and still expect maximum throughput -- this should fit inside the 16GB MCDRAM (and I will not have 64 threads running a legacy cronjob).
That is suboptimal. Start the tests with some delay, so there will be at most one in the p-1 phase.
Starting exactly 4 is problematic: say 3 of them is in the prp test (with already done p-1 phase), and one is in the p-1 test.
If p-1 finds a factor AND one prp test finishes, then you lost, because you can't start two p-1 tests at once.
The solution: increase the pool, say always have (at least) 10 prp tests, and at most one in the p-1 phase. With high chance
there will be no such gap in the schedule [you will have some suspended prp tests always because at once
you run 3 or 4 prp tests that is already completed the p-1 phases].
ps. ofcourse even in this schedule you lost some time, but only in the beginning until you don't have 3 completed p-1 jobs.
but that lost is a fixed cost in timing.
R. Gerbicz is offline   Reply With Quote
Old 2021-07-21, 21:05   #215
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,783 Posts
Default

Stagger is easily created. And can help toward moving your system toward trusted status, and help clear DC backlog.

[Worker 1]
Wavefront PRP

[Worker 2]
Low CAT DC as PRP with proof (takes ~1/3 as long as wavefront PRP)
Wavefront PRP

[Worker 3]
Mid Cat 4 DC as PRP with proof (selected to take ~1/2 as long as 1 wavefront PRP)
Wavefront PRP

[Worker 4]
High Cat 4 DC as PRP with proof (selected to take ~3/4 as long as 1 wavefront PRP)
Wavefront PRP

Pick exponents from lists in https://mersenneforum.org/showthread.php?t=24148.
Or just set some workers to DC and others to first-time check until the stagger is approximately sufficient, switching a worker at a time to wavefront PRP.
kriesel is online now   Reply With Quote
Old 2021-07-23, 10:35   #216
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

386310 Posts
Default

Recap that 4 workers would take 10 days. I tried 16 workers again at a comparable size and it was 40 days. However 32 workers is about 70 days and uses 55% of RAM. What's more I got lucky and found a stage 1 factor in this stage-2-less testing.
paulunderwood is online now   Reply With Quote
Old 2021-07-23, 12:25   #217
axn
 
axn's Avatar
 
Jun 2003

19×271 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Recap that 4 workers would take 10 days. I tried 16 workers again at a comparable size and it was 40 days.
Hmmm.. So the benchmarks lied? 16 workers were supposed to give 10% more thruput compared to 4, so it should have completed in 36 days?

Quote:
Originally Posted by paulunderwood View Post
However 32 workers is about 70 days and uses 55% of RAM
If 8 worker can complete in 18 days or less, it would very nearly match the thruput of 32 workers and give about 1.5GB/worker for a semi-decent stage 2. But can it do it in 18? Or will it be more like 20-21?
axn is online now   Reply With Quote
Old 2021-07-23, 17:18   #218
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

1111000101112 Posts
Default

Quote:
Originally Posted by axn View Post
But can it do it in 18? Or will it be more like 20-21?
I will try 8 workers at some stage.

I am currently upgrading from Deb8 to Deb10 via Deb9-- the latest Debian kernel is now running, although I expect this to make very little difference in timings. I did a reboot when I got Deb9 installed an it came back in "powersave" mode. I poked the necessary files to get it back to "performance" -- that took it down from over 90 days to under 70 days. When previously I installed the "new" (bigger) HDD at one point I had to clear CMOS -- so I will hook a monitor and see what can be done in BIOS to speed things up. The settings therein might explain a few things, like why I got 16 workers at 25-26 days each and now 40 days. I'll turn off HyperThreading. All my timings so far are a pickle! The current state of play is ~69 days for 32 workers.


Last fiddled with by paulunderwood on 2021-07-23 at 17:47
paulunderwood is online now   Reply With Quote
Old 2021-07-23, 21:07   #219
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

F1716 Posts
Default

Debian 10 (Buster) installed. The chip is running at 1.4GHz even though it says powersave. HyperThreading is off. 32 workers in 56 days for 113M bit candidates With some runtime tuning I expect this to lower.

Edit: After some runtime tuning by mprime at 2% done the ETA is now 55 days.

Last fiddled with by paulunderwood on 2021-07-24 at 05:59
paulunderwood is online now   Reply With Quote
Old 2021-07-24, 20:34   #220
Cheetahgod
 
May 2020

2×13 Posts
Default

Wow that's pretty neat.
Cheetahgod is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
AMD vs Intel dtripp Software 3 2013-02-19 20:20
Intel NUC nucleon Hardware 2 2012-05-10 23:53
Intel RNG API? R.D. Silverman Programming 19 2011-09-17 01:43
AMD or Intel mack Information & Answers 7 2009-09-13 01:48
Intel Mac? penguain NFSNET Discussion 0 2006-06-12 01:31

All times are UTC. The time now is 11:45.


Wed Oct 20 11:45:38 UTC 2021 up 89 days, 6:14, 0 users, load averages: 1.25, 1.22, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.