mersenneforum.org Intel Xeon PHI?
 Register FAQ Search Today's Posts Mark Forums Read

2021-07-21, 11:58   #210
axn

Jun 2003

19×271 Posts

Quote:
 Originally Posted by paulunderwood 4 workers is takes about 10 days whereas 16 would have taken 25-26 days. I'll try 32 workers when these 4 have finished. If it's less than 50 days (after tuning) I think I can forego P-1 stage 2 and still expect maximum throughput -- this should fit inside the 16GB MCDRAM (and I will not have 64 threads running a legacy cronjob).
Is there any reason why you're not trying 8 workers ?!

2021-07-21, 13:36   #211
paulunderwood

Sep 2002
Database er0rr

3,863 Posts

Quote:
 Originally Posted by axn Is there any reason why you're not trying 8 workers ?!
Well, the Phi seems to favour using fewer cores but more workers and as you pointed out there is not much difference between using 32 workers compared to 64. I am currently testing 32 workers -- it will be some hours before I know if the ETA will be less than 50 days. Also Linux top to is showing 6400% now.

Last fiddled with by paulunderwood on 2021-07-21 at 13:43

2021-07-21, 14:52   #212
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

132278 Posts

Quote:
 Originally Posted by paulunderwood Code: Prime95 64-bit version 30.3, RdtscTiming=1 Timings for 6048K FFT length (64 cores, 1 worker): Throughput: 218.85 iter/sec. Timings for 6048K FFT length (64 cores, 2 workers): Throughput: 432.09 iter/sec. Timings for 6048K FFT length (64 cores, 4 workers): Throughput: 602.49 iter/sec. Timings for 6048K FFT length (64 cores, 8 workers): Throughput: 654.40 iter/sec. Timings for 6048K FFT length (64 cores, 16 workers): Throughput: 666.12 iter/sec. Timings for 6048K FFT length (64 cores, 32 workers): Throughput: 693.20 iter/sec. Timings for 6048K FFT length (64 cores, 64 workers): Throughput: 717.88 iter/sec.
So, Xeon Phi 7210?
Why not mprime V30.6b4?
What linux distro & version was that benchmark run upon?

2021-07-21, 15:06   #213
paulunderwood

Sep 2002
Database er0rr

3,863 Posts

Quote:
 Originally Posted by kriesel So, Xeon Phi 7210? Why not mprime V30.6b4? What linux distro & version was that benchmark run upon?
Yes,
Good question -- I'll upgrade mprime soon.
Debian 10 -- I plan to distro-upgrade the replacement disk (Debian 8) fairly soon

Last fiddled with by paulunderwood on 2021-07-21 at 15:06

2021-07-21, 20:47   #214
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

2·32·83 Posts

Quote:
 Originally Posted by paulunderwood I tried 64 workers instead of 16 and tests went from 26 days to 170 days, plus I got 48 unwanted double checks. I scrubbed all that work. Now I am running 4 workers each with 16 cores and 3072MB of RAM for stage 2 P-1 work each. Edit Doh, those timings were heavily skewed by a an old @reboot cronjob left on the "new" disk I just fitted. Anyway I am going to stick with 4 workers.
Quote:
 Originally Posted by paulunderwood 4 workers is takes about 10 days whereas 16 would have taken 25-26 days. I'll try 32 workers when these 4 have finished. If it's less than 50 days (after tuning) I think I can forego P-1 stage 2 and still expect maximum throughput -- this should fit inside the 16GB MCDRAM (and I will not have 64 threads running a legacy cronjob).
That is suboptimal. Start the tests with some delay, so there will be at most one in the p-1 phase.
Starting exactly 4 is problematic: say 3 of them is in the prp test (with already done p-1 phase), and one is in the p-1 test.
If p-1 finds a factor AND one prp test finishes, then you lost, because you can't start two p-1 tests at once.
The solution: increase the pool, say always have (at least) 10 prp tests, and at most one in the p-1 phase. With high chance
there will be no such gap in the schedule [you will have some suspended prp tests always because at once
you run 3 or 4 prp tests that is already completed the p-1 phases].
ps. ofcourse even in this schedule you lost some time, but only in the beginning until you don't have 3 completed p-1 jobs.
but that lost is a fixed cost in timing.

 2021-07-21, 21:05 #215 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 5,783 Posts Stagger is easily created. And can help toward moving your system toward trusted status, and help clear DC backlog. [Worker 1] Wavefront PRP [Worker 2] Low CAT DC as PRP with proof (takes ~1/3 as long as wavefront PRP) Wavefront PRP [Worker 3] Mid Cat 4 DC as PRP with proof (selected to take ~1/2 as long as 1 wavefront PRP) Wavefront PRP [Worker 4] High Cat 4 DC as PRP with proof (selected to take ~3/4 as long as 1 wavefront PRP) Wavefront PRP Pick exponents from lists in https://mersenneforum.org/showthread.php?t=24148. Or just set some workers to DC and others to first-time check until the stagger is approximately sufficient, switching a worker at a time to wavefront PRP.
 2021-07-23, 10:35 #216 paulunderwood     Sep 2002 Database er0rr 386310 Posts Recap that 4 workers would take 10 days. I tried 16 workers again at a comparable size and it was 40 days. However 32 workers is about 70 days and uses 55% of RAM. What's more I got lucky and found a stage 1 factor in this stage-2-less testing.
2021-07-23, 12:25   #217
axn

Jun 2003

19×271 Posts

Quote:
 Originally Posted by paulunderwood Recap that 4 workers would take 10 days. I tried 16 workers again at a comparable size and it was 40 days.
Hmmm.. So the benchmarks lied? 16 workers were supposed to give 10% more thruput compared to 4, so it should have completed in 36 days?

Quote:
 Originally Posted by paulunderwood However 32 workers is about 70 days and uses 55% of RAM
If 8 worker can complete in 18 days or less, it would very nearly match the thruput of 32 workers and give about 1.5GB/worker for a semi-decent stage 2. But can it do it in 18? Or will it be more like 20-21?

2021-07-23, 17:18   #218
paulunderwood

Sep 2002
Database er0rr

1111000101112 Posts

Quote:
 Originally Posted by axn But can it do it in 18? Or will it be more like 20-21?
I will try 8 workers at some stage.

I am currently upgrading from Deb8 to Deb10 via Deb9-- the latest Debian kernel is now running, although I expect this to make very little difference in timings. I did a reboot when I got Deb9 installed an it came back in "powersave" mode. I poked the necessary files to get it back to "performance" -- that took it down from over 90 days to under 70 days. When previously I installed the "new" (bigger) HDD at one point I had to clear CMOS -- so I will hook a monitor and see what can be done in BIOS to speed things up. The settings therein might explain a few things, like why I got 16 workers at 25-26 days each and now 40 days. I'll turn off HyperThreading. All my timings so far are a pickle! The current state of play is ~69 days for 32 workers.

Last fiddled with by paulunderwood on 2021-07-23 at 17:47

 2021-07-23, 21:07 #219 paulunderwood     Sep 2002 Database er0rr F1716 Posts Debian 10 (Buster) installed. The chip is running at 1.4GHz even though it says powersave. HyperThreading is off. 32 workers in 56 days for 113M bit candidates With some runtime tuning I expect this to lower. Edit: After some runtime tuning by mprime at 2% done the ETA is now 55 days. Last fiddled with by paulunderwood on 2021-07-24 at 05:59
 2021-07-24, 20:34 #220 Cheetahgod   May 2020 2×13 Posts Wow that's pretty neat.

 Similar Threads Thread Thread Starter Forum Replies Last Post dtripp Software 3 2013-02-19 20:20 nucleon Hardware 2 2012-05-10 23:53 R.D. Silverman Programming 19 2011-09-17 01:43 mack Information & Answers 7 2009-09-13 01:48 penguain NFSNET Discussion 0 2006-06-12 01:31

All times are UTC. The time now is 11:45.

Wed Oct 20 11:45:38 UTC 2021 up 89 days, 6:14, 0 users, load averages: 1.25, 1.22, 1.15