mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Dual Intel Xeon E5-2680: How best to use? (https://www.mersenneforum.org/showthread.php?t=24278)

longjing 2019-04-10 10:46

Dual Intel Xeon E5-2680: How best to use?
 
Hi,

I am new to this so apologies for noob questions, I would just like a little guidance on how to set things up correctly.

I have a system with a dual CPU mother motherboard and two Intel Xeon E5-2680 v2 chips. These have 10 cores each and 20 threads. When I run htop I can see 40 threads total. I would like to run mprime in what would otherwise be 'down-time' for the system.

If I remember correctly when I ran prime95 in the past on my core i7 windows machine I set workers to 1 and could then set cores equal to the number of threads (1-7). Using ubuntu on the dual CPU machine when I run mprime and go through the setup I have set number of workers to 1 and then number of cores to 18 (can only select between 1-20). This seemed odd to me as I expected to be able to set between 1-40, the total thread number (maybe this is not how it works).

Monitoring the CPU activity with htop shows that it is running at around half maximum load and I'm concerned that it is only recognising one cpu.

Would someone be able to advise on the best way to set this up?

Thanks

paulunderwood 2019-04-10 11:29

[QUOTE=longjing;513319]Hi,

I am new to this so apologies for noob questions, I would just like a little guidance on how to set things up correctly.

I have a system with a dual CPU mother motherboard and two Intel Xeon E5-2680 v2 chips. These have 10 cores each and 20 threads. When I run htop I can see 40 threads total. I would like to run mprime in what would otherwise be 'down-time' for the system.

If I remember correctly when I ran prime95 in the past on my core i7 windows machine I set workers to 1 and could then set cores equal to the number of threads (1-7). Using ubuntu on the dual CPU machine when I run mprime and go through the setup I have set number of workers to 1 and then number of cores to 18 (can only select between 1-20). This seemed odd to me as I expected to be able to set between 1-40, the total thread number (maybe this is not how it works).

Monitoring the CPU activity with htop shows that it is running at around half maximum load and I'm concerned that it is only recognising one cpu.

Would someone be able to advise on the best way to set this up?

Thanks[/QUOTE]

Hyperthreading makes little or a negative contribution to Prime95/mprime. You could run [c]man taskset[/c] and read up on how to to tie down each of two mprime to each CPU, although there may be a way of setting affinities from within mprime.

axn 2019-04-10 11:30

HT cores are not relevant for P95 / mprime.. It is recognising both CPUs but only using the "real" cores. Of course the load would be "half" but performance will be full. There is a setting whereby you can ask it to use HT cores also to do the testing, but it will just use more heat for no performance gain.

BTW, you best thruput would be if you run 2 workers, each with 10 threads. That way cross-CPU communication inefficiencies can be avoided.

longjing 2019-04-10 12:07

Brilliant, thank you both for your help.

kriesel 2019-04-10 15:02

To confirm the previous advice, or gauge the variation in performance, run mprime throughput benchmarking for numbers of workers, 1,2,4,5,10,20, with and without hyperthreading, over the range of fft lengths you anticipate using. I suggest at least 2560K to 8192K.

aurashift 2019-04-11 17:37

as someone who's running dual/quad xeons exclusively...


1 - running 1 worker per physical core is best (so in your case 20 workers, if you were to max out)

2 - running 1 worker per 2 cores is close to best

3 - anything above that and you start losing a little bit of overall performance but you'll get the exponent done sooner if that's a concern.
4 - do NOT run one worker across two sockets. The UPI/QPI is not fast enough to keep up.

kriesel 2019-04-11 19:48

[QUOTE=aurashift;513434]as someone who's running dual/quad xeons exclusively...


1 - running 1 worker per physical core is best (so in your case 20 workers, if you were to max out)

2 - running 1 worker per 2 cores is close to best

3 - anything above that and you start losing a little bit of overall performance but you'll get the exponent done sooner if that's a concern.
4 - do NOT run one worker across two sockets. The UPI/QPI is not fast enough to keep up.[/QUOTE]
In detailed benchmarking analysis posted at [url]https://www.mersenneforum.org/showthread.php?t=23900[/url], it appears to me the optimal # of workers for total throughput depends on the xeon model and on the fft length to some degree. Of the 4 models I checked, some fit your guidance better than others. One worker/core is often but not always max throughput. Exponent expiration can also become a consideration with the older slower xeons.

longjing 2019-05-02 19:52

Hello again,

When I run with 2 workers with 10 cores each I get an expected time of completion of approximately 14 days, so I would complete 1 exponent for a LL first time test per week on average.

I then set the number of workers to 20 with 1 core each, as mentioned above, but now get an ETA of 230 days, which would be 1 every 11.5 days. It seemed quite a large difference so I thought I would post here in case it was indicative of any other issue.

Prime95 2019-05-02 20:44

Test / Status gives a *very* rough estimate of completion dates.

Use Options / Benchmark to get accurate throughput numbers OR run a few hundred thousand iterations with one and two workers and do your own throughput calculations.

longjing 2019-05-03 19:32

Thanks, I ran the Options / Benchmark to get a better reading:


[C]Timings for 8192K FFT length (20 cores, 1 worker): 24.24 ms. Throughput: 41.25 iter/sec.
Timings for 8192K FFT length (20 cores, 2 workers): 34.06, 32.05 ms. Throughput: 60.56 iter/sec.
[Fri May 3 19:09:17 2019]
Timings for 8192K FFT length (20 cores, 20 workers): 354.96, 348.68, 349.28, 343.11, 349.11, 384.59, 431.22, 343.90, 341.67, 348.70, 328.79, 328.96, 334.84, 326.79, 327.68, 348.19, 334.99, 332.70, 337.88, 332.53 ms. Throughput: 57.96 iter/sec.[/C]


So it seems that the 2 worker, 10 cores each option is still the better performer on this set up.


Thanks again for the help!

kriesel 2019-05-03 21:42

4 workers?
10 workers?


All times are UTC. The time now is 06:59.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.