![]() |
|
|
#1 |
|
Apr 2019
5 Posts |
Hi,
I am new to this so apologies for noob questions, I would just like a little guidance on how to set things up correctly. I have a system with a dual CPU mother motherboard and two Intel Xeon E5-2680 v2 chips. These have 10 cores each and 20 threads. When I run htop I can see 40 threads total. I would like to run mprime in what would otherwise be 'down-time' for the system. If I remember correctly when I ran prime95 in the past on my core i7 windows machine I set workers to 1 and could then set cores equal to the number of threads (1-7). Using ubuntu on the dual CPU machine when I run mprime and go through the setup I have set number of workers to 1 and then number of cores to 18 (can only select between 1-20). This seemed odd to me as I expected to be able to set between 1-40, the total thread number (maybe this is not how it works). Monitoring the CPU activity with htop shows that it is running at around half maximum load and I'm concerned that it is only recognising one cpu. Would someone be able to advise on the best way to set this up? Thanks |
|
|
|
|
|
#2 | |
|
Sep 2002
Database er0rr
5×937 Posts |
Quote:
Last fiddled with by paulunderwood on 2019-04-10 at 11:30 |
|
|
|
|
|
|
#3 |
|
Jun 2003
23×683 Posts |
HT cores are not relevant for P95 / mprime.. It is recognising both CPUs but only using the "real" cores. Of course the load would be "half" but performance will be full. There is a setting whereby you can ask it to use HT cores also to do the testing, but it will just use more heat for no performance gain.
BTW, you best thruput would be if you run 2 workers, each with 10 threads. That way cross-CPU communication inefficiencies can be avoided. |
|
|
|
|
|
#4 |
|
Apr 2019
5 Posts |
Brilliant, thank you both for your help.
|
|
|
|
|
|
#5 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
172208 Posts |
To confirm the previous advice, or gauge the variation in performance, run mprime throughput benchmarking for numbers of workers, 1,2,4,5,10,20, with and without hyperthreading, over the range of fft lengths you anticipate using. I suggest at least 2560K to 8192K.
|
|
|
|
|
|
#6 |
|
Jan 2015
25410 Posts |
as someone who's running dual/quad xeons exclusively...
1 - running 1 worker per physical core is best (so in your case 20 workers, if you were to max out) 2 - running 1 worker per 2 cores is close to best 3 - anything above that and you start losing a little bit of overall performance but you'll get the exponent done sooner if that's a concern. 4 - do NOT run one worker across two sockets. The UPI/QPI is not fast enough to keep up. |
|
|
|
|
|
#7 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Quote:
|
|
|
|
|
|
|
#8 |
|
Apr 2019
510 Posts |
Hello again,
When I run with 2 workers with 10 cores each I get an expected time of completion of approximately 14 days, so I would complete 1 exponent for a LL first time test per week on average. I then set the number of workers to 20 with 1 core each, as mentioned above, but now get an ETA of 230 days, which would be 1 every 11.5 days. It seemed quite a large difference so I thought I would post here in case it was indicative of any other issue. |
|
|
|
|
|
#9 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17·487 Posts |
Test / Status gives a *very* rough estimate of completion dates.
Use Options / Benchmark to get accurate throughput numbers OR run a few hundred thousand iterations with one and two workers and do your own throughput calculations. |
|
|
|
|
|
#10 |
|
Apr 2019
516 Posts |
Thanks, I ran the Options / Benchmark to get a better reading:
Timings for 8192K FFT length (20 cores, 1 worker): 24.24 ms. Throughput: 41.25 iter/sec. Timings for 8192K FFT length (20 cores, 2 workers): 34.06, 32.05 ms. Throughput: 60.56 iter/sec. [Fri May 3 19:09:17 2019] Timings for 8192K FFT length (20 cores, 20 workers): 354.96, 348.68, 349.28, 343.11, 349.11, 384.59, 431.22, 343.90, 341.67, 348.70, 328.79, 328.96, 334.84, 326.79, 327.68, 348.19, 334.99, 332.70, 337.88, 332.53 ms. Throughput: 57.96 iter/sec. So it seems that the 2 worker, 10 cores each option is still the better performer on this set up. Thanks again for the help! |
|
|
|
|
|
#11 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
4 workers?
10 workers? |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Dual Xeon 5355 | bgbeuning | Information & Answers | 5 | 2015-11-17 17:53 |
| benchmarks on dual i7-xeon | fivemack | Msieve | 1 | 2009-12-14 12:51 |
| Dual Xeon Help | euphrus | Software | 12 | 2005-07-21 14:47 |
| Dual Xeon Workstation | RickC | Hardware | 15 | 2003-12-17 01:35 |
| Best configuration for linux + dual P4 Xeon + hyperthreading | luma | Software | 3 | 2003-03-28 10:26 |