mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2019-04-10, 10:46   #1
longjing
 
Apr 2019

5 Posts
Default Dual Intel Xeon E5-2680: How best to use?

Hi,

I am new to this so apologies for noob questions, I would just like a little guidance on how to set things up correctly.

I have a system with a dual CPU mother motherboard and two Intel Xeon E5-2680 v2 chips. These have 10 cores each and 20 threads. When I run htop I can see 40 threads total. I would like to run mprime in what would otherwise be 'down-time' for the system.

If I remember correctly when I ran prime95 in the past on my core i7 windows machine I set workers to 1 and could then set cores equal to the number of threads (1-7). Using ubuntu on the dual CPU machine when I run mprime and go through the setup I have set number of workers to 1 and then number of cores to 18 (can only select between 1-20). This seemed odd to me as I expected to be able to set between 1-40, the total thread number (maybe this is not how it works).

Monitoring the CPU activity with htop shows that it is running at around half maximum load and I'm concerned that it is only recognising one cpu.

Would someone be able to advise on the best way to set this up?

Thanks
longjing is offline   Reply With Quote
Old 2019-04-10, 11:29   #2
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5×937 Posts
Default

Quote:
Originally Posted by longjing View Post
Hi,

I am new to this so apologies for noob questions, I would just like a little guidance on how to set things up correctly.

I have a system with a dual CPU mother motherboard and two Intel Xeon E5-2680 v2 chips. These have 10 cores each and 20 threads. When I run htop I can see 40 threads total. I would like to run mprime in what would otherwise be 'down-time' for the system.

If I remember correctly when I ran prime95 in the past on my core i7 windows machine I set workers to 1 and could then set cores equal to the number of threads (1-7). Using ubuntu on the dual CPU machine when I run mprime and go through the setup I have set number of workers to 1 and then number of cores to 18 (can only select between 1-20). This seemed odd to me as I expected to be able to set between 1-40, the total thread number (maybe this is not how it works).

Monitoring the CPU activity with htop shows that it is running at around half maximum load and I'm concerned that it is only recognising one cpu.

Would someone be able to advise on the best way to set this up?

Thanks
Hyperthreading makes little or a negative contribution to Prime95/mprime. You could run man taskset and read up on how to to tie down each of two mprime to each CPU, although there may be a way of setting affinities from within mprime.

Last fiddled with by paulunderwood on 2019-04-10 at 11:30
paulunderwood is offline   Reply With Quote
Old 2019-04-10, 11:30   #3
axn
 
axn's Avatar
 
Jun 2003

23×683 Posts
Default

HT cores are not relevant for P95 / mprime.. It is recognising both CPUs but only using the "real" cores. Of course the load would be "half" but performance will be full. There is a setting whereby you can ask it to use HT cores also to do the testing, but it will just use more heat for no performance gain.

BTW, you best thruput would be if you run 2 workers, each with 10 threads. That way cross-CPU communication inefficiencies can be avoided.
axn is offline   Reply With Quote
Old 2019-04-10, 12:07   #4
longjing
 
Apr 2019

5 Posts
Default

Brilliant, thank you both for your help.
longjing is offline   Reply With Quote
Old 2019-04-10, 15:02   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

172208 Posts
Default

To confirm the previous advice, or gauge the variation in performance, run mprime throughput benchmarking for numbers of workers, 1,2,4,5,10,20, with and without hyperthreading, over the range of fft lengths you anticipate using. I suggest at least 2560K to 8192K.
kriesel is online now   Reply With Quote
Old 2019-04-11, 17:37   #6
aurashift
 
Jan 2015

25410 Posts
Default

as someone who's running dual/quad xeons exclusively...


1 - running 1 worker per physical core is best (so in your case 20 workers, if you were to max out)

2 - running 1 worker per 2 cores is close to best

3 - anything above that and you start losing a little bit of overall performance but you'll get the exponent done sooner if that's a concern.
4 - do NOT run one worker across two sockets. The UPI/QPI is not fast enough to keep up.
aurashift is offline   Reply With Quote
Old 2019-04-11, 19:48   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default

Quote:
Originally Posted by aurashift View Post
as someone who's running dual/quad xeons exclusively...


1 - running 1 worker per physical core is best (so in your case 20 workers, if you were to max out)

2 - running 1 worker per 2 cores is close to best

3 - anything above that and you start losing a little bit of overall performance but you'll get the exponent done sooner if that's a concern.
4 - do NOT run one worker across two sockets. The UPI/QPI is not fast enough to keep up.
In detailed benchmarking analysis posted at https://www.mersenneforum.org/showthread.php?t=23900, it appears to me the optimal # of workers for total throughput depends on the xeon model and on the fft length to some degree. Of the 4 models I checked, some fit your guidance better than others. One worker/core is often but not always max throughput. Exponent expiration can also become a consideration with the older slower xeons.
kriesel is online now   Reply With Quote
Old 2019-05-02, 19:52   #8
longjing
 
Apr 2019

510 Posts
Default

Hello again,

When I run with 2 workers with 10 cores each I get an expected time of completion of approximately 14 days, so I would complete 1 exponent for a LL first time test per week on average.

I then set the number of workers to 20 with 1 core each, as mentioned above, but now get an ETA of 230 days, which would be 1 every 11.5 days. It seemed quite a large difference so I thought I would post here in case it was indicative of any other issue.
longjing is offline   Reply With Quote
Old 2019-05-02, 20:44   #9
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17·487 Posts
Default

Test / Status gives a *very* rough estimate of completion dates.

Use Options / Benchmark to get accurate throughput numbers OR run a few hundred thousand iterations with one and two workers and do your own throughput calculations.
Prime95 is offline   Reply With Quote
Old 2019-05-03, 19:32   #10
longjing
 
Apr 2019

516 Posts
Default

Thanks, I ran the Options / Benchmark to get a better reading:


Timings for 8192K FFT length (20 cores, 1 worker): 24.24 ms. Throughput: 41.25 iter/sec.
Timings for 8192K FFT length (20 cores, 2 workers): 34.06, 32.05 ms. Throughput: 60.56 iter/sec.
[Fri May 3 19:09:17 2019]
Timings for 8192K FFT length (20 cores, 20 workers): 354.96, 348.68, 349.28, 343.11, 349.11, 384.59, 431.22, 343.90, 341.67, 348.70, 328.79, 328.96, 334.84, 326.79, 327.68, 348.19, 334.99, 332.70, 337.88, 332.53 ms. Throughput: 57.96 iter/sec.



So it seems that the 2 worker, 10 cores each option is still the better performer on this set up.


Thanks again for the help!
longjing is offline   Reply With Quote
Old 2019-05-03, 21:42   #11
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×163 Posts
Default

4 workers?
10 workers?
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Dual Xeon 5355 bgbeuning Information & Answers 5 2015-11-17 17:53
benchmarks on dual i7-xeon fivemack Msieve 1 2009-12-14 12:51
Dual Xeon Help euphrus Software 12 2005-07-21 14:47
Dual Xeon Workstation RickC Hardware 15 2003-12-17 01:35
Best configuration for linux + dual P4 Xeon + hyperthreading luma Software 3 2003-03-28 10:26

All times are UTC. The time now is 16:36.


Fri Jul 7 16:36:32 UTC 2023 up 323 days, 14:05, 1 user, load averages: 2.85, 2.49, 2.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔