mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2015-12-31, 13:17   #1
bgbeuning
 
Dec 2014

3×5×17 Posts
Default Opteron is Hyperthreaded ?

My Opteron uses the AMD Bulldozer architecture.
I was reading an article about it and they said
Bulldozer shares an FPU for each pair of cores.
(And AMD was sued for false advertising over this.)

My baseline results using 24 workers on 24 cores look like

[Dec 29 16:08] Setting affinity to run worker on logical CPU #23
[Dec 29 16:08] Resuming primality test of M41338133 using AMD K10 FFT length 2240K, Pass1=448, Pass2=5K
[Dec 29 16:08] Iteration: 37709936 / 41338133 [91.22%].
[Dec 29 16:08] Iteration: 37710000 / 41338133 [91.22%], ms/iter: 55.080, ETA: 55:30:36
[Dec 29 16:23] Iteration: 37720000 / 41338133 [91.24%], ms/iter: 92.282, ETA: 3d 20:44

With 90 ms/iter the common value.

I told prime95 to use 23 workers instead of 24 to see if
the 23rd worker got better results. Things got 2x slower.

[Dec 30 13:49] Setting affinity to run worker on any logical CPU.
[Dec 30 13:49] Resuming primality test of M41338133 using AMD K10 FFT length 2240K, Pass1=448, Pass2=5K
[Dec 30 13:49] Iteration: 38555632 / 41338133 [93.26%].
[Dec 30 14:03] Iteration: 38560000 / 41338133 [93.27%], ms/iter: 195.083, ETA: 6d 06:32
[Dec 30 14:36] Iteration: 38570000 / 41338133 [93.30%], ms/iter: 192.809, ETA: 6d 04:15

I noticed "Smart Affinity Assignment" did not set affinity in this case.
On a NUMA machine letting threads move between nodes is VERY bad.
So I manually assigned the workers to cores.

[Dec 30 14:58] Setting affinity to run worker on logical CPU #23
[Dec 30 14:58] Resuming primality test of M41338133 using AMD K10 FFT length 2240K, Pass1=448, Pass2=5K
[Dec 30 14:58] Iteration: 38575931 / 41338133 [93.31%].
[Dec 30 15:01] Iteration: 38580000 / 41338133 [93.32%], ms/iter: 54.967, ETA: 42:06:46
[Dec 30 15:11] Iteration: 38590000 / 41338133 [93.35%], ms/iter: 54.693, ETA: 41:45:03

The performance is almost twice as good as the original.

I can tell prime95 to use 12 workers and manually assign workers to cores.
bgbeuning is offline   Reply With Quote
Old 2015-12-31, 16:33   #2
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

100111101011102 Posts
Default

If I understand correctly, you have arrived at the same setup which I have for my FX-8350. I treat each integer pair, with the associated FPU, as a "core". It has been a while since I set it up this way, and I may not still have the test results which got me here.

I think that I saw that a single integer unit, plus FPU, with the other integer unit not running P95, got better total results than running P95 with different LL assignments on the two integer units. Results were similar with LL on one integer "core" and P-1 on the other of the pair.

I now run 4 worker windows, with LL/DC assignments on the odd numbered cores, and the even numbered cores as helper threads. My rationale is that in this way each FPU, with the associated caches, are only doing one job, with two integer units, thus avoiding conflict over resources. Results were similar when I was running P-1 with the same allocation scheme, with the exception that Stage 2 only used ~1-2/3 'cores' once all the RAM was allocated.

I have been curious whether running 2 worker windows with 4 integer "cores" and 2 FPUs might perform any better through reducing memory contention, but some experiments made me think that this assignment scheme would not utilize the integer units fully. I don't know any way to see how hard a shared FPU is working.
kladner is offline   Reply With Quote
Old 2016-01-10, 07:19   #3
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by kladner View Post
If I understand correctly, you have arrived at the same setup which I have for my FX-8350......
Apologies. On rereading, I see that my post was barely tangential to yours. I have the nagging feeling that this response belonged in another thread.
kladner is offline   Reply With Quote
Old 2016-01-10, 08:26   #4
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by kladner View Post
Apologies. On rereading, I see that my post was barely tangential to yours. I have the nagging feeling that this response belonged in another thread.
Indeed, I found the other thread.
http://www.mersenneforum.org/showthread.php?t=20819
I am not sure how I managed to dump this response here. Sorry for the confusion.
kladner is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
unable to detect some of the hyperthreaded logical cpus owned139 Hardware 5 2015-01-11 21:47
unable to detect some of the hyperthreaded logical cpus? jarablue Hardware 3 2013-09-16 01:58
Let's buy GIMPS an Opteron! Xyzzy Lounge 264 2006-08-17 12:39
Hyperthreaded Machines & V24.15 Prime95 Software 29 2005-11-14 18:05
AMD Opteron naclosagc Software 27 2003-08-10 19:14

All times are UTC. The time now is 00:50.


Sat Jul 17 00:50:32 UTC 2021 up 49 days, 22:37, 1 user, load averages: 1.74, 1.55, 1.42

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.