![]() |
|
|
#1 |
|
Dec 2014
3×5×17 Posts |
My Opteron uses the AMD Bulldozer architecture.
I was reading an article about it and they said Bulldozer shares an FPU for each pair of cores. (And AMD was sued for false advertising over this.) My baseline results using 24 workers on 24 cores look like [Dec 29 16:08] Setting affinity to run worker on logical CPU #23 [Dec 29 16:08] Resuming primality test of M41338133 using AMD K10 FFT length 2240K, Pass1=448, Pass2=5K [Dec 29 16:08] Iteration: 37709936 / 41338133 [91.22%]. [Dec 29 16:08] Iteration: 37710000 / 41338133 [91.22%], ms/iter: 55.080, ETA: 55:30:36 [Dec 29 16:23] Iteration: 37720000 / 41338133 [91.24%], ms/iter: 92.282, ETA: 3d 20:44 With 90 ms/iter the common value. I told prime95 to use 23 workers instead of 24 to see if the 23rd worker got better results. Things got 2x slower. [Dec 30 13:49] Setting affinity to run worker on any logical CPU. [Dec 30 13:49] Resuming primality test of M41338133 using AMD K10 FFT length 2240K, Pass1=448, Pass2=5K [Dec 30 13:49] Iteration: 38555632 / 41338133 [93.26%]. [Dec 30 14:03] Iteration: 38560000 / 41338133 [93.27%], ms/iter: 195.083, ETA: 6d 06:32 [Dec 30 14:36] Iteration: 38570000 / 41338133 [93.30%], ms/iter: 192.809, ETA: 6d 04:15 I noticed "Smart Affinity Assignment" did not set affinity in this case. On a NUMA machine letting threads move between nodes is VERY bad. So I manually assigned the workers to cores. [Dec 30 14:58] Setting affinity to run worker on logical CPU #23 [Dec 30 14:58] Resuming primality test of M41338133 using AMD K10 FFT length 2240K, Pass1=448, Pass2=5K [Dec 30 14:58] Iteration: 38575931 / 41338133 [93.31%]. [Dec 30 15:01] Iteration: 38580000 / 41338133 [93.32%], ms/iter: 54.967, ETA: 42:06:46 [Dec 30 15:11] Iteration: 38590000 / 41338133 [93.35%], ms/iter: 54.693, ETA: 41:45:03 The performance is almost twice as good as the original. I can tell prime95 to use 12 workers and manually assign workers to cores. |
|
|
|
|
|
#2 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
100111101011102 Posts |
If I understand correctly, you have arrived at the same setup which I have for my FX-8350. I treat each integer pair, with the associated FPU, as a "core". It has been a while since I set it up this way, and I may not still have the test results which got me here.
I think that I saw that a single integer unit, plus FPU, with the other integer unit not running P95, got better total results than running P95 with different LL assignments on the two integer units. Results were similar with LL on one integer "core" and P-1 on the other of the pair. I now run 4 worker windows, with LL/DC assignments on the odd numbered cores, and the even numbered cores as helper threads. My rationale is that in this way each FPU, with the associated caches, are only doing one job, with two integer units, thus avoiding conflict over resources. Results were similar when I was running P-1 with the same allocation scheme, with the exception that Stage 2 only used ~1-2/3 'cores' once all the RAM was allocated. I have been curious whether running 2 worker windows with 4 integer "cores" and 2 FPUs might perform any better through reducing memory contention, but some experiments made me think that this assignment scheme would not utilize the integer units fully. I don't know any way to see how hard a shared FPU is working.
|
|
|
|
|
|
#3 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2×3×1,693 Posts |
|
|
|
|
|
|
#4 | |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2×3×1,693 Posts |
Quote:
http://www.mersenneforum.org/showthread.php?t=20819 I am not sure how I managed to dump this response here. Sorry for the confusion. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| unable to detect some of the hyperthreaded logical cpus | owned139 | Hardware | 5 | 2015-01-11 21:47 |
| unable to detect some of the hyperthreaded logical cpus? | jarablue | Hardware | 3 | 2013-09-16 01:58 |
| Let's buy GIMPS an Opteron! | Xyzzy | Lounge | 264 | 2006-08-17 12:39 |
| Hyperthreaded Machines & V24.15 | Prime95 | Software | 29 | 2005-11-14 18:05 |
| AMD Opteron | naclosagc | Software | 27 | 2003-08-10 19:14 |