![]() |
Multi-threading on AMD FX CPUs
I'm not sure if this is the best place. Mods: relocation welcome if appropriate.
This is an account of P95 doing DC LL work with cores and worker threads arrange in different ways. FX-8350, stock 4 GHz, 32 GB DDR3-1600, 9-9-9-24, Dual channel, unganged To reduce heat and improve computer responsiveness, I reduced the DC workers from 8 cores to 6 cores. Whilst distributing the leftover assignments, I realized that I could experiment with multi-thread performance. I remember an offhand comment which raised the question if FX-type chips, with FPUs shared between pairs of integer units, would behave the same as chips with a 1:1 integer to FPU ratio, when running multi-threaded workers. I started by setting the number of workers and arranging worktodo.txt for 3 workers. I assigned the three workers to cores 1, 3, and 5, and set threads to 2. This causes the helper threads to go to the even-numbered cores, so that the primary thread is sharing the FPU with the helper. After running for a week or so like this, I changed the worker setting to single thread, but left the threads assigned to 1-3-5. This gives a single integer unit full use of the FPU. Finally, I changed the thread assignments to cores 1-2-3. Now two different assignments are sharing the FPU of 1-2. I have screenshots of P95 under these various conditions. In all cases, I waited to have several screen outputs in the captures. The results are so consistent that I have boiled it down to the rough numbers below. 32.5M range One helper thread: ~19ms/it Single thread, unshared FPU: ~24.4ms/it Single thread, shared FPU: ~36.5ms/it |
Interesting.
It doesn't surprise me that the two-threads-one-assignment is twice as fast as two-threads-two-assignments. What IS rather neat is the fact that the helper thread which doesn't run on its own core is actually helping. I'd be interested in seeing this attempted on an Intel CPU. So, did you go down from 8 DC workers to 6 or from 4 to 3? |
[QUOTE=TheMawn;380197]Interesting.
It doesn't surprise me that the two-threads-one-assignment is twice as fast as two-threads-two-assignments. What IS rather neat is the fact that the helper thread which doesn't run on its own core is actually helping. I'd be interested in seeing this attempted on an Intel CPU. So, did you go down from 8 DC workers to 6 or from 4 to 3?[/QUOTE] I had been running 8 workers single threaded. I'm currently doing 3 workers double threaded. It does seem that there is still a slight advantage to running single thread workers, though, so I may go to 6 workers. |
| All times are UTC. The time now is 00:48. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.