mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   AMD Zen speculations (https://www.mersenneforum.org/showthread.php?t=20992)

ldesnogu 2017-03-11 14:15

[QUOTE=Prime95;454597]I'm wondering if Ryzen's hardware prefetcher is being too active, wasting memory bandwidth prefetching data that won't be needed.[/QUOTE]
I don't know if Prime95 uses explicit prefetch instructions so this might be pointless. I've seen slides where AMD advise to remove all prefetch instructions and let the HW prefetcher do its job.

ewmayer 2017-03-11 23:35

[QUOTE=ldesnogu;454679]I don't know if Prime95 uses explicit prefetch instructions so this might be pointless. I've seen slides where AMD advise to remove all prefetch instructions and let the HW prefetcher do its job.[/QUOTE]

Prime95 makes extensive use of explicit prefetch - George cites something like a 30% speedup from doing so (as in the code runs ~50% faster).

Prime95 2017-03-12 01:11

[QUOTE=ewmayer;454711]Prime95 makes extensive use of explicit prefetch - George cites something like a 30% speedup from doing so (as in the code runs ~50% faster).[/QUOTE]

Note that any numbers I've given in the past about speedup due to prefetch was for single core FFTs. I've not determined the advantage, if any, of prefetching on today's memory bandwidth limited CPUs.

Prime95 2017-03-12 01:42

Just did the tests for a 4M FFT on Skylake.

Single core: 17.7ms w/ prefetch, 19.9 ms w/o prefetch (11% faster)

All core throughput: 178 iter/s w/ prefetch, 171 iter/s w/o prefetch (4% faster)

ewmayer 2017-03-12 01:57

[QUOTE=Prime95;454719]Just did the tests for a 4M FFT on Skylake.

Single core: 17.7ms w/ prefetch, 19.9 ms w/o prefetch (11% faster)

All core throughput: 178 iter/s w/ prefetch, 171 iter/s w/o prefetch (4% faster)[/QUOTE]

Can you shoot the sans-prefetch code - in whatever form is most convenient for the purpose - to someone with a Zen system?

Prime95 2017-03-12 04:17

64-bit linux version: [url]https://www.dropbox.com/s/9n3cfkumuqykbbp/mprime?dl=0[/url]

airsquirrels 2017-03-12 15:15

[QUOTE=Prime95;454723]64-bit linux version: [url]https://www.dropbox.com/s/9n3cfkumuqykbbp/mprime?dl=0[/url][/QUOTE]

For fun I tried running this on a Dual E5-2698 v3 (16 core) system.

It updated local.init to use CorePerTest=16, and the logs indicated my 2 workers were assigned CPUs 0-15 and 16-31 respectively, however HTOP shows works running on arbitrary logical CPUS, 28.9 is nice and orderly 100% on the first 32 listed cores. I was not able to find a configuration that used the proper cores, so I could not evaluate any changes from the lack of prefetch instructions.

Prime95 2017-03-12 16:46

[QUOTE=airsquirrels;454750]It updated local.init to use CorePerTest=16, and the logs indicated my 2 workers were assigned CPUs 0-15 and 16-31 respectively, however HTOP shows works running on arbitrary logical CPUS, 28.9 is nice and orderly 100% on the first 32 listed cores. I was not able to find a configuration that used the proper cores, so I could not evaluate any changes from the lack of prefetch instructions.[/QUOTE]

I'm not sure what you are saying here. I take it this is for doing real work, not benchmarking. You expected 2 workers using 16 cores each and all 16 cores from the same Xeon chip.

What is the hwloc output? This is output to results.txt when you do a benchmark.

What is prime95's output about assigning helper threads to CPUs?

What is HTOP's output that makes you think something is wrong?

Mark Rose 2017-03-12 17:31

It sounds like his htop output is showing processor affinity isn't working.

airsquirrels 2017-03-13 02:45

[QUOTE=Prime95;454756]I'm not sure what you are saying here. I take it this is for doing real work, not benchmarking. You expected 2 workers using 16 cores each and all 16 cores from the same Xeon chip.

What is the hwloc output? This is output to results.txt when you do a benchmark.

What is prime95's output about assigning helper threads to CPUs?

What is HTOP's output that makes you think something is wrong?[/QUOTE]

I will run the benchmark tomorrow. In htop the first 32 cores represent the 16 physical cores on the two xeons, with their corresponding hyperthreaded cores numbered 32-47, and 48-63. With the pre-hwloc build using AffinityScramble2 set all in natural sequence, Affinity set to 0 for worker 1 and 16 for worker 2, this results in the correct physical cores behind consistently 100% utilization.

When I run this build it shows assigning the helper threads to the correct cores (0-15 for the first worker, 16-31 for the second), just the same as the live mprime. The assignments don't see to be sticking, however, as actual htop output is now all over the place without any apparent pinning of helpers to specific cores. Performance is also lower (14-15ms/tier on 100M digit exponents vs. 13)

Prime95 2017-03-13 05:05

29.1 assigns affinity to both hyperthreads on a core. That is, if prime95 decides to assign to core #0, it assigns it to logical cores #0 and #32. My thought is that assigning to either logical CPU should result in the same performance.

There is a way to override prime95's affinity setting algorithm in local.txt -- much better than the ugly AffinityScamble settings. You can be the first to test it! Create two sections in local.txt: [Worker #1] and [#Worker #2]. Then use this syntax copied from comments in the code:
[CODE]/* Parse affinity settings specified in the INI file. */
/* We accept several syntaxes in an INI file: */
/* 3,6,9 Run main worker thread on logical CPU #3, run two aux threads on logical CPUs #6 & #9 */
/* 3-4,5-6 Run main worker thread on logical CPUs #3 & #4, run aux thread on logical CPUs #5 & #6 */
/* {3,5,7},{4,6} Run main worker thread on logical CPUs #3, #5, & #7, run aux thread on logical CPUs #4 & #6 */
/* (3,5,7),(4,6) Run main worker thread on logical CPUs #3, #5, & #7, run aux thread on logical CPUs #4 & #6 */
/* [3,5-7],(4,6) Run main worker thread on logical CPUs #3, #5, #6, & #7, run aux thread on logical CPUs #4 & #6 */
[/CODE]


All times are UTC. The time now is 19:57.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.