![]() |
[QUOTE=ixfd64;522529]Is this issue 100% reproducible?[/QUOTE]
On two systems I can do it on demand 100%. [QUOTE=Prime95;522536]Does it happen under Linux? Any chance either Evil Genius or mackerel could load Linux in a VM and try mprime?[/QUOTE] Not something I can do any time soon. [QUOTE=Prime95;522547]Do not run the all-complex FFTs. The crashes were with that checkbox off.[/QUOTE] If it helps, I went the other way. It still crashes if I check complex FFTs on Windows. |
[CODE]
Prime95 64-bit version 29.8, RdtscTiming=1 FFTlen=4800K, Type=3, Arch=4, Pass1=320, Pass2=15360, clm=4 (4 cores, 4 workers): 24.32, 24.22, 24.05, 24.21 ms. Throughput: 165.30 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=320, Pass2=15360, clm=2 (4 cores, 4 workers): 24.94, 25.10, 25.14, 24.90 ms. Throughput: 159.88 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=320, Pass2=15360, clm=1 (4 cores, 4 workers): 25.35, 25.56, 25.31, 25.15 ms. Throughput: 157.83 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=384, Pass2=12800, clm=4 (4 cores, 4 workers): 24.61, 23.98, 24.37, 24.19 ms. Throughput: 164.73 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=384, Pass2=12800, clm=2 (4 cores, 4 workers): 24.73, 24.94, 24.87, 24.69 ms. Throughput: 161.26 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=384, Pass2=12800, clm=1 (4 cores, 4 workers): 25.46, 25.24, 25.24, 25.18 ms. Throughput: 158.23 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=640, Pass2=7680, clm=4 (4 cores, 4 workers): 24.59, 24.63, 24.31, 24.28 ms. Throughput: 163.59 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=640, Pass2=7680, clm=2 (4 cores, 4 workers): 24.67, 24.60, 24.68, 24.58 ms. Throughput: 162.39 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=640, Pass2=7680, clm=1 (4 cores, 4 workers): 25.49, 25.45, 25.26, 25.43 ms. Throughput: 157.43 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=768, Pass2=6400, clm=4 (4 cores, 4 workers): 24.80, 24.51, 24.64, 24.70 ms. Throughput: 162.19 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=768, Pass2=6400, clm=2 (4 cores, 4 workers): 25.17, 25.14, 25.03, 25.22 ms. Throughput: 159.11 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=768, Pass2=6400, clm=1 (4 cores, 4 workers): 25.72, 25.79, 25.55, 25.77 ms. Throughput: 155.61 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=1280, Pass2=3840, clm=4 (4 cores, 4 workers): 24.10, 24.14, 24.41, 24.57 ms. Throughput: 164.58 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=1280, Pass2=3840, clm=2 (4 cores, 4 workers): 24.15, 23.99, 24.02, 24.23 ms. Throughput: 165.98 iter/sec. FFTlen=4800K, Type=3, Arch=4, Pass1=1280, Pass2=3840, clm=1 (4 cores, 4 workers): 24.88, 25.04, 25.09, 25.07 ms. Throughput: 159.89 iter/sec. [/CODE] |
So what we know:
1) Problem is Windows only. 2) Problem is only 4 cores / 4 workers (on 6-core and 8-core CPUs). 3) Problem occurs on 4800K FFT. Are other FFT sizes a problem? Perhaps clue #2 is the key. Maybe the bug can be reproduced on an Intel Windows machine when benchmarking fewer cores than are available. |
29.8b5 on my old 8 core Haswell-E the 4800K benchmark on 4 cores 4 workers works fine with and without all-complex FFTs.
|
[QUOTE=Prime95;522568]2) Problem is only 4 cores / 4 workers (on 6-core and 8-core CPUs).[/QUOTE]
This was not reproduced on 8086k (Intel 6 cores), so it isn't universal. I'm wondering if it is a CCX thing with Ryzen, and it would be interesting to try this with earlier generation. Edit: I've been thinking about changing the cooling on my Ryzen systems, so could use that opportunity to drop in one of the older CPUs again for a test. Might not be before the weekend. Edit 2: I've asked on another forum if anyone else can do the testing, might get results faster that way. |
When I select 4096k FFT, 2 cores, 2 workers, Prime95 goes haywire.
Error setting affinity to core #xyz. There are 8 cores. Error setting affinity to core #xyz. There are 8 cores. Error setting affinity to core #xyz. There are 8 cores. ... mprime is fine with it. |
Stack corruption?
Please set "AffinityVerbosityBench=3" in prime.txt and try again. It will print out a smidge more information. |
I get a lot of 'Affinity set to cpuset 0x00000003' or similar credible hexadecimal number but occasionally 'Error setting affinity to core #208. There are 8 cores.' or something similar.
|
Thinking out loud:
The bad core number is stored on the stack. There are a number of hwloc calls prior to attempting to set the affinity. The problem happens only under Windows and Zen 2. Possible causes: 1) Bug in prime95. Seems unlikely in that prime95 is executing the same code for both Intel and Zen2. 2) Bug in Zen 2. Seems unlikely. AMD has a big QA budget. 3) Bug in hwloc. Possible. Note the Linux hwloc output is different than the Windows hwloc output. Hwloc developers may not have tested on Zen 2 yet. 4) Bug in Windows. Hwloc gets much of its info from the OS. We've seen several cases where hwloc returns bad cache information because the OS isn't detecting caches properly. Not sure how to proceed from here. Hwloc bug repository had no relevant posts as of 2 days ago. |
I think a good question would be why is the cpu core the affinity is set to >= 8?
|
Although it will prove nothing if it succeeds, try running hwloc's stand-alone program called lstopo or lstopo-no-graphics from [url]https://www.open-mpi.org/software/hwloc/v2.0/[/url]
|
| All times are UTC. The time now is 18:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.