mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Crashes in Prime95 with Zen 2 (https://www.mersenneforum.org/showthread.php?t=24620)

mackerel 2019-07-29 21:57

[QUOTE=ixfd64;522529]Is this issue 100% reproducible?[/QUOTE]
On two systems I can do it on demand 100%.

[QUOTE=Prime95;522536]Does it happen under Linux? Any chance either Evil Genius or mackerel could load Linux in a VM and try mprime?[/QUOTE]
Not something I can do any time soon.

[QUOTE=Prime95;522547]Do not run the all-complex FFTs. The crashes were with that checkbox off.[/QUOTE]
If it helps, I went the other way. It still crashes if I check complex FFTs on Windows.

Evil Genius 2019-07-30 02:16

[CODE]
Prime95 64-bit version 29.8, RdtscTiming=1
FFTlen=4800K, Type=3, Arch=4, Pass1=320, Pass2=15360, clm=4 (4 cores, 4 workers): 24.32, 24.22, 24.05, 24.21 ms. Throughput: 165.30 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=320, Pass2=15360, clm=2 (4 cores, 4 workers): 24.94, 25.10, 25.14, 24.90 ms. Throughput: 159.88 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=320, Pass2=15360, clm=1 (4 cores, 4 workers): 25.35, 25.56, 25.31, 25.15 ms. Throughput: 157.83 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=384, Pass2=12800, clm=4 (4 cores, 4 workers): 24.61, 23.98, 24.37, 24.19 ms. Throughput: 164.73 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=384, Pass2=12800, clm=2 (4 cores, 4 workers): 24.73, 24.94, 24.87, 24.69 ms. Throughput: 161.26 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=384, Pass2=12800, clm=1 (4 cores, 4 workers): 25.46, 25.24, 25.24, 25.18 ms. Throughput: 158.23 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=640, Pass2=7680, clm=4 (4 cores, 4 workers): 24.59, 24.63, 24.31, 24.28 ms. Throughput: 163.59 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=640, Pass2=7680, clm=2 (4 cores, 4 workers): 24.67, 24.60, 24.68, 24.58 ms. Throughput: 162.39 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=640, Pass2=7680, clm=1 (4 cores, 4 workers): 25.49, 25.45, 25.26, 25.43 ms. Throughput: 157.43 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=768, Pass2=6400, clm=4 (4 cores, 4 workers): 24.80, 24.51, 24.64, 24.70 ms. Throughput: 162.19 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=768, Pass2=6400, clm=2 (4 cores, 4 workers): 25.17, 25.14, 25.03, 25.22 ms. Throughput: 159.11 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=768, Pass2=6400, clm=1 (4 cores, 4 workers): 25.72, 25.79, 25.55, 25.77 ms. Throughput: 155.61 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=1280, Pass2=3840, clm=4 (4 cores, 4 workers): 24.10, 24.14, 24.41, 24.57 ms. Throughput: 164.58 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=1280, Pass2=3840, clm=2 (4 cores, 4 workers): 24.15, 23.99, 24.02, 24.23 ms. Throughput: 165.98 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=1280, Pass2=3840, clm=1 (4 cores, 4 workers): 24.88, 25.04, 25.09, 25.07 ms. Throughput: 159.89 iter/sec.
[/CODE]

Prime95 2019-07-30 02:58

So what we know:

1) Problem is Windows only.
2) Problem is only 4 cores / 4 workers (on 6-core and 8-core CPUs).
3) Problem occurs on 4800K FFT.

Are other FFT sizes a problem?

Perhaps clue #2 is the key. Maybe the bug can be reproduced on an Intel Windows machine when benchmarking fewer cores than are available.

ATH 2019-07-30 06:31

29.8b5 on my old 8 core Haswell-E the 4800K benchmark on 4 cores 4 workers works fine with and without all-complex FFTs.

mackerel 2019-07-30 08:32

[QUOTE=Prime95;522568]2) Problem is only 4 cores / 4 workers (on 6-core and 8-core CPUs).[/QUOTE]
This was not reproduced on 8086k (Intel 6 cores), so it isn't universal. I'm wondering if it is a CCX thing with Ryzen, and it would be interesting to try this with earlier generation.

Edit: I've been thinking about changing the cooling on my Ryzen systems, so could use that opportunity to drop in one of the older CPUs again for a test. Might not be before the weekend.
Edit 2: I've asked on another forum if anyone else can do the testing, might get results faster that way.

Evil Genius 2019-07-30 18:56

When I select 4096k FFT, 2 cores, 2 workers, Prime95 goes haywire.


Error setting affinity to core #xyz. There are 8 cores.
Error setting affinity to core #xyz. There are 8 cores.
Error setting affinity to core #xyz. There are 8 cores.
...



mprime is fine with it.

Prime95 2019-07-30 19:44

Stack corruption?

Please set "AffinityVerbosityBench=3" in prime.txt and try again. It will print out a smidge more information.

Evil Genius 2019-07-30 19:54

I get a lot of 'Affinity set to cpuset 0x00000003' or similar credible hexadecimal number but occasionally 'Error setting affinity to core #208. There are 8 cores.' or something similar.

Prime95 2019-07-30 20:44

Thinking out loud:

The bad core number is stored on the stack.
There are a number of hwloc calls prior to attempting to set the affinity.
The problem happens only under Windows and Zen 2.

Possible causes:
1) Bug in prime95. Seems unlikely in that prime95 is executing the same code for both Intel and Zen2.
2) Bug in Zen 2. Seems unlikely. AMD has a big QA budget.
3) Bug in hwloc. Possible. Note the Linux hwloc output is different than the Windows hwloc output. Hwloc developers may not have tested on Zen 2 yet.
4) Bug in Windows. Hwloc gets much of its info from the OS. We've seen several cases where hwloc returns bad cache information because the OS isn't detecting caches properly.

Not sure how to proceed from here. Hwloc bug repository had no relevant posts as of 2 days ago.

Evil Genius 2019-07-30 20:54

I think a good question would be why is the cpu core the affinity is set to >= 8?

Prime95 2019-07-30 21:05

Although it will prove nothing if it succeeds, try running hwloc's stand-alone program called lstopo or lstopo-no-graphics from [url]https://www.open-mpi.org/software/hwloc/v2.0/[/url]


All times are UTC. The time now is 18:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.