mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Crashes in Prime95 with Zen 2 (https://www.mersenneforum.org/showthread.php?t=24620)

Evil Genius 2019-07-25 22:27

Crashes in Prime95 with Zen 2
 
I've been running Prime95 for some 10 years. Recently I replaced my Ryzen 7 1700, which had been working flawlessly, with a brand new Ryzen 7 3700x, the rest being kept the same. Since then, every night around 6 am for some reason, Prime95 crashes with an access violation.


Hardware:


AMD Ryzen 7 3700x (stock settings)

Asrock X370 Killer SLI with BIOS 5.40 (latest)

4x Kingston 4 GB DDR4-2133 ECC (memtest approved, of course)


Software:


Prime95 v29.8 build 3


Fault addresses (relative to image base):


0x1bc4f03
0x1bc50b9
0x1bc4f03


Normally I'd suspect the hardware, but the fault addresses all occur in the same subroutine and twice on the same address. Also the memory has been tested well and is ECC protected. So I hope the author is willing to take a look.

Prime95 2019-07-27 00:09

I suspect it is the automated benchmark crashing.

Try doing a thoughput benchmark on a 4096K FFT. Select "Benchmark all implementations...".

If it crashes post results.bench.txt. Then we'll get a Zen 1 user to do the same thing to see which FFT implementation is crashing.

Evil Genius 2019-07-27 02:14

It didn't crash at all with a 4096K FFT, so I decided to test what I'm currently working on: 4800K FFTs, 4 cores, 4 workers. It crashed immediately, so results.bench.txt only contains topology information, I'm afraid. You still want it?

Prime95 2019-07-27 04:38

No need.

Add "Autobench=0" to prime.txt while I investigate.

Prime95 2019-07-27 05:12

On second thought, do send the output.

My code review turns up nothing suspicious. For grins, please try adding "CpuSupports3DNow=0" in local.txt. I don't think that will make a difference.

One other possibility is a bug in the hwloc library.

Debugging may require remote access to zen 2 machine. Preferably linux.

Evil Genius 2019-07-27 10:26

[CODE]AMD Ryzen 7 3700X 8-Core Processor
CPU speed: 4165.85 MHz, 8 hyperthreaded cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 8x32 KB, L2 cache size: 8x512 KB, L3 cache size: 2x16 MB
L1 cache line size: 64 bytes, L2 cache line size: 64 bytes
Machine topology as determined by hwloc library:
Machine#0 (total=11269920KB, Backend=Windows, hwlocVersion=2.0.3, ProcessName=prime95.exe)
Package (total=11269920KB, CPUVendor=AuthenticAMD, CPUFamilyNumber=23, CPUModelNumber=113, CPUModel="AMD Ryzen 7 3700X 8-Core Processor ", CPUStepping=0)
L3 (size=16384KB, linesize=64, ways=16, Inclusive=0)
L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x00000003)
PU#0 (cpuset: 0x00000001)
PU#1 (cpuset: 0x00000002)
L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x0000000c)
PU#2 (cpuset: 0x00000004)
PU#3 (cpuset: 0x00000008)
L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x00000030)
PU#4 (cpuset: 0x00000010)
PU#5 (cpuset: 0x00000020)
L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x000000c0)
PU#6 (cpuset: 0x00000040)
PU#7 (cpuset: 0x00000080)
L3 (size=16384KB, linesize=64, ways=16, Inclusive=0)
L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x00000300)
PU#8 (cpuset: 0x00000100)
PU#9 (cpuset: 0x00000200)
L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x00000c00)
PU#10 (cpuset: 0x00000400)
PU#11 (cpuset: 0x00000800)
L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x00003000)
PU#12 (cpuset: 0x00001000)
PU#13 (cpuset: 0x00002000)
L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
Core (cpuset: 0x0000c000)
PU#14 (cpuset: 0x00004000)
PU#15 (cpuset: 0x00008000)
Prime95 64-bit version 29.8, RdtscTiming=1[/CODE]

Evil Genius 2019-07-27 13:20

You most probably already know this, but Zen 2 has double the AVX bandwith compared to Zen 1, so it can process AVX 256-bit at full speed. My own FFT implementation benchmark (determining the order of 18782*(2^32-1)^4096+1) went from 2m19 to 1m49, a 27.5% speed increase.


Also, why is there no exception reporting with a full register dump in Prime95? That helps enormously with fault finding. I could send you some source if needed.

Evil Genius 2019-07-28 10:18

"AutoBench=0" did the trick for now!

Prime95 2019-07-28 22:14

Any other Zen 2 users out there? Do they crash too on a 4800K all implementations FFT?

mackerel 2019-07-28 23:08

Bed time now, but if you don't get more reports before tomorrow I can try it too.

mackerel 2019-07-29 08:13

A quick run before I go into work, it crashed repeatably.

Settings:
CPU 3700X
P95 29.8b5
Windows 10, 64 bit
min/max FFT: 4800k
Unselected "benchmark HT"
4 cores
4 workers
Happens with "benchmark all implementations" checked and unchecked!

Looking at the output window, it is starting to do a test and then crashes almost immediately. There's a second or so of running before it does so, and there's nothing in the output other than hwloc stuff. When crashing the application closes without any further notice. No errors displayed in Windows.

If I leave it on default of 8 cores, 1, 2, 8 workers, that runs normally. So it seems limited to 4 cores/4 worker setting.

ATH 2019-07-29 08:25

Sounds like it is a problem with hwloc?

I think you can disable hwloc with this line in prime.txt, maybe you can check if that helps?
EnableSetAffinity=0

mackerel 2019-07-29 08:52

I'm at work now so it will be some time before I can do any follow up testing. It seems to get past hwloc ok, and the crash happens when running the fft. I guess the question now is, what is different about running 4c4w than 8c 1,2,8w?

Forgot to say, I quickly tried same on a 6700k at 4c4w, ran normally without problem.

Edit: didn't think at the time, wonder if 8 cores, 4 workers would work...

Prime95 2019-07-29 13:49

Does not sound like an hwloc problem.

mackerel 2019-07-29 17:43

To recap and cover new testing:

Prime95 29.8b5
Windows 10 64-bit (probably all on 1903)
4800k FFT throughput benchmark

3700X (Zen 2, 8 cores)
8 cores 1, 2, 4, 8 workers: ok
4 cores, 4 workers: crashes

3600 (Zen 2, 6 cores)
6 cores, 1, 2, 6 workers: ok
4 cores, 4 workers: crashes

6700k (Skylake 4 cores)
4 cores, 4 workers: ok

8086k (Coffee Lake 6 cores)
6 cores, 1, 6 workers: ok
4 cores, 4 workers: ok

I can't easily test older Ryzen generations as I dropped the new CPUs into the systems that had them.

ixfd64 2019-07-29 18:44

Is this issue 100% reproducible?

Prime95 2019-07-29 19:29

[QUOTE=ixfd64;522529]Is this issue 100% reproducible?[/QUOTE]

Probably. It has affected 2 different users - both Windows.

Does it happen under Linux? Any chance either Evil Genius or mackerel could load Linux in a VM and try mprime?

Evil Genius 2019-07-29 19:53

I have it running in the Linux subsystem. What parameters do you want me to use with mprime?

Evil Genius 2019-07-29 20:03

[QUOTE=ixfd64;522529]Is this issue 100% reproducible?[/QUOTE]


Yes. I'm just one of the early adopters.

Prime95 2019-07-29 20:27

[QUOTE=Evil Genius;522537]I have it running in the Linux subsystem. What parameters do you want me to use with mprime?[/QUOTE]

./mprime -m

Then choose Benchmark and the same options you used under Windows.

Evil Genius 2019-07-29 20:52

[CODE][Mon Jul 29 22:46:46 2019]
Compare your results to other computers at http://www.mersenne.org/report_benchmarks
AMD Ryzen 7 3700X 8-Core Processor
CPU speed: 4281.77 MHz, 8 hyperthreaded cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 8x32 KB, L2 cache size: 512 KB, L3 cache size: 32 MB
L1 cache line size: 64 bytes, L2 cache line size: 64 bytes
Machine topology as determined by hwloc library:
Machine#0 (total=16706384KB, Backend=Linux, OSName=Linux, OSRelease=4.4.0-18362-Microsoft, OSVersion="#1-Microsoft Mon Mar 18 12:02:00 PST 2019", HostName=zenstation, Architecture=x86_64, hwlocVersion=2.0.3, ProcessName=mprime)
Package#0 (total=16706384KB, CPUVendor=AuthenticAMD, CPUFamilyNumber=23, CPUModelNumber=113, CPUModel="AMD Ryzen 7 3700X 8-Core Processor ", CPUStepping=0)
Core#0 (cpuset: 0x00000003)
PU#0 (cpuset: 0x00000001)
PU#1 (cpuset: 0x00000002)
Core#1 (cpuset: 0x0000000c)
PU#2 (cpuset: 0x00000004)
PU#3 (cpuset: 0x00000008)
Core#2 (cpuset: 0x00000030)
PU#4 (cpuset: 0x00000010)
PU#5 (cpuset: 0x00000020)
Core#3 (cpuset: 0x000000c0)
PU#6 (cpuset: 0x00000040)
PU#7 (cpuset: 0x00000080)
Core#4 (cpuset: 0x00000300)
PU#8 (cpuset: 0x00000100)
PU#9 (cpuset: 0x00000200)
Core#5 (cpuset: 0x00000c00)
PU#10 (cpuset: 0x00000400)
PU#11 (cpuset: 0x00000800)
Core#6 (cpuset: 0x00003000)
PU#12 (cpuset: 0x00001000)
PU#13 (cpuset: 0x00002000)
Core#7 (cpuset: 0x0000c000)
PU#14 (cpuset: 0x00004000)
PU#15 (cpuset: 0x00008000)
Prime95 64-bit version 29.8, RdtscTiming=1
FFTlen=4800K all-complex, Type=3, Arch=4, Pass1=384, Pass2=12800, clm=4 (4 cores, 4 workers): 23.54, 23.94, 23.82, 23.91 ms. Throughput: 168.08 iter/sec.
FFTlen=4800K all-complex, Type=3, Arch=4, Pass1=384, Pass2=12800, clm=2 (4 cores, 4 workers): 24.53, 24.65, 24.51, 24.55 ms. Throughput: 162.87 iter/sec.
FFTlen=4800K all-complex, Type=3, Arch=4, Pass1=384, Pass2=12800, clm=1 (4 cores, 4 workers): 25.53, 25.51, 25.21, 25.63 ms. Throughput: 157.05 iter/sec.
FFTlen=4800K all-complex, Type=3, Arch=4, Pass1=640, Pass2=7680, clm=4 (4 cores, 4 workers): 23.93, 23.73, 23.92, 23.51 ms. Throughput: 168.28 iter/sec.
FFTlen=4800K all-complex, Type=3, Arch=4, Pass1=640, Pass2=7680, clm=2 (4 cores, 4 workers): 24.26, 24.22, 24.41, 24.42 ms. Throughput: 164.43 iter/sec.
FFTlen=4800K all-complex, Type=3, Arch=4, Pass1=640, Pass2=7680, clm=1 (4 cores, 4 workers): 25.18, 25.28, 25.16, 25.14 ms. Throughput: 158.79 iter/sec.
FFTlen=4800K all-complex, Type=3, Arch=4, Pass1=768, Pass2=6400, clm=4 (4 cores, 4 workers): 24.01, 23.44, 23.92, 23.89 ms. Throughput: 167.98 iter/sec.
FFTlen=4800K all-complex, Type=3, Arch=4, Pass1=768, Pass2=6400, clm=2 (4 cores, 4 workers): 24.36, 24.35, 24.40, 24.46 ms. Throughput: 163.99 iter/sec.
FFTlen=4800K all-complex, Type=3, Arch=4, Pass1=768, Pass2=6400, clm=1 (4 cores, 4 workers): 25.19, 24.74, 24.78, 25.21 ms. Throughput: 160.14 iter/sec.
FFTlen=4800K all-complex, Type=3, Arch=4, Pass1=1280, Pass2=3840, clm=4 (4 cores, 4 workers): 24.02, 23.44, 23.37, 23.29 ms. Throughput: 170.01 iter/sec.
FFTlen=4800K all-complex, Type=3, Arch=4, Pass1=1280, Pass2=3840, clm=2 (4 cores, 4 workers): 24.11, 24.07, 24.09, 24.25 ms. Throughput: 165.77 iter/sec.
FFTlen=4800K all-complex, Type=3, Arch=4, Pass1=1280, Pass2=3840, clm=1 (4 cores, 4 workers): 24.56, 24.70, 24.68, 24.63 ms. Throughput: 162.31 iter/sec.[/CODE]


mprime did not crash

Prime95 2019-07-29 21:01

Do not run the all-complex FFTs. The crashes were with that checkbox off.

mackerel 2019-07-29 21:57

[QUOTE=ixfd64;522529]Is this issue 100% reproducible?[/QUOTE]
On two systems I can do it on demand 100%.

[QUOTE=Prime95;522536]Does it happen under Linux? Any chance either Evil Genius or mackerel could load Linux in a VM and try mprime?[/QUOTE]
Not something I can do any time soon.

[QUOTE=Prime95;522547]Do not run the all-complex FFTs. The crashes were with that checkbox off.[/QUOTE]
If it helps, I went the other way. It still crashes if I check complex FFTs on Windows.

Evil Genius 2019-07-30 02:16

[CODE]
Prime95 64-bit version 29.8, RdtscTiming=1
FFTlen=4800K, Type=3, Arch=4, Pass1=320, Pass2=15360, clm=4 (4 cores, 4 workers): 24.32, 24.22, 24.05, 24.21 ms. Throughput: 165.30 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=320, Pass2=15360, clm=2 (4 cores, 4 workers): 24.94, 25.10, 25.14, 24.90 ms. Throughput: 159.88 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=320, Pass2=15360, clm=1 (4 cores, 4 workers): 25.35, 25.56, 25.31, 25.15 ms. Throughput: 157.83 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=384, Pass2=12800, clm=4 (4 cores, 4 workers): 24.61, 23.98, 24.37, 24.19 ms. Throughput: 164.73 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=384, Pass2=12800, clm=2 (4 cores, 4 workers): 24.73, 24.94, 24.87, 24.69 ms. Throughput: 161.26 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=384, Pass2=12800, clm=1 (4 cores, 4 workers): 25.46, 25.24, 25.24, 25.18 ms. Throughput: 158.23 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=640, Pass2=7680, clm=4 (4 cores, 4 workers): 24.59, 24.63, 24.31, 24.28 ms. Throughput: 163.59 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=640, Pass2=7680, clm=2 (4 cores, 4 workers): 24.67, 24.60, 24.68, 24.58 ms. Throughput: 162.39 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=640, Pass2=7680, clm=1 (4 cores, 4 workers): 25.49, 25.45, 25.26, 25.43 ms. Throughput: 157.43 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=768, Pass2=6400, clm=4 (4 cores, 4 workers): 24.80, 24.51, 24.64, 24.70 ms. Throughput: 162.19 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=768, Pass2=6400, clm=2 (4 cores, 4 workers): 25.17, 25.14, 25.03, 25.22 ms. Throughput: 159.11 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=768, Pass2=6400, clm=1 (4 cores, 4 workers): 25.72, 25.79, 25.55, 25.77 ms. Throughput: 155.61 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=1280, Pass2=3840, clm=4 (4 cores, 4 workers): 24.10, 24.14, 24.41, 24.57 ms. Throughput: 164.58 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=1280, Pass2=3840, clm=2 (4 cores, 4 workers): 24.15, 23.99, 24.02, 24.23 ms. Throughput: 165.98 iter/sec.
FFTlen=4800K, Type=3, Arch=4, Pass1=1280, Pass2=3840, clm=1 (4 cores, 4 workers): 24.88, 25.04, 25.09, 25.07 ms. Throughput: 159.89 iter/sec.
[/CODE]

Prime95 2019-07-30 02:58

So what we know:

1) Problem is Windows only.
2) Problem is only 4 cores / 4 workers (on 6-core and 8-core CPUs).
3) Problem occurs on 4800K FFT.

Are other FFT sizes a problem?

Perhaps clue #2 is the key. Maybe the bug can be reproduced on an Intel Windows machine when benchmarking fewer cores than are available.

ATH 2019-07-30 06:31

29.8b5 on my old 8 core Haswell-E the 4800K benchmark on 4 cores 4 workers works fine with and without all-complex FFTs.

mackerel 2019-07-30 08:32

[QUOTE=Prime95;522568]2) Problem is only 4 cores / 4 workers (on 6-core and 8-core CPUs).[/QUOTE]
This was not reproduced on 8086k (Intel 6 cores), so it isn't universal. I'm wondering if it is a CCX thing with Ryzen, and it would be interesting to try this with earlier generation.

Edit: I've been thinking about changing the cooling on my Ryzen systems, so could use that opportunity to drop in one of the older CPUs again for a test. Might not be before the weekend.
Edit 2: I've asked on another forum if anyone else can do the testing, might get results faster that way.

Evil Genius 2019-07-30 18:56

When I select 4096k FFT, 2 cores, 2 workers, Prime95 goes haywire.


Error setting affinity to core #xyz. There are 8 cores.
Error setting affinity to core #xyz. There are 8 cores.
Error setting affinity to core #xyz. There are 8 cores.
...



mprime is fine with it.

Prime95 2019-07-30 19:44

Stack corruption?

Please set "AffinityVerbosityBench=3" in prime.txt and try again. It will print out a smidge more information.

Evil Genius 2019-07-30 19:54

I get a lot of 'Affinity set to cpuset 0x00000003' or similar credible hexadecimal number but occasionally 'Error setting affinity to core #208. There are 8 cores.' or something similar.

Prime95 2019-07-30 20:44

Thinking out loud:

The bad core number is stored on the stack.
There are a number of hwloc calls prior to attempting to set the affinity.
The problem happens only under Windows and Zen 2.

Possible causes:
1) Bug in prime95. Seems unlikely in that prime95 is executing the same code for both Intel and Zen2.
2) Bug in Zen 2. Seems unlikely. AMD has a big QA budget.
3) Bug in hwloc. Possible. Note the Linux hwloc output is different than the Windows hwloc output. Hwloc developers may not have tested on Zen 2 yet.
4) Bug in Windows. Hwloc gets much of its info from the OS. We've seen several cases where hwloc returns bad cache information because the OS isn't detecting caches properly.

Not sure how to proceed from here. Hwloc bug repository had no relevant posts as of 2 days ago.

Evil Genius 2019-07-30 20:54

I think a good question would be why is the cpu core the affinity is set to >= 8?

Prime95 2019-07-30 21:05

Although it will prove nothing if it succeeds, try running hwloc's stand-alone program called lstopo or lstopo-no-graphics from [url]https://www.open-mpi.org/software/hwloc/v2.0/[/url]

Evil Genius 2019-07-31 18:02

Machine (11GB total) + Package L#0
NUMANode L#0 (P#0 11GB)
L3 L#0 (16MB)
L2 L#0 (512KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#1)
L2 L#1 (512KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
PU L#2 (P#2)
PU L#3 (P#3)
L2 L#2 (512KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2
PU L#4 (P#4)
PU L#5 (P#5)
L2 L#3 (512KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
PU L#6 (P#6)
PU L#7 (P#7)
L3 L#1 (16MB)
L2 L#4 (512KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4
PU L#8 (P#8)
PU L#9 (P#9)
L2 L#5 (512KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5
PU L#10 (P#10)
PU L#11 (P#11)
L2 L#6 (512KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
PU L#12 (P#12)
PU L#13 (P#13)
L2 L#7 (512KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
PU L#14 (P#14)
PU L#15 (P#15)

ewmayer 2019-07-31 21:31

[QUOTE=Prime95;522641]Although it will prove nothing if it succeeds, try running hwloc's stand-alone program called lstopo or lstopo-no-graphics from [url]https://www.open-mpi.org/software/hwloc/v2.0/[/url][/QUOTE]

Is there a way to compare the hwloc-reported topology on Windows with that given by hwloc (or simply in /proc/cpuinfo) on a same-CPU Linux system?

mackerel 2019-08-02 19:35

Since I'm tinkering with the boxes now, I just reproduced the problem with Windows 7, which strictly speaking neither MS nor AMD support with Zen 2 CPUs.

The interesting thing is, the first time I tried running 4 cores, 4 workers with a 3600, I got a load of "Error setting affinity to core #xyz. There are 6 cores." messages on screen, before Windows reported an application error. Nothing in log after hwloc. Subsequent runs just went to the application error without those affinity messages, presumably due to something written in config files after 1st run.

Might take some time but I'm going to drop in a 2600 shortly to see if that is also affected. I didn't get any testers on the other forum I posted on.

mackerel 2019-08-02 20:13

1 Attachment(s)
2600 temporarily installed. I tried 4 cores, 4 workers, and it crashed just like it did on 3600 and 3700X. Since it was mentioned earlier in the thread, I also tried 2 cores 2 workers, and it also gave a load of affinity errors but completed without crashing. The errors didn't appear in the log, but as it didn't crash I was able to copy and save it in attached text file.

So new information right now is:
It also happens in Windows 7, not just Windows 10.
It also happens with Zen+ CPU, not limited to Zen 2.

I have a crazy idea to try out, back shortly :)

Edit: and the results are in. I went into the bios and disabled half the cores, so it is running in 3+0 configuration. One CCX. Tried a bench with 2 core 2 workers, ran fine, no errors. Similar 3c3w. Is there something about splitting work across CCX that is causing the problem?

PhilF 2019-08-02 22:38

After it crashes without an error message, have you checked the Windows Event Viewer to see the code reported by the application crash?

hansl 2019-08-19 13:19

I don't have a Zen 2 to test on, but would the recent fix in 29.8b6 apply to the issues in this thread?

Prime95 2019-08-19 20:10

[QUOTE=hansl;523934]I don't have a Zen 2 to test on, but would the recent fix in 29.8b6 apply to the issues in this thread?[/QUOTE]

Yes. The bug was in prime95 running on a CPU with multiple L3 caches. The benchmark code that makes sure a worker's threads are all running in the same L3 cache was flawed.

ewmayer 2019-08-20 19:24

[QUOTE=Prime95;523965]Yes. The bug was in prime95 running on a CPU with multiple L3 caches. The benchmark code that makes sure a worker's threads are all running in the same L3 cache was flawed.[/QUOTE]

Is this an extension to the core-affinity considerations? I.e. do various cores statically map to a given L3 cache, or is that mapping something the OS can fiddle at runtime?

Prime95 2019-08-20 20:12

[QUOTE=ewmayer;524051]Is this an extension to the core-affinity considerations? I.e. do various cores statically map to a given L3 cache, or is that mapping something the OS can fiddle at runtime?[/QUOTE]

Maybe the OS is smart enough to group different threads from the same process into the same L3 cache -- or maybe not. Hwloc libraries give you enough control to ensure this happens.


All times are UTC. The time now is 18:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.