mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 version 29.6/29.7/29.8 (https://www.mersenneforum.org/showthread.php?t=24094)

hansl 2019-08-12 17:48

I am testing a bunch more combinations of core count and worker counts and found some other situations causing errors or crashes.

First, I found that benchmarking all possible [1,24] core counts are fine if I limit to just 1 worker.

Then I found a couple more weird combinations which crashed:
[code]
[Worker #1 Aug 12 12:16] Timing 2560K FFT, 16 cores, 10 workers. Floating point exception (core dumped)
...
[Worker #1 Aug 12 12:20] Timing 2560K FFT, 16 cores, 12 workers. Floating point exception (core dumped)
[/code]
Should these combinations (where workers does not divide cores) even be allowed/attempted?

I then tried a more comprehensive test of all multiple of 2 core counts, and multiple of 2 workers, which ended up with a bunch of errors trying to set cpu affinities from #25 up to cpu #524
[code]


[Worker #1 Aug 12 12:24] Timing 2560K FFT, 2 cores, 2 workers. [Aug 12 12:24] Error setting affinity to core #25. There are 24 cores.
[Worker #1 Aug 12 12:24] Error setting affinity to core #26. There are 24 cores.
[Worker #1 Aug 12 12:24] Error setting affinity to core #27. There are 24 cores.
...
(Errors for all numbers core #25 through #524)
...
[Worker #1 Aug 12 12:24] Error setting affinity to core #523. There are 24 cores.
[Worker #1 Aug 12 12:24] Error setting affinity to core #524. There are 24 cores.


[Worker #1 Aug 12 12:24] Timing 2560K FFT, 4 cores, 2 workers. [Aug 12 12:24] Error setting affinity to core #25. There are 24 cores.
... 500 errors again ...
[Worker #1 Aug 12 12:26] Timing 2560K FFT, 4 cores, 4 workers. [Aug 12 12:26] Error setting affinity to core #25. There are 24 cores.
... 500 errors again ...
[Worker #1 Aug 12 12:26] Timing 2560K FFT, 6 cores, 2 workers. [Aug 12 12:26] Error setting affinity to core #25. There are 24 cores.
... 500 errors again ...
[Worker #1 Aug 12 12:27] Timing 2560K FFT, 6 cores, 4 workers. [Aug 12 12:27] Error setting affinity to core #25. There are 24 cores.
... 500 errors again ...
[Worker #1 Aug 12 12:27] Timing 2560K FFT, 6 cores, 6 workers. [Aug 12 12:27] Error setting affinity to core #25. There are 24 cores.
... 500 errors again ...
[Worker #1 Aug 12 12:28] Timing 2560K FFT, 8 cores, 2 workers. [Aug 12 12:28] Error setting affinity to core #25. There are 24 cores.
... 500 errors again ...
[Worker #1 Aug 12 12:28] Timing 2560K FFT, 8 cores, 4 workers. [Aug 12 12:28] Error setting affinity to core #25. There are 24 cores.
... 500 errors again ...
[Worker #1 Aug 12 12:28] Timing 2560K FFT, 8 cores, 6 workers. [Aug 12 12:28] Error setting affinity to core #25. There are 24 cores.
... 500 errors again ...
[Worker #1 Aug 12 12:29] Timing 2560K FFT, 8 cores, 8 workers. [Aug 12 12:29] Error setting affinity to core #25. There are 24 cores.
... 500 errors again ...
[Worker #1 Aug 12 12:29] Timing 2560K FFT, 10 cores, 2 workers. [Aug 12 12:29] Error setting affinity to core #25. There are 24 cores.
... 500 errors again ...
[Worker #1 Aug 12 12:29] Timing 2560K FFT, 10 cores, 4 workers. [Aug 12 12:29] Error setting affinity to core #25. There are 24 cores.
... 500 errors again ...
[Worker #1 Aug 12 12:29] Timing 2560K FFT, 10 cores, 6 workers. Floating point exception (core dumped)
[/code]

I tried to keep the output small and manageable for this, so I didn't have the "AffinityVerbosityBench=3" set as in my previous crash report.

Please let me know if there's any specific configuration you'd like me to provide more detailed logs for.

Prime95 2019-08-18 03:55

[QUOTE=hansl;523605]I am testing a bunch more combinations of core count and worker counts and found some other situations causing errors or crashes. [/QUOTE]

I've finally found the source of this bug! I'll work on a fix.

For now, put "NumThreadingNodes=1" in local.txt and the problem should go away.

hansl 2019-08-18 17:14

[QUOTE=Prime95;523848]I've finally found the source of this bug! I'll work on a fix.

For now, put "NumThreadingNodes=1" in local.txt and the problem should go away.[/QUOTE]
Great! I was able to test again all multiples of 2 cores and workers with this and none failed.

One more question: I noticed that higher worker count seems to add to the time of the individual benchmark stages, even though I am testing at supposedly 5 seconds each. I was just manually looking at the clock and counting seconds during a run, and it seemed that 24 cores/1 worker was around 5-6s but 24 cores/24 workers took about 36 seconds.
Is that expected, and would the throughput numbers still be correct if its expecting 5 seconds?

Prime95 2019-08-18 18:00

[QUOTE=hansl;523876] I noticed that higher worker count seems to add to the time of the individual benchmark stages, even though I am testing at supposedly 5 seconds each. I was just manually looking at the clock and counting seconds during a run, and it seemed that 24 cores/1 worker was around 5-6s but 24 cores/24 workers took about 36 seconds.
Is that expected, and would the throughput numbers still be correct if its expecting 5 seconds?[/QUOTE]

The process is to launch 24 threads, init them all, then wait for all to complete initialization, then do the 5 seconds of counting iterations. I hope the increased run time is for doing 24 initializations vs. just one initialization. I'll add an option to print a message that all workers have finished initialization so that you can see if the wall clock time for the single worker and 24 worker cases are both about 5 seconds once initialization completes.

Prime95 2019-08-18 18:10

In 29.8 build 6 add "BenchInitCompleteMessage=1" to prime.txt.

hansl 2019-08-18 20:17

[QUOTE=Prime95;523883]In 29.8 build 6 add "BenchInitCompleteMessage=1" to prime.txt.[/QUOTE]

OK, thank you! Does this mean the build is out already? If so can you link to the linux 64bit version?

Prime95 2019-08-18 21:17

29.8 build 6 is now ready. See first post.

GP2 2019-08-19 01:06

[QUOTE=Prime95;523898]29.8 build 6 is now ready. See first post.[/QUOTE]

Chrome will soon be dropping support for the FTP protocol, starting with version 82. It's currently at version 76.

Can we change the links in the first post to:
[c]https://www.mersenne.org/ftp_root/gimps/p95v298b6.linux64.tar.gz[/c]
and so forth?

That will have the added benefit of being more secure than downloading executables over an unencrypted connection.

S485122 2019-08-19 05:47

[QUOTE=Prime95;523898]29.8 build 6 is now ready. See first post.[/QUOTE]Trying to run the win64 version I get the following error message : "prime95,exe - Application error / The application was unable to start correctly (0xc000007b). Click OK to close the application." It seems the file libhwloc-15.dll is the culprit : using the version from 29.8b3 does not give the problem. (Could it be that the dll is the 32 bits version ? it is smaller than the version shipped with 29.8b3...)

Then a cosmetic correction is also needed : in the Windows 64 version, the File Version and the Product version are stuck at 28.8.1.0 and 29.8.0.0.

Jacob

nomead 2019-08-19 11:43

Was running 29.6b3... now updated to 29.8b6, Linux 64-bit.

I changed the CPU on one machine, from Ryzen 3 2200G to Ryzen 5 3600. It seems to be running fine with the memory at 3600 MHz, even on a B350 motherboard (too early to tell for sure, but a few hours of torture tests were OK). Just a few questions, though.

Is there anything that I need to do because the hardware has now basically changed quite a bit? I already changed the number of cores used from 4 to 6.

And on Zen 2, what FFT is supposed to be used, FMA3 or AVX2? For some reason, the program selects FMA3 (FFT length 2688K). On the 2200G, I could force AVX2 through options in local.txt, but of course, it ran slower than FMA3, as expected. On the 3600, it gives this error, when trying to continue from a savefile:
[CODE][Work thread Aug 19 14:37] Cannot initialize FFT code, errcode=1002
[Work thread Aug 19 14:37] Number sent to gwsetup is too large for the FFTs to handle.
[/CODE]

CPU info from results.bench.txt :(I'm not including the cache lines, probably not relevant info anyway)
[CODE]AMD Ryzen 5 3600 6-Core Processor
CPU speed: 4184.81 MHz, 6 hyperthreaded cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2
L1 cache size: 6x32 KB, L2 cache size: 6x512 KB, L3 cache size: 2x16 MB
L1 cache line size: 64 bytes, L2 cache line size: 64 bytes
Machine topology as determined by hwloc library:
Machine#0 (total=16351064KB, DMIProductName="To Be Filled By O.E.M.", DMIProductVersion="To Be Filled By O.E.M.", DMIBoardVendor=ASRock, DMIBoardName="AB350M Pro4", DMIBoardVersion=, DMIBoardAssetTag=, DMIChassisVendor="To Be Filled By O.E.M.", DMIChassisType=3, DMIChassisVersion="To Be Filled By O.E.M.", DMIChassisAssetTag="To Be Filled By O.E.M.", DMIBIOSVendor="American Megatrends Inc.", DMIBIOSVersion=P6.00, DMIBIOSDate=08/02/2019, DMISysVendor="To Be Filled By O.E.M.", Backend=Linux, LinuxCgroup=/, OSName=Linux, OSRelease=4.19.0-2-amd64, OSVersion="#1 SMP Debian 4.19.16-1 (2019-01-17)", HostName=palvi2, Architecture=x86_64, hwlocVersion=2.0.4, ProcessName=mprime)
Package#0 (total=16351064KB, CPUVendor=AuthenticAMD, CPUFamilyNumber=23, CPUModelNumber=113, CPUModel="AMD Ryzen 5 3600 6-Core Processor ", CPUStepping=0)
[/CODE]

Random 2019-08-19 18:50

[QUOTE=S485122;523925]Trying to run the win64 version I get the following error message : "prime95,exe - Application error / The application was unable to start correctly (0xc000007b). Click OK to close the application." It seems the file libhwloc-15.dll is the culprit : using the version from 29.8b3 does not give the problem. (Could it be that the dll is the 32 bits version ? it is smaller than the version shipped with 29.8b3...)

Then a cosmetic correction is also needed : in the Windows 64 version, the File Version and the Product version are stuck at 28.8.1.0 and 29.8.0.0.

Jacob[/QUOTE]

I can confirm the error. 8700K CPU on win10 x64 1903. Tested in a clean VM as well and received the same error.


All times are UTC. The time now is 22:52.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.