![]() |
I am testing a bunch more combinations of core count and worker counts and found some other situations causing errors or crashes.
First, I found that benchmarking all possible [1,24] core counts are fine if I limit to just 1 worker. Then I found a couple more weird combinations which crashed: [code] [Worker #1 Aug 12 12:16] Timing 2560K FFT, 16 cores, 10 workers. Floating point exception (core dumped) ... [Worker #1 Aug 12 12:20] Timing 2560K FFT, 16 cores, 12 workers. Floating point exception (core dumped) [/code] Should these combinations (where workers does not divide cores) even be allowed/attempted? I then tried a more comprehensive test of all multiple of 2 core counts, and multiple of 2 workers, which ended up with a bunch of errors trying to set cpu affinities from #25 up to cpu #524 [code] [Worker #1 Aug 12 12:24] Timing 2560K FFT, 2 cores, 2 workers. [Aug 12 12:24] Error setting affinity to core #25. There are 24 cores. [Worker #1 Aug 12 12:24] Error setting affinity to core #26. There are 24 cores. [Worker #1 Aug 12 12:24] Error setting affinity to core #27. There are 24 cores. ... (Errors for all numbers core #25 through #524) ... [Worker #1 Aug 12 12:24] Error setting affinity to core #523. There are 24 cores. [Worker #1 Aug 12 12:24] Error setting affinity to core #524. There are 24 cores. [Worker #1 Aug 12 12:24] Timing 2560K FFT, 4 cores, 2 workers. [Aug 12 12:24] Error setting affinity to core #25. There are 24 cores. ... 500 errors again ... [Worker #1 Aug 12 12:26] Timing 2560K FFT, 4 cores, 4 workers. [Aug 12 12:26] Error setting affinity to core #25. There are 24 cores. ... 500 errors again ... [Worker #1 Aug 12 12:26] Timing 2560K FFT, 6 cores, 2 workers. [Aug 12 12:26] Error setting affinity to core #25. There are 24 cores. ... 500 errors again ... [Worker #1 Aug 12 12:27] Timing 2560K FFT, 6 cores, 4 workers. [Aug 12 12:27] Error setting affinity to core #25. There are 24 cores. ... 500 errors again ... [Worker #1 Aug 12 12:27] Timing 2560K FFT, 6 cores, 6 workers. [Aug 12 12:27] Error setting affinity to core #25. There are 24 cores. ... 500 errors again ... [Worker #1 Aug 12 12:28] Timing 2560K FFT, 8 cores, 2 workers. [Aug 12 12:28] Error setting affinity to core #25. There are 24 cores. ... 500 errors again ... [Worker #1 Aug 12 12:28] Timing 2560K FFT, 8 cores, 4 workers. [Aug 12 12:28] Error setting affinity to core #25. There are 24 cores. ... 500 errors again ... [Worker #1 Aug 12 12:28] Timing 2560K FFT, 8 cores, 6 workers. [Aug 12 12:28] Error setting affinity to core #25. There are 24 cores. ... 500 errors again ... [Worker #1 Aug 12 12:29] Timing 2560K FFT, 8 cores, 8 workers. [Aug 12 12:29] Error setting affinity to core #25. There are 24 cores. ... 500 errors again ... [Worker #1 Aug 12 12:29] Timing 2560K FFT, 10 cores, 2 workers. [Aug 12 12:29] Error setting affinity to core #25. There are 24 cores. ... 500 errors again ... [Worker #1 Aug 12 12:29] Timing 2560K FFT, 10 cores, 4 workers. [Aug 12 12:29] Error setting affinity to core #25. There are 24 cores. ... 500 errors again ... [Worker #1 Aug 12 12:29] Timing 2560K FFT, 10 cores, 6 workers. Floating point exception (core dumped) [/code] I tried to keep the output small and manageable for this, so I didn't have the "AffinityVerbosityBench=3" set as in my previous crash report. Please let me know if there's any specific configuration you'd like me to provide more detailed logs for. |
[QUOTE=hansl;523605]I am testing a bunch more combinations of core count and worker counts and found some other situations causing errors or crashes. [/QUOTE]
I've finally found the source of this bug! I'll work on a fix. For now, put "NumThreadingNodes=1" in local.txt and the problem should go away. |
[QUOTE=Prime95;523848]I've finally found the source of this bug! I'll work on a fix.
For now, put "NumThreadingNodes=1" in local.txt and the problem should go away.[/QUOTE] Great! I was able to test again all multiples of 2 cores and workers with this and none failed. One more question: I noticed that higher worker count seems to add to the time of the individual benchmark stages, even though I am testing at supposedly 5 seconds each. I was just manually looking at the clock and counting seconds during a run, and it seemed that 24 cores/1 worker was around 5-6s but 24 cores/24 workers took about 36 seconds. Is that expected, and would the throughput numbers still be correct if its expecting 5 seconds? |
[QUOTE=hansl;523876] I noticed that higher worker count seems to add to the time of the individual benchmark stages, even though I am testing at supposedly 5 seconds each. I was just manually looking at the clock and counting seconds during a run, and it seemed that 24 cores/1 worker was around 5-6s but 24 cores/24 workers took about 36 seconds.
Is that expected, and would the throughput numbers still be correct if its expecting 5 seconds?[/QUOTE] The process is to launch 24 threads, init them all, then wait for all to complete initialization, then do the 5 seconds of counting iterations. I hope the increased run time is for doing 24 initializations vs. just one initialization. I'll add an option to print a message that all workers have finished initialization so that you can see if the wall clock time for the single worker and 24 worker cases are both about 5 seconds once initialization completes. |
In 29.8 build 6 add "BenchInitCompleteMessage=1" to prime.txt.
|
[QUOTE=Prime95;523883]In 29.8 build 6 add "BenchInitCompleteMessage=1" to prime.txt.[/QUOTE]
OK, thank you! Does this mean the build is out already? If so can you link to the linux 64bit version? |
29.8 build 6 is now ready. See first post.
|
[QUOTE=Prime95;523898]29.8 build 6 is now ready. See first post.[/QUOTE]
Chrome will soon be dropping support for the FTP protocol, starting with version 82. It's currently at version 76. Can we change the links in the first post to: [c]https://www.mersenne.org/ftp_root/gimps/p95v298b6.linux64.tar.gz[/c] and so forth? That will have the added benefit of being more secure than downloading executables over an unencrypted connection. |
[QUOTE=Prime95;523898]29.8 build 6 is now ready. See first post.[/QUOTE]Trying to run the win64 version I get the following error message : "prime95,exe - Application error / The application was unable to start correctly (0xc000007b). Click OK to close the application." It seems the file libhwloc-15.dll is the culprit : using the version from 29.8b3 does not give the problem. (Could it be that the dll is the 32 bits version ? it is smaller than the version shipped with 29.8b3...)
Then a cosmetic correction is also needed : in the Windows 64 version, the File Version and the Product version are stuck at 28.8.1.0 and 29.8.0.0. Jacob |
Was running 29.6b3... now updated to 29.8b6, Linux 64-bit.
I changed the CPU on one machine, from Ryzen 3 2200G to Ryzen 5 3600. It seems to be running fine with the memory at 3600 MHz, even on a B350 motherboard (too early to tell for sure, but a few hours of torture tests were OK). Just a few questions, though. Is there anything that I need to do because the hardware has now basically changed quite a bit? I already changed the number of cores used from 4 to 6. And on Zen 2, what FFT is supposed to be used, FMA3 or AVX2? For some reason, the program selects FMA3 (FFT length 2688K). On the 2200G, I could force AVX2 through options in local.txt, but of course, it ran slower than FMA3, as expected. On the 3600, it gives this error, when trying to continue from a savefile: [CODE][Work thread Aug 19 14:37] Cannot initialize FFT code, errcode=1002 [Work thread Aug 19 14:37] Number sent to gwsetup is too large for the FFTs to handle. [/CODE] CPU info from results.bench.txt :(I'm not including the cache lines, probably not relevant info anyway) [CODE]AMD Ryzen 5 3600 6-Core Processor CPU speed: 4184.81 MHz, 6 hyperthreaded cores CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2 L1 cache size: 6x32 KB, L2 cache size: 6x512 KB, L3 cache size: 2x16 MB L1 cache line size: 64 bytes, L2 cache line size: 64 bytes Machine topology as determined by hwloc library: Machine#0 (total=16351064KB, DMIProductName="To Be Filled By O.E.M.", DMIProductVersion="To Be Filled By O.E.M.", DMIBoardVendor=ASRock, DMIBoardName="AB350M Pro4", DMIBoardVersion=, DMIBoardAssetTag=, DMIChassisVendor="To Be Filled By O.E.M.", DMIChassisType=3, DMIChassisVersion="To Be Filled By O.E.M.", DMIChassisAssetTag="To Be Filled By O.E.M.", DMIBIOSVendor="American Megatrends Inc.", DMIBIOSVersion=P6.00, DMIBIOSDate=08/02/2019, DMISysVendor="To Be Filled By O.E.M.", Backend=Linux, LinuxCgroup=/, OSName=Linux, OSRelease=4.19.0-2-amd64, OSVersion="#1 SMP Debian 4.19.16-1 (2019-01-17)", HostName=palvi2, Architecture=x86_64, hwlocVersion=2.0.4, ProcessName=mprime) Package#0 (total=16351064KB, CPUVendor=AuthenticAMD, CPUFamilyNumber=23, CPUModelNumber=113, CPUModel="AMD Ryzen 5 3600 6-Core Processor ", CPUStepping=0) [/CODE] |
[QUOTE=S485122;523925]Trying to run the win64 version I get the following error message : "prime95,exe - Application error / The application was unable to start correctly (0xc000007b). Click OK to close the application." It seems the file libhwloc-15.dll is the culprit : using the version from 29.8b3 does not give the problem. (Could it be that the dll is the 32 bits version ? it is smaller than the version shipped with 29.8b3...)
Then a cosmetic correction is also needed : in the Windows 64 version, the File Version and the Product version are stuck at 28.8.1.0 and 29.8.0.0. Jacob[/QUOTE] I can confirm the error. 8700K CPU on win10 x64 1903. Tested in a clean VM as well and received the same error. |
| All times are UTC. The time now is 22:52. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.