![]() |
|
|
#23 | |
|
"Ben"
Feb 2007
351310 Posts |
Quote:
Code:
-np 2 1x2 -t 20: 3 hrs 9 min -np 4 1x4 -t 10: 2 hrs 48 min -np 5 1x5 -t 8: 3 hrs 49 min -np 8 1x8 -t 5: 2 hrs 50 min |
|
|
|
|
|
|
#24 |
|
"Mike"
Aug 2002
25·257 Posts |
Here are binaries for 64-bit Linux with various "VBITS" flags set.
|
|
|
|
|
|
#25 |
|
"Mike"
Aug 2002
822410 Posts |
CPU = i5-10600K
RAM = 2×8GB DDR4-3200 CMD = ./msieve -v -nc -t 6 LA = 21988s
|
|
|
|
|
|
#26 |
|
"Mike"
Aug 2002
25×257 Posts |
Given that the 1920X and 3950X are pretty serious CPUs, does the result for the i5 seem abnormally fast?
CPU = 1920X RAM = 4×16GB DDR4-2666 CMD = ./msieve -v -nc -t 24 LA = 7h 58m 53s CPU = 3950X RAM = 2×8GB DDR4-3666 CMD = ./msieve -v -nc -t 16 LA = 7h 33m 00s CPU = i5-10600K RAM = 2×8GB DDR4-3200 CMD = ./msieve -v -nc -t 6 LA = 6h 06m 28s |
|
|
|
|
|
#27 | |
|
Apr 2010
2·83 Posts |
Quote:
-nc1: ~0h 43m 18s -nc2: ~0h 5m 15s until the multithreaded LA starts Timings for the multithreaded part: -nc2: estimated 3h 24m msieve compiled with gcc-9.3 -nc2: estimated 3h 25m msieve compiled with gcc-10.0 -nc2: estimated 3h 22m msieve compiled with clang-9 -nc2: estimated 3h 24m msieve compiled with clang-10 Fastest total without -nc3: ~4h 21m All runs with VBITS=256 and 32 threads. All other versions were slower. I tried the objects for each compiler twice, to ensure that the clang-9 one is indeed the fastest. Last fiddled with by Gimarel on 2020-10-30 at 12:39 |
|
|
|
|
|
|
#28 |
|
"Mike"
Aug 2002
25·257 Posts |
CPU = 5600X
RAM = 2×16GB DDR4-3200 CMD = ./msieve -v -nc -t 12 LA = 14805s
|
|
|
|
|
|
#29 |
|
"Mike"
Aug 2002
25×257 Posts |
CPU = 1920X
RAM = 4×16GB DDR4-2666 CMD = ./msieve -v -nc -t 24 LA = 7h 58m 53s CPU = 3950X RAM = 2×8GB DDR4-3666 CMD = ./msieve -v -nc -t 16 LA = 7h 33m 00s CPU = 5600X RAM = 2×16GB DDR4-3200 CMD = ./msieve -v -nc -t 12 LA = 4h 6m 45s We have used the same binary and the same setup/method for every benchmark we have posted. This 5600X result just doesn't seem right unless we had the 1920X and 3950X set up wrong or something.
|
|
|
|
|
|
#30 | |
|
Jun 2003
5,051 Posts |
Quote:
|
|
|
|
|
|
|
#31 | |
|
"Mike"
Aug 2002
202016 Posts |
Quote:
We don't have the 3950X anymore so we can't retest it.
|
|
|
|
|
|
|
#32 |
|
"Mike"
Aug 2002
25·257 Posts |
CPU = 10980XE (165W)
RAM = 8×32GB DDR4-3200 CMD = ./msieve -v -nc -t 18 LA = 16343s CPU = 10980XE (165W) RAM = 8×32GB DDR4-3200 CMD = ./msieve -v -nc -t 36 LA = 14709s
|
|
|
|
|
|
#33 |
|
Jul 2003
So Cal
2·34·13 Posts |
Each node has a Fujitsu A64FX 64-bit ARM processor with 48 cores and 32 GB HBM memory divided into 4 NUMA regions.
VBITS = 128 1 node 3h 30m 2 nodes 1h 58m 4 nodes 1h 10m 8 nodes 0h 41m VBITS makes a big difference for this processor 1 node VBITS = 64 4h 5m VBITS = 128 3h 30m VBITS = 256 5h 40m Two notes about compiling: The cache size must be set in the source since msieve doesn't detect it for ARM processors and the default is quite small. And removing the manual loop unrolling in the files in common/lanczos/cpu/ gives a small but consistent 1.5-2% improvement on this processor. Last fiddled with by frmky on 2021-05-09 at 09:26 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| PFGW benchmarking | carpetpool | Hardware | 4 | 2019-09-30 20:06 |
| Looking for benchmarking help with a Phenom or PhenomII X6 | mrolle | Software | 25 | 2012-03-14 14:15 |
| GMP 5.0.1 vs GMP 4.1.4 benchmarking | unconnected | GMP-ECM | 5 | 2011-04-03 16:16 |
| Benchmarking dual-CPU machines | garo | Software | 2 | 2010-09-27 20:33 |
| Benchmarking challenge! | Xyzzy | Software | 17 | 2003-08-26 15:43 |