mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2021-05-09, 19:09   #34
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

8,311 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
CPU = 10980XE (165W)
RAM = 8×32GB DDR4-3200
CMD = ./msieve -v -nc -t 36
LA = 14709s
CPU = 10980XE (999W)
RAM = 8×32GB DDR4-3200
CMD = ./msieve -v -nc -t 36
LA = 12758s


Xyzzy is offline   Reply With Quote
Old 2021-05-09, 19:12   #35
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

831110 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
LA = 12758s
3h32m38s is a new record, for us!

But look at this weird message:

Msieve v. 1.54 (SVN 1030)
...
commencing linear algebra
...
commencing Lanczos iteration (32 threads)
...


We specified 36 threads but msieve only used 32 threads.

Xyzzy is offline   Reply With Quote
Old 2021-05-09, 20:16   #36
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

42218 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We specified 36 threads but msieve only used 32 threads.
common/lanczos/cpu/lanczos_cpu.h:#define MAX_THREADS 32
frmky is offline   Reply With Quote
Old 2021-05-09, 22:24   #37
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

3×17×43 Posts
Default

One node with 2 x Cavium ThunderX2 CN9980 32-core 64-bit ARM cpus and DDR4 memory.

VBITS = 64 2h 57m
VBITS = 128 2h 1m
VBITS = 256 2h 2m
frmky is offline   Reply With Quote
Old 2021-05-10, 04:38   #38
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

3×17×43 Posts
Default

nVidia Tesla V100 with now old CUDA code that only supports 64-bit vectors
1h 26m
53 minutes after many code changes.

Last fiddled with by frmky on 2021-06-22 at 06:07
frmky is offline   Reply With Quote
Old 2021-05-12, 12:06   #39
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3·1,181 Posts
Default

Quote:
Originally Posted by frmky View Post
One node with 2 x Cavium ThunderX2 CN9980 32-core 64-bit ARM cpus and DDR4 memory.

VBITS = 64 2h 57m
VBITS = 128 2h 1m
VBITS = 256 2h 2m
[batman]Where does he get those wonderful toys??[/batman]

How difficult was the porting effort needed to run on ARM?
jasonp is offline   Reply With Quote
Old 2021-05-12, 18:47   #40
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

3×17×43 Posts
Default

Quote:
Originally Posted by jasonp View Post
How difficult was the porting effort needed to run on ARM?
Set the cache size in the source, optionally remove the loop unrolling, set the optimization flags for the machine in Makefile (really just -Ofast -mcpu=native is usually fine) and compile. In the end I just used OpenMPI and GCC 10. I also tried the arm and Cray compilers, but GCC 10 was just as fast.

This exercise shattered my presumption that ARM cpus were efficient but slow.

Last fiddled with by frmky on 2021-05-12 at 18:55
frmky is offline   Reply With Quote
Old 2021-05-12, 19:03   #41
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

3·149 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
This also helps others see if perhaps their msieve copy isn't as fast as it could be (e.g. compiling it oneself can prove *much* faster if the binary one finds online isn't compiled for the same architecture).

That's a bit of a problem with open-source benchmarks. The performance depends on the compiler and the computer. On really needs to compare hardware using the same binary. It would be worth the source code reporting the compiler version used. I believe that there are some pre-defined values in GCC that indicate the compiler version. I guess the benchmark could check its own md5 checksum, and report that when it runs. Then at least one would know if the exact same binary is being reported each time.
drkirkby is offline   Reply With Quote
Old 2021-05-12, 21:32   #42
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

23×54 Posts
Default

Well, no- in this case, that's the *benefit*, not the problem. If my compiler is vastly faster than yours, these benchmarks can show you that maybe you could try compiling yourself / with my compiler to get more speed.

We aren't benchmarking to compare hardware nearly as much as we're trying to share info on how to make msieve run faster.

We'd much rather compare various compilations of msieve than have one standardized binary that might not be fastest just for the sake of comparing hardware.

That said, there's a place for directly comparing hardware without software variations as you suggest; but in the context of this thread it's a secondary priority. There's just too many instructions available on some chips but not others- if we used a binary that runs on v2-era DDR3 Xeons, it would leave modern CPUS with more advanced instruction sets crippled compared to their potential speed. That's not a helpful comparison.
VBCurtis is offline   Reply With Quote
Old 2021-07-30, 21:56   #43
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

3×17×43 Posts
Default

Quote:
Originally Posted by frmky View Post
nVidia Tesla V100 with now old CUDA code that only supports 64-bit vectors
1h 26m
53 minutes after many code changes.
After reimplementing the CUDA SpMV with CUB, the Tesla V100 now takes 36 minutes.
frmky is offline   Reply With Quote
Old 2021-08-02, 15:20   #44
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

201678 Posts
Default

GPU = Quadro RTX 8000
LA = 3331s


Now that we can run msieve on a GPU we have no reason to ever run it on a CPU again. This result is 3.8× faster (!) than the best CPU time we ever recorded!



Thanks to frmky for the instructions to get it working. We had to do a few extra steps but if we were able to figure it out anybody can!
Code:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 8000     Off  | 00000000:17:00.0  On |                    0 |
|  0%   54C    P2   260W / 260W |   4888MiB / 45550MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2257      G   /usr/libexec/Xorg                 166MiB |
|    0   N/A  N/A      2632      G   /usr/bin/gnome-shell               73MiB |
|    0   N/A  N/A      3313      G   /usr/lib64/firefox/firefox          3MiB |
|    0   N/A  N/A     19361      G   /usr/lib64/firefox/firefox         53MiB |
|    0   N/A  N/A     73091      G   /usr/lib64/firefox/firefox         53MiB |
|    0   N/A  N/A     73137      G   /usr/lib64/firefox/firefox          3MiB |
|    0   N/A  N/A    215174      C   ./msieve                         4529MiB |
+-----------------------------------------------------------------------------+
Attached Files
File Type: log msieve.log (14.0 KB, 40 views)
Xyzzy is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PFGW benchmarking carpetpool Hardware 4 2019-09-30 20:06
Looking for benchmarking help with a Phenom or PhenomII X6 mrolle Software 25 2012-03-14 14:15
GMP 5.0.1 vs GMP 4.1.4 benchmarking unconnected GMP-ECM 5 2011-04-03 16:16
Benchmarking dual-CPU machines garo Software 2 2010-09-27 20:33
Benchmarking challenge! Xyzzy Software 17 2003-08-26 15:43

All times are UTC. The time now is 07:39.


Sun Oct 24 07:39:24 UTC 2021 up 93 days, 2:08, 0 users, load averages: 0.94, 1.15, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.