mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2021-08-06, 20:02   #23
ryanp
 
ryanp's Avatar
 
Jun 2012
Boulder, CO

5×7×13 Posts
Default

Quote:
Originally Posted by frmky View Post
A 27.7M matrix in 21 hours. The A100 is a nice card!
Indeed!

Quote:
Do you have access to an A40 to try it? I'm curious if the slower global memory significantly increases the runtime.
No, sadly, "just" A100 and V100's.
ryanp is offline   Reply With Quote
Old 2021-08-06, 21:20   #24
ryanp
 
ryanp's Avatar
 
Jun 2012
Boulder, CO

1110001112 Posts
Default

On that note, though, are there any plans to support multiple GPUs? If a single A100 is this fast, 16x A100's with a fully interconnected fabric could probably tear through big matrices?
ryanp is offline   Reply With Quote
Old 2021-08-06, 21:41   #25
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

264610 Posts
Default

The current version supports multiple GPUs using MPI (compile with CUDA=1 MPI=1 CUDAAWARE=1) but relies on a good MPI implementation. OpenMPI's collectives transfer the data from card and do the reduction on the CPU. MVAPICH2-GDR I think keeps the reductions on the card, but SDSC doesn't have that working on Expanse GPU yet so I haven't been able to test it. I hope to have time on NCSA Delta later this fall to try it out.

Edit: My backup plan if that doesn't work out is to use a non-CUDA-aware MPI to pass IPC handles between processes and do the reduction on the GPU myself.

Edit 2: I've got a draft version working just now that passes vectors between GPUs using MPI CUDA-aware point-to-point comms (which uses NVLink or GPUDirect when available) then does the reduction on the GPU manually. In a quick test on a 43M matrix using two V100's connected with NVLink, this reduces LA time from nearly 90 hours when passing vectors through CPU memory to 57 56 hours transferring directly between GPUs.

Edit 3: It's now in GitHub. Just compile with a CUDA-Aware MPI like OpenMPI using CUDA=XX MPI=1 CUDAAWARE=1 where XX is replaced by the compute capability of your GPU.

Last fiddled with by frmky on 2021-08-12 at 08:20
frmky is offline   Reply With Quote
Old 2021-08-07, 06:44   #26
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2×33×72 Posts
Default

Code:
linear algebra completed 45452 of 42101088 dimensions (0.1%, ETA 21h 4m)
Using four V100's, I'm getting about 21 hours to solve a 42.1M matrix.
frmky is offline   Reply With Quote
Old 2021-08-07, 11:10   #27
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2·7·461 Posts
Default

Interesting! That's about a p3.8xlarge instance, for which the spot price is $4/hr, so that's $84 = £60 to solve the matrix.

I'm paying 19p/kWh here, and my Skylake machine uses about 250W and takes 820 hours for a 44M matrix, so that's £40 of electricity (but probably £60 in depreciation, assuming the £3360 machine will last five years); on another hand it's taking a month rather than a day, on a third hand that's still keeping up with my sieving resources.
fivemack is offline   Reply With Quote
Old 2021-08-07, 16:13   #28
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

1010010101102 Posts
Default

Code:
linear algebra completed 49005 of 84248506 dimensions (0.1%, ETA 94h30m)
And scaling well. The 84.2M matrix for 2,2162M should take about 4 days on four NVLink-connected V100's. It's using about 26GB on each card.
frmky is offline   Reply With Quote
Old 2021-08-07, 16:20   #29
ryanp
 
ryanp's Avatar
 
Jun 2012
Boulder, CO

5·7·13 Posts
Default

Quote:
Originally Posted by frmky View Post
Code:
linear algebra completed 49005 of 84248506 dimensions (0.1%, ETA 94h30m)
And scaling well. The 84.2M matrix for 2,2162M should take about 4 days on four NVLink-connected V100's. It's using about 26GB on each card.
That's quite impressive. I dug this up which I believe was your MPI run of a 109.4M matrix from a few months back?

Code:
linear algebra completed 20216008 of 109441779 dimensions (18.5%, ETA 854h19m)
ryanp is offline   Reply With Quote
Old 2021-08-07, 16:40   #30
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2×33×72 Posts
Default

Yes, that would have been on 6 Sandy Bridge nodes with 2x 10 core cpus each.

Here's the companion 2,2162L matrix, also 84.2M, running on 8 Fujitsu A64FX nodes.

Code:
Fri Jul  2 01:59:19 2021  linear algebra at 0.0%, ETA 337h 2m
frmky is offline   Reply With Quote
Old 2021-08-08, 00:00   #31
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

26·29 Posts
Default

Would something like work on my 3090? It has 24GB of ram on it, though I would have to get some help with compilation as I use WSL2, which doesn't support CUDA applications (yet).
wombatman is offline   Reply With Quote
Old 2021-08-08, 00:57   #32
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

1010010101102 Posts
Default

Quote:
Originally Posted by wombatman View Post
Would something like work on my 3090? It has 24GB of ram on it, though I would have to get some help with compilation as I use WSL2, which doesn't support CUDA applications (yet).
Yes, you could solve a matrix up to about 15M or so on the card. If you have at least 32 GB system memory, you could go a bit larger transferring the matrix from system memory as needed using CUDA managed memory. But I have no experience compiling msieve for Windows.
frmky is offline   Reply With Quote
Old 2021-08-11, 22:09   #33
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2×33×72 Posts
Default

The LA for 2,2162M, an 84.2M matrix, successfully completed on four NVLink-connected V100's in a total of 95.5 hours of runtime. There was a restart due to the 48-hour queue time limit on SDSC Expanse GPU. This run used just over 26GB of GPU memory on each of the four V100's.

Attached is a snapshot of the timeline for two block Lanzcos iterations on three of the four gpus. Per the time scale at the top, it takes just over 1 second/iteration. Over 80% of the time is spent in the SpMV routine. The transfer of vectors directly between GPU's takes relatively little time when NVLink is used.

Attached Thumbnails
Click image for larger version

Name:	2_2162M_timeline.png
Views:	169
Size:	219.1 KB
ID:	25449  
frmky is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Resume linear algebra Timic Msieve 35 2020-10-05 23:08
use msieve linear algebra after CADO-NFS filtering aein Msieve 2 2017-10-05 01:52
Has anyone tried linear algebra on a Threadripper yet? fivemack Hardware 3 2017-10-03 03:11
Linear algebra at 600% CRGreathouse Msieve 8 2009-08-05 07:25
Linear algebra proof Damian Math 8 2007-02-12 22:25

All times are UTC. The time now is 06:56.


Mon Jun 5 06:56:36 UTC 2023 up 291 days, 4:25, 0 users, load averages: 1.23, 1.04, 0.99

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔