mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2023-06-04, 11:56   #144
swellman
 
swellman's Avatar
 
Jun 2012

24·3·5·17 Posts
Default

Quote:
Originally Posted by Jarod View Post
Out of curiosity is there a compiled GPU version for Windows 11 if not are there any plans the one being made?
Try here. The site has most of the popular factoring tools compiled on various CPU microarchitectures, with some GPU versions as well.
swellman is online now   Reply With Quote
Old 2023-06-14, 17:22   #145
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

7×383 Posts
Default

Lambda Labs now has H100's available for $2.40/hr. It will also solve a 29M matrix for about $30 without the complexity of OpenMPI or shutdowns.
frmky is offline   Reply With Quote
Old 2023-06-16, 00:10   #146
RichD
 
RichD's Avatar
 
Sep 2008
Kansas

401710 Posts
Default

Quote:
Originally Posted by frmky View Post
Lambda Labs now has H100's available for $2.40/hr. It will also solve a 29M matrix for about $30 without the complexity of OpenMPI or shutdowns.
Visited the above website and it now says:
On-Demand Cloud: Spin up on-demand GPUs billed by the hour. H100 instances starting at $1.99/hr.
Cloud Clusters: Reserve thousands of NVIDIA H100s with 3200 Gbps Infiniband. Starting at $1.89/hr.
RichD is offline   Reply With Quote
Old 2023-06-16, 02:13   #147
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

1010011110012 Posts
Default

I see that too, but when I logged it and tried to launch a H100 instance, they don't seem to be available right now. However, they did have a 40GB PCIe A100 for $1.10/hr. That's a much better deal for GPU LA. That should solve a 29M matrix for a bit under $20.

Edit: Wow, they even have 8x A100 SXM4 40GB for $8.80/hr. That'll solve a ~160M matrix (SNFS difficulty around 335 digits) for about $750.

Last fiddled with by frmky on 2023-06-16 at 03:02
frmky is offline   Reply With Quote
Old 2023-08-19, 07:35   #148
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
San Diego, Calif.

2·3·1,733 Posts
Question

I decided to try Lambda; they have very limited user options. (AND you have to wait for any A100 or H100 instance to become available for hours if not days.)
You get a system of any color as long as it is black
(in their case "Driver Version: 525.85.12 CUDA Version: 12.0").
msieve-lacuda-nfsathome-cuda11.5 branch doesn't build in cub/ --
Code:
cd cub && make WIN=0 WIN64=0 VBITS=256 sm=800 && cd ..
make[1]: Entering directory '/home/ubuntu/G/msieve_nfsathome/cub'
"/usr/bin/nvcc" -gencode=arch=compute_80,code=\"sm_80,compute_80\" -DSM800 -o sort_engine.so sort_engine.cu -Xptxas -v -Xcudafe -# -shared -Xcompiler -ffloat-store -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -I. -I"/usr/bin/..//include" -O3 -DTHRUST_IGNORE_CUB_VERSION_CHECK
/usr/include/cub/detail/device_synchronize.cuh(27): error: expected a ";"

/usr/include/cub/detail/device_synchronize.cuh(34): error: this pragma must immediately precede a declaration

/usr/include/cub/detail/device_synchronize.cuh(66): error: expected a declaration

/usr/include/thrust/system/cuda/detail/util.h(61): error: expected a declaration

/usr/include/thrust/system/cuda/detail/util.h(149): error: expected a ";"

/usr/include/thrust/system/cuda/detail/util.h(151): error: expected a declaration

/usr/include/thrust/system/cuda/detail/util.h(182): error: variable "cuda_cub" has already been defined
Did they change cub sematics again in 12.0?

...downgrading to 11.5...
...compiles.
Batalov is offline   Reply With Quote
Old 2023-08-19, 16:14   #149
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

51718 Posts
Default

Quote:
Originally Posted by Batalov View Post
msieve-lacuda-nfsathome-cuda11.5 branch doesn't build in cub/ --
When I tried Lambda a couple of months ago there was no wait for A100's, but they have been aggressively marketing their H100's since. There's probably some spillover on their A100's.

It should compile on CUDA 12. Is that a fresh clone of msieve-lacuda-nfsathome from github? If so, try changing line 91 of cub/Makefile from
Code:
INC = -I"$(CUDA_ROOT)/include" -I.
to
Code:
INC = -I. -I"$(CUDA_ROOT)/include"
Edit: And yes, they keep mucking around with CUB lately.

Last fiddled with by frmky on 2023-08-19 at 16:15
frmky is offline   Reply With Quote
Old 2023-08-19, 20:31   #150
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
San Diego, Calif.

289E16 Posts
Default

I used that cub/Makefile line 91 change, but on that system it didn't change the outcome (similar to above, which means maybe some macros are now refactored so at some place the generated code loses ';' and then everything barfs out).
Perhaps it has to do with their OS choice - they use "Ubuntu 20"-based linux. Maybe kernel code is not easily pluggable on Ubuntu, or maybe a needed to get what on RHEL goes with "sudo yum -y install kernel-devel-`uname -r`"

On AWS I always choose Amazon's AMIs (RHEL-based, and all dialects that go with it). I also did everything from scratch previously (default AMI, then install CUDA from nvidia's procedures). Even more previously I built all in SLES-based instances (because I'd used to work in SLES-flavored environments for decades), but now juggled SLES RHEL, and now (for Lambda's use) Ubuntu. (zypper vs yum vs apt-get and keep memorized package lists and names). Doable. On AWS, I now get a "Deep Learning"-flavored AMI to not waste time on that low-level stuff.

This time it was too late into the night, so I tried to install cuda-11.5 but then it conflicted with the system driver (and so I had to cleanly nuke it). Indeed while I installed cuda-11.5 it swapped kernel POST modules so with a bit of hesitation I ran 'sudo reboot', and nicely Lamdba doesn't take away the node while rebooting. Then everything compiled. Some next time I will try to hack into a node with cuda-12.0 some more.

Lambda has even less available attached storage than nodes, so I cross my fingers now and expect the node to go on for ~3.5 days (that's what I ended up needing for the 45M matrix). On AWS I use a mounted drive to keep state. On Lambda? Nothing. I can scp the .chk files out somewhere, but for now decided to trust them to keep the node mine.
The node is decent: AMD EPYC 7J13 64-Core Processor (lscpu: 30 cores, so probably virtualized), tons of RAM (200Gb), and a A100-SXM with 40GB. For $1.10/hr.

P.S. The tropical storm is coming so both Gas&Electric and internet provider already robo-texted me that they might have outages. But I nohup'd and disown'd the process so maybe it will fly solo even if my shell will disconnect for a few hours. We'll see!
Batalov is offline   Reply With Quote
Old 2023-08-19, 20:40   #151
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
San Diego, Calif.

2·3·1,733 Posts
Default

On the linear algebra separate topic (not to lump with Lambda stuff), maybe it will be useful for someone:

I made a few tests on the 45M matrix that I have for 6,505- c202 (good size for this project; no need to oversieve) and it slightly doesn't fit into the 40G A100 card. So I groked a simple mnemonic rule:
"if your matrix doesn't fit (even at VBITS=64) and we resort to using use_managed=1, then go as high as possible in VBITS."
My ETAs were
  • 124 hrs for VBITS=256 & use_managed=1 and
  • 93 hrs for VBITS=512 & use_managed=1 <- now running.
Code:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:06:00.0 Off |                    0 |
| N/A   50C    P0   273W / 400W |  40534MiB / 40536MiB |     97%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1293      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      5297      C   ./msieve                        21815MiB |
+-----------------------------------------------------------------------------+
Batalov is offline   Reply With Quote
Old 2023-08-29, 08:29   #152
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

5·359 Posts
Default

Suggestions for those who would like to tinker with their GPUs for power efficiency: Do not use nvidia-smi -pl xxx if your card supports nvidia-smi -lgc 0,xxxx --mode=1 instead! The former will lower both core and memory clocks, the latter only the core clocks. With this, I was able to reduce the power consumption to 55-60 % while increasing the LA time only by less than 5 %.

This of course makes less sense when using the cloud…
kruoli is online now   Reply With Quote
Old 2023-09-13, 17:12   #153
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

5×359 Posts
Default

Since I started doing GPU-LA, I always get
Code:
The call to cuIpcCloseMemHandle failed. This is a warning and the program
will continue to run.
  cuIpcCloseMemHandle return value:   201
  address: 0x7feab4000000
at the end when LA is basically done. The square root then runs normally. A reboot did not help.

It is obvious that this is not a high-priority issues, but I thought I should mention it nontheless.
kruoli is online now   Reply With Quote
Old 2023-09-13, 17:39   #154
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

7·383 Posts
Default

Quote:
Originally Posted by kruoli View Post
Code:
The call to cuIpcCloseMemHandle failed. This is a warning and the program
will continue to run.
That's an old OpenMPI bug in MPI_Finalize, but you can ignore it. The calculation is done at that point.
frmky is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Resume linear algebra Timic Msieve 35 2020-10-05 23:08
use msieve linear algebra after CADO-NFS filtering aein Msieve 2 2017-10-05 01:52
Has anyone tried linear algebra on a Threadripper yet? fivemack Hardware 3 2017-10-03 03:11
Linear algebra at 600% CRGreathouse Msieve 8 2009-08-05 07:25
Linear algebra proof Damian Math 8 2007-02-12 22:25

All times are UTC. The time now is 11:21.


Sat Sep 23 11:21:29 UTC 2023 up 10 days, 9:03, 0 users, load averages: 0.89, 1.11, 1.21

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔