mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Msieve GPU Linear Algebra (https://www.mersenneforum.org/showthread.php?t=27042)

VBCurtis 2021-09-17 19:44

I'm gonna hand-wave here, since only a few people have bothered taking data:

When a relation set is right at the cusp of building a matrix, a few more hours sieving will save more than a few hours to solve the matrix on that same machine (meaning CPU in both cases).

At the relation counts most e-small and 15e jobs are processed at, 20 more core-hours of sieving might save 5 or 10 core-hours of matrix work (again, both measured on a CPU). I've done a few experiments at home, and I have yet to find a job where the sieving required to build a matrix at TD=120 saved more CPU time than it cost. I believe this could/would be the case on really big jobs, say with matrices at 50M+ in size.

We have historically sieved more than needed because BOINC computation is cheap, while matrix solving time was in short supply. So, now that GPU matrix solving makes matrices not in short supply, we should sieve less. Something like 5-10% fewer relations, which means 5-10% more jobs done per calendar month.

frmky 2021-09-17 19:47

[QUOTE=Xyzzy;588059]Is the bottleneck server storage space?[/QUOTE]
No. The server is currently using 467G of 3.6T.

frmky 2021-09-18 03:58

For 2,2174L, 1355M relations yielded 734M uniques. With nearly 50% duplicates, we have clearly reached the limit for 16e. Anyway, filtering yielded
[CODE]matrix is 102063424 x 102063602 (51045.3 MB) with weight 14484270868 (141.91/col)[/CODE]
Normally I'd try to bring this down, but testing on a quad V100 system with NVLink gives
[CODE]linear algebra completed 2200905 of 102060161 dimensions (2.2%, ETA 129h 5m)[/CODE]
So more sieving would only save a day or so in LA. I have the cluster time, so I'll let it run.

pinhodecarlos 2021-09-18 07:02

[QUOTE=VBCurtis;588061]

We have historically sieved more than needed because BOINC computation is cheap, while matrix solving time was in short supply. So, now that GPU matrix solving makes matrices not in short supply, we should sieve less. Something like 5-10% fewer relations, which means 5-10% more jobs done per calendar month.[/QUOTE]

Totally agree with you now. And more, when someone says a number is under LA I would recommend (I know Greg!…lol) to cancel all queued wus, this will also speed up next number to sieve. Sievers are wasting a few days (my experience) processing unnecessary work ( I just manually abort them to go to someone else), just be careful to not do this under any challenges since it will interfere with strategic bunkering.

Xyzzy 2021-09-18 13:47

1 Attachment(s)
[QUOTE=Xyzzy;584798]If you are using RHEL 8 (8.4) you can install the proprietary Nvidia driver easily via these directions:

[URL]https://developer.nvidia.com/blog/streamlining-nvidia-driver-deployment-on-rhel-8-with-modularity-streams/[/URL]

Then you will need these packages installed:

[C]gcc
make
cuda-nvcc-10-2
cuda-cudart-dev-10-2-10.2.89-1[/C]

And possibly:

[C]gmp-devel
zlib-devel[/C]

You also have to manually adjust your path variable in [C]~/.bashrc[/C]:

[C]export PATH="/usr/local/cuda-10.2/bin:$PATH"[/C]

:mike:[/QUOTE]Here are simpler instructions.

[CODE]sudo subscription-manager repos --enable=rhel-8-for-x86_64-appstream-rpms
sudo subscription-manager repos --enable=rhel-8-for-x86_64-baseos-rpms
sudo subscription-manager repos --enable=codeready-builder-for-rhel-8-x86_64-rpms
sudo dnf config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
sudo dnf module install nvidia-driver:latest
sudo reboot
sudo dnf install cuda-11-4
echo 'export PATH=/usr/local/cuda-11.4/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64/:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc[/CODE]Then just use the attached archive to set up your work.

:mike:

charybdis 2021-09-18 14:37

[QUOTE=frmky;588086]For 2,2174L, 1355M relations yielded 734M uniques. With nearly 50% duplicates, we have clearly reached the limit for 16e.[/QUOTE]

Or is this just the limit for 16e with 33-bit large primes? I know you've avoided going higher because of the difficulty of the LA and the msieve filtering bug, but now that the bug is fixed and GPUs make the LA much easier, might it be worth going up to 34-bit?

frmky 2021-09-18 15:59

[QUOTE=charybdis;588104]Or is this just the limit for 16e with 33-bit large primes?[/QUOTE]
Does the lasieve5 code work correctly with 34-bit large primes? I know the check is commented out, but I haven't tested it.

charybdis 2021-09-19 00:30

I tested the binary from [URL="https://www.mersenneforum.org/showpost.php?p=470249&postcount=10"]here[/URL] on 2,2174L with 34-bit large primes and it seemed to work fine. Yield was more than double that at 33-bit so definitely looks worth it, as one would expect. There were no issues with setting mfba=99 either.

henryzz 2021-09-19 08:02

I looked through the code a few years ago and found no issues. Lasieve4 is also fine although it is limited to 96 bit mfba/r.

wreck 2021-09-23 11:46

I give a try to receive NFS@Home WU and found lpbr and lpba 34 assignment of 2,2174M.
Here is the polynomial file S2M2174b.poly's contents.

[CODE]
n: 470349924831928271476705309712184283829671891500377511256458133476241008159328553358384317181001385841345904968378352588310952651779460262173005355061503024245423661736289481941107679294474063050602745740433565487767078338816787736757703231764661986524341166060777900926495463269979500293362217153953866146837
skew: 1.22341
c6: 2
c5: 0
c4: 0
c3: 2
c2: 0
c1: 0
c0: 1
Y1: 1
Y0: -3064991081731777716716694054300618367237478244367204352
type: snfs
rlim: 250000000
alim: 250000000
lpbr: 34
lpba: 34
mfbr: 99
mfba: 69
rlambda: 3.6
alambda: 2.6
[/CODE]

When q is near 784M, the memory used is 743MB.

charybdis 2021-09-23 12:46

[QUOTE=wreck;588461][CODE]
lpbr: 34
lpba: 34
mfbr: 99
mfba: 69
[/CODE][/QUOTE]

@frmky, for future reference, when I tested this I found that rational side sieving with *algebraic* 3LP was fastest. This shouldn't be too much of a surprise: the rational norms are larger, but not so much larger that 6 large primes across the two sides should split 4/2 rather than 3/3 (don't forget the special-q is a "free" large prime).


All times are UTC. The time now is 14:33.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.