mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Msieve (https://www.mersenneforum.org/forumdisplay.php?f=83)
-   -   Msieve GPU Linear Algebra (https://www.mersenneforum.org/showthread.php?t=27042)

charybdis 2021-10-26 09:35

What was your general impression of 34-bit vs 33-bit? Will the extra bit allow slightly larger jobs to be run as I'd hoped?

VBCurtis 2021-10-26 15:47

[QUOTE=frmky;591636]2,2174M is in LA, so here's one more data point. Running on [B]eight[/B] NVLink-connected V100's,
It'll take a bit longer due to queue logistics, but hopefully it'll be done within the week.[/QUOTE]

How many relations did you collect? Was the unique ratio better than 2,2174L's? The matrices came out pretty similar in size, so a comparison of relations counts (raw and unique) gives a nice 33 vs 34 data point.

frmky 2021-10-26 18:06

For 2,2174L we sieved from 20M - 6B, and collected 1.36B relations. This gave 734M uniques, so about 46% duplicates.

For 2,2174M we sieved from 20M - 4B, and collected 2.19B relations. This gave 1.29B uniques, so about 41% duplicates. However, we sieved a considerably narrower range of q, and it was overall much faster.

LaurV 2021-10-27 03:14

[offtopic] I changed the thread title. The old one made me [URL="https://en.wikipedia.org/wiki/Romanian_profanity"]nostalgic[/URL] every time someone posted in it... The new title is easier to search too, as the thread contains a lot of useful info... [/offtopic]

frmky 2021-10-31 18:58

[QUOTE=frmky;591636]2,2174M is in LA, so here's one more data point.[/QUOTE]
It's done.
[PASTEBIN]5RGLguge[/PASTEBIN]

EdH 2022-02-20 15:14

I'm contemplating playing with Colab to see if it could be used with smaller matrices. But I wonder if there is really any worth.

If I do everything but LA locally and only upload the necessary files for the matrix work, I'm still looking at a pretty large relations file for anything of value. But, I'm currently looking at more than a day of local CPU LA for ~c170 candidates. If I could knock that down to a few hours, maybe it would be "fun" to try.

The assigned GPUs vary widely as well. My last two experiments (sessions with GPU ECM) yielded a P100 and a K80. I do normally get some longer session times, but it's not guaranteed. Also, I may have only been getting half the card. (I'm still confused on shader/core/sm/etc.

If my source is correct the K80 is only CUDA 3.7. Is this current enough to work?

Would d/ling the checkpoint file at regular intervals be enough to be able to restart a timed out session later?

What else would I need to consider?

Sorry for the questions. Thanks for any help.

An extra question: Since the K80 is only CUDA 3.7 architecture, would it even be worth obtaining one? It seems the current minimum is at 3.5 and I'd hate to have another obsolete card right after getting one.

frmky 2022-02-21 02:38

Yes, it will work on a K80. My updated version requires CC 3.5 or greater.

You don't need to transfer the large relations file. Do this:
1. Complete the filtering and build the matrix locally. You can stop it manually once you see "commencing Lanczos iteration".
2. Transfer the ini, fb, and mat files (and mat.idx if using multiple GPUs with MPI, not covered here) to the GPU node.
3. On the GPU node, start the LA with options like ./msieve -nc2 skip_matbuild=1 -g 0 -v
4. You can interrupt it and restart it with "-ncr -g 0".
5. Once it's complete, transfer the dep file to the local node and run sqrt with -nc3 as usual.

The local and GPU msieve binaries can be compiled with different values for VBITS since the LA is run entirely using the GPU binary. And yes, you just need the chk file in addition to the other files above to restart.

A K80 is a dual GPU card, so without using MPI you will only be using half the card. And each half is only a little bit faster than a K20. It will be slower than a P100 as you would expect.

EdH 2022-02-21 03:29

Thanks frmky! This helps a bunch. I will pursue the Colab session. [strike]I also have a 3.5 card to play with, but it only has 2GB. Not sure if that's enough to even get a small matrix into.[/strike]

I'm off to study. . .

EdH 2022-02-22 01:05

I'm saddened to report that even had I been successful with my Colab experiments, it would still be impractical.

I was able to compile Msieve for two different GPUs, a K80 (3.7) and a T4 (7.5). However, Msieve refused to understand the options although I tried all the variations I could think of in both Python and BASH scripts, with and without single/double quotes around various portions, and in a variety of orders. In all cases, Msieve simply displayed all the available options.

In any case, the impracticality is that for a c160, the msieve.dat.mat file is just short of 2GB. The two tested methods of getting the file loaded into the Colab sessions were via SSH and via Google Drive. SSH took just under two hours. Uploading the file to Google Drive took just under two hours. The first method held the session open without using the GPU for anything, for which Colab complained, while the second allowed the session to start rather quickly (after the two hour upload to Google Drive). But, since a c160 created a 2GB file, I'm expecting larger matrices will just take a much longer time to load into a Colab Session.

I may try again later to get Msieve to process the test case, since at this point I have the needed files in Google Drive, but the practicality is in doubt.

Thank you for the assistance. I will surely put this to use when I finally acquire a usable CUDA GPU. (I'm even eying some K20s ATM.)

EdH 2022-02-22 23:40

[QUOTE=EdH;600475]. . .
I may try again later to get Msieve to process the test case, since at this point I have the needed files in Google Drive, but the practicality is in doubt.

Thank you for the assistance. I will surely put this to use when I finally acquire a usable CUDA GPU. (I'm even eying some K20s ATM.)[/QUOTE]I'm going to claim success!

I got a Colab session to run Msieve LA on a Tesla T4! I didn't let it complete, but the log claims:[code]
Tue Feb 22 22:48:53 2022 linear algebra at 0.0%, ETA 3h44m [/code]The best time I could get for a 40 threaded Xeon was about twice that long.

I was able to compress the .mat file to almost half the size, but it still takes an hour to upload it to Google Drive and a little bit of time to decompress it. (Others may be able to upload a lot faster.)

The actual details are much more complicated than my other sessions, so I need to work quite a bit on them before I can publish them. As to the earlier comments of practicality, I will have to study this further for my use. On one hand, it takes a lot of manual intervention and timely success is not guaranteed. On the other hand, all of this work being done by Colab is letting the local machines perform other work. Perhaps the value can be realized for larger jobs.

I don't seem to be getting the screen output I expected from the [C]-v[/C] option.

Is there a way to redirect the checkpoint file? I couldn't find an option that I thought existed.

Thanks again for all the help.

EdH 2022-02-24 01:24

Sorry if you're tired of these reports, but here's another:

I have a full-fledged Colab session that works through completion of LA. I let a c157 finish today, that I had recently run on my 20c/40t Xeon. The times were nearly identical:[code]Xeon 04:17:41 elapsed time
Colab 04:19:08 elapsed time[/code]I hope to do the same test with a different GPU, to compare.


All times are UTC. The time now is 21:39.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.