![]() |
[QUOTE=bayanne;529689]Give me simple instructions to use them in P-1 or PRP, then I will use them.
It was not me that picked the model of Tesla to use :)[/QUOTE]I recommend gpuowl's entry in [URL]https://www.mersenneforum.org/showthread.php?t=24839[/URL]. I haven't gotten around to CUDALucas yet, or to figuring out what went wrong and how to address the CUDAPm1 selftest failure. I won't claim the instructions are simple, nor irreducible, but I think they are sufficient for gpuowl or very close. |
[QUOTE=axn;529690]For LL test:
1) Build cudalucas from source or use someone's prebuilt executable. Source available at [url]https://sourceforge.net/p/cudalucas/code/HEAD/tree/trunk/[/url] Change makefile to use [C]--generate-code arch=compute_60,code=sm_60[/C] (instead of 35) 2) Run cufftbench and threadbench 3) Create a worktodo with a manual assignment from mersenne.org 4) ???? 5) Profit I'm assuming you know how to use your google drive to host the files?[/QUOTE] I won't recommend running CUDALucas for these powerful GPUs as they are significantly slower than GPUOWL. For some reason CUDALucas need more bandwidth per iteration, hence despite the OpenCL overhead it's a lot faster. |
Tried running CUDALucas on a P100 i got - got the message "*** buffer overflow detected ***: /content/drive/My Drive/cudalucas/c.exe terminated"
Is gpuowl really faster? I'm surprised, since I've always assumed Nvidia's opencl was pretty "meh". EDIT: Finished a P1 assignment in 32min - 92M, 5M FFT with P100. |
Except you cannot do LL DC with gpuowl.
I wish he would open up for LL again, just put a limit of 85M exponent on it? But maybe the LL code is long gone from gpuowl. Feel free to use the CUDALucas I compiled on Kaggle: [URL="http://hoegge.dk/mersenne/kaggle/cudalucas.tar.gz"]cudalucas.tar.gz[/URL] Here is one compiled on Google Colab: [url]https://mersenneforum.org/showpost.php?p=527191&postcount=178[/url] |
1 Attachment(s)
[QUOTE=bayanne;529689]Give me simple instructions to use them in P-1 or PRP, then I will use them.[/QUOTE]
Drop this gpuowl executable I compiled in your google drive (I am sure this will work straight away since I use this executable for kaggle), create a folder called gpuowl and put the executable in that folder. Then, created a worktodo.txt file on your computer, dumb some PRP works on there (preferably more than 3 so it can run for a while). Upload that worktodo.txt file to the same folder as gpuowl in google drive. Now, headover to colab, and create a new file. run !nvidia-smi to check what GPU you have, then put in the following code block to mount your google drive. [CODE]from google.colab import drive drive.mount('/content/drive')[/CODE] Follow the step it prompted and then your google drive will be mounted. Here's the configurations I use to run gpuowl: [CODE]!chmod 777 '/content/drive/My Drive/gpuowl' !cd '/content/drive/My Drive/gpuowl' && LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" && chmod 777 gpuowl && chmod 777 worktodo.txt && ./gpuowl -use ORIG_X2 -block 400 -log 160000[/CODE] You can change the log frequency or the GEC blocks by changing the block and log values to your desire. Have fun! |
[QUOTE=ATH;529724]Except you cannot do LL DC with gpuowl.[/QUOTE]Yes you can, but it needs to be v0.6 or earlier. V0.6 has the Jacobi check, which even the latest version of CUDALucas does not. But this may only run for AMD, not NVIDIA. The 4M fft length is adequate for LL DC up to ~77M. [URL]https://www.mersenneforum.org/showpost.php?p=489083&postcount=7[/URL]
|
Varying timings
In mprime 29.8 on Colab, in successive 12 hour runs, on the same exponent in progress (87092557 first PRP test), I see different ms/iter timings, presumably from running on different cpu models in different sessions. This of course makes the ETA fluctuate.
In order of first appearance, approx (eye-averaged) ms/iter: 33 (FMA3) 34 (FMA3) 24 (AVX512) 29 (FMA3) 31 (FMA3) 30 (FMA3) 55 (FMA3) 54 (FMA3) 51 (FMA3) 56 (FMA3) There is also fluctuation of up to 10%+ within a single session. The jump to 50+ ms/iter has the unfortunate effect of the ETA being more days away now than it was 3 weeks ago. |
If you see any 5x timings, then kill the session and reconnect. Hopefully you'll get a better one.
Also, for the FMA3 runs, you might get better timings by enabling Hyperthreaded LL. |
Just got assigned a P100 :big grin: seems like a 2080 or even a 2070 can beat that ... am I missing something?
[CODE]Beginning GPU Trial Factoring Environment Bootstrapping... Please see https://www.gpu72.com/ for additional details. 20191107_031234: GPU72 TF V0.32 Bootstrap starting... 20191107_031234: Working as "ef52b79ffb10661e4ecc7da049088e55"... 20191107_031234: Installing needed packages (1/3) 20191107_031243: Installing needed packages (2/3) 20191107_031253: Installing needed packages (3/3) 20191107_031324: Fetching initial work... 20191107_031325: Running GPU type Tesla P100-PCIE-16GB 20191107_031325: running a simple selftest... 20191107_031336: Selftest statistics 20191107_031336: number of tests 107 20191107_031336: successfull tests 107 20191107_031336: selftest PASSED! 20191107_031336: Starting trial factoring M95411807 from 2^75 to 2^76 (80.20 GHz-days) 20191107_031336: Exponent TF Level % Done ETA GHzD/D Itr Time | Class #, Seq # | #FCs | SieveRate | SieveP | Uptime 20191107_031350: 95411807 75 to 76 0.1% 1h43m 1116.31 6.466s | 0/4620, 1/960 | 42.85G | 6627.4M/s | 82485 | 0:02 20191107_031454: 95411807 75 to 76 1.4% 1h41m 1121.51 6.436s | 60/4620, 13/960 | 42.85G | 6658.2M/s | 82485 | 0:03[/CODE] |
[QUOTE=dcheuk;529861]Just got assigned a P100 :big grin: seems like a 2080 or even a 2070 can beat that ... am I missing something?[/QUOTE]
Nope. See [url]https://www.mersenne.ca/mfaktc.php?sort=ghdpd&noA=1[/url] That's why the last 10 posts says to run LL instead of TF on these puppies. :wink: |
[QUOTE=dcheuk;529861]Just got assigned a P100 :big grin: seems like a 2080 or even a 2070 can beat that ... am I missing something?
[CODE]Beginning GPU Trial Factoring Environment Bootstrapping... Please see https://www.gpu72.com/ for additional details. 20191107_031234: GPU72 TF V0.32 Bootstrap starting... 20191107_031234: Working as "ef52b79ffb10661e4ecc7da049088e55"... 20191107_031234: Installing needed packages (1/3) 20191107_031243: Installing needed packages (2/3) 20191107_031253: Installing needed packages (3/3) 20191107_031324: Fetching initial work... 20191107_031325: Running GPU type Tesla P100-PCIE-16GB 20191107_031325: running a simple selftest... 20191107_031336: Selftest statistics 20191107_031336: number of tests 107 20191107_031336: successfull tests 107 20191107_031336: selftest PASSED! 20191107_031336: Starting trial factoring M95411807 from 2^75 to 2^76 (80.20 GHz-days) 20191107_031336: Exponent TF Level % Done ETA GHzD/D Itr Time | Class #, Seq # | #FCs | SieveRate | SieveP | Uptime 20191107_031350: 95411807 75 to 76 0.1% 1h43m 1116.31 6.466s | 0/4620, 1/960 | 42.85G | 6627.4M/s | 82485 | 0:02 20191107_031454: 95411807 75 to 76 1.4% 1h41m 1121.51 6.436s | 60/4620, 13/960 | 42.85G | 6658.2M/s | 82485 | 0:03[/CODE][/QUOTE] Flipping back the pages and just saw this [QUOTE=axn;529686]P100s and K80s are wasted in TF. They are much better suited to LL. :two cents: Incidentally, a P100 can complete a 50m DC in about 12 hrs![/QUOTE] :groan: |
| All times are UTC. The time now is 23:03. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.