mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
Thread Tools
Old 2019-11-05, 15:40   #507
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by bayanne View Post
Give me simple instructions to use them in P-1 or PRP, then I will use them.

It was not me that picked the model of Tesla to use :)
I recommend gpuowl's entry in https://www.mersenneforum.org/showthread.php?t=24839. I haven't gotten around to CUDALucas yet, or to figuring out what went wrong and how to address the CUDAPm1 selftest failure. I won't claim the instructions are simple, nor irreducible, but I think they are sufficient for gpuowl or very close.

Last fiddled with by kriesel on 2019-11-05 at 15:40
kriesel is offline   Reply With Quote
Old 2019-11-05, 16:06   #508
xx005fs
 
"Eric"
Jan 2018
USA

22×53 Posts
Default

Quote:
Originally Posted by axn View Post
For LL test:

1) Build cudalucas from source or use someone's prebuilt executable.
Source available at https://sourceforge.net/p/cudalucas/...AD/tree/trunk/
Change makefile to use --generate-code arch=compute_60,code=sm_60 (instead of 35)
2) Run cufftbench and threadbench
3) Create a worktodo with a manual assignment from mersenne.org
4) ????
5) Profit

I'm assuming you know how to use your google drive to host the files?
I won't recommend running CUDALucas for these powerful GPUs as they are significantly slower than GPUOWL. For some reason CUDALucas need more bandwidth per iteration, hence despite the OpenCL overhead it's a lot faster.
xx005fs is offline   Reply With Quote
Old 2019-11-05, 16:15   #509
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

216810 Posts
Default

Tried running CUDALucas on a P100 i got - got the message "*** buffer overflow detected ***: /content/drive/My Drive/cudalucas/c.exe terminated"

Is gpuowl really faster? I'm surprised, since I've always assumed Nvidia's opencl was pretty "meh".

EDIT: Finished a P1 assignment in 32min - 92M, 5M FFT with P100.

Last fiddled with by kracker on 2019-11-05 at 16:38
kracker is offline   Reply With Quote
Old 2019-11-05, 16:29   #510
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

61268 Posts
Default

Except you cannot do LL DC with gpuowl.

I wish he would open up for LL again, just put a limit of 85M exponent on it?
But maybe the LL code is long gone from gpuowl.

Feel free to use the CUDALucas I compiled on Kaggle:
cudalucas.tar.gz

Here is one compiled on Google Colab:
https://mersenneforum.org/showpost.p...&postcount=178

Last fiddled with by ATH on 2019-11-05 at 16:33
ATH is offline   Reply With Quote
Old 2019-11-05, 16:49   #511
xx005fs
 
"Eric"
Jan 2018
USA

22×53 Posts
Default

Quote:
Originally Posted by bayanne View Post
Give me simple instructions to use them in P-1 or PRP, then I will use them.
Drop this gpuowl executable I compiled in your google drive (I am sure this will work straight away since I use this executable for kaggle), create a folder called gpuowl and put the executable in that folder. Then, created a worktodo.txt file on your computer, dumb some PRP works on there (preferably more than 3 so it can run for a while). Upload that worktodo.txt file to the same folder as gpuowl in google drive.

Now, headover to colab, and create a new file. run !nvidia-smi to check what GPU you have, then put in the following code block to mount your google drive.
Code:
from google.colab import drive
drive.mount('/content/drive')
Follow the step it prompted and then your google drive will be mounted.

Here's the configurations I use to run gpuowl:
Code:
!chmod 777 '/content/drive/My Drive/gpuowl'
!cd '/content/drive/My Drive/gpuowl' && LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" && chmod 777 gpuowl && chmod 777 worktodo.txt && ./gpuowl -use ORIG_X2 -block 400 -log 160000
You can change the log frequency or the GEC blocks by changing the block and log values to your desire. Have fun!
Attached Files
File Type: zip gpuowl.zip (243.0 KB, 57 views)

Last fiddled with by xx005fs on 2019-11-05 at 16:50
xx005fs is offline   Reply With Quote
Old 2019-11-05, 21:52   #512
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by ATH View Post
Except you cannot do LL DC with gpuowl.
Yes you can, but it needs to be v0.6 or earlier. V0.6 has the Jacobi check, which even the latest version of CUDALucas does not. But this may only run for AMD, not NVIDIA. The 4M fft length is adequate for LL DC up to ~77M. https://www.mersenneforum.org/showpo...83&postcount=7
kriesel is offline   Reply With Quote
Old 2019-11-06, 14:43   #513
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

153D16 Posts
Default Varying timings

In mprime 29.8 on Colab, in successive 12 hour runs, on the same exponent in progress (87092557 first PRP test), I see different ms/iter timings, presumably from running on different cpu models in different sessions. This of course makes the ETA fluctuate.
In order of first appearance, approx (eye-averaged) ms/iter:
33 (FMA3)
34 (FMA3)
24 (AVX512)
29 (FMA3)
31 (FMA3)
30 (FMA3)
55 (FMA3)
54 (FMA3)
51 (FMA3)
56 (FMA3)
There is also fluctuation of up to 10%+ within a single session.
The jump to 50+ ms/iter has the unfortunate effect of the ETA being more days away now than it was 3 weeks ago.
kriesel is offline   Reply With Quote
Old 2019-11-07, 02:33   #514
axn
 
axn's Avatar
 
Jun 2003

117358 Posts
Default

If you see any 5x timings, then kill the session and reconnect. Hopefully you'll get a better one.
Also, for the FMA3 runs, you might get better timings by enabling Hyperthreaded LL.
axn is offline   Reply With Quote
Old 2019-11-07, 03:16   #515
dcheuk
 
dcheuk's Avatar
 
Jan 2019
Tallahassee, FL

F316 Posts
Default

Just got assigned a P100 seems like a 2080 or even a 2070 can beat that ... am I missing something?

Code:
Beginning GPU Trial Factoring Environment Bootstrapping...
Please see https://www.gpu72.com/ for additional details.

20191107_031234: GPU72 TF V0.32 Bootstrap starting...
20191107_031234: Working as "ef52b79ffb10661e4ecc7da049088e55"...

20191107_031234: Installing needed packages (1/3)
20191107_031243: Installing needed packages (2/3)
20191107_031253: Installing needed packages (3/3)
20191107_031324: Fetching initial work...
20191107_031325: Running GPU type Tesla P100-PCIE-16GB

20191107_031325: running a simple selftest...
20191107_031336: Selftest statistics
20191107_031336:   number of tests           107
20191107_031336:   successfull tests         107
20191107_031336: selftest PASSED!
20191107_031336: Starting trial factoring M95411807 from 2^75 to 2^76 (80.20 GHz-days)

20191107_031336: Exponent  TF Level  % Done     ETA   GHzD/D  Itr Time |   Class #,   Seq # |    #FCs | SieveRate |  SieveP | Uptime
20191107_031350: 95411807  75 to 76    0.1%   1h43m  1116.31    6.466s |    0/4620,   1/960 |  42.85G | 6627.4M/s |   82485 |   0:02
20191107_031454: 95411807  75 to 76    1.4%   1h41m  1121.51    6.436s |   60/4620,  13/960 |  42.85G | 6658.2M/s |   82485 |   0:03
dcheuk is offline   Reply With Quote
Old 2019-11-07, 03:19   #516
axn
 
axn's Avatar
 
Jun 2003

32×5×113 Posts
Default

Quote:
Originally Posted by dcheuk View Post
Just got assigned a P100 seems like a 2080 or even a 2070 can beat that ... am I missing something?
Nope. See https://www.mersenne.ca/mfaktc.php?sort=ghdpd&noA=1

That's why the last 10 posts says to run LL instead of TF on these puppies.
axn is offline   Reply With Quote
Old 2019-11-07, 03:19   #517
dcheuk
 
dcheuk's Avatar
 
Jan 2019
Tallahassee, FL

24310 Posts
Default

Quote:
Originally Posted by dcheuk View Post
Just got assigned a P100 seems like a 2080 or even a 2070 can beat that ... am I missing something?

Code:
Beginning GPU Trial Factoring Environment Bootstrapping...
Please see https://www.gpu72.com/ for additional details.

20191107_031234: GPU72 TF V0.32 Bootstrap starting...
20191107_031234: Working as "ef52b79ffb10661e4ecc7da049088e55"...

20191107_031234: Installing needed packages (1/3)
20191107_031243: Installing needed packages (2/3)
20191107_031253: Installing needed packages (3/3)
20191107_031324: Fetching initial work...
20191107_031325: Running GPU type Tesla P100-PCIE-16GB

20191107_031325: running a simple selftest...
20191107_031336: Selftest statistics
20191107_031336:   number of tests           107
20191107_031336:   successfull tests         107
20191107_031336: selftest PASSED!
20191107_031336: Starting trial factoring M95411807 from 2^75 to 2^76 (80.20 GHz-days)

20191107_031336: Exponent  TF Level  % Done     ETA   GHzD/D  Itr Time |   Class #,   Seq # |    #FCs | SieveRate |  SieveP | Uptime
20191107_031350: 95411807  75 to 76    0.1%   1h43m  1116.31    6.466s |    0/4620,   1/960 |  42.85G | 6627.4M/s |   82485 |   0:02
20191107_031454: 95411807  75 to 76    1.4%   1h41m  1121.51    6.436s |   60/4620,  13/960 |  42.85G | 6658.2M/s |   82485 |   0:03
Flipping back the pages and just saw this

Quote:
Originally Posted by axn View Post
P100s and K80s are wasted in TF. They are much better suited to LL.

Incidentally, a P100 can complete a 50m DC in about 12 hrs!
dcheuk is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Alternatives to Google Colab kriesel Cloud Computing 11 2020-01-14 18:45
Notebook enzocreti enzocreti 0 2019-02-15 08:20
Computer Diet causes Machine Check Exception -- need heuristics help Christenson Hardware 32 2011-12-25 08:17
Computer diet - Need help garo Hardware 41 2011-10-06 04:06
Workunit diet ? dsouza123 NFSNET Discussion 5 2004-02-27 00:42

All times are UTC. The time now is 05:45.


Fri Aug 6 05:45:44 UTC 2021 up 14 days, 14 mins, 1 user, load averages: 3.30, 3.04, 2.90

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.