![]() |
|
|
#716 | |
|
Feb 2005
Colorado
29216 Posts |
Quote:
63 is a Haswell Xeon. 79 is a Broadwell Xeon. 85 is a Skylake Xeon, which supports AXV-512. |
|
|
|
|
|
|
#717 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
I routinely run mprime (for Mersenne prime PRP testing) and (gpuowl or mfaktc) on the same Colab session, with mprime in the background, and also mfaktc. I'd expect a cpu-only "NONE" session on the same cpu type to give slightly better mprime performance since the cpu would not also be lightly serving the gpu-centric application as it does when a gpu is also in use. That doesn't make a gpu a cpu or vice versa. !lscpu shows cpu characteristics; !nvidia-smi shows gpu characteristics including model. Then if they are ok, I log on to Google drive for a session. Each starts, and if it's an mfaktc run, it's also put in the background, and top -d 120 is run in foreground to show signs of life and later how long the session lasted. https://www.mersenneforum.org/showpo...73&postcount=8 Last fiddled with by kriesel on 2019-12-17 at 23:39 |
|
|
|
|
|
|
#718 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
543710 Posts |
Next time someone gets a T4 on Google Colab, please run and submit benchmarks for TF and LL.
https://www.mersenne.ca/mfaktc.php https://www.mersenne.ca/cudalucas.php |
|
|
|
|
|
#719 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
100110001101102 Posts |
Quote:
Also, while I'm typing... Based on Wayne's comment, I spun up a Kaggle TF instance again (on a "disposable" account). After 19 hours across three runs, still working. Weird! |
|
|
|
|
|
|
#720 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
230668 Posts |
|
|
|
|
|
|
#721 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
|
|
|
|
|
|
#722 |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
3×5×313 Posts |
|
|
|
|
|
|
#723 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
543710 Posts |
Speaking of luck, the availability of K80 that I'm seeing in Colab is low again.
So I'm switching paradigms. Make a folder each for working with whatever gpu model comes my way. Launch a session. See what gpu I get. Launch the matching script for the relevant folder and benchmarking work. Last fiddled with by kriesel on 2019-12-19 at 22:00 |
|
|
|
|
|
#724 |
|
"Dylan"
Mar 2017
10010001002 Posts |
I was working on doing some P-1 on some Mersennes with no stage 2 done in the hopes of finding some factors. The script starts up ok, and it starts working on the exponent, but after a while it is unable to write to the Drive:
Code:
Best time for fft = 1568K, time: 0.0818, t1 = 256, t2 = 32, t3 = 64
Using threads: norm1 256, mult 128, norm2 128.
Using up to 15912M GPU memory.
Selected B1=525000, B2=12731250, 4.7% chance of finding a factor
Using B1 = 525000 from savefile.
Continuing stage 1 from a partial result of M28222361 fft length = 1568K, iteration = 60001
Iteration 70000 M28222361, 0x98b803a08faa200c, n = 1568K, CUDAPm1 v0.22 err = 0.07520 (0:06 real, 0.6628 ms/iter, ETA 7:35)
Iteration 80000 M28222361, 0x1f50201cb4e89065, n = 1568K, CUDAPm1 v0.22 err = 0.07031 (0:07 real, 0.6641 ms/iter, ETA 7:30)
Iteration 90000 M28222361, 0x306bf7766242d8b8, n = 1568K, CUDAPm1 v0.22 err = 0.07422 (0:07 real, 0.6585 ms/iter, ETA 7:19)
Couldn't write checkpoint.
Iteration 100000 M28222361, 0xed07659e0434e0e2, n = 1568K, CUDAPm1 v0.22 err = 0.07227 (0:06 real, 0.6621 ms/iter, ETA 7:15)
Couldn't write checkpoint.
Iteration 110000 M28222361, 0x440db9dc0edc5f08, n = 1568K, CUDAPm1 v0.22 err = 0.07422 (0:07 real, 0.6748 ms/iter, ETA 7:17)
Couldn't write checkpoint.
Iteration 120000 M28222361, 0x4e29dd3852cc20cc, n = 1568K, CUDAPm1 v0.22 err = 0.07031 (0:07 real, 0.6743 ms/iter, ETA 7:09)
SIGINT caught, writing checkpoint.
SIGINT caught, writing checkpoint.
Couldn't write checkpoint.
Estimated time spent so far: 1:22
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected
cat: results.txt: Transport endpoint is not connected
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected
rm: cannot remove 'results.txt': Transport endpoint is not connected
PS: I'm using the script I provided on post 158, with the addition of '!apt-get update' and '!apt-get install cuda-cudart-10-0 cuda-cufft-dev-10-0'. |
|
|
|
|
|
#725 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
543710 Posts |
Quote:
I had a different issue in Colab with cudapm1; all zero res64s in a selftest that failed to find a known factor. See https://www.mersenneforum.org/showth...928#post527928 |
|
|
|
|
|
|
#726 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10101001111012 Posts |
Quote:
Code:
2019-12-20 23:29:54 colab/TeslaP100 Exception NSt12experimental10filesystem2v17__cxx1116filesystem_errorE: filesystem error: cannot get current path: Transport endpoint is not connected 2019-12-20 23:29:54 colab/TeslaP100 waiting for background GCDs.. 2019-12-20 23:29:54 colab/TeslaP100 Bye shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected Last fiddled with by kriesel on 2019-12-21 at 01:31 |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Alternatives to Google Colab | kriesel | Cloud Computing | 11 | 2020-01-14 18:45 |
| Notebook | enzocreti | enzocreti | 0 | 2019-02-15 08:20 |
| Computer Diet causes Machine Check Exception -- need heuristics help | Christenson | Hardware | 32 | 2011-12-25 08:17 |
| Computer diet - Need help | garo | Hardware | 41 | 2011-10-06 04:06 |
| Workunit diet ? | dsouza123 | NFSNET Discussion | 5 | 2004-02-27 00:42 |