mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
Thread Tools
Old 2020-01-18, 19:05   #837
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default Good news bad news

Bad news first:
personal new record low (cpu-only) session duration, 8-10 minutes. Probably didn't even save any mprime progress for that session, it was so short.

Good news next:
Tesla P100 gpu available on the restart, so running dual cpu and gpu tasks on the resume
kriesel is online now   Reply With Quote
Old 2020-01-19, 00:35   #838
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009

22×3×163 Posts
Default

I ran an instance after my last post. It assigned a T4 GPU and started at 17:27 UTC, and stopped at 19:11 UTC. I have six results to submit. This particular notebook had not been ran for three days. Things change, I suppose...
storm5510 is offline   Reply With Quote
Old 2020-01-19, 06:54   #839
bayanne
 
bayanne's Avatar
 
"Tony Gott"
Aug 2002
Yell, Shetland, UK

22·83 Posts
Default

Just started notebook up again, and got a Tesla T4.
Let's see how long it lasts for ...

Lasted for 44 minutes, sigh

Last fiddled with by bayanne on 2020-01-19 at 07:34
bayanne is offline   Reply With Quote
Old 2020-01-22, 03:42   #840
Fan Ming
 
Oct 2019

10111112 Posts
Default

Today all my instances got T4 instead of P100, no matter how many times I ended and restarted the session... Don't know what happened...
BTW, attached file is the compiled CUDA 10.1 version of mfaktc for linux, may be useful on colab.
Attached Files
File Type: zip mfaktc-linux-cuda10.1.zip (398.9 KB, 57 views)

Last fiddled with by Fan Ming on 2020-01-22 at 03:43
Fan Ming is offline   Reply With Quote
Old 2020-01-22, 04:45   #841
xx005fs
 
"Eric"
Jan 2018
USA

22·53 Posts
Default

Quote:
Originally Posted by Fan Ming View Post
Today all my instances got T4 instead of P100, no matter how many times I ended and restarted the session... Don't know what happened...
BTW, attached file is the compiled CUDA 10.1 version of mfaktc for linux, may be useful on colab.
Similar situation initially. However, after several instance resets (iirc 3 per account) they all got P100 eventually. Will see how long they will run.
xx005fs is offline   Reply With Quote
Old 2020-01-22, 05:39   #842
Fan Ming
 
Oct 2019

5×19 Posts
Default

Quote:
Originally Posted by xx005fs View Post
Similar situation initially. However, after several instance resets (iirc 3 per account) they all got P100 eventually. Will see how long they will run.
Thanks for the clue. After 10+ times reset finally got P100...
Fan Ming is offline   Reply With Quote
Old 2020-01-22, 06:02   #843
xx005fs
 
"Eric"
Jan 2018
USA

22·53 Posts
Default

It's been running for 90 minutes with 0 issues and none of my accounts have disconnected. The tip to keep them running is when the cell containing the code to execute GPUOWL or MFACTx, the top right will show 2 status bars with RAM and Disk, click the little triangular tab next to it to expand that small menu, then click "Connect to a hosted runtime." After a successful connection, it should have 3 green dots and say "busy" after them, which means that you won't need a manual reconnection after a certain time has passed.
xx005fs is offline   Reply With Quote
Old 2020-01-23, 16:06   #844
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default Recent experiences with Colab and Kaggle.

OK, just a quick update from my personal POV...

I've been running four Colab instances from my main workstation/browser for the last several days. Been getting CPU instances consistently for almost always 12 hours each. No problem relaunching another instance immediately after. As in, have been getting effectively 4 * 24 hours of CPU compute per day (single hyperthreaded core each).

Approximately once a day I'm offered a T4 GPU running at ~1,700 GHzD/D. These survive between 30 and 90 minutes.

Kaggle, on the other hand...

Since several other people have reported that they're doing OK with Kaggle CPUs (and in one GPU72_TF case, a full 30 hours of GPU), I tried launching a mprime CPU instance.

The session was terminated and my account deleted in 30 seconds... Clearly automated, but apparently targeted to the resonlytion of account AND process consuming process.

I'll have to try to beg for my account to be unlocked again; I had hoped to run some CFD jobs on them, but I just couldn't resist running the CPU experiment...
chalsall is offline   Reply With Quote
Old 2020-01-23, 21:34   #845
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

2·4,909 Posts
Default

Any chance of your publicly available notebook (the one that I have run) could use CPU's if no GPU is available. ECM or P-1 or DC's would be good work types.
Uncwilly is offline   Reply With Quote
Old 2020-01-23, 21:49   #846
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
Any chance of your publicly available notebook (the one that I have run) could use CPU's if no GPU is available. ECM or P-1 or DC's would be good work types.
Yup... That's "mapped". Originally it was planned to be a parallel payload for the GPU'ing code, but now it will be ~98% of the throughput.

It will be at least a week or so before I can even begin to tackle this. This biggest issue is the need to handle much larger checkpoint files. Keep in mind also that these things are not that powerful; a reasonable amount of RAM though.
chalsall is offline   Reply With Quote
Old 2020-01-24, 00:48   #847
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

541910 Posts
Default T4 trouble

Just lately I've started getting T4's in Colab. While it's much better at TF, I figured as long as I'm already set up for gpuowl P-1 run time scaling on the other gpu types, have a go at T4 too. Well. First exponent I tried ran into a repeatable issue.
Code:
2020-01-23 05:12:53 colab/TeslaT4 200001247 P1  2828122 100.00%; 13725 us/sq; ETA 0d 00:00; 12d841c25867b11e 
2020-01-23 05:12:54 colab/TeslaT4 P-1 (B1=1960000, B2=54880000, D=30030): primes 3129263, expanded 3307512, doubles 511497 (left 2156833), singles 2106269, total 2617766 (84%) 
2020-01-23 05:12:54 colab/TeslaT4 200001247 P2 using blocks [65 - 1828] to cover 2617766 primes 
2020-01-23 05:12:55 colab/TeslaT4 200001247 P2 using 165 buffers of 88.0 MB each 
2020-01-23 05:15:26 colab/TeslaT4 Exception gpu_error: MEM_OBJECT_ALLOCATION_FAILURE clEnqueueCopyBuffer(queue, src, dst, 0, 0, size, 0, NULL, NULL) at clwrap.cpp:330 copyBuf 
2020-01-23 05:15:26 colab/TeslaT4 Bye 
2020-01-23 08:19:18 config.txt: -user kriesel -cpu colab/TeslaT4 -yield -maxAlloc 16000 
2020-01-23 08:19:18 colab/TeslaT4 config: -use NO_ASM  
2020-01-23 08:19:18 colab/TeslaT4 200001247 FFT 11264K: Width 256x4, Height 64x8, Middle 11; 17.34 bits/word 
2020-01-23 08:19:18 colab/TeslaT4 OpenCL args "-DEXP=200001247u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=11u -DWEIGHT_STEP=0x1.949a0e5c425d6p+0 -DIWEIGHT_STEP=0x1.43f3fe219ac11p-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0" 
2020-01-23 08:19:18 colab/TeslaT4   
2020-01-23 08:19:18 colab/TeslaT4 OpenCL compilation in 5 ms 
2020-01-23 08:19:20 colab/TeslaT4 200001247 P1 B1=1960000, B2=54880000; 2828122 bits; starting at 2828121 
2020-01-23 08:19:21 colab/TeslaT4 200001247 P1  2828122 100.00%; 19136 us/sq; ETA 0d 00:00; 12d841c25867b11e 
2020-01-23 08:19:22 colab/TeslaT4 P-1 (B1=1960000, B2=54880000, D=30030): primes 3129263, expanded 3307512, doubles 511497 (left 2156833), singles 2106269, total 2617766 (84%) 
2020-01-23 08:19:22 colab/TeslaT4 200001247 P2 using blocks [65 - 1828] to cover 2617766 primes 
2020-01-23 08:19:23 colab/TeslaT4 200001247 P2 using 165 buffers of 88.0 MB each 
2020-01-23 08:21:35 colab/TeslaT4 Exception gpu_error: MEM_OBJECT_ALLOCATION_FAILURE clEnqueueCopyBuffer(queue, src, dst, 0, 0, size, 0, NULL, NULL) at clwrap.cpp:330 copyBuf 
2020-01-23 08:21:35 colab/TeslaT4 Bye
I finally got it unstuck by dropping to maxAlloc 15000. But 16000 works fine on other 16GB gpus.
Note that it would crash a gpu session in about 2 minutes.

Last fiddled with by kriesel on 2020-01-24 at 00:48
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Alternatives to Google Colab kriesel Cloud Computing 11 2020-01-14 18:45
Notebook enzocreti enzocreti 0 2019-02-15 08:20
Computer Diet causes Machine Check Exception -- need heuristics help Christenson Hardware 32 2011-12-25 08:17
Computer diet - Need help garo Hardware 41 2011-10-06 04:06
Workunit diet ? dsouza123 NFSNET Discussion 5 2004-02-27 00:42

All times are UTC. The time now is 19:00.


Sun Aug 1 19:00:04 UTC 2021 up 9 days, 13:29, 0 users, load averages: 2.51, 1.67, 1.68

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.