![]() |
|
|
#4830 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
262716 Posts |
|
|
|
|
|
|
#4831 |
|
Romulan Interpreter
Jun 2011
Thailand
2·3·1,609 Posts |
Hey Chris, can you spin a "colab toy" that launches two copies of mfaktc when a K80 is detected? I am pretty sure we are only using half of it on colab. Of course, this has yet to be tested, but it seems that for P4 and P100 we get about 95%-110% of the theoretical performance (probably explainable by the fact that their clocks are not standard), while for T4 and K80 we get less. One colab T4 only gives us about 65% of the theoretical performance, while K80 is capped at about 45%. This also matches James' tables (well.. somehow). I don't know the issue with T4 (it may be indeed running underclocked in colab's servers, or something else may have taking place which we don't know) but for K80 one explanation may be the "dual chip". So, I assume we only use half of it (or only half is made available by colab?). Could you try to play with it? Two folders, "-d 0", "-d 1", whatever (I don't know how that goes under linux). Two instances to try would be interesting. In the worst case, we can get half of the speed in each instance, and we learn that more is not possible... But in the best case we may get few percents of GHzDays/Day more (up to 100% in the best case).
Last fiddled with by LaurV on 2020-03-25 at 10:29 |
|
|
|
|
|
#4832 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
Quote:
K80 before background process launch, during Colab script startup: Code:
Mon Feb 17 14:56:29 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.48.02 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 | | N/A 68C P8 33W / 149W | 0MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ Code:
+-----------------------------------------------------------------------------+ Sun Mar 15 19:04:07 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.59 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 | | N/A 50C P0 69W / 149W | 69MiB / 11441MiB | 0% Default | +-------------------------------+----------------------+----------------------+ Code:
Fri Mar 13 08:51:49 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.59 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P4 Off | 00000000:00:04.0 Off | 0 | | N/A 38C P0 39W / 75W | 1111MiB / 7611MiB | 67% Default | +-------------------------------+----------------------+----------------------+ Code:
Mon Mar 23 16:57:09 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.64.00 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 | | N/A 40C P0 148W / 250W | 16183MiB / 16280MiB | 99% Default | +-------------------------------+----------------------+----------------------+ Code:
Sat Feb 29 19:47:48 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.48.02 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 48C P0 64W / 70W | 2517MiB / 15079MiB | 100% Default | +-------------------------------+----------------------+----------------------+ edit: on a different account, that had already been running with sleep 18 seconds, found one instance of this for gpuowl P-1 at ~97M. Judging by the memory usage that is P-1 stage 2: Code:
Tue Mar 10 13:43:36 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.59 Driver Version: 418.67 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 | | N/A 42C P0 147W / 149W | 11406MiB / 11441MiB | 100% Default | +-------------------------------+----------------------+----------------------+ Last fiddled with by kriesel on 2020-03-25 at 14:20 |
|
|
|
|
|
|
#4833 |
|
Jan 2020
3910 Posts |
I haven't gotten GPU in a few days. I've had to use TPU. That connects pretty much right away every time.
|
|
|
|
|
|
#4834 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
9,767 Posts |
Quote:
I've found that Colab seems to settle on this kind of allotment within a day or two. Interestingly, each front end is given a GPU at approximately the same time of the day for each individual (Gmail) account. Further, I've found that when I'm given a GPU, if I get a K80 or a P4 I can do a "Factory Reset" and after two to five attempts, I will be given a T4 or a P100. |
|
|
|
|
|
|
#4835 |
|
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
2·4,909 Posts |
For the GPU72 implementation I keep getting sessions that want to do P-1 on the same exponent at the same time. I noticed this several times. Today were working on 100982867 at the same time.
|
|
|
|
|
|
#4836 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
9,767 Posts |
Hmmm... This should really be over on the GPU72 Status thread, but...
I see this candidate was assigned to you at 19:15 and then again at 21:06 (UTC). The checkpoint file issued was the same for both. Did the first run actually run for more than ten minutes? I see the second run only lasted for about 22 minutes. Please let me know. If it did actually run, the only explanation would be that the "apt install cron" didn't "take", and thus the "cpoints.pl" script wasn't being launched by the crontab entry. And, of course, I've never seen this before myself... Has anyone else noticed this kind of behavior? The code hasn't changed for a couple of weeks (not that that necessarily means it's entirely sane). |
|
|
|
|
|
#4837 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
65358 Posts |
I haven't, but LaurV reported something similar 10 days ago:
|
|
|
|
|
|
#4838 | ||
|
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
2·4,909 Posts |
Quote:
Quote:
Code:
20200327_221830 ( 3:11): 100982867 P-1 77 65.02% Stage: 2 complete. Time: 411.603 sec. |
||
|
|
|
|
|
#4839 |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
22·3·17·23 Posts |
|
|
|
|
|
|
#4840 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
9,767 Posts |
Quote:
I'll look at making this more resilient. Perhaps have the Checkpointer script also be launched from the CPU Payload script, as well as collect some debugging information as to weither the apt install actually works. Interesting... This is new(ish) behaviour. And/or an extremely rare edge-case. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Status | Primeinator | Operation Billion Digits | 5 | 2011-12-06 02:35 |
| 62 bit status | 1997rj7 | Lone Mersenne Hunters | 27 | 2008-09-29 13:52 |
| OBD Status | Uncwilly | Operation Billion Digits | 22 | 2005-10-25 14:05 |
| 1-2M LLR status | paulunderwood | 3*2^n-1 Search | 2 | 2005-03-13 17:03 |
| Status of 26.0M - 26.5M | 1997rj7 | Lone Mersenne Hunters | 25 | 2004-06-18 16:46 |