![]() |
![]() |
#45 |
If I May
"Chris Halsall"
Sep 2002
Barbados
2×72×113 Posts |
![]()
It never fails...
![]() On the other hand, we can hardly complain about them ***giving*** each of us 1,500 GHzD of free compute every week!!! Also, in some ways, this is comforting. It means they're OK with us doing what we're doing. ![]() |
![]() |
![]() |
![]() |
#46 |
Undefined
"The unspeakable one"
Jun 2006
My evil lair
2·13·257 Posts |
![]() |
![]() |
![]() |
![]() |
#47 |
"Dylan"
Mar 2017
25216 Posts |
![]()
I have been running the Colab script for the GPU72 project, and while it runs well, there should be a way for it not to request more assignments than the time allots. I was thinking something along these lines (in psuedocode):
Code:
* upon running the script, get the current timestamp (in Unix time) and detect what platform we are on (Colab or Kaggle) and call it start * while the timestamp < start + 12 hours (Colab) or 9 hours (Kaggle), fetch an assignment * estimate the amount of time it would take to run the assignment, and add to the current timestamp * if the estimated time of completion would be after the deadline, drop the assignment and break |
![]() |
![]() |
![]() |
#48 | |
If I May
"Chris Halsall"
Sep 2002
Barbados
2·72·113 Posts |
![]() Quote:
The problem with your suggestion is with mfaktc you can't know when a factor will be found, and so it's important to keep a bit of a buffer of work queued. This is more of an issue with Colab than Kaggle, in that with the former if you run out of work and mfaktc stops, the GPU's availability will be wasted. My current methodology is to keep three candidates in the worktodo file. The checkpoint file for the currently being worked candidate is uploaded to the server every two minutes. If a candidate assigned to a Colab / Kaggle instance is more than 12 hours old it is recycled if no checkpoint file was returned. What I'm currently working on is reissuing unfinished candidates back to the same GPU72 worker with the checkpoint file so they can continue, and finish off the work. By the end of the weekend I'll have the UI stuff built on GPU72, to let anyone participate. However, if anyone else would like to give this a whirl, please PM me with your GPU72 account details (UN, Display Name or email). And thanks to the current beta-testers. Lots of great feedback (and factors found!). ![]() |
|
![]() |
![]() |
![]() |
#49 |
May 2011
Orange Park, FL
90610 Posts |
![]()
I was assigned a Tesla T4 (1720 GHzD/D) for my first two 12-hour sessions (it disconnects automatically after that time), but this morning I am running on a much slower K80 (410 GHzD/D)
The checkpoint capability is much more important with this much slower GPU. The estimated time to complete a 69M 74->75 TF is a little over three hours, so there is potential for a loss of three hours computing time. Maybe I should consider stopping the run and reconnecting after three exponents have been processed (assuming no factor found) until the checkpoint code is in place. Last fiddled with by Chuck on 2019-09-13 at 14:22 Reason: Restart |
![]() |
![]() |
![]() |
#50 | ||
If I May
"Chris Halsall"
Sep 2002
Barbados
101011010000102 Posts |
![]() Quote:
![]() Quote:
By EOD today I'll have the code implemented to be able to give back the assignment to complete. Importantly, the previous worker will be given back the assignment. Edit: BTW, you can see the checkpoint file status by looking at your Assignments page. The percentage completed is calculated from the submitted checkpoint files. Last fiddled with by chalsall on 2019-09-13 at 16:47 |
||
![]() |
![]() |
![]() |
#51 |
May 2011
Orange Park, FL
2×3×151 Posts |
![]()
This morning my notebook disconnected in the middle of a run and when I attempted to reconnect I got the message
Code:
Failed to assign a backend No backend with GPU available. Would you like to use a runtime with no accelerator? |
![]() |
![]() |
![]() |
#52 |
May 2011
Orange Park, FL
90610 Posts |
![]()
An hour later I tried reloading the notebook and it is running again with a T4.
Last fiddled with by Chuck on 2019-09-14 at 14:17 Reason: 1 hour |
![]() |
![]() |
![]() |
#53 |
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
23×223 Posts |
![]()
Can I use two accounts from the same IP address?
|
![]() |
![]() |
![]() |
#54 |
"Yves"
Jul 2017
Belgium
83 Posts |
![]() |
![]() |
![]() |
![]() |
#55 |
If I May
"Chris Halsall"
Sep 2002
Barbados
101011010000102 Posts |
![]()
Yup. I'm currently running two different accounts concurrently, from the same IP. And, in fact, from the same browser (different tabs).
I've found the GPU backend availability can vary considerably. Sometimes one account can get a GPU, while the other can't. Sometimes neither can, and sometimes both can. A small sample-set suggests the GPUs are in high demand during "working hours" Eastern time, and then opens up at around 1800 (2200 UTC). |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Alternatives to Google Colab | kriesel | Cloud Computing | 11 | 2020-01-14 18:45 |
Notebook | enzocreti | enzocreti | 0 | 2019-02-15 08:20 |
Computer Diet causes Machine Check Exception -- need heuristics help | Christenson | Hardware | 32 | 2011-12-25 08:17 |
Computer diet - Need help | garo | Hardware | 41 | 2011-10-06 04:06 |
Workunit diet ? | dsouza123 | NFSNET Discussion | 5 | 2004-02-27 00:42 |