![]() |
![]() |
#1 |
If I May
"Chris Halsall"
Sep 2002
Barbados
2×4,703 Posts |
![]()
As some of you might know, several of us have been having great fun over in the Google Colaboratory Notebook? thread -- learning about Notebook instances...
Since the GPU72 integration is just a small "proof-of-concept" part of the bigger picture, I wanted to fork out further discussions specific to GPU72 here. An update: 1. Everyone is now running version 0.32 of the Bootstrap payload. 1.1. This fixes a SPE with the regular expression (regex) which was (very rarely) extracting the GHzD/D and IntrT values incorrectly. (Always remember: regex patterns are "greedy"!) 2. I have reduced the recycling period from twelve (12) to three (3) hours. 2.1. So long as no-one sees anything odd, I should be able to bring this down to perhaps as low as 30 minutes. 2.2. The temporal delta value is calculated from Last Updated or Assigned, whichever is youngest. 3. I /might/ have an odd race condition going on in the TF'ing instances. I have seen extremely rare cases on some of my own machines where an assignment is given, but then not reported on. Further drill-down shows it doesn't appear in the worktodo files either. 3.1. The system will automatically recover from this if and whenever it happens, but I don't like things happening which I don't understand. It means I've made a mistake somewhere... As always, observations welcomed... ![]() |
![]() |
![]() |
![]() |
#2 |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
22·72·47 Posts |
![]() ![]() |
![]() |
![]() |
![]() |
#3 |
If I May
"Chris Halsall"
Sep 2002
Barbados
2·4,703 Posts |
![]()
Quick update:
The race condition has finally been solved. Man that was stupid! A good example of how difficult debugging can be when you start introducing unusual but not unreasonable temporal harmonics. I have now reduced the recycling period down to one (1) hour. So long as no one reports anything weird, I'll be able to bring this down to twenty (20) minutes. Edit: Oh... And could I please ask that people try to post here for GPU72_TF Notebook stuff. I feel like I've dominated the Google Notebook thread with this; I'd like that to get back to general usage case-studies and examples. Edit2: Perhaps a kind Super Mod could create a new sub-forum? Something like "Notebook Instances"? This subject space is just begging to fork into many, many threads!!! ![]() Last fiddled with by chalsall on 2019-10-10 at 17:07 |
![]() |
![]() |
![]() |
#4 |
May 2011
Orange Park, FL
22·7·31 Posts |
![]()
After a day of Kaggle/Colab sessions, I often have a bunch of leftover checkpoint files from restarts etc.
I have manually transferred these to my local worktodo file and completed and reported the work with mfaktc. However, the Kaggle/Colab checkpoints still show up on my GPUto72 report even after they have been reported to Primenet and are not on my GPUto72 assignments list. This leads to the checkpoint files being picked up by a later Colab run. It restarts work that I have already reported. If there were checkboxes to delete checkpoint files this could be eliminated. |
![]() |
![]() |
![]() |
#5 |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
217748 Posts |
![]()
Why not just let the system finish them next time you get going? That is what I do.
|
![]() |
![]() |
![]() |
#6 |
Romulan Interpreter
Jun 2011
Thailand
2·19·241 Posts |
![]()
Chris, please do NOT expire/recycle the assignments that were already started, for at least 4 or 5 days! One hour is too less.
That is because we have limited account time per week, which we consume in the first 2-3 days, then we must wait until next week, when we want to resume the interrupted assignment, otherwise the work is lost. |
![]() |
![]() |
![]() |
#7 | |
If I May
"Chris Halsall"
Sep 2002
Barbados
100100101111102 Posts |
![]() Quote:
Could you please PM me a couple of candidate examples where this happened? |
|
![]() |
![]() |
![]() |
#8 |
If I May
"Chris Halsall"
Sep 2002
Barbados
24BE16 Posts |
![]()
To be clear... candidates which have had work done will *not* be expired. They are held until another one of *your* instances ask for more work -- however long that may be.
|
![]() |
![]() |
![]() |
#9 |
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
921210 Posts |
![]()
I just had 2 instances of trying to spin up a session and it goes through the bootstrap and then exits.
I will be AFT for ~6 hours, sorry. |
![]() |
![]() |
![]() |
#10 | |
If I May
"Chris Halsall"
Sep 2002
Barbados
2×4,703 Posts |
![]() Quote:
What is happening is the Comms spider is getting work from the server, but the connection is timing out as far as the spider is concerned. I don't understand why, but it's happening on all my instances (including two simulated ones which run 24/7). I'm in the process of applying updates and will be rebooting in about ten minutes. If things go well it should be back about two minutes after that. If things don't go well, it will be a bit longer... |
|
![]() |
![]() |
![]() |
#11 | |
If I May
"Chris Halsall"
Sep 2002
Barbados
100100101111102 Posts |
![]() Quote:
I was still seeing the timeout error, so I dug down on the assignment code. Turns out there was a somewhat expensive SQL statement that was pushing the execution time *just* over the spider's time-out period. Assignments should be working again for everyone. I'll drill down on optimizing up that query, and/or just throw it into the background to process in parallel. As always, please let me know if anyone sees anything else strange. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Notebook | enzocreti | enzocreti | 0 | 2019-02-15 08:20 |
Differentiation/integration in Real Analysis I course | clowns789 | Analysis & Analytic Number Theory | 4 | 2017-05-23 18:48 |
integration problem | sma4 | Homework Help | 2 | 2009-08-02 14:08 |
What Integration Technique? | Primeinator | Homework Help | 17 | 2008-06-04 04:07 |
Tabular Integration w/ 2 Nonlinear Terms | Primeinator | Homework Help | 0 | 2008-05-01 06:22 |