mersenneforum.org GPU to 72 status...
 Register FAQ Search Today's Posts Mark Forums Read

 2020-03-04, 12:05 #4698 bayanne     "Tony Gott" Aug 2002 Yell, Shetland, UK 4778 Posts Opting to work on CPU tasks as well has meant that I am now getting GPU instances less frequently. I am wondering whether this is being somewhat counter productive ... Last fiddled with by bayanne on 2020-03-04 at 12:06
2020-03-04, 13:06   #4699
chalsall
If I May

"Chris Halsall"
Sep 2002

100101000000102 Posts

Quote:
 Originally Posted by bayanne Opting to work on CPU tasks as well has meant that I am now getting GPU instances less frequently. I am wondering whether this is being somewhat counter productive ...
While it is impossible to guess at what Google's algorithms are weighting, I don't /think/ so. More likely what we're observing is the ebb-and-flow of demand vs. availability.

But to test your theory, change your CPU Worktype to "Disabled" and the CPU won't be used (the CPU payload provided is just a sleep(forever) call in such cases).

 2020-03-04, 14:56 #4700 James Heinrich     "James Heinrich" May 2004 ex-Northern Ontario 5·653 Posts Does this make sense? Code: Beginning GPU Trial Factoring Environment Bootstrapping... Please see https://www.gpu72.com/ for additional details. 20200304_145207: GPU72 TF V0.42 Bootstrap starting (now with CPU support!)... 20200304_145207: Working as "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"... 20200304_145207: Installing needed packages 20200304_145213: Fetching initial work... 20200304_145213: Running GPU type Tesla K80 20200304_145214: running a simple selftest... 20200304_145218: Selftest statistics 20200304_145218: number of tests 107 20200304_145219: successfull tests 107 20200304_145219: selftest PASSED! 20200304_145219: Bootstrap finished. Exiting. It has a GPU, but it's not doing any TF... not sure if it's doing P-1 but I don't see any comment about that either. My other instance I started at the same time is also borkend: Code: Beginning GPU Trial Factoring Environment Bootstrapping... Please see https://www.gpu72.com/ for additional details. 20200304_145330: GPU72 TF V0.42 Bootstrap starting (now with CPU support!)... 20200304_145330: Working as "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"... 20200304_145330: Installing needed packages 20200304_145335: Fetching initial work... 20200304_145336: Running GPU type Tesla T4 20200304_145336: running a simple selftest... 20200304_145340: Selftest statistics 20200304_145340: number of tests 107 20200304_145340: successfull tests 107 20200304_145340: selftest PASSED! 20200304_145340: Bootstrap finished. Exiting. Last fiddled with by James Heinrich on 2020-03-04 at 14:57
2020-03-04, 15:07   #4701
chalsall
If I May

"Chris Halsall"
Sep 2002

2·3·1,579 Posts

Quote:
 Originally Posted by James Heinrich Does this make sense?
No!!! Grrr...

Please try rerunning the Sections. According to the DB you /were/ issued work.

Edit: Actually, one of your three instance was issued TF work, the other two were only issued P-1 work. This shouldn't happen.

Working theory: a DNS lookup failure could explain this. I'll add a check to the Comms script module to retry if it doesn't successfully get the first batch of work. But simply rerunning your failed sections should fix the issue right now.

Last fiddled with by chalsall on 2020-03-04 at 15:15

2020-03-04, 15:38   #4702
James Heinrich

"James Heinrich"
May 2004
ex-Northern Ontario

5·653 Posts

Quote:
 Originally Posted by chalsall But simply rerunning your failed sections should fix the issue right now.
I restarted both. The one seems normal:
Code:
20200304_153523:  Exponent  TF Level  % Done     ETA   GHzD/D  Itr Time |   Class #,   Seq # |    #FCs | SieveRate |  SieveP | Uptime
20200304_153538: 100180889 75 to 76    0.1%   1h39m  1102.38    6.236s |    0/4620,   1/960 |  40.81G | 6544.7M/s |   82485 |   0:02
20200304_153538: 100969277 P-1    77   0.00%  Stage: 1
20200304_153643: 100180889 75 to 76    1.5%   1h37m  1110.57    6.190s |   52/4620,  14/960 |  40.81G | 6593.3M/s |   82485 |   0:04
The other seems to have a lot more P-1 lines than I expect right on init:
Code:
20200304_153537: Installing needed packages
20200304_153554: Fetching initial work...
20200304_153556: Running GPU type Tesla T4

20200304_153556: running a simple selftest...
20200304_153605: Selftest statistics
20200304_153605:   number of tests           107
20200304_153605:   successfull tests         107
20200304_153605: selftest PASSED!
20200304_153605: Starting trial factoring M106899509 from 2^75 to 2^76 (71.58 GHz-days)

20200304_153605:  Exponent  TF Level  % Done     ETA   GHzD/D  Itr Time |   Class #,   Seq # |    #FCs | SieveRate |  SieveP | Uptime
20200304_153620: 106899509 75 to 76    0.1%  55m42s  1848.60    3.485s |    0/4620,   1/960 |  38.25G | 10974.9M/s |   82485 |   0:45
20200304_153620: 100968493 P-1    77   0.00%  Stage: 1
20200304_153620: 100968493 P-1    77   2.64%  Stage: 1
20200304_153620: 100968493 P-1    77   5.29%  Stage: 1
20200304_153620: 100968493 P-1    77   7.94%  Stage: 1
20200304_153620: 100968493 P-1    77   10.59%  Stage: 1
20200304_153620: 100969361 P-1    77   0.00%  Stage: 1
20200304_153722: 106899509 75 to 76    2.4%  55m52s  1801.06    3.577s |  111/4620,  23/960 |  38.25G | 10692.6M/s |   82485 |   0:47

2020-03-04, 16:12   #4703
chalsall
If I May

"Chris Halsall"
Sep 2002

2×3×1,579 Posts

Quote:
 Originally Posted by James Heinrich I restarted both. The one seems normal: ... The other seems to have a lot more P-1 lines than I expect right on init:
Thanks for the data...

What is happening here is you now have two P-1 jobs running in parallel. An interesting edge case. There's nothing we can do about this, but it shows me some deltas I need to make to the payloads to handle these kinds of rare (but not impossible) edge cases.

Somewhat amusingly, yesterday Chuck had a similar situation. Even though he did a Factory Reset, two of his instances continued uploading debugging information for 24 hours; ~0.6 GB/minute... My /var/ was not amused...

2020-03-04, 16:46   #4704
PhilF

Feb 2005

2·5·59 Posts

Quote:
 Originally Posted by chalsall two of his instances continued uploading debugging information for 24 hours; ~0.6 GB/minute... My /var/ was not amused...
Maybe that's a new tactic they are employing in order to try to drive us away, lol

2020-03-04, 17:36   #4705
Chuck

May 2011
Orange Park, FL

32×97 Posts

Quote:
 Originally Posted by chalsall Thanks for the data... What is happening here is you now have two P-1 jobs running in parallel. An interesting edge case. There's nothing we can do about this, but it shows me some deltas I need to make to the payloads to handle these kinds of rare (but not impossible) edge cases. Somewhat amusingly, yesterday Chuck had a similar situation. Even though he did a Factory Reset, two of his instances continued uploading debugging information for 24 hours; ~0.6 GB/minute... My /var/ was not amused...

2020-03-04, 17:43   #4706
chalsall
If I May

"Chris Halsall"
Sep 2002

100101000000102 Posts

Quote:
 Originally Posted by Chuck Did I do something wrong that enabled this debug uploading?
No... I did.

All you did was stop and restart your Sections. Perfectly reasonable.

But then I had made an assumption that was incorrect, and my code started misbehaving. Then I had the code send back debugging information in such situations, not realizing just how large the data would be nor how often it would be sent...

A classic SPE, working in an environment where it's a "you'd better get this correct, because you have no control over it once it's running" situation.

And then, of course, not getting it correct... DWIM!!!

2020-03-04, 17:52   #4707
Chuck

May 2011
Orange Park, FL

11011010012 Posts

Quote:
 Originally Posted by chalsall No... I did. All you did was stop and restart your Sections. Perfectly reasonable. But then I had made an assumption that was incorrect, and my code started misbehaving. Then I had the code send back debugging information in such situations, not realizing just how large the data would be nor how often it would be sent... A classic SPE, working in an environment where it's a "you'd better get this correct, because you have no control over it once it's running" situation. And then, of course, not getting it correct... DWIM!!!
Sometimes when I restart a session, the little spinning indicator in the upper left corner scrolls up off the screen out of sight when I scroll the window to the bottom of the screen. When this happens, I found that if I scroll back to the top of the window and re-click "Default" on the logging level, it corrects the problem and the indicator does not scroll off the top.

I thought I might have accidentally selected "Verbose".

 2020-03-04, 21:04 #4708 Chuck     May 2011 Orange Park, FL 32×97 Posts Colab restarts My sessions expired after 24 hours as usual. There were three sessions and as I restarted each, it went through the bootstrap process and exited immediately. I restarted the three sessions again and they then ran normally. Evidently I picked up an extra P-1 assignment as each session is displaying two different P-1 progress lines starting at 0.00%.

 Similar Threads Thread Thread Starter Forum Replies Last Post Primeinator Operation Billion Digits 5 2011-12-06 02:35 1997rj7 Lone Mersenne Hunters 27 2008-09-29 13:52 Uncwilly Operation Billion Digits 22 2005-10-25 14:05 paulunderwood 3*2^n-1 Search 2 2005-03-13 17:03 1997rj7 Lone Mersenne Hunters 25 2004-06-18 16:46

All times are UTC. The time now is 00:51.

Sat Feb 27 00:51:16 UTC 2021 up 85 days, 21:02, 1 user, load averages: 2.39, 2.63, 2.86