mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet > GPU to 72

Reply
 
Thread Tools
Old 2020-03-04, 12:05   #4698
bayanne
 
bayanne's Avatar
 
"Tony Gott"
Aug 2002
Yell, Shetland, UK

4778 Posts
Default

Opting to work on CPU tasks as well has meant that I am now getting GPU instances less frequently.
I am wondering whether this is being somewhat counter productive ...

Last fiddled with by bayanne on 2020-03-04 at 12:06
bayanne is offline   Reply With Quote
Old 2020-03-04, 13:06   #4699
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100101000000102 Posts
Default

Quote:
Originally Posted by bayanne View Post
Opting to work on CPU tasks as well has meant that I am now getting GPU instances less frequently. I am wondering whether this is being somewhat counter productive ...
While it is impossible to guess at what Google's algorithms are weighting, I don't /think/ so. More likely what we're observing is the ebb-and-flow of demand vs. availability.

But to test your theory, change your CPU Worktype to "Disabled" and the CPU won't be used (the CPU payload provided is just a sleep(forever) call in such cases).
chalsall is online now   Reply With Quote
Old 2020-03-04, 14:56   #4700
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

5·653 Posts
Default

Does this make sense?
Code:
Beginning GPU Trial Factoring Environment Bootstrapping...
Please see https://www.gpu72.com/ for additional details.

20200304_145207: GPU72 TF V0.42 Bootstrap starting (now with CPU support!)...
20200304_145207: Working as "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"...

20200304_145207: Installing needed packages
20200304_145213: Fetching initial work...
20200304_145213: Running GPU type Tesla K80

20200304_145214: running a simple selftest...
20200304_145218: Selftest statistics
20200304_145218:   number of tests           107
20200304_145219:   successfull tests         107
20200304_145219: selftest PASSED!
20200304_145219: Bootstrap finished.  Exiting.
It has a GPU, but it's not doing any TF... not sure if it's doing P-1 but I don't see any comment about that either.

My other instance I started at the same time is also borkend:
Code:
Beginning GPU Trial Factoring Environment Bootstrapping...
Please see https://www.gpu72.com/ for additional details.

20200304_145330: GPU72 TF V0.42 Bootstrap starting (now with CPU support!)...
20200304_145330: Working as "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"...

20200304_145330: Installing needed packages
20200304_145335: Fetching initial work...
20200304_145336: Running GPU type Tesla T4

20200304_145336: running a simple selftest...
20200304_145340: Selftest statistics
20200304_145340:   number of tests           107
20200304_145340:   successfull tests         107
20200304_145340: selftest PASSED!
20200304_145340: Bootstrap finished.  Exiting.

Last fiddled with by James Heinrich on 2020-03-04 at 14:57
James Heinrich is offline   Reply With Quote
Old 2020-03-04, 15:07   #4701
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·3·1,579 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Does this make sense?
No!!! Grrr...

Please try rerunning the Sections. According to the DB you /were/ issued work.

Edit: Actually, one of your three instance was issued TF work, the other two were only issued P-1 work. This shouldn't happen.

Working theory: a DNS lookup failure could explain this. I'll add a check to the Comms script module to retry if it doesn't successfully get the first batch of work. But simply rerunning your failed sections should fix the issue right now.

Last fiddled with by chalsall on 2020-03-04 at 15:15
chalsall is online now   Reply With Quote
Old 2020-03-04, 15:38   #4702
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

5·653 Posts
Default

Quote:
Originally Posted by chalsall View Post
But simply rerunning your failed sections should fix the issue right now.
I restarted both. The one seems normal:
Code:
20200304_153523:  Exponent  TF Level  % Done     ETA   GHzD/D  Itr Time |   Class #,   Seq # |    #FCs | SieveRate |  SieveP | Uptime
20200304_153538: 100180889 75 to 76    0.1%   1h39m  1102.38    6.236s |    0/4620,   1/960 |  40.81G | 6544.7M/s |   82485 |   0:02
20200304_153538: 100969277 P-1    77   0.00%  Stage: 1
20200304_153643: 100180889 75 to 76    1.5%   1h37m  1110.57    6.190s |   52/4620,  14/960 |  40.81G | 6593.3M/s |   82485 |   0:04
The other seems to have a lot more P-1 lines than I expect right on init:
Code:
20200304_153537: Installing needed packages
20200304_153554: Fetching initial work...
20200304_153556: Running GPU type Tesla T4

20200304_153556: running a simple selftest...
20200304_153605: Selftest statistics
20200304_153605:   number of tests           107
20200304_153605:   successfull tests         107
20200304_153605: selftest PASSED!
20200304_153605: Starting trial factoring M106899509 from 2^75 to 2^76 (71.58 GHz-days)

20200304_153605:  Exponent  TF Level  % Done     ETA   GHzD/D  Itr Time |   Class #,   Seq # |    #FCs | SieveRate |  SieveP | Uptime
20200304_153620: 106899509 75 to 76    0.1%  55m42s  1848.60    3.485s |    0/4620,   1/960 |  38.25G | 10974.9M/s |   82485 |   0:45
20200304_153620: 100968493 P-1    77   0.00%  Stage: 1
20200304_153620: 100968493 P-1    77   2.64%  Stage: 1
20200304_153620: 100968493 P-1    77   5.29%  Stage: 1
20200304_153620: 100968493 P-1    77   7.94%  Stage: 1
20200304_153620: 100968493 P-1    77   10.59%  Stage: 1
20200304_153620: 100969361 P-1    77   0.00%  Stage: 1
20200304_153722: 106899509 75 to 76    2.4%  55m52s  1801.06    3.577s |  111/4620,  23/960 |  38.25G | 10692.6M/s |   82485 |   0:47
James Heinrich is offline   Reply With Quote
Old 2020-03-04, 16:12   #4703
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2×3×1,579 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
I restarted both. The one seems normal: ... The other seems to have a lot more P-1 lines than I expect right on init:
Thanks for the data...

What is happening here is you now have two P-1 jobs running in parallel. An interesting edge case. There's nothing we can do about this, but it shows me some deltas I need to make to the payloads to handle these kinds of rare (but not impossible) edge cases.

Somewhat amusingly, yesterday Chuck had a similar situation. Even though he did a Factory Reset, two of his instances continued uploading debugging information for 24 hours; ~0.6 GB/minute... My /var/ was not amused...
chalsall is online now   Reply With Quote
Old 2020-03-04, 16:46   #4704
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

2·5·59 Posts
Default

Quote:
Originally Posted by chalsall View Post
two of his instances continued uploading debugging information for 24 hours; ~0.6 GB/minute... My /var/ was not amused...
Maybe that's a new tactic they are employing in order to try to drive us away, lol
PhilF is online now   Reply With Quote
Old 2020-03-04, 17:36   #4705
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

32×97 Posts
Default

Quote:
Originally Posted by chalsall View Post
Thanks for the data...

What is happening here is you now have two P-1 jobs running in parallel. An interesting edge case. There's nothing we can do about this, but it shows me some deltas I need to make to the payloads to handle these kinds of rare (but not impossible) edge cases.

Somewhat amusingly, yesterday Chuck had a similar situation. Even though he did a Factory Reset, two of his instances continued uploading debugging information for 24 hours; ~0.6 GB/minute... My /var/ was not amused...
Did I do something wrong that enabled this debug uploading?
Chuck is offline   Reply With Quote
Old 2020-03-04, 17:43   #4706
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100101000000102 Posts
Default

Quote:
Originally Posted by Chuck View Post
Did I do something wrong that enabled this debug uploading?
No... I did.

All you did was stop and restart your Sections. Perfectly reasonable.

But then I had made an assumption that was incorrect, and my code started misbehaving. Then I had the code send back debugging information in such situations, not realizing just how large the data would be nor how often it would be sent...

A classic SPE, working in an environment where it's a "you'd better get this correct, because you have no control over it once it's running" situation.

And then, of course, not getting it correct... DWIM!!!
chalsall is online now   Reply With Quote
Old 2020-03-04, 17:52   #4707
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

11011010012 Posts
Default

Quote:
Originally Posted by chalsall View Post
No... I did.

All you did was stop and restart your Sections. Perfectly reasonable.

But then I had made an assumption that was incorrect, and my code started misbehaving. Then I had the code send back debugging information in such situations, not realizing just how large the data would be nor how often it would be sent...

A classic SPE, working in an environment where it's a "you'd better get this correct, because you have no control over it once it's running" situation.

And then, of course, not getting it correct... DWIM!!!
Sometimes when I restart a session, the little spinning indicator in the upper left corner scrolls up off the screen out of sight when I scroll the window to the bottom of the screen. When this happens, I found that if I scroll back to the top of the window and re-click "Default" on the logging level, it corrects the problem and the indicator does not scroll off the top.

I thought I might have accidentally selected "Verbose".
Chuck is offline   Reply With Quote
Old 2020-03-04, 21:04   #4708
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

32×97 Posts
Default Colab restarts

My sessions expired after 24 hours as usual. There were three sessions and as I restarted each, it went through the bootstrap process and exited immediately. I restarted the three sessions again and they then ran normally.

Evidently I picked up an extra P-1 assignment as each session is displaying two different P-1 progress lines starting at 0.00%.
Chuck is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Status Primeinator Operation Billion Digits 5 2011-12-06 02:35
62 bit status 1997rj7 Lone Mersenne Hunters 27 2008-09-29 13:52
OBD Status Uncwilly Operation Billion Digits 22 2005-10-25 14:05
1-2M LLR status paulunderwood 3*2^n-1 Search 2 2005-03-13 17:03
Status of 26.0M - 26.5M 1997rj7 Lone Mersenne Hunters 25 2004-06-18 16:46

All times are UTC. The time now is 00:51.

Sat Feb 27 00:51:16 UTC 2021 up 85 days, 21:02, 1 user, load averages: 2.39, 2.63, 2.86

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.