![]() |
|
|
#4797 |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
125416 Posts |
I can get 2 sessions each day but only once; late in the evening.
Otherwise as soon as I start 1 session it won't let me start another. Interestingly, for this 1 session I can start the GPU72 session without first starting the "tunnel" session. It gives the message "No GPU available" still but lets the CPU code run the P1. |
|
|
|
|
|
#4798 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
262716 Posts |
You keep mentioning the "Tunnel" session. Are you running am Instance Root reverse-tunnel Section? Not needed (but fun for the pretty graphs and other data).
|
|
|
|
|
|
#4799 | |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
22×3×17×23 Posts |
Quote:
1. Start tunnels: sshd.pl 2. Run bootstrap.pl If step 1 is no longer required why can I not get a GPU without it? |
|
|
|
|
|
|
#4800 | |
|
"James Heinrich"
May 2004
ex-Northern Ontario
11·311 Posts |
Quote:
|
|
|
|
|
|
|
#4801 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
9,767 Posts |
Quote:
I have no idea why you're noticing that correlation. But it shouldn't be causal. Once you're given a Session (read: Connect to a Backend) you'll have a GPU, or you won't. Running an SSH Section won't magically attach you to a GPU. |
|
|
|
|
|
|
#4802 |
|
Romulan Interpreter
Jun 2011
Thailand
100101101101102 Posts |
Hey Chris, I just upgraded to the new colab script yesterday, the one which uses the CPU too, and there seems to be a bug with reporting results for CPU.
First, I got a P-1 starting at 43% of Stage 1 (??). As I didn't do any P-1 before (this is new "notebook" with the ID starting with "b535..."), I assumed that you save the intermediary (full) residues from time to time, just in case colab decides to kick someone's ass unexpectedly, and then you resume next time. But passing me other's guy work (I assume you do it viceversa too?) is wrong, somehow, because assuming I can finish it, I would get the credit for it, therefore robing the person who did the first 43% of the work. You should keep an evidence and assign the continuity of work only to the user who did the first part of work too. Not that I complain too much about free resources given by Google to us... Secondly, colab indeed kicked me off before succeeding in finishing the Stage 1 of that P-1 (last time at almost 98% ). When resumed (starting new session) I am getting the same exponent, but.... starting at 43%. I am already doing this third time. "102986021 P-1 77 46.23% Stage: 1 complete." (the column is confuse there, it looks like the stage 1 is complete, but it is not, the message says that "46% of stage 1 is complete", you should better display as: "Stage 1 complete: 46.xx%", but this is minor, my pain in the butt is now repeating the same work over and over, to no progress. Am I doing something wrong? Do I need to use some "persistent" storage/drive on my side of colab/google_drive/whatever?). Last fiddled with by LaurV on 2020-03-17 at 04:13 |
|
|
|
|
|
#4803 |
|
Romulan Interpreter
Jun 2011
Thailand
2·3·1,609 Posts |
Ok, today it seems I got a better CPU (?!?), because after 5 hours, it finished Stage 1, tried an unsuccessful GCD, ans moved to Stage 2, which is now ~5.5% done. If the instance is killed at 10 hours as expected (or before), it is clear that it won't finish and report in time.
I just backup'd the checpoint files, in case it crashes I will finish it locally, to avoid doing the same work over and over. What's the plan B? (you see, we didn't really keep in touch with new "inventions" you did there, and most probably we are doing something wrong...) Last fiddled with by LaurV on 2020-03-17 at 06:49 |
|
|
|
|
|
#4804 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
9,767 Posts |
Quote:
But this should all be sane; many people are using it successfully; including my seven instances running the exact same code as everyone else. To be clear... The P-1 checkpoint files should be thrown back to the server every ten minutes during the entire run(s). If an instance dies, the last checkpoint is sent out to the next requested instance (that you own, of course). If you PM me the exponent in question, I can examine the logs and the checkpoint files themselves. |
|
|
|
|
|
|
#4805 |
|
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
2×4,909 Posts |
|
|
|
|
|
|
#4806 | |
|
Romulan Interpreter
Jun 2011
Thailand
2·3·1,609 Posts |
Quote:
Edit: We manually stopped and restarted everything after some time, because we were not satisfied with the K80 we got for TF, and the P-1 Stage 2 resumed again, normally (61%). We are good here. Last fiddled with by LaurV on 2020-03-18 at 05:06 |
|
|
|
|
|
|
#4807 |
|
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
2·4,909 Posts |
MOD NOTE: BOINC related posts moved here:
https://www.mersenneforum.org/showthread.php?t=25383 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Status | Primeinator | Operation Billion Digits | 5 | 2011-12-06 02:35 |
| 62 bit status | 1997rj7 | Lone Mersenne Hunters | 27 | 2008-09-29 13:52 |
| OBD Status | Uncwilly | Operation Billion Digits | 22 | 2005-10-25 14:05 |
| 1-2M LLR status | paulunderwood | 3*2^n-1 Search | 2 | 2005-03-13 17:03 |
| Status of 26.0M - 26.5M | 1997rj7 | Lone Mersenne Hunters | 25 | 2004-06-18 16:46 |