mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU to 72 (https://www.mersenneforum.org/forumdisplay.php?f=95)
-   -   GPU to 72 status... (https://www.mersenneforum.org/showthread.php?t=16263)

EugenioBruno 2020-03-10 18:15

[QUOTE=chalsall;539295]
Basically, we have determined that it is "optimal" to TF to 77 bits before running the First Check.
[/QUOTE]

Considering the fact that primenet's "what makes most sense" assignment is 73 bit work, I assume there isn't an unanimous consensus?

I assume you have lots of data to back up 77 bits as the optimal level, considering all the work going through the project, so I'm a bit confused as to why primenet wouldn't also hand out that as "optimal". True, it takes more time, but even a simple DC of a 50M exponent takes days on a 3700X, FCs take waaay more, so I'm not sure I understand the time argument...

Just asking a lot of (annoying) questions to quickly learn about the various faces of this project :)

By the way, what Chuck said is so very true - I had no factors found until now, and this evening I got three (a bit of 73 bit work before my worktodo gets to the 77 jobs).

linament 2020-03-10 18:15

[QUOTE]Do you happen to know what the assignment was? And approximate time (UTC please)?[/QUOTE]


I have no idea what the assignment was (I only casually look when I shut my machine down for the night) other than it was in the 72-73 bit range. I do remember that the automatic breadth first assignments yesterday were scattered between 100M-107M (but that probably is not enough to help). As to the approximate time my GPU session stopped, I can only guess that it was near 2020-03-09 23:30 (found in an mprime log that I was running concurrently).

James Heinrich 2020-03-10 18:34

[QUOTE=EugenioBruno;539306]Considering the fact that primenet's "what makes most sense" assignment is 73 bit work, I assume there isn't an unanimous consensus?[/QUOTE]Everyone bases the calculation of optimal TF (and P-1) factoring on predicted time saved (by finding a factor) vs the effort spend looking for factors (instead of just running a primality test).
PrimeNet assumes the same CPU will be used for TF, P-1 and LL/PRP, which makes the calculation easy, and was historically true for a long time before GPUs came on the scene.
Chris's calculations for GPU72 assume that TF will be done on GPU since they are [i]hugely[/i] faster than CPU at TF, whereas their advantage at P-1 and LL/PRP is less dramatic.
In a system where the same processor is used for TF, P-1 and LL/PRP then calculating the optimal effort distribution is pretty simple (e.g. [url=https://www.mersenne.ca/cudalucas.php?model=706]this graph for a GPU[/url]). In current practice, however, TF is largely done on GPUs whereas P-1 is almost entirely CPU, and LL/PRP is mostly CPU.

I believe the general PrimeNet approach is to tell clients to TF to CPU-optimal levels in case they actually get TF'd on a CPU, but to let GPU72 micromanage the wavefront and direct GPU resources as needed so that hopefully nothing is actually being TF'd on a CPU.
In the end it's always going to be an imperfect calculation, exacerbated by limited compute resources. TF targets might be different if there was more (or less) GPU power available for TF, or a larger buffer between TF and P-1 and LL/PRP wavefronts, etc.
Personally I just trust Chris to keep his eye on the data and micromanage What Makes Sense to direct available resources where they... make sense. :smile:

EugenioBruno 2020-03-10 19:05

[QUOTE=James Heinrich;539312]
Personally I just trust Chris to keep his eye on the data and micromanage What Makes Sense to direct available resources where they... make sense. :smile:[/QUOTE]

Sounds like a pretty good idea to me. :)
Thanks for the explanation, makes sense.

chalsall 2020-03-10 19:17

[QUOTE=EugenioBruno;539315]Sounds like a pretty good idea to me. :) Thanks for the explanation, makes sense.[/QUOTE]

Thanks for the vote of confidence guys. Appreciated.

But to put on the table, we still need a /lot/ more TF'ing compute to stay "optimal". We've got about two months to figure out where that's going to be coming from...

As always, if you have any GPU compute you might throw our way, it would be welcomed. And now you don't even need to have a GPU, just a Gmail account (or seven)! :wink:

EugenioBruno 2020-03-10 20:11

First result should be coming your way in about 3 hours.

After I'm done doing the work to 73 reserved through primenet, I should be contributing ~900GHz/d (gtx 1650) for about 16 hours a day.

Each according to their own compute capabilities :P

chalsall 2020-03-10 21:24

Primenet Username...
 
So, I finally had the cycles to add the UI front end, to allow people to [URL="https://www.gpu72.com/account/settings/"]enter their Primenet Username[/URL] into the system.

This form also lets you update your GPU72 Display Name, if you're currently appearing as "Anonymous", and you'd like to "come out".

Once the system knows your PNUN, it will start issuing you P-1 CPU work in parallel (or instead of, when GPUs aren't available to you). A few hours after the system notifies me of the new knowledge, a Virtual Machine will also be created on Primenet called GPU72_TF, into which Colab TF'ing results will automatically be submitted. (That step isn't scripted yet; a human (me) is still in the loop.)

For those not already auto submitting, please try out this form and let me know if you have any comments or SPE observations.

Also, there's a field for getting email notifications when assignments are overdue. This was requested by at least a few people, and I'll activate this in the next week or so.

James Heinrich 2020-03-10 21:36

[QUOTE=chalsall;539330]there's a field for getting email notifications when assignments are overdue...
please try out this form and let me know if you have any comments or SPE observations.[/QUOTE]I set email alerts to "one week" and the form came back and said "Account settings updated" but the drop-down still shows "no notifications". I don't know if it's actually not saving the setting, or just a display issue.

chalsall 2020-03-10 21:43

[QUOTE=James Heinrich;539332]I don't know if it's actually not saving the setting, or just a display issue.[/QUOTE]

The latter. Thanks... Will have that field update, but it is being saved in the DB.

Uncwilly 2020-03-10 22:41

1 Attachment(s)
[QUOTE=chalsall;539330]So, I finally had the cycles to add the UI front end, to allow people to [URL="https://www.gpu72.com/account/settings/"]enter their Primenet Username[/URL] into the system.[/QUOTE]
I logged in and see nothing to modify or update.

chalsall 2020-03-10 22:45

[QUOTE=Uncwilly;539340]I logged in and see nothing to modify or update.[/QUOTE]

Thanks. SPE... Fixed.

EugenioBruno 2020-03-10 23:54

Just to make sure I did this right and I don't mess up my later jobs: I can just submit work via the manual form on mersenne.org as if I got it from there, correct? Then gputo72 will figure out that I did that job because it was assigned to me?

Uncwilly 2020-03-11 00:21

GPU72 only tracks work that passed through it. So, if you got it manually from PrimeNet, it won't care. If you are using Windows, you can use Misfit to automatically retrieve work and return results. It can work with GPU72 or PrimeNet.

EugenioBruno 2020-03-11 00:34

The work was assigned to me by gpu72. Does my question make more sense now?

Edit: it's also marked as completed in my gpu72 account so I assume my setup should be working...

LaurV 2020-03-11 05:12

You get work from GPU72, report it to PrimeNet. Via manual form, or automatically (Misfit or script).

GPU72 will see the work is complete, and credit you with it. You don't need to report bact to GPU72.
As said before, if you are under Windoze, use Misfit. Will save you a lot of headache.

Uncwilly 2020-03-11 13:44

[QUOTE=EugenioBruno;539350]The work was assigned to me by gpu72. Does my question make more sense now?

Edit: it's also marked as completed in my gpu72 account so I assume my setup should be working...[/QUOTE]
Correct. I misread what you posted. I missed the "if" in "as if".

EugenioBruno 2020-03-11 14:26

Thanks. Also, thanks all for the suggestions for misfit, and maybe I'll take another look at it sometimes, but I just remember headaches trying to figure it out... I have a script in my startup folder to start mfaktc, and checking in and submitting work every once in a while is a cool little ritual in itself. Simple enough for me. I don't tend to like *over*abstraction :)

2M215856352p1 2020-03-11 14:56

I made a mistake when I experimented with the new feature.

Not knowing which PrimeNet username to use (the displayed name or the login id), I put the displayed name as the PrimeNet username.

Unfortunately, the results were credited to Anonymous unless I manually submitted the results before the automatic submission spider automatically submitted the result.

However, once I put my PrimeNet username and clicked on update, the field can no longer be updated. I wish to update the field so that the results will be credited to my PrimeNet account.

Any advice?

chalsall 2020-03-11 15:12

[QUOTE=2M215856352p1;539410]Any advice?[/QUOTE]

Can you please PM me your Primenet Username?

2M215856352p1 2020-03-11 15:21

[QUOTE=chalsall;539411]Can you please PM me your Primenet Username?[/QUOTE]

Have you received my PM? I am not very sure whether I did that correctly.

Chuck 2020-03-11 15:34

Error during P-1 submission
 
1 Attachment(s)
Don't know if this is of any interest, or just a transient problem. This is from a Colab CPU only notebook.

...Evidently a transient problem, the next hourly submission was successful.

chalsall 2020-03-11 15:34

[QUOTE=2M215856352p1;539414]Have you received my PM? I am not very sure whether I did that correctly.[/QUOTE]

Yes. And replied. Please reply to that, and I'll be able to fix you up. :smile:

chalsall 2020-03-11 15:53

[QUOTE=Chuck;539418]Don't know if this is of any interest, or just a transient problem. This is from a Colab CPU only notebook. ...Evidently a transient problem, the next hourly submission was successful.[/QUOTE]

Interesting. Thanks for reporting that.

This means there was a communications error between the mprime client in the instance and Primenet (through the GPU72 proxy).

I'll take a look at the logs, and see if the client actually reached the proxy or not. But, regardless, the INI file should lower that retry period setting.

P. S. BTW, thanks to everyone who's reporting the timestamps with their "hmmm..." observations. It helps debugging immensely. Geeks really appreciate timestamps. NTP'ed UTC idea! :smile:

EugenioBruno 2020-03-12 18:43

Speaking of optimal TF; what % do you guesstimate the current 77 bit work to be "more efficient" (I think the metric is time saved?) than the 73 work the primenet gives out?

Depending on the answer I might (or might not, or in different proportions) also do a bit of those, since they're very quick and I get more results per day - but not if the efficiency is way lower.

Uncwilly 2020-03-12 19:14

Based upon the First Time Check being done on a CPU and the TF work on a GPU, taking the factoring to 77 makes sense. The GPU's are saving the CPU's several percent of work (~4-5.3), above and beyond what is saved by taking everything to 73 bits. That is, ~5% fewer exponents need to get tested, because the GPU's have found a factor.

CPU's doing First time tests (PRP being preferred) or P-1 testing makes the most sense for them. Also, since there are many more of them doing tests, it makes the most sense at the moment to throw every available GPU at TF.

James Heinrich 2020-03-12 19:18

[QUOTE=EugenioBruno;539532].. work to be "more efficient" (I think the metric is time saved?)[/QUOTE]The most "efficient" work in terms of time saved is always the highest exponent and lowest bit depth.

To quote an extreme example, the third-largest exponent I track is [url=https://www.mersenne.ca/exponent/9999999929]M9,999,999,929[/url] and it has a ~40-bit factor. Finding this trivial factor took about 0.0000000000009 GHz-days of effort and saves 275,833 GHz-days of effort to do a single PRP/LL test (double that if you want double-checks, not to mention another 1000 GHd or so for P-1). 600-quadrillion-to-one efficiency, can't beat it. :smile:

At the lower bit depths, e.g. up to what PrimeNet hands out and a few bits above that there is no question but that it all needs to be done. The [I]only[/I] question is, based on the TF resources available vs primality-testing progress is what the final TF bitlevel should be. Right now that question hovers around should it be PrimeNet+4 or PrimeNet+5, and the answer varies according to how far ahead of the primality-testing wavefront the TF effort is and how much TF power is available. A few posts back I believe Chris said we have about a 2-month lead right now and it's shrinking, so we'll likely need to cut back to +4.

See this graph for an illustration: [url]https://mersenne.ca/graphs/factor_bits_1000M/[/url]
The pink curve is PrimeNet TF target, blue is PrimeNet+3, palegreen is PrimeNet+5.
The goal is to get the actual TF (black) up to [I]at least[/I] +3, but really should be +4 and (if we can manage it) +5.

chalsall 2020-03-12 19:35

[QUOTE=James Heinrich;539537]A few posts back I believe Chris said we have about a 2-month lead right now and it's shrinking, so we'll likely need to cut back to +4.[/QUOTE]

Yup.

Approximately 700 candidates are FC'ed each day at the moment. We're only averaging ~380 candidates to 77 bits per day over the last month. We do have ~50,000 candidates already TF'ed optionally as a buffer, but we are falling behind.

Part of my motivation for investing the time in the whole Colab thing was to get it sane and scalable. Now we work on getting the concurrent instance count up (and not just in Colab...).

For anyone who has a Gmail account and hasn't tried the Notebook thing, I would encourage you to give it a go. Pretty easy! :smile:

kriesel 2020-03-12 20:19

[QUOTE=James Heinrich;539537]The most "efficient" work in terms of time saved is always the highest exponent and lowest bit depth.

To quote an extreme example, the third-largest exponent I track is [URL="https://www.mersenne.ca/exponent/9999999929"]M9,999,999,929[/URL] and it has a ~40-bit factor. Finding this trivial factor took about 0.0000000000009 GHz-days of effort and saves 275,833 GHz-days of effort to do a single PRP/LL test (double that if you want double-checks, not to mention another 1000 GHd or so for P-1). [/QUOTE]But that is irrelevant. And so, useless activity. No one in his right mind is going to attempt P-1 or primality testing on such a large exponent this decade or next or for much longer. I take occasional flak for doing TF or P-1 to proper levels at less than one tenth that exponent value, as a means of exploring the limits of the currently available software. There are reasons why Ernst discourages people from attempting billion-digit primality tests with Mlucas, and why George feels no need to extend prime95's capabilities beyond about p~10[SUP]9[/SUP]. Even Mihai's gpuowl does not (quite) reach the billion-digit range. There is no software to P-1 factor or primality test such large exponents.
Instead of scattered activity on numerous exponents that won't be reached by the wavefront for decades or centuries, those reinforcements are needed at the barricades (wavefront).
[QUOTE=chalsall;539540]Approximately 700 candidates are FC'ed each day at the moment. We're only averaging ~380 candidates to 77 bits per day over the last month. We do have ~50,000 candidates already TF'ed optionally as a buffer, but we are falling behind.[/QUOTE]So 50,000/(700-380) = 156 days; buffer depletion in about 5 months. At or before buffer depletion, drop terminal TF level to 76. (Which is what I'm running to now for production TF.) If TF to 77 is possible for ~380/day, to 76 would be possible for ~780/day.

chalsall 2020-03-12 20:36

[QUOTE=EugenioBruno;539532]Depending on the answer I might (or might not, or in different proportions) also do a bit of those, since they're very quick and I get more results per day - but not if the efficiency is way lower.[/QUOTE]

The short answer is: do whatever you enjoy doing! Every "bit" helps -- it all has to be done. And not everyone has a compute farm in their basement!

Heck, I myself am down to only running a single old GTX560 here in Bimshire, and that's mostly just for testing purposes. (I whine when Colab only gives me a K80 for free, until I remember that it's still twice as fast as my own kit.)

chalsall 2020-03-12 20:39

[QUOTE=kriesel;539544]So 50,000/(700-380) = 156 days; buffer depletion in about 5 months. At or before buffer depletion, drop terminal TF level to 76. (Which is what I'm running to now for production TF.) If TF to 77 is possible for ~380/day, to 76 would be possible for ~780/day.[/QUOTE]

That ~380/day was over the last month. We've been lower than that more recently (participation ebbs and flows).

And I'm prepared for the situation that we need to "drop" candidates early, although I don't want to. Working some angles on getting some additional firepower...

James Heinrich 2020-03-12 21:46

[QUOTE=kriesel;539544]But that is irrelevant. And so, useless activity.
Instead of scattered activity on numerous exponents that won't be reached by the wavefront for decades or centuries, those reinforcements are needed at the barricades (wavefront)[/QUOTE]It is germane to this discussion in that it provides a dramatic illustration to the posed question of how the "efficiency" scales towards large exponents and small factors.

That said, while I wouldn't call any factor-finding effort outright "useless" in and of itself, as it relates to GIMPS's mission of finding the Next Mersenne Prime then factoring large exponents is not useful. It will all [I]eventually[/I] need to be done, but that could be decades-to-centuries away depending on the exponent.

EugenioBruno 2020-03-12 22:36

I have to admit I didn't really understand the argument fully, but tomorrow I will search for explanations, probabilities, arguments and so on, so I can understand without annoying you folks too much.

I'm not even sure if the qualitative gist I get - deeply TFing what's about to be FCd is better because it saves FCs and moves the wavefront faster; TFing exponents far ahead is worse because they wouldn't be tested anyway for a while, and hardware is going to evolve in the meantime - is roughly correct.

James Heinrich 2020-03-12 22:47

[QUOTE=EugenioBruno;539566]deeply TFing what's about to be FCd is better because it saves FCs and moves the wavefront faster;
TFing exponents far ahead is worse because they wouldn't be tested anyway for a while[/QUOTE]You got it. :tu:

chalsall 2020-03-12 22:47

[QUOTE=EugenioBruno;539566]I have to admit I didn't really understand the argument fully, but tomorrow I will search for explanations, probabilities, arguments and so on, so I can understand without annoying you folks too much.[/QUOTE]

No problem. For those who ask (like you), there are many going "Why are they doing that?"...

Basically, it comes down to a (scarce) resource management problem.

As a thought experiment, imagine that you were the sole person looking for the next Mersenne Prime. Based on James' deep analysis, we've empirically determined that it is more efficient to first TF to 77 bits before doing the First Check ***on the same GPU***.

Then, add to the problem space the fact that there are thousands of participants in GIMPS, all volunteers, and all running a huge mix of CPUs and GPUs. Some like finding factors. Some hope to find the next MP. Some are content using mprime/Prime95 to ensure the sanity of their kit by running DCs.

Then, on top of that, consider that there are actually multiple "wavefronts". The Cat 0 through 4 assignment classes, plus the P-1'ers, plus the DC'ers.

At the end of the day, none of this really matters all that much. But it /does/ make for some really interesting driving problems... :smile:

Uncwilly 2020-03-12 22:48

[QUOTE=EugenioBruno;539566]I'm not even sure if the qualitative gist I get - deeply TFing what's about to be FCd is better because it saves FCs and moves the wavefront faster; TFing exponents far ahead is worse because they wouldn't be tested anyway for a while, and hardware is going to evolve in the meantime - is roughly correct.[/QUOTE]You got it right.

linament 2020-03-12 22:53

1 Attachment(s)
Occasionally, when I open my browser to the colab session running the notebook provided by GPU72, I get the message in the attached image. Is this expected?

[ATTACH]21880[/ATTACH]

chalsall 2020-03-12 23:01

[QUOTE=linament;539573]Occasionally, when I open my browser to the colab session running the notebook provided by GPU72, I get the message in the attached image. Is this expected?[/QUOTE]

Hmmm... No...

The Notebook should automatically detect that a GPU is available, and use it if it is.

The only reason it should revert to CPU only is if the nvidia-smi command doesn't work correctly. Perhaps this is another delta Colab has made to their environment, although I have never seen this myself.

One thing to try when you see that... Stop the Notebook Section, and then rerun it and see if it detects the GPU on the second run.

Another thing would be to "Connect to Hosted Runtime" (drop-down menu in the upper-right-hand side) and see if it complains about not being able to attach to a GPU backend.

Edit: Sorry... I glanced at your screenshot too quickly. You /are/ running both the GPU and CPU code in that instance. I have no idea why Google thinks you're not.

EugenioBruno 2020-03-12 23:50

[QUOTE=chalsall;539570]
Basically, it comes down to a (scarce) resource management problem.
[/QUOTE]

As Aragorn would say, "You have my GTX 1650!"

:D

chalsall 2020-03-12 23:59

[QUOTE=EugenioBruno;539577]As Aragorn would say, "You have my GTX 1650!"[/QUOTE]

And it's much appreciated! :smile:

Chuck 2020-03-15 11:44

Colab paid tier first time restricted
 
This morning after my 24 hour run time expired on three GPU sessions and one P-1, I was unable to get a GPU to start a new session, and only one P-1 session was allowed to connect.

Chuck 2020-03-15 15:31

Got GPUs again
 
Later this morning I was again able to get multiple GPS sessions.

linament 2020-03-15 19:13

I don't know if this is significant. However, whenever I interrupt execution of my Colab session (such as shutting down my machine for the night) that is using the GPU72 script, it terminates with the following error message. At this point, I am only running a P-1 instance because I have used up my GPU quota for the day.

[QUOTE]20200315_185854 ( 3:45): [Work thread Mar 15 18:58] M100990271 stage 1 is 73.65% complete. Time: 344.483 sec. Exiting... Can't locate LWP/UserAgent.pm in @INC (you may need to install the LWP::UserAgent module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.26.1 /usr/local/share/perl/5.26.1 /usr/lib/x86_64-linux-gnu/perl5/5.26 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.26 /usr/share/perl/5.26 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at ./comms.pl line 32. BEGIN failed--compilation aborted at ./comms.pl line 32. Done.[/QUOTE]

chalsall 2020-03-15 19:33

[QUOTE=linament;539792]However, whenever I interrupt execution of my Colab session (such as shutting down my machine for the night) that is using the GPU72 script, it terminates with the following error message. At this point, I am only running a P-1 instance because I have used up my GPU quota for the day.[/QUOTE]

No, it's not a problem. Ungraceful, but not a problem.

The issue is the CPU Payload doesn't use the Perl LWP module, so it isn't installed. But at the end of the Notebook Section, the Comms module is called to let GPU72 know that the Section was stopped.

BTW... You don't actually need to stop your Section(s) when you're going to shut your machine down for the night. Just close your browser (and answer "Yes, I really want to leave this page") and your Session will continue working for an hour or so.

petrw1 2020-03-15 21:37

I can get 2 sessions each day but only once; late in the evening.
Otherwise as soon as I start 1 session it won't let me start another.

Interestingly, for this 1 session I can start the GPU72 session without first starting the "tunnel" session.
It gives the message "No GPU available" still but lets the CPU code run the P1.

chalsall 2020-03-15 22:09

[QUOTE=petrw1;539807]Interestingly, for this 1 session I can start the GPU72 session without first starting the "tunnel" session.[/QUOTE]

You keep mentioning the "Tunnel" session. Are you running am Instance Root reverse-tunnel Section? Not needed (but fun for the pretty graphs and other data).

petrw1 2020-03-15 22:24

[QUOTE=chalsall;539810]You keep mentioning the "Tunnel" session. Are you running am Instance Root reverse-tunnel Section? Not needed (but fun for the pretty graphs and other data).[/QUOTE]

I didn't realize the rules changed since I started last fall.
1. Start tunnels: sshd.pl
2. Run bootstrap.pl

If step 1 is no longer required why can I not get a GPU without it?

James Heinrich 2020-03-15 22:31

[QUOTE=petrw1;539812]If step 1 is no longer required why can I not get a GPU without it?[/QUOTE]I just open [url]https://colab.research.google.com/github/chalsall/GPU72_CoLab/blob/master/gpu72_tf.ipynb[/url] plop in my NAK and click Play, and I have no trouble getting a GPU (most of the time).

chalsall 2020-03-15 22:31

[QUOTE=petrw1;539812]I didn't realize the rules changed since I started last fall. ... If step 1 is no longer required why can I not get a GPU without it?[/QUOTE]

The sshd.pl Section has /never/ been needed for the GPU72_TF Notebook. It's more of a developer's tool.

I have no idea why you're noticing that correlation. But it shouldn't be causal.

Once you're given a Session (read: Connect to a Backend) you'll have a GPU, or you won't. Running an SSH Section won't magically attach you to a GPU.

LaurV 2020-03-17 04:05

Hey Chris, I just upgraded to the new colab script yesterday, the one which uses the CPU too, and there seems to be a bug with reporting results for CPU.

First, I got a P-1 starting at 43% of Stage 1 (??). As I didn't do any P-1 before (this is new "notebook" with the ID starting with "b535..."), I assumed that you save the intermediary (full) residues from time to time, just in case colab decides to kick someone's ass unexpectedly, and then you resume next time. But passing me other's guy work (I assume you do it viceversa too?) is wrong, somehow, because assuming I can finish it, I would get the credit for it, therefore robing the person who did the first 43% of the work. You should keep an evidence and assign the continuity of work only to the user who did the first part of work too. Not that I complain too much about free resources given by Google to us...

Secondly, colab indeed kicked me off before succeeding in finishing the Stage 1 of that P-1 (last time at almost 98% :rant:). When resumed (starting new session) I am getting the same exponent, but.... starting at 43%. I am already doing this third time.

"102986021 P-1 77 46.23% Stage: 1 complete."

(the column is confuse there, it looks like the stage 1 is complete, but it is not, the message says that "46% of stage 1 is complete", you should better display as: "Stage 1 complete: 46.xx%", but this is minor, my pain in the butt is now repeating the same work over and over, to no progress. Am I doing something wrong? Do I need to use some "persistent" storage/drive on my side of colab/google_drive/whatever?).

LaurV 2020-03-17 06:48

Ok, today it seems I got a better CPU (?!?), because after 5 hours, it finished Stage 1, tried an unsuccessful GCD, ans moved to Stage 2, which is now ~5.5% done. If the instance is killed at 10 hours as expected (or before), it is clear that it won't finish and report in time.

I just backup'd the checpoint files, in case it crashes I will finish it locally, to avoid doing the same work over and over.

What's the plan B? (you see, we didn't really keep in touch with new "inventions" you did there, and most probably we are doing something wrong...)

chalsall 2020-03-17 17:04

[QUOTE=LaurV;539905]What's the plan B? (you see, we didn't really keep in touch with new "inventions" you did there, and most probably we are doing something wrong...)[/QUOTE]

OK... I'm /stupidly/ busy at the moment. Getting a company ready to work 100% remotely...

But this should all be sane; many people are using it successfully; including my seven instances running the exact same code as everyone else.

To be clear... The P-1 checkpoint files should be thrown back to the server every ten minutes during the entire run(s). If an instance dies, the last checkpoint is sent out to the next requested instance (that you own, of course).

If you PM me the exponent in question, I can examine the logs and the checkpoint files themselves.

Uncwilly 2020-03-17 19:35

[QUOTE=chalsall;539961]OK... I'm /stupidly/ busy at the moment. Getting a company ready to work 100% remotely...[/QUOTE][SIZE="3"][FONT="Lucida Sans Unicode"][COLOR="Green"][B]Bless you my son. You are doing work that is vital to keeping the world safe. It will be transparent to most people. But we here know that bits don't move by themseleves.[/B][/COLOR][/FONT][/SIZE]
:awesome:
:bow wave:

LaurV 2020-03-18 04:56

[QUOTE=chalsall;539961]But this should all be sane; <...>
If you PM me the exponent in question <...>[/QUOTE]
The exponent was in the first post. It resumed Stage 2 at ~54% [U]normally[/U] today, after last night kick off, only about 7-8 minutes of work lost (it seems as you said, checkpoint time around 10 minutes). You don't need to do anything, but if you have time, you can check the fact that we (colab) did the Stage 1 more than once from ~43% to ~9x% (assuming the reports reached your server, but they probably did, because TF was reported normally in all this time).


Edit: We manually stopped and restarted everything after some time, because we were not satisfied with the K80 we got for TF, and the P-1 Stage 2 resumed again, normally (61%). We are good here.

Uncwilly 2020-03-19 16:18

[FONT="Arial Black"][COLOR="Red"][SIZE="3"]MOD NOTE: BOINC related posts moved here:[/SIZE][/COLOR][/FONT]
[url]https://www.mersenneforum.org/showthread.php?t=25383[/url]

chalsall 2020-03-19 21:23

[QUOTE=Uncwilly;539971]Bless you my son. You are doing work that is vital to keeping the world safe. It will be transparent to most people. But we here know that bits don't move by themseleves.[/QUOTE]

That's very kind. Thank you.

And, yeah... The work that we do is not seen, nor even understood, by most people. The only time we're noticed is when things aren't working, which should be never.

bayanne 2020-03-22 05:59

For the first time this monring, I was unable to connect 2 of my 3 instances to process P1 using cpu. The script advised that this was due to usage limits. It advised that could not connect as there was no TPU [CPU] backend available.

I then changed runtime type to TPU, and was then able to connect. Must remember to switch that back to GPU when this session finishes.

kladner 2020-03-22 15:02

1 Attachment(s)
Just now (10:00 CDT), GPU72 is being tagged by Firefox as a potential security risk. While I don't believe this message, it prevents me from reaching the site.

James Heinrich 2020-03-22 15:06

[QUOTE=kladner;540484]Just now (10:00 CDT), GPU72 is being tagged by Firefox as a potential security risk. While I don't believe this message, it prevents me from reaching the site.[/QUOTE]It works fine here (5 mins later) in Firefox...?

chalsall 2020-03-22 15:17

[QUOTE=kladner;540484]Just now (10:00 CDT), GPU72 is being tagged by Firefox as a potential security risk. While I don't believe this message, it prevents me from reaching the site.[/QUOTE]

Huh??? That's really weird. Is anyone else seeing this?

The SSL cert doesn't expire for another two months.

Edit: Just a thought... Did you try to go to the https:// address, or the http:// address?

LaurV 2020-03-22 17:35

fine from here, no issue

kracker 2020-03-22 18:52

[QUOTE=chalsall;540486]Huh??? That's really weird. Is anyone else seeing this?

The SSL cert doesn't expire for another two months.

Edit: Just a thought... Did you try to go to the https:// address, or the http:// address?[/QUOTE]

Working fine here.

kladner 2020-03-22 20:30

[QUOTE=chalsall;540486]Huh??? That's really weird. Is anyone else seeing this?

The SSL cert doesn't expire for another two months.

Edit: Just a thought... Did you try to go to the https:// address, or the http:// address?[/QUOTE]
It's working now. I actually had the Assignments and Overall Statistics https:// pages up in tabs. They went out when I refreshed the pages. Firefox seems to be pretty aggressive sometimes and I expect false alarms. On a few occasions IE and Chrome (when I had it) agreed that something was amiss.

mrk74 2020-03-22 20:39

Total noob (at least I admit it!) question: I'm using Colab doing TF. I set it to do whatever needed done. The TF Level is only doing 74 to 75. Is that because I set it to do whatever needed done and that's what is needed?


Also it looks like there was a code revision while I was running and it says "Unsaved changes since X:XXpm. Is there something I can do about that?

chalsall 2020-03-22 20:49

[QUOTE=mrk74;540527]Total noob (at least I admit it!) question: I'm using Colab doing TF. I set it to do whatever needed done. The TF Level is only doing 74 to 75. Is that because I set it to do whatever needed done and that's what is needed?[/QUOTE]

Yup, at the moment we're getting ready to start releasing work for the P-1'ers, so many candidates are currently being worked in 99M. It's a constant balancing act; what's being worked can change day-by-day, depending on the expected "hunger" of the various wavefronts.

[QUOTE=mrk74;540527]Also it looks like there was a code revision while I was running and it says "Unsaved changes since X:XXpm. Is there something I can do about that?[/QUOTE]

Once the code is running, I can't make any changes to it. But Google's Colab might be telling you that you haven't saved your Notebook. Try going to the Colab "File" menu, and select "Save".

BTW... Thanks for trying out the Colab TF'ing thing. When you have a chance, go to your GPU72 [URL="https://www.gpu72.com/account/settings/"]Account Settings[/URL], and put in your Primenet Username (not display name). Then the results will be auto-submitted, and you'll also be given P-1'ing work to do in parallel on the CPU.

James Heinrich 2020-03-22 20:49

[QUOTE=mrk74;540527]The TF Level is only doing 74 to 75. Is that because I set it to do whatever needed done and that's what is needed?[/quote]Exactly. Expect to see varying exponent ranges and bit levels depending on what's most needed at that moment.

[QUOTE=mrk74;540527]it says "Unsaved changes since X:XXpm. Is there something I can do about that?[/QUOTE]Ignore that, it's normal to see that and you don't need to save anything.

mrk74 2020-03-22 22:35

[QUOTE=chalsall;540529]
BTW... Thanks for trying out the Colab TF'ing thing. When you have a chance, go to your GPU72 [URL="https://www.gpu72.com/account/settings/"]Account Settings[/URL], and put in your Primenet Username (not display name). Then the results will be auto-submitted, and you'll also be given P-1'ing work to do in parallel on the CPU.[/QUOTE]


Done! Thanks for the info!

Chuck 2020-03-23 00:13

[QUOTE=chalsall;540529]Yup, at the moment we're getting ready to start releasing work for the P-1'ers, so many candidates are currently being worked in 99M. It's a constant balancing act; what's being worked can change day-by-day, depending on the expected "hunger" of the various wavefronts.
[/QUOTE]

Is it recommended I continue Depth First with Colab or should I switch to Let GPU72 Decide?

James Heinrich 2020-03-23 00:23

[QUOTE=Chuck;540556]Is it recommended ... Let GPU72 Decide?[/QUOTE]It would seem (to me) to be self-evident that GPU72 would recommend that you do what GPU72 recommends you do. :smile:

Chuck 2020-03-23 00:26

[QUOTE=James Heinrich;540557]It would seem (to me) to be self-evident that GPU72 would recommend that you do what GPU72 recommends you do. :smile:[/QUOTE]

Sometime back Chris recommended I do Depth First.

chalsall 2020-03-23 00:41

[QUOTE=Chuck;540558]Sometime back Chris recommended I do Depth First.[/QUOTE]

Actually, if you could keep doing what you're doing, that would be great.

Because you're running the Colab "Paid tier", your instances are a reliable engine to take candidates up to 77 bits which have had P-1 done by the Cat 3/4 "churners". These are then immediately handed out by Primenet's assignment priority algorithms (when not, of course, factored by you).

petrw1 2020-03-23 01:34

I have consistency now.
 
I have 2 colab sessions.

Once every 24 hours I can start both; both will get a GPU and a CPU.
They run for 4 to 8 hours, then both stop.
I'm not sure if it stops after a limited number of user hours or a specific time of day or when they run out of GPUs and want to give them to others.

Anyway, I can then run 1 session. It will not find a GPU but will happily run CPU only.
This session runs for about 12 hours.

RINSE … REPEAT Daily

mrk74 2020-03-24 13:48

I've got 3 unfinished assignments because of collab timing out. Is there a way I can go back and finish those assignments? One was at 97% before it stopped. One is a TF and the other 2 are P-1.

James Heinrich 2020-03-24 13:52

[QUOTE=mrk74;540745]I've got 3 unfinished assignments because of collab timing out. Is there a way I can go back and finish those assignments? One was at 97% before it stopped. One is a TF and the other 2 are P-1.[/QUOTE]They should just resume on your next session. Are they not?

mrk74 2020-03-24 15:15

[QUOTE=James Heinrich;540747]They should just resume on your next session. Are they not?[/QUOTE]
Nope. I was just letting it run overnight and it stopped for whatever reason. It was doing a P-1 of M103952483 stage 1 if I remember right. When I restarted it it went to M103975037 stage 2 that was already 22% done. Could it have something to do with me refreshing the whole page and having to put in my access key again possibly?


Edit: It stopped after a couple hours. Refresh. It goes BACK to 103952483. Odd...

petrw1 2020-03-24 15:49

[QUOTE=mrk74;540754]Nope. I was just letting it run overnight and it stopped for whatever reason. It was doing a P-1 of M103952483 stage 1 if I remember right. When I restarted it it went to M103975037 stage 2 that was already 22% done. Could it have something to do with me refreshing the whole page and having to put in my access key again possibly?[/QUOTE]

Mine have always finished in later sessions.
But not always in order...sometimes it starts a new one and then completes a partial

linament 2020-03-24 23:21

View Assignments Page
 
Just noticed when I was looking at the View Assignments page for GPU72. All of my manual assignments now have a CPU named: Colab_MAH. Also, the percent complete for all of my manual assignments are 1: 51.66% (TF and P-1). Not a big deal, but I thought I would let you know.

chalsall 2020-03-24 23:36

[QUOTE=linament;540806]Not a big deal, but I thought I would let you know.[/QUOTE]

Ah... Thanks. Stupid Programmer Error -- I thought I had trapped for that. Fixed.

LaurV 2020-03-25 10:26

Hey Chris, can you spin a "colab toy" that launches [U]two[/U] copies of mfaktc when a K80 is detected? I am pretty sure we are only using half of it on colab. Of course, this has yet to be tested, but it seems that for P4 and P100 we get about 95%-110% of the theoretical performance (probably explainable by the fact that their clocks are not standard), while for T4 and K80 we get less. One colab T4 only gives us about 65% of the theoretical performance, while K80 is capped at about 45%. This also matches James' tables (well.. somehow). I don't know the issue with T4 (it may be indeed running underclocked in colab's servers, or something else may have taking place which we don't know) but for K80 one explanation may be the "dual chip". So, I assume we only use half of it (or only half is made available by colab?). Could you try to play with it? Two folders, "-d 0", "-d 1", whatever (I don't know how that goes under linux). Two instances to try would be interesting. In the worst case, we can get half of the speed in each instance, and we learn that more is not possible... But in the best case we may get few percents of GHzDays/Day more (up to 100% in the best case).

kriesel 2020-03-25 13:52

[QUOTE=LaurV;540839]I assume we only use half of it (or only half is made available by colab?).[/QUOTE]Running nvidia-smi in Colab (in a script not using challsall's code) shows only one gpu device available. This is true regardless of gpu model. I think that it was established early on, that only one half of the physical dual-gpu card was made available by Colab in the VM, just as only one cpu core (with HT) is. The following nvidia-smi output is obtained about 12 seconds after the mfaktc run is launched as a background process, to give it time to get going and show power and memory utilization from a run in progress.

K80 before background process launch, during Colab script startup:
[CODE]Mon Feb 17 14:56:29 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.48.02 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 68C P8 33W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+[/CODE]Gpuowl P-1 run on K80 (might have still been ramping up, note gpu utilization 0 indicated):[CODE]+-----------------------------------------------------------------------------+
Sun Mar 15 19:04:07 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 50C P0 69W / 149W | 69MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+[/CODE]For comparison, gpuowl P-1 runs on other models:[CODE]Fri Mar 13 08:51:49 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:00:04.0 Off | 0 |
| N/A 38C P0 39W / 75W | 1111MiB / 7611MiB | 67% Default |
+-------------------------------+----------------------+----------------------+[/CODE][CODE]Mon Mar 23 16:57:09 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 |
| N/A 40C P0 148W / 250W | 16183MiB / 16280MiB | 99% Default |
+-------------------------------+----------------------+----------------------+[/CODE][CODE]Sat Feb 29 19:47:48 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.48.02 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 48C P0 64W / 70W | 2517MiB / 15079MiB | 100% Default |
+-------------------------------+----------------------+----------------------+[/CODE]I'll increase the 12 seconds and see what shows up in the logs. It could take weeks to get another try on a K80. Script is a version of the first attachment at [URL]https://www.mersenneforum.org/showpost.php?p=537155&postcount=16[/URL]


edit: on a different account, that had already been running with sleep 18 seconds, found one instance of this for gpuowl P-1 at ~97M. Judging by the memory usage that is P-1 stage 2:[CODE]Tue Mar 10 13:43:36 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 42C P0 147W / 149W | 11406MiB / 11441MiB | 100% Default |
+-------------------------------+----------------------+----------------------+[/CODE]

mrk74 2020-03-27 18:49

I haven't gotten GPU in a few days. I've had to use TPU. That connects pretty much right away every time.

chalsall 2020-03-27 19:03

[QUOTE=mrk74;541087]I haven't gotten GPU in a few days. I've had to use TPU. That connects pretty much right away every time.[/QUOTE]

I've been consistently getting a GPU once per day across each of my eight (8#) front ends (added another one to one of my VPN'ed virtual humans as a test), each lasting between 7 and 7.5 hours.

I've found that Colab seems to settle on this kind of allotment within a day or two. Interestingly, each front end is given a GPU at approximately the same time of the day for each individual (Gmail) account.

Further, I've found that when I'm given a GPU, if I get a K80 or a P4 I can do a "Factory Reset" and after two to five attempts, I will be given a T4 or a P100.

Uncwilly 2020-03-27 21:24

For the GPU72 implementation I keep getting sessions that want to do P-1 on the same exponent at the same time. I noticed this several times. Today were working on 100982867 at the same time.

chalsall 2020-03-27 21:45

[QUOTE=Uncwilly;541100]Today were working on 100982867 at the same time.[/QUOTE]

Hmmm... This should really be over on the GPU72 Status thread, but...

I see this candidate was assigned to you at 19:15 and then again at 21:06 (UTC). The checkpoint file issued was the same for both. Did the first run actually run for more than ten minutes? I see the second run only lasted for about 22 minutes.

Please let me know. If it did actually run, the only explanation would be that the "apt install cron" didn't "take", and thus the "cpoints.pl" script wasn't being launched by the crontab entry.

And, of course, I've never seen this before myself... Has anyone else noticed this kind of behavior? The code hasn't changed for a couple of weeks (not that that necessarily means it's entirely sane).

James Heinrich 2020-03-27 22:06

[QUOTE=chalsall;541103]Has anyone else noticed this kind of behavior?[/QUOTE]I haven't, but LaurV reported something similar 10 days ago:[QUOTE=LaurV;539899]colab indeed kicked me off before succeeding in finishing the Stage 1 of that P-1 (last time at almost 98% :rant:). When resumed (starting new session) I am getting the same exponent, but.... starting at 43%. I am already doing this third time.[/QUOTE]

Uncwilly 2020-03-27 22:21

[QUOTE=chalsall;541103]Hmmm... This should really be over on the GPU72 Status thread, but...[/quote]For some reason I didn't seem to find the right one. I will look later and move the posts.

[quote]I see this candidate was assigned to you at 19:15 and then again at 21:06 (UTC). The checkpoint file issued was the same for both. Did the first run actually run for more than ten minutes? I see the second run only lasted for about 22 minutes.

Please let me know. If it did actually run, the only explanation would be that the "apt install cron" didn't "take", and thus the "cpoints.pl" script wasn't being launched by the crontab entry.

And, of course, I've never seen this before myself... Has anyone else noticed this kind of behavior? The code hasn't changed for a couple of weeks (not that that necessarily means it's entirely sane).[/QUOTE]I killed the run that started second. The other one is still up.
[CODE]20200327_221830 ( 3:11): 100982867 P-1 77 65.02% Stage: 2 complete. Time: 411.603 sec.[/CODE] I have seen this happen at least 2 times before.

petrw1 2020-03-27 22:41

[QUOTE=chalsall;541088]Further, I've found that when I'm given a GPU, if I get a K80 or a P4 I can do a "Factory Reset" and after two to five attempts, I will be given a T4 or a P100.[/QUOTE]

5 re-starts in a row P4.
Bad luck ... or do I need to wait a few minutes between restarts?

chalsall 2020-03-27 22:47

[QUOTE=Uncwilly;541105]The other one is still up.
[CODE]20200327_221830 ( 3:11): 100982867 P-1 77 65.02% Stage: 2 complete. Time: 411.603 sec.[/CODE] I have seen this happen at least 2 times before.[/QUOTE]

OK, thanks very much for the report. I am ***not*** seeing any CP files from the first instance, which can only be explained by the cron sub-system not being installed.

I'll look at making this more resilient. Perhaps have the Checkpointer script also be launched from the CPU Payload script, as well as collect some debugging information as to weither the apt install actually works.

Interesting... This is new(ish) behaviour. And/or an extremely rare edge-case.

chalsall 2020-03-27 22:49

[QUOTE=petrw1;541106]5 re-starts in a row P4. Bad luck ... or do I need to wait a few minutes between restarts?[/QUOTE]

I've /never/ not gotten at least a P100 after five attempts immediately after each other. But perhaps try waiting an hour and then do the Factory Reset thing.

Uncwilly 2020-03-29 00:45

[QUOTE=chalsall;541107]OK, thanks very much for the report. I am ***not*** seeing any CP files from the first instance, which can only be explained by the cron sub-system not being installed.[/QUOTE]I have 2 that are running that same exponent again. The second started ~35.6% again, just like the first, same as yesterday. It is at 88% and I will be going offline with this laptop before it hits 100. At this rate, it will never get done during a single session.
:groan:
Groundhog day, yet again.

chalsall 2020-03-29 01:01

[QUOTE=Uncwilly;541214]:groan: Groundhog day, yet again.[/QUOTE]

Weird. Sorry about this. I'll have some cycles to drill-down on this tomorrow.

Uncwilly 2020-03-29 03:06

I was able to stay connected long enough for it to complete. It shows as credited.

chalsall 2020-04-01 15:25

[QUOTE=chalsall;541107]OK, thanks very much for the report. I am ***not*** seeing any CP files from the first instance, which can only be explained by the cron sub-system not being installed.[/QUOTE]

OK... I've *finally* seen an example of this happening with one of my own instances. And, as inferred, it is because of the cron sub-system not being installed.[CODE]
20200331_145125: DEEP: AptA: Reading package lists...
Building dependency tree...
Reading state information...
Suggested packages:
anacron logrotate checksecurity exim4 | postfix | mail-transport-agent
The following NEW packages will be installed:
cron
0 upgraded, 1 newly installed, 0 to remove and 25 not upgraded.
Need to get 68.8 kB of archives.
After this operation, 253 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 cron amd64 3.0pl1-128.1ubuntu1 [68.8 kB]
Fetched 68.8 kB in 1s (108 kB/s)
dpkg: error: cannot access archive '/var/cache/apt/archives/cron_3.0pl1-128.1ubuntu1_amd64.deb': No such file or directory[/CODE]

Weirdly, I have *never* had this problem with my SSH Reverse Tunnels package, which uses the exact same command to install cron.

Does anyone who regularly works with Ubuntu understand what's going on here? It looks like the package was downloaded successfully, but drilling down on the filesystem shows the package is in fact not where it should be.

Now, to figure out how to recover from this rare edge case... Just retrying the install may not be enough, since both the CPUWrapper script and the CPUPayload both attempt to install and launch cron.

PhilF 2020-04-01 17:13

If you are using aptitude or apt-get to install packages, be sure to do:

aptitude update

or

apt-get update

Before installing any packages.

chalsall 2020-04-01 18:04

[QUOTE=PhilF;541501]If you are using aptitude or apt-get to install packages, be sure to do:[/QUOTE]

Thank you!!! That almost certainly explains the issue.

ixfd64 2020-04-04 19:42

I currently have 300 assignments from GPU to 72 that I ran on two work computers. They should have finished a few weeks ago, but I'm stuck at home due to the COVID-19 pandemic and couldn't submit the results. Complicating matters don't have remote access to these systems. Is there a limit on how long assignments can be extended?

chalsall 2020-04-04 19:51

[QUOTE=ixfd64;541791]Is there a limit on how long assignments can be extended?[/QUOTE]

How does a year sound? :smile:

Assignments aren't actually expired any longer, except for cases where the user has clearly abandoned the work.

I'd much rather you stay safe than have a few results be submitted!

Chuck 2020-04-10 00:36

Current Trial Factoring Depth for all Candidates
 
On the "Current Trial Factoring Depth for all Candidates" report page, does factoring beyond 77 bits get added in to the 77 column?


All times are UTC. The time now is 06:41.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.