mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU to 72 (https://www.mersenneforum.org/forumdisplay.php?f=95)
-   -   GPU to 72 status... (https://www.mersenneforum.org/showthread.php?t=16263)

James Heinrich 2020-01-06 01:30

[QUOTE=petrw1;534346]As I understand The Math...
PrimeNet has already taken all exponents up to 190M to the prescribed limits[/QUOTE]Remember those limits are at [i]least[/i] 10+ years old and derived from the before-time, before GPU-TF existed. Those numbers are no longer accurate (by a large margin).
Note on my [url=https://www.mersenne.ca/graphs/factor_bits_1000M/]graphs[/url], the red curved line is the old PrimeNet TF limits as per the Math page, the purple line is PrimeNet+3 bits (old GPU72 target), the greenish line is PrimeNet+5 bits (new GPU72 target).

Prime95 2020-01-06 03:45

[QUOTE=chalsall;534344]George: your thoughts on specifically this suggestion?[/QUOTE]

I think I can do that -- if I could log on to the server.

chalsall 2020-01-06 03:59

[QUOTE=Prime95;534358]I think I can do that -- if I could log on to the server.[/QUOTE]

A fairly trivial delta. Do you not have support staff? :wink:

P.S. I find I have to be *very* careful in what I say in some spaces. The immediate above was meant to be both funny, and serious, at the same time.

Prime95 2020-01-06 04:07

[QUOTE=chalsall;534359] Do you not have support staff? :wink:[/QUOTE]

More unpaid volunteers :wink:

Prime95 2020-01-07 05:50

[QUOTE=chalsall;534312]One thing which might be helpful is if Primenet would assign work sorted by TF level desc (as with P-1, grouped by 1M range). That way GPU72 workers could focus on the high end of each range, as the wavefronts race towards us[/QUOTE]

I think this is done.

It may operate more slowly as I used to have an index that matched the sort order, now I think SQLServer will need to scan the available exponents in a 1M range.

Let me know if I screwed up.

storm5510 2020-01-07 12:56

1 Attachment(s)
I periodically check my assignments on the GPU72 web site. Today, I found five which were close to expiration. When I click on each number, another page appears from [I]mersenne.org [/I]which shows I have completed the assignment. See the attached image below.

The remainder, which are not highlighted, I have queued in [I]mfaktc[/I] to get them done. This includes one Colaboratory assignment. One of my instances completed a single exponent this past Sunday, then stopped part way through a second. I need to look at my assignments page much more often.

chalsall 2020-01-07 13:26

[QUOTE=Prime95;534464]I think this is done.[/QUOTE]

OK, thanks George.

I'll take a look at start bringing in some appopriate 10xM ranges for us to work.

chalsall 2020-01-07 14:15

[QUOTE=storm5510;534485]I periodically check my assignments on the GPU72 web site. Today, I found five which were close to expiration. When I click on each number, another page appears from [I]mersenne.org [/I]which shows I have completed the assignment. See the attached image below.[/QUOTE]

Hmmmm... I'll drill down on this later today.

chalsall 2020-01-07 17:13

[QUOTE=Prime95;534464]Let me know if I screwed up.[/QUOTE]

Sorry to say, it looks like something "bad" has gone down...

My spiders are seeing unusual time-outs. And according to the Primenet assignment reports very, very few new assignments are being given.

chalsall 2020-01-07 17:19

[QUOTE=storm5510;534485]Today, I found five which were close to expiration. When I click on each number, another page appears from [I]mersenne.org [/I]which shows I have completed the assignment.[/QUOTE]

OK... You were assigned these to take to 77, but you only took them up to 74.

I just reset those assignments to be to 74; they're now credited to your account.

chalsall 2020-01-07 23:52

[QUOTE=James Heinrich;534348]Note on my [url=https://www.mersenne.ca/graphs/factor_bits_1000M/]graphs[/url], the red curved line is the old PrimeNet TF limits as per the Math page, the purple line is PrimeNet+3 bits (old GPU72 target), the greenish line is PrimeNet+5 bits (new GPU72 target).[/QUOTE]

James... I was drilling down on your [URL="https://www.mersenne.ca/graphs/"]various graphs linked here[/URL], and I noticed that a couple of them don't appear to have the red/green/blue curved lines correctly rendered.

[URL="https://www.mersenne.ca/graphs/factor_bits_100M/factor_bits_100M_20200107.png"]This graph (zoomed into the first 192M ranges)[/URL] is the one I'm most interested in.

Thanks.

storm5510 2020-01-07 23:59

[QUOTE=chalsall;534513]OK... You were assigned these to take to 77, but you only took them up to 74....[/QUOTE]

I noticed this after I made the post. The thing is, I never requested 77 bits. Before this, I was getting 75 when you rolled everything over after the 74's were all done.

James Heinrich 2020-01-08 03:34

[QUOTE=chalsall;534546]James... I was drilling down on your [URL="https://www.mersenne.ca/graphs/"]various graphs linked here[/URL], and I noticed that a couple of them don't appear to have the red/green/blue curved lines correctly rendered.[/QUOTE]The (factor_bits, 1000M, 10G) variants are correct I believe, the (384M, 100M) variants are incorrect. I thought it was a simple graphing error since the latter two pull data from a finer-grained table, but it seems I'm actually getting different data. For example, at 100M I have average NF bitlevels of 69.27 and 72.62 in the two data tables. Obviously at most only one of these is correct but I'll need to do some further digging into the code that compiles said data to find where the problem is. I'll post back (tomorrow, if things go well) when I've found and fixed the problem. Thanks for pointing this out.

James Heinrich 2020-01-08 20:36

After much self-confusion (6 hours of staring at numbers and wondering why [i]a[/i] != [i]b[/i], and what kind of dummy set the data up that way), the tldr is that everything on the graphs were in the right place except the default-TF-level (+0, +3, +5) lines on the 100M and 384M variants: on those the default TF level lines were plotted for exponents an order of magnitude larger (the TF level at 10M was calculated as 100M, etc).

So I'm right now regenerating ~5000 graphs from the last 8 years. Just like I did about this time last year when I found a different problem. :davieddy: :blush:
At ~3.5s per graph this should take about 5 hours...

chalsall 2020-01-08 21:25

[QUOTE=James Heinrich;534610]After much self-confusion...[/QUOTE]

Dude. Been there. Done that. Repeatedly... :wink:

Thanks, much! :tu:

kladner 2020-01-09 06:21

[QUOTE=storm5510;534548]I noticed this after I made the post. The thing is, I never requested 77 bits. Before this, I was getting 75 when you rolled everything over after the 74's were all done.[/QUOTE]
I have on occasion forgotten to reset from the default bit level, which I think is 77.

James Heinrich 2020-01-10 04:29

[QUOTE=James Heinrich;534610]So I'm right now regenerating ~5000 graphs from the last 8 years. Just like I did about this time last year when I found a different problem. :davieddy: :blush:
At ~3.5s per graph this should take about 5 hours...[/QUOTE]So 5 is really 32... I found some minor database inconsistencies that interrupted the regeneration run several times until I fixed the data, but finally all the graphs have been rebuilt:
[url]https://www.mersenne.ca/graphs/factor_bits_100M/[/url]


Of course, now I find that today's graph is drawing weird.... what did I mess up now? :bangheadonwall:
edit: Stupid Programmer Error has been fixed (the error, not the programmer, he's still broken :cmd:)

chalsall 2020-01-10 17:27

[QUOTE=James Heinrich;534745]Stupid Programmer Error has been fixed (the error, not the programmer, he's still broken :cmd:)[/QUOTE]

ROFL!!! Thanks mate!

It's really sad that we're not going to be able to continue following the green line. Oh, well... Looking forward to the next MP in the next year or so! :tu:

chalsall 2020-01-10 17:35

[QUOTE=chalsall;534312]...and over the next couple of days I'll bring in some 10xMs to bring up as best we can.[/QUOTE]

OK, just so everyone knows, I've brought in a couple of thousand candidates in 100M to take up to 74 bits. These are in 100.[89]M -- the high end of the range to take advantage of George's new sort TF desc assignment clause.

This is in preparation for Cat 2 entering 100M. I'm still wrapping my head around how to deal with Cat 3 and 4 -- they are climbing so fast that it's going to be a bit tricky figuring out how best to "feed" them 74-bit candidates. Probably best to just do a few hundred at the top of each range, and wait for the wavefronts to scream through.

The good news is there isn't actually a whole lot of Cat 3 and 4 being assigned, and many of them will expire. They are climbing so fast simply because of the speed Cat 0, 1 and 2 are being done.

Truly, WOW!!!

As always, feedback welcome. As Red Green says, "We're all in this together.

kriesel 2020-01-10 18:51

[QUOTE=James Heinrich;534745]So 5 is really 32... [/QUOTE]Well, 32 mod 24 is 7 which is close to 5. Very close for an IT estimate.[QUOTE]
edit: Stupid Programmer Error has been fixed (the error, not the programmer, he's still broken :cmd:)[/QUOTE]Those of us who dare to try to program can relate. Thanks for the laugh.

c10ck3r 2020-01-10 19:09

[QUOTE=kriesel;534787]Well, 32 mod 24 is 7 which is close to 5. ...[/QUOTE]
Is it?

chalsall 2020-01-10 19:28

[QUOTE=kriesel;534787]Well, 32 mod 24 is 7 which is close to 5. Very close for an IT estimate.Those of us who dare to try to program can relate. Thanks for the laugh.[/QUOTE]

I read that as f(x) = 2^x.

Yes, thanks for sharing. So few understand the kind of work we do... :wink:

petrw1 2020-01-10 19:41

[QUOTE=chalsall;534786]

The good news is there isn't actually a whole lot of Cat 3 and 4 being assigned, and many of them will expire. They are climbing so fast simply because of the speed Cat 0, 1 and 2 are being done..[/QUOTE]

IMHO ...as summarized above... most expire anyway so focus on Cat 0,1...maybe 2..

PhilF 2020-01-10 22:14

I don't know if "GPU Factoring" is participating in GPU to 72 or not, but considering the desire to try to stay ahead of the wave front why would someone want to trial factor an exponent that has had P-1 done, and on top of that has a PRP test completed?

[url]https://www.mersenne.org/report_exponent/?exp_lo=100832383&full=1[/url]

James Heinrich 2020-01-10 23:12

[QUOTE=PhilF;534811]I don't know if "GPU Factoring" is participating in GPU to 72 or not, but considering the desire to try to stay ahead of the wave front why would someone want to trial factor an exponent that has had P-1 done, and on top of that has a PRP test completed?[/QUOTE]As I recall, "GPU Factoring" is the PrimeNet username of the GPU72 spider that grabs work. It's always reserved as Trial Factoring on PrimeNet, the actual work performed by the GPU72 user it's assigned to may or may not be TF.

In the case of [M]100832383[/M], I'm not sure if the GPU72 spider pays attention to the fact that a PRP has already been done, even though the 100M exponent has only been TF'd to abnormally-low 2[sup]72[/sup] (an illustration of the TF shortfall we're experiencing).

PhilF 2020-01-10 23:31

[QUOTE=James Heinrich;534821]As I recall, "GPU Factoring" is the PrimeNet username of the GPU72 spider that grabs work. It's always reserved as Trial Factoring on PrimeNet, the actual work performed by the GPU72 user it's assigned to may or may not be TF.

In the case of [M]100832383[/M], I'm not sure if the GPU72 spider pays attention to the fact that a PRP has already been done, even though the 100M exponent has only been TF'd to abnormally-low 2[sup]72[/sup] (an illustration of the TF shortfall we're experiencing).[/QUOTE]

Maybe it should pay attention to that, because in my view any work other than a PRP double-check would be a waste of resources at this point in time.

chalsall 2020-01-10 23:52

[QUOTE=James Heinrich;534821]As I recall, "GPU Factoring" is the PrimeNet username of the GPU72 spider that grabs work. It's always reserved as Trial Factoring on PrimeNet, the actual work performed by the GPU72 user it's assigned to may or may not be TF.[/QUOTE]

Guys, please forgive me for this, but fuck me!

This situation has been unexpected, and I'm doing the best I can. Mistakes might happen. Sorry about that.

It takes more than a little bit of my time to manage this. All unpaid.

My deepest appoliges for any mistakes I make.

PhilF 2020-01-11 00:27

[QUOTE=chalsall;534826]Guys, please forgive me for this, but fuck me!

This situation has been unexpected, and I'm doing the best I can. Mistakes might happen. Sorry about that.

It takes more than a little bit of my time to manage this. All unpaid.

My deepest appoliges for any mistakes I make.[/QUOTE]

Oh I hope you didn't take me wrong. I was just pointing out that if someone was reserving exponents via an errant script (I didn't know who GPU Factoring was) that they would want to know.

We've all been there, done that, and got the tee shirt.

chalsall 2020-01-11 01:59

[QUOTE=PhilF;534830]We've all been there, done that, and got the tee shirt.[/QUOTE]

Sorry. I'm currently under attack from all fronts. I actually find it somewhat amusing.

To share, I was once ~85 feet underwater when my regulator failed. I followed the training and breathed out through the regulator, and only got water back.

I remember very clearly thinking "Oh, so this is how I'm going to die".

And, then, the additional training kicked in: You have an "octopus". You have a secondary air supply immediately available. I used that and survived.

My "buddy" still to this day says "I would have noticed".

At the end of the day, we all stand alone. Be comfortable with, and manage, that.

kladner 2020-01-11 07:52

I rely on, and greatly appreciate what you do and the results thereof. :cool:
Many thanks. That doesn't pay the bills, but it is the currency I have immediately at hand. :wink:

kriesel 2020-01-11 12:14

[QUOTE=c10ck3r;534790]Is it?[/QUOTE]Ack. And all primes are odd. Give or take 1. (What happens when I get 4 hours sleep.)

chalsall 2020-01-11 17:50

[QUOTE=kladner;534849]Many thanks. That doesn't pay the bills, but it is the currency I have immediately at hand. :wink:[/QUOTE]

Thanks guys. And sorry -- yesterday was a /very/ stressful day...

I finally got off my butt, and implemented a parallel spider which looks at PRP in addition to LL state. Previously GPU72 was not aware of candidates which had a PRP done instead of an LL.

This means the [URL="https://www.gpu72.com/reports/current_level/"]Current Levels[/URL] report is once again sane, and I won't accidentally bring in candidates to TF which are really a DC.

kladner 2020-01-16 03:45

Can I assume that 1st time LL TF is still the most helpful at the moment?

Also, I ended up sticking another 16GB of RAM in the new machine, 32 total. It's doing pretty well now at DC, but is there a current need for P-1?

chalsall 2020-01-16 04:15

[QUOTE=kladner;535196]Can I assume that 1st time LL TF is still the most helpful at the moment? ... It's doing pretty well now at DC, but is there a current need for P-1?[/QUOTE]

Yes, please. LLTF; ideally WMS to at least 74 or LG72D.

And, no. We're _good_ for P-1'ing (at least for Cat 0 through to 2 (currently); Cat 3 and 4 are pointless to chase with P-1'ing).

kladner 2020-01-16 15:05

Thanks Chris. While I thought of P-1 when I upped the RAM, that was not the reason. I was following Mackerel's suggestion that filling the other two slots carries the same benefits as dual rank RAM. This turns out to be the case. LL performance went up about 30% as Mack predicted. CPU still outruns the RAM in the 4GHz+ range.

AT 4GHz a single worker with 6 cores is getting about 2.47ms/it. Faster CPU makes exactly No difference. Same is true at lower speeds, but I am compromising for the sake of other tasks which will be added later. I suppose I could save some electricity for now by running at 3200 MHz, which seemed to be about optimal, with the same throughput as all higher frequencies.


EDIT: Wrong! Running at 3200MHz is slower. I guess it was in other configurations that I encountered the same performance at different CPU speeds.

chalsall 2020-01-18 14:58

SPE error on GHzDays saved graphs...
 
Just so everyone knows, srow7 pointed out to me that the GHzDays Saved graphs weren't showing the factors found for work done in the 10xM ranges. This has now been fixed.

Just in case anyone stopped doing work because of this...

chalsall 2020-01-19 13:15

DNS issues...
 
Just a heads up...

My DNS provider, OpenSRS, is having some issues with their servers at the moment; this is intermittent. They're working the issue now, but if you see any "can't resolve" messages, this is the reason.

kracker 2020-01-30 02:45

1 Attachment(s)
Noticed a weird issue(which may be my fault) - I run P95, running P-1 through the gpu72 proxy... I've noticed that the assignments that P95 is running are reserved - almost all of them(haven't checked) seem to be reserved by primenet by someone else... I've stopped all my computers that are running P-1 through gpu72 for the time being.

EDIT: example: [url]https://www.mersenne.org/report_exponent/default.php?exp_lo=92310481&full=1[/url]
EDIT2: I noticed a lot of them are reserved by "alelele" - is that a gpu72 spider?

chalsall 2020-01-30 13:32

[QUOTE=kracker;536228]I've noticed that the assignments that P95 is running are reserved - almost all of them(haven't checked) seem to be reserved by primenet by someone else... I noticed a lot of them are reserved by "alelele" - is that a gpu72 spider?[/QUOTE]

Strategic clearing out of abandoned assignments.

These are going to be recycled in mid-February, so your machines which work though the proxy (and have a reliable and predictable production rate) and mine have been clearing out as many as we can before then. Actually, next week I was going to stop giving your machines work to ensure all the assignments were completed before being recycled.

kracker 2020-01-30 15:59

[QUOTE=chalsall;536240]Strategic clearing out of abandoned assignments.

These are going to be recycled in mid-February, so your machines which work though the proxy (and have a reliable and predictable production rate) and mine have been clearing out as many as we can before then. Actually, next week I was going to stop giving your machines work to ensure all the assignments were completed before being recycled.[/QUOTE]

Ahh, I see. That makes sense - thanks for clearing that up!

kriesel 2020-01-31 06:02

[QUOTE=chalsall;534304]Yeah... We really need to discuss this as a team, and figure out what's the best thing to do.

Basically, because of a certain individual, GIMPS LL throughput has more than tripled in the last three months (!). This is ***amazingly*** cool news! Thanks Ben!

However, this has completely messed with the goal of having all LL assignments "optionally" TF'ed, and P-1'ed (ideally "well", by P-1'ing "specialists").

Currently, it is "optimal" to TF to 77 "bits", but with the current TF'ing "firepower" we're only producing about 50 a day; to stay "steady-state" we would need to produce 900 a day!

We do have ~60,000 candidates already ready for LL assignment, but that's only going to last us two months. What I've currently got GPU72 doing is giving out work in such a way as we "chase" ahead of the Cat 2 assignments such that they are optimally TF'ed and P-1'ed. However, it won't take long until Cat 2 also gets into the 10xM ranges.

Then, I don't know... Should we start releasing at 75, hoping to occasionally get to 76?

And/or, should we bring in work in the 10xM ranges, and start bringing them up (many are still only at 72 bits).

I would really welcome suggestions as to what people want to see happen/thinks makes sense.

And, as always (but particularly now), if anyone has any GPU compute they could bring to bear, it would be much appreciated!

Thoughts?[/QUOTE]Find a way to make more of the 95M-100M TF available to ordinary GIMPSters that are not GPU72 participants. I'm generally relegated to 100M+ and putting recently ~3ThzD/day into it, despite requesting lowest exponents from the manual assignments page. Concentrate the firepower close in front of the primality wavefront. Any significant TF work on exponents greater than ~1.2x the primality testing wavefront leading edge is kind of wasted, other than software testing and benchmarking.

I'm working through the following on my little fleet to help out a bit:

1) Thoroughly tuning mfaktc and upgrading to 2047Mib-capable-gpusievesize and tuning again for that on most of my gpus; squeezing out up to 10% more from existing gear, and almost all running multiple mfaktc instances in parallel for the last additional bit of throughput; they can be driven to 100% indicated gpu load in gpu-z or nvidia-smi (benchmarking results for several models were posted in the mfaktc thread); any gpu model over ~100GhzD/day seems to benefit a little from multiple TF instances.

2) Reactivating some older gpus I had lying idle, now that I have an open-frame 6-PCIE up and running mostly (Asrock H81 Pro BTC 2.0, lowly 8GB ram single-DIMM i7-4790, also running prime95 PRPDC and only using a third of the system ram), nice big high efficiency PS. IGP refuses to take an OpenCL driver, and a couple of old gpus are not starting up currently. And ample cooling in the form of winter weather.

3) Diverting short term a GTX1080Ti from another use to TF (3 instances in parallel after 2047-capable and serious tuning gets it to 99-100% gpu load) which takes its throughput to ~1.4ThzD/day, and shifting lesser gpus to TF also;

4) Getting ready for more incoming hardware.

5) Popping the covers off some old gpus to remove the dust/lint/felt buildup after the fan; even fixed-clock old Quadros have some sort of thermal protection, perhaps shutting down some cores. One was so clogged I think it was interfering with the fan rotor and had been removed from service, and is now after a cleaning, back in the fray.

6) Further development of my own multi-gpu-app management program (makes monitoring status and collecting results easier and more efficient, especially important when running 2 and 3 TF instances per gpu in a system)

This combined TF throughput is mostly just softening up the low end of the 100M bin a bit, outside of the GPU72 flow. Lately manual TF assignments direct from mersenne.org "lowest exponents" have dropped off from 75/76 bit assignments, to recently as low as 72/73. Occasionally I would get some 95M before GPU72 scooped them up again, but that hasn't happened in a while. I have some Quadro 2000s slogging through some 95M 75/76 at ~one a day each!

For the faster newer gpus, if mfaktc and mfakto were modified a bit more, to raise the max gpusievesize above 2047Mib (currently using a signed 32-bit variable to compute bit address) to perhaps 4095Mib (unsigned 32-bit), there appears to be a bit more gain yet to be had there; at least for GTX1080Ti and up. It's likely to matter more as faster gpus come out, judging by tests from a wide variety of gpu speeds. A percent here and there, times how many gpus? Probably the equivalent of adding whole gpus to the project.

Perhaps Ben could shift some of his horsepower from first primality tests to LLDC and PRPDC. Even 10% would help those a lot; 20+% to LLDC would be better, as it's lagging several years behind.

axn 2020-01-31 09:50

[QUOTE=kriesel;536286]Perhaps Ben could shift some of his horsepower from first primality tests to LLDC and PRPDC.[/QUOTE]

That is the dumbest thing I've heard. It is equivalent to saying, "slow down, you might find a prime too quickly!"

Y'all are putting the cart before the horse. TF is supposed to help the project by accelerating the LL/PRP wavefront; and now that somebody has deployed LL resources to do just that, you want them to slow down?!

Just do TF 1 or 2 bits less than optimal and call it a day. That last bit has very negligible impact on project thruput compared to the previous bits. It will be a crying shame if, in the pursuit of mathematical optimality, you're letting many undersieved exponents thru to P-1 and Cat 3/4 testers.

petrw1 2020-01-31 11:08

[QUOTE=kriesel;536286]Perhaps Ben could shift some of his horsepower from first primality tests to LLDC and PRPDC. Even 10% would help those a lot; 20+% to LLDC would be better, as it's lagging several years behind.[/QUOTE]

Ryan P is going great guns there

chalsall 2020-01-31 11:53

[QUOTE=kriesel;536286]Find a way to make more of the 95M-100M TF available to ordinary GIMPSters that are not GPU72 participants. I'm generally relegated to 100M+ and putting recently ~3ThzD/day into it, despite requesting lowest exponents from the manual assignments page. Concentrate the firepower close in front of the primality wavefront.[/QUOTE]

The issue is that manual TF assignments from Primenet survive for six months, and so if not processed appropriately they risk being recycled and then being given to an LL'er sub-optimally TF'ed.

GPU72 very carefully targets its resources to "feed" the various wavefronts optimally, including Cats 3 and 4 to at least 75 bits and the P-1'ers to 77. Most of the Cat 3 and 4 assignments will be recycled, and can then be brought up to 77 before being given as a Cat 2 or lower.

Please note that Cat 3 and 4 are already in the 10xM ranges, and Cat 2 is about to enter there. So any work being done there will be "useful" quite quickly (particularly considering George's new assignment sort on TF depth clause).

Lastly, while I appreciate that ~3 THzD/D is impressive, please note that for the last month GPU72's participants have averaged a total of ~300 THzD/D.

I would argue that it's better to keep the disciplined targeted firepower working the way it is now. And, again, work in the 10xMs (ideally to 76 or 77) is needed right now.

kriesel 2020-01-31 14:01

[QUOTE=axn;536294]That is the dumbest thing I've heard.[/QUOTE]Balance is good. The lag between first-test and DC is around 8 years. If a mix of effort is LL 80% DC 20%, the lag will hold about constant. I feel reducing the lag would be good. The lag has been growing, before Delo joined; ten years ago the lag was about 6 years; 20 years ago, only about 3 years.

Outrunning the collective TF effort so some of the first-time primality testing is wasted on factorable candidates does not occur to me as an ideal plan.

If one very well funded user does essentially all of the first-primality testing as a percentage, he gets essentially all the probability of the next prime discovery, and does not contribute in other areas too, other participants may begin to question whether their TF, P-1, and DC in support of that is worth their time and money. Participation is dropping. It used to be over 7000 with results in the past year; now it's below 6200 and seems to be steadily declining.
[QUOTE=petrw1;536297]Ryan P is going great guns there[/QUOTE]Propper is listed in top producers as ~98% ECM 1%DC, 1% other. Delo is 99% first-primality, 1% DC 0% everything else. Not exactly balanced. All contributions are welcome. But the heaviest hitters are encouraged to consider how their choice of mix may affect the project, including other participants' responses.

chalsall 2020-01-31 14:14

[QUOTE=kriesel;536304]Outrunning the collective TF effort so some of the first-time primality testing is wasted on factorable candidates does not occur to me as an ideal plan.[/QUOTE]

That isn't happening. And this is GIMPS, not GIMFS.

storm5510 2020-01-31 14:19

[QUOTE=chalsall;536300]The issue is that manual TF assignments from Primenet survive for six months, and so if not processed appropriately they risk being recycled and then being given to an LL'er sub-optimally TF'ed...

...And, again, work in the 10xMs (ideally to 76 or 77) is needed right now.[/QUOTE]

"Survive for six months." This is way too long. Ten days would be plenty. If whoever takes more than they can run in this period of time, they need to cut it back.

I believe many will TF to whatever level they feel is practical. Personally, I do not take on anything above 2^75. It is the time spent versus chance of finding a factor, or not finding one. 75's take an hour on my hardware in the 98M to 100M area. Anything beyond, that is 20xx territory.

chalsall 2020-01-31 14:31

[QUOTE=storm5510;536310]"Survive for six months." This is way too long. Ten days would be plenty. If whoever takes more than they can run in this period of time, they need to cut it back.[/QUOTE]

This is a policy decision made by George, further exasperated by the fact that TF and P-1 assignments are not constrained by the LL/DC assignment rules.

[QUOTE=storm5510;536310]I believe many will TF to whatever level they feel is practical. Personally, I do not take on anything above 2^75.[/QUOTE]

And that's perfectly fine. Your kit; your choice.

kriesel 2020-01-31 14:52

[QUOTE=storm5510;536310]Ten days would be plenty.[/QUOTE]Not in my opinion. I'm personally running over 30 manually queued and reported gpu application instances, and that's climbing over time as I add hardware and add instances per gpu for greater throughput. (Working on automating managing that small but growing herd.) Ten TF assignments 75/76 queued on a slow (Quadro2000) gpu is 11 days to complete. I reserve assignments in blocks of 10 or more per gpu instance, and try to avoid them ever running dry, so latency is likely to be more than two weeks; months occasionally is not out of the question. I do my best to avoid expiration.

People do go on long vacations sometimes, or business travel, or get sick or injured, or have a term paper due or exam coming up, also.

Uncwilly 2020-01-31 15:31

[QUOTE=kriesel;536313]Not in my opinion. I'm personally running over 30 manually queued and reported gpu application instances[/QUOTE][B][U]Why are you not using Misfit?[/U][/B] Or a home brewed script to hand this?:bangheadonwall:

Would 2 months be ok with you? or 3 or 4?
Assignment recycling is important. Old TF assignments should be recycled ahead of the first time LL wave in enough time that they can all get done.

Ben may discourage some (since he depresses the chance of them finding a prime). But, overall there is more total throughput. And total number of users does fall in the months after the spike around a new prime discovery.

kriesel 2020-01-31 15:37

[QUOTE=chalsall;536309]That isn't happening. And this is GIMPS, not GIMFS.[/QUOTE]I took your post [URL]https://www.mersenneforum.org/showpost.php?p=534304&postcount=4550[/URL] to mean, first time testing is outrunning the project's ability to optimally TF and P-1, and you were asking for input on how to cope. Was that wrong?

If Ben Delo chose to, he could probably cause double checking to complete to M57885161 by Christmas without reducing his rate of first primality test completion by half. And reduce the DC backlog by years, and ease the pressure on TF and P-1 in the bargain. Come to think of it, his industrial grade kit would probably be quite good at P-1, which is relatively lacking in error detection and correction as an algorithm as implemented, and apparently falling behind.

He could also drop a few grand on RTX2080s, set up MISFIT, pay the utility bill for them, and improve the situation in TF a bit.
[QUOTE=chalsall;536300]
Lastly, while I appreciate that ~3 THzD/D is impressive, please note that for the last month GPU72's participants have averaged a total of ~300 THzD/D.

I would argue that it's better to keep the disciplined targeted firepower working the way it is now. And, again, work in the 10xMs (ideally to 76 or 77) is needed right now.[/QUOTE]Soon to be 6ThzD/d, and later ~8. But of course, a small component of the whole is just that; a small percentage of the total. And my little effort is not GPU72, and it is disciplined and as targeted as GPU72 laying claim to massive amounts of exponents at the wavefront allows it to be. I do appreciate the description of the juggling act for the various categories. Simple it ain't.

kriesel 2020-01-31 16:07

[QUOTE=Uncwilly;536315][B][U]Why are you not using Misfit?[/U][/B] Or a home brewed script to hand this?:bangheadonwall:

Would 2 months be ok with you? or 3 or 4?
Assignment recycling is important. Old TF assignments should be recycled ahead of the first time LL wave in enough time that they can all get done.

Ben may discourage some (since he depresses the chance of them finding a prime). But, overall there is more total throughput. And total number of users does fall in the months after the spike around a new prime discovery.[/QUOTE]
My understanding is MISFIT only does TF. MISFIT as I recall requires MS Forms and .net on every system it's installed on. (Not so easy for linux users.) Nothing does CUDAPm1. No script that's been released does all the gpu apps I run.

I'm running a mix of everything I feel forwards the wavefronts toward discovery. TF, P-1, LLDC, PRP DC, PRP first test, LL first test, software QA, runtime scaling and limits probing, tuning tests and documentation, etc.

I have a homebrew script that does the big 6 gpu apps, to the extent of analyzing logs, determining active or dormant/hung/stopped status, gathering new results into one file per system, computing remaining worktodo in days of gpu throughput per app instance, etc. I haven't added get-work automation to the script yet; that's probably next or near it. I recently added a user-triggered-only self-update-from-file-share function (which is the 1% of it that is not OS-independent, only implemented for Win32 so far). I work on extending it now and then. It's perl compiled to an executable with no requirement for installing anything else on a system it runs on. It's been almost 2 years in the works and feels like nearing ready for release (months). I alternate among writing code, debugging, testing, and documenting, and many other unrelated tasks. Sometimes I write the documentation first and code to that.

I'd probably be fine with 3 months TF expiration. And not assigning first-tests until optimally TF and P-1 complete. Let cpu first-test fall back to DC assignments and P-1. Not really my call though.

kriesel 2020-01-31 16:34

[QUOTE=chalsall;536311]This is a policy decision made by George, further exasperated by the fact that TF and P-1 assignments are not constrained by the LL/DC assignment rules.[/QUOTE]The issues are exacerbated, and so the participants may be exasperated.:smile::beer2:
Rules can be changed.

LaurV 2020-01-31 16:38

We are with axn here, if the man has resources and likes to LL or PRP, then he should LL or PRP. You won't tell me what to do with my rig, and the goal of the project is finding primes. Why the hack do we have the same discussion two times every year? :razz:

chalsall 2020-01-31 17:13

[QUOTE=kriesel;536316]I took your post [URL]https://www.mersenneforum.org/showpost.php?p=534304&postcount=4550[/URL] to mean, first time testing is outrunning the project's ability to optimally TF and P-1, and you were asking for input on how to cope. Was that wrong?[/QUOTE]

It was accurate at the time. However, some "big guns" stepped up to help readjust to the new realities. Oliver alone dumped 6.2 PHzD of work (!) in the last month. LaurV brought ~8 THzD/D of compute back to bear to help chase ahead of Cats 3 and 4 to 75 bits. And several others have stepped up their game as well.

I'll have some time to run the numbers again this weekend to see how we're looking, but there's a chance we'll be able to keep going to 77 for Cats 2 and below for the foreseeable future.

storm5510 2020-01-31 17:25

[QUOTE=kriesel;536313]...I'm personally running over 30 manually queued and reported gpu application instances...[/QUOTE]

I can see where you might want to keep all these going. I only have [U]one[/U] now, which I use sparingly. When I do run it, I reduce it to 80% capacity. It runs cooler this way, and there is nearly no impact on throughput. Still over 1,000 Ghz-d/day.

chalsall 2020-01-31 17:34

[QUOTE=kriesel;536318]My understanding is MISFIT only does TF. MISFIT as I recall requires MS Forms and .net on every system it's installed on. (Not so easy for linux users.) Nothing does CUDAPm1. No script that's been released does all the gpu apps I run.[/QUOTE]

Have you looked at Mark Rose's [URL="https://github.com/MarkRose/primetools"]primetools[/URL] codebase?

I don't use it currently myself, but I have in the past. Reliable.

If it doesn't do everything you want already, it would probably be a good base to build upon.

kriesel 2020-01-31 20:31

[QUOTE=chalsall;536326]It was accurate at the time. However, some "big guns" stepped up to help readjust to the new realities. Oliver alone dumped 6.2 PHzD of work (!) in the last month. LaurV brought ~8 THzD/D of compute back to bear to help chase ahead of Cats 3 and 4 to 75 bits. And several others have stepped up their game as well.

I'll have some time to run the numbers again this weekend to see how we're looking, but there's a chance we'll be able to keep going to 77 for Cats 2 and below for the foreseeable future.[/QUOTE]Wow, excellent, good to see the group respond effectively to the challenge, thanks for the update.

[QUOTE=chalsall;536330]Have you looked at Mark Rose's [URL="https://github.com/MarkRose/primetools"]primetools[/URL] codebase?
...
If it doesn't do everything you want already, it would probably be a good base to build upon.[/QUOTE]Yes, I've looked at it, and summarized its feature set in a table with the other similar stuff I could find. My impression is its feature set is a small subset of what I'm aiming for. Also I don't know Python. Reprogramming vintage wetware is slow.

[QUOTE=LaurV;536322]You won't tell me what to do with my rig... Why the hack do we have the same discussion two times every year? :razz:[/QUOTE]Wouldn't dream of trying, LaurV. But some things need to be discussed now and then. If nothing else, it improves someone's understanding of how things are. In this case, definitely including mine. Things changed a lot this month since Chalsall's Jan 5 post.

kracker 2020-02-01 00:28

[QUOTE=LaurV;536322]We are with axn here, if the man has resources and likes to LL or PRP, then he should LL or PRP. You won't tell me what to do with my rig, and the goal of the project is finding primes. Why the hack do we have the same discussion two times every year? :razz:[/QUOTE]

+1
It's his firepower... he can do whatever the hell he wants with it, which at the moment seems to be prime hunting... If he wants advice or opinions about something, he'll ask for it most likely.

Something I try to live by: Just because it's your opinion doesn't automatically make it fact.

kriesel 2020-02-01 14:47

LL vs. LLDC time gap trend over time
 
At one time DC got done within the hardware lifetime of the system that did the first test.
Flaky systems got identified and corrective action could be taken regarding all their output in a timely manner (early triple check), and also the user could be notified their still-producing system was unreliable and any hardware issues addressed and output reliability improved.

Over the past 20 years, the gap between first test and double check has grown drastically, from under two years, to nearly a decade, longer than typical hardware lifetime.

Recent trends are toward more primality testing, less DC, so the gap will worsen (is worsening).

I think the required DC rate should be increased to get this large and growing lag under control.

Method used to gauge gap was to view a 1000-range of exponent beginning at the indicated base value, and find the longest time gap within that sample interval.

base LL year DC time gap years
3M (most date data missing)
4M 1998 1.5 (some first-test dates missing)
5M 1998 2.2
10M 2000 3.7
30M 2005 6.9
50M 2010 8.6
>50M tbd

axn 2020-02-01 14:54

[QUOTE=kriesel;536381]Flaky systems got identified and corrective action could be taken regarding all their output in a timely manner (early triple check), and also the user could be notified their still-producing system was unreliable and any hardware issues addressed and output reliability improved.[/QUOTE]

You've made an excellent argument to stop LL and only hand out PRP. :smile:

kriesel 2020-02-01 16:06

[QUOTE=axn;536382]You've made an excellent argument to stop LL and only hand out PRP. :smile:[/QUOTE]Where the software/hardware combination can run either, yes. I've switched nearly all my primality testing from LL to PRP on cpu and gpu.

Some older gpus with very good DP/SP ratio, but OpenCL below 2.0, can't run gpuowl, so can't run PRP. I have a Tesla C2075 trudging through LL DC, at about 2/week in 53M, in CUDALucas, for example. (Which lacks even the Jacobi check, but the gpu has ECC ram, and I haven't yet had a bad final residue on it.)

PRP also needs DC. Since PRP is new, it has had very little. While it's more reliable than LL, due to the excellent GEC, there have been found errors outside the GEC protected code. Lots of PRP DC are needed to provide a big enough statistical sample to gauge what the PRP/GEC reliability figure is in practice.

It would be good for the GIMPS project if some top producers did more DC, either LL DC or PRP DC or both. And transitioned more completely from LL to PRP on their first tests.

And as Woltman occasionally reminds us, there might be a missed prime lurking in the ~2% of LL tests that have bad residues, that have waited years for a double check.

I don't know that there's any good solution to a user running unreliable hardware, either through lack of knowledge of the problem or lack of interest in improving reliability.

Dylan14 2020-02-01 19:01

I'm not sure if this is a bug or not, but on the GPU72 site, if you log in on one version (say, the version whose language is determined based on your browser) and then switch to another language, the site will ask you to log in again to access your account. As I presume all the languages are on the same server, shouldn't one login suffice for all languages?

chalsall 2020-02-01 19:11

[QUOTE=Dylan14;536402]As I presume all the languages are on the same server, shouldn't one login suffice for all languages?[/QUOTE]

This is how HTTP Basic Auth works. Even though all the different language subdomains are indeed on the same server, as far as your browser is concerned they are different.

kladner 2020-02-03 07:11

I came home from work about quarter to eleven CST to find no mfaktc running on my sole surviving GPU, a GTX 1060. .bat file restart attempts failed instantly. From command line, trying to run 'mfaktc -st' ended in an error message about mismatches in CUDA versions in different contexts. After mucking about a bit, I had the bright idea to just update the graphics driver. The nvidia installer could not find any devices requiring the driver because I only use the card for computing, and run the display from the onboard Intel.

I am shut down for TF, at least for the moment. At which moment I am too tired, and working on being too "relaxed" to pursue reconnecting the 1060 by whatever compatible cable, preferably DVI I can dig up to update the driver.. Meanwhile, I am down to a very laid-back 'garden' of an i7 6700k (4 cores, 1 worker, 4.3 GHz), and an i7 9700k (6 of 8 cores, 1 worker, 4.3 GHz). Both doing LLDC with DDR4-3200, dual rank or equivalent DRAM.

Life has setbacks. Deal with it However, my guess is that Win 10 updated something that broke my mfaktc.

kriesel 2020-02-03 11:34

[QUOTE=kladner;536541]I came home from work about quarter to eleven CST to find no mfaktc running on my sole surviving GPU, a GTX 1060. .bat file restart attempts failed instantly. From command line, trying to run 'mfaktc -st' ended in an error message about mismatches in CUDA versions in different contexts. After mucking about a bit, I had the bright idea to just update the graphics driver
....

Life has setbacks. Deal with it However, my guess is that Win 10 updated something that broke my mfaktc.[/QUOTE]
Maybe [URL]https://www.lifewire.com/how-to-roll-back-a-driver-in-windows-2619217[/URL] helps.
Finding out why it went awry and preventing a recurrence is needed too. It should not have updated to an incompatible version though.

kladner 2020-02-03 17:10

[QUOTE=kriesel;536552]Maybe [URL]https://www.lifewire.com/how-to-roll-back-a-driver-in-windows-2619217[/URL] helps.
Finding out why it went awry and preventing a recurrence is needed too. It should not have updated to an incompatible version though.[/QUOTE]
Thanks for the link. After going through some of the suggested procedures I have mfaktc running again. I think it was the "Update driver" approach that fixed it after a few attempts with the different options available.

ixfd64 2020-02-04 04:22

It seems some assignments are either being poached or being reassigned to other users. I should have 400 outstanding assignments, but GPU to 72 says I only have 283.

chalsall 2020-02-04 12:41

[QUOTE=ixfd64;536625]It seems some assignments are either being poached or being reassigned to other users. I should have 400 outstanding assignments, but GPU to 72 says I only have 283.[/QUOTE]

Can you give me any examples?

There's no recycling going on, and everything being given out is actually "owned" by GPU72.

chalsall 2020-02-04 14:31

[QUOTE=chalsall;536639]Can you give me any examples?[/QUOTE]

Hmmm... It looks like our "friend" Niels_Mache_Nextcloud is working "off the books" again... Not directly related to you, but [URL="https://www.mersenne.org/report_exponent/?exp_lo=107879231&full=1"]107879231[/URL] is an example of him blindly redoing work which someone else had already done.

Damn, I wish he wouldn't do this. There's no way I can predict what he'll poach, and it discourages legitimate participants.

ixfd64 2020-02-04 17:22

I'll have another word with him. And it definitely doesn't help that George is often unresponsive to requests to manually add results that have been rejected by the server.

chalsall 2020-02-04 17:28

[QUOTE=ixfd64;536656]I'll have another word with him.[/QUOTE]

Thanks.

[QUOTE=ixfd64;536656]And it definitely doesn't help that George is often unresponsive to requests to manually add results that have been rejected by the server.[/QUOTE]

It's a small consolation, I know, but you /do/ get the credit on GPU72 for any poached TF assignments.

linament 2020-02-04 17:31

Double check assignments
 
This morning I was able to bring up a GPU72 Trial Factoring Colabratory Notebook with an access key using a work type of "GPU72 decides." I noticed a little while ago looking at my View Assignments that I have been doing DC TF in the 102M range (.eg 102432997). It seems a little odd to be doing DC in this range. If it is truly DC work, why is it important seeing that the DC wavefront is decades from here?

chalsall 2020-02-04 17:39

[QUOTE=linament;536658]It seems a little odd to be doing DC in this range. If it is truly DC work, why is it important seeing that the DC wavefront is decades from here?[/QUOTE]

Colab "Let GPU72 Decide" is temporarily doing high DCTF which are at very low TF levels. This is just a bit of "clean-up".

If you would prefer to do LLTF, change your Notebook Instance's work-type.

chalsall 2020-02-06 14:58

[QUOTE=chalsall;536659]Colab "Let GPU72 Decide" is temporarily doing high DCTF which are at very low TF levels. This is just a bit of "clean-up".[/QUOTE]

Just so everyone knows, Colab LG72D is back to doing "breadth-first" LLTF work.

chalsall 2020-02-10 23:27

[QUOTE=kladner;535196]It's doing pretty well now at DC, but is there a current need for P-1?[/QUOTE]

Actually... There is perhaps a need for some "quality" P-1'ing...

Cats 0 and 1 are more than fine, but Cat 2 might be worth "chasing" with P-1'ing. For this reason GPU72 is now only offering P-1'ing at "high" levels, in order to have a least a few ready for Cat 2 workers as they climb into the 10xMs.

So, anyone who might enjoy doing some near-term needed P-1'ing, please consider doing some through GPU72. On the other hand, those who simply want to process as many P-1s as possible per "wall-clock" time, consider getting the P-1 work directly from Primenet.

Or, perhaps, I should offer a new P-1'ing worktype?

Thoughts?

petrw1 2020-02-11 00:28

[QUOTE=chalsall;537263]
Or, perhaps, I should offer a new P-1'ing worktype?

Thoughts?[/QUOTE]

You already have this....don't you?

kladner 2020-02-14 16:59

[QUOTE]Actually... There is perhaps a need for some "quality" P-1'ing...[/QUOTE]
Would the desirable assignments come in through the proxy with P95 set to P-1? No assignments available through GPU72 manual page...

chalsall 2020-02-14 17:13

[QUOTE=kladner;537577]Would the desirable assignments come in through the proxy with P95 set to P-1? No assignments available through the GPU72 manual page...[/QUOTE]

It would be cool if you got them through the proxy (for tracking purposes), but...

They're also available from the manual assignment form. You just pointed out an SPE error in my preview code; it was constrained by a <99,000,000 clause...

Man, who would have thunk it? :smile: :tu:

chalsall 2020-02-14 17:16

[QUOTE=petrw1;537270]You already have this....don't you?[/QUOTE]

Ah, good point... When I have some cycles I'll bring in some candidates to P-1 in the high 100M range, in prep for Cat 2 entering there (~2 months or so). Then the options (Lowest, Highest) will make sense again.

chalsall 2020-02-14 18:04

So, a quick update (I'll have some cycles over the weekend to do a more indepth report on where we are).

Right now, we are under a bit of a crunch in 106M (Cat 4).

Ben's throughput has stabilized a bit, so [URL="https://www.mersenne.org/primenet/graphs.php"]Primenet's net LL throughput is averaging about 680 LL's a day[/URL] (down from ~900 a day).

Because of this, the [URL="https://www.mersenne.org/thresholds/"]assignment ranges ("Cats")[/URL] have actually been dropping for Cats 1 through 3. Because I didn't foresee this, we've had to release some candidates at 76 and 75 "bits" in 106M.

Because of the recent availability of Colab again, it would be good if people choose "LL Depth First" for at least a few of their [URL="https://www.gpu72.com/account/instances/"]Instances[/URL]. This will assign work to 77 bits in the high 106M range.

Please know that each 76 to 77 run takes approximately 2 hours on a full T4. But this helps the P-1'ing job done on the CPU before the LL test (in addition, of course, to eliminating candidates).

But, as always, it's your time (the compute is free). WMS or Breadth First is helpful too!

Man, what a problem to have!!! :wink:

kladner 2020-02-14 20:16

[QUOTE=chalsall;537578]It would be cool if you got them through the proxy (for tracking purposes), but...

They're also available from the manual assignment form. You just pointed out an SPE error in my preview code; it was constrained by a <99,000,000 clause...

Man, who would have thunk it? :smile: :tu:[/QUOTE]
Happy to go through the proxy. Just checking that would get the desired results. :smile:

chalsall 2020-02-14 20:28

[QUOTE=kladner;537594]Happy to go through the proxy. Just checking that would get the desired results. :smile:[/QUOTE]

Always a good idea to check. SPEs abound with things moving so quickly... Oh, and I just noticed that your machine grabbed a batch through the proxy. Sweet!

Just so everyone knows, I've also brought in 200 candidates to P-1 in the high 100M range. For anyone willing to work up there, they're available from the manual assignment page by choosing "Highest Exponent" under the "Options" field.

James Heinrich 2020-02-15 16:04

Are the ranges on [url=https://www.gpu72.com/account/factoring_cost/p-1/]Individual Factoring Cost P-1[/url] hardcoded? On my report I have scales of 45M-73M and 2^69-74, whereas I'm pretty sure I've P-1 outside both of those.

kladner 2020-02-15 16:23

[QUOTE=chalsall;537595]Always a good idea to check. SPEs abound with things moving so quickly... Oh, and [U]I just noticed that your machine grabbed a batch through the proxy[/U]. Sweet!
[/QUOTE]
I fumbled around changing P95 settings. This led to a bunch of P-1 assignments getting thrown back at one point. That machine now has 15 minutes to go on its last DC. After that, P-1 will take over.

As I remember, there is some benefit to P-1 from multi-threading, but maybe not beyond 2 cores per worker. Again, this box has 32 GiB RAM. Any opinions on running 2 versus 4 (or 8) workers on an 8 core CPU with decent memory speed for dual channel?

EDIT: The DC finished. I reset the worker windows to four with 2 cores each, 28 GiB RAM allowed, all set for P-1. Repeatedly, at startup, the first worker pulls in another DCLL. I have unreserved the DC several times and it keeps coming back. This machine has been running DC for weeks with no problems. Why can't I make the first worker run P-1?


EDIT 2: Computer properties in GIMPS was set to DC for the first worker. I change that to P-1, and worker 1 still gets a DCLL. I really don't want an LL running there. What goes on?
NOTE: I guess this maybe belongs in the PrimeNet sub-forum, but it started here, and it's troublesome to change it now.

chalsall 2020-02-15 21:50

[QUOTE=James Heinrich;537642]Are the ranges on [url=https://www.gpu72.com/account/factoring_cost/p-1/]Individual Factoring Cost P-1[/url] hardcoded?[/QUOTE]

Sigh... Yes... :smile:

Added to my Todo list...

chalsall 2020-02-15 21:53

[QUOTE=kladner;537645]NOTE: I guess this maybe belongs in the PrimeNet sub-forum, but it started here, and it's troublesome to change it now.[/QUOTE]

OK... This is /possibly/ an SPE on my part. I seem to remember that the proxy sometimes had problems picking up when a preferred work-type was changed.

Something to try: Change the settings on Primenet, then change them on the client and have it communicate with Primenet.

If that doesn't work, PM me the machine name in question.

chalsall 2020-02-15 22:01

Colab Automatic Submissions!!!
 
OK, so I finally climbed into the code needed to do this. Was a /lot/ more work than I first thought (needed some rather interesting hacking; thanks to George and Aaron for the support and assistance)...

The automatic submission of completed work by Colab Instances is now (almost completely) working. The only thing left to do is handle "Factor Found" messages. I'm waiting for one of my own instances to find a factor so I can test the code path.

For those who's Primenet Username (not Display Name) I already knew, I've activated this. Basically, within a minute of a result being returned to GPU72, it is then sent to Primenet, and credited at both locations.

I need to build a form for this, but if anyone who's "Colabbing" and still sees (oldish) results in your report, please PM me your Primenet USERNAME (but NOT your Password; don't need, nor want, this).

A "virtual" computer is created on Primenet for each account called "GPU72_TF". This is what the results are submitted as.

This should make people's lives a whole lot easier. Just spin up an instance, and then forget about it (for a while). :smile:

Chuck 2020-02-15 23:17

Good work
 
[QUOTE=chalsall;537672]OK, so I finally climbed into the code needed to do this. Was a /lot/ more work than I first thought (needed some rather interesting hacking; thanks to George and Aaron for the support and assistance)...

The automatic submission of completed work by Colab Instances is now (almost completely) working. The only thing left to do is handle "Factor Found" messages. I'm waiting for one of my own instances to find a factor so I can test the code path.
[/QUOTE]

Good work. A very welcome addition. :cool:

Chuck 2020-02-15 23:23

[QUOTE=chalsall;537582]
Because of the recent availability of Colab again, it would be good if people choose "LL Depth First" for at least a few of their [URL="https://www.gpu72.com/account/instances/"]Instances[/URL]. This will assign work to 77 bits in the high 106M range.
[/QUOTE]

I thought WMS always reflected your requirements at the moment.

chalsall 2020-02-15 23:37

[QUOTE=Chuck;537676]I thought WMS always reflected your requirements at the moment.[/QUOTE]

It generally does. Although for Colab I don't have a WMS, but instead a "Let GPU72 Decide" (LG72D).

It can be a bit of a shock for people to suddenly get the "heavy" work, so I generally keep LG72D to be no more than 76 bits, and leave the "Depth" setting for those who understand what they're committing to (with LG72D prepping the work for the last step).

It's a constant juggle; keep the TF'ers happy, while keeping the P-1'ers and LL'ers fed... :smile:

Chuck 2020-02-16 00:03

Oh I forgot that Colab was "Let GPU72 Decide" instead of WMS.

Chuck 2020-02-16 03:34

Colab sessions not getting work
 
My COLAB sessions are not getting new work.

The last number I factored was 102995749, and then the session exited. I tried creating a new notebook but it still did not fetch work. I had just changed to "Depth first".

I changed back to GPU72 Decides and it successfully fetched work again.

chalsall 2020-02-16 04:19

[QUOTE=Chuck;537687]I had just changed to "Depth first". I changed back to GPU72 Decides and it successfully fetched work again.[/QUOTE]

Stick a fork in me; I'm done! :sad:

Sorry... SPE... I was trying to retarget to 102M, but forgot to change the low value. DWIM!!!

This should be working again; thanks for attempting Depth. Heading to bed now; will confirm sanity in the morning (of my code, not me. I'm definitely certifiable...).

Chuck 2020-02-16 14:09

All is well...thanks for quick fix.

I like the way the results submission process for Colab works using the GPU72_TF computer name. This separates out the work on my PrimeNet CPU report so I can see how much is coming from Colab.

James Heinrich 2020-02-18 00:14

[QUOTE=chalsall;537672]The automatic submission of completed work by Colab Instances is now (almost completely) working.
For those who's Primenet Username (not Display Name) I already knew, I've activated this.[/QUOTE]I notice that the returned results are queued in GPU72 for a minute or so, which is fine, but the text shown below says:[quote]Copy and paste the above results lines (only those in the green cell) into the Primenet Manual Results page.
Or, better yet, please email us with your Primenet USERNAME (but NOT your PASSWORD), and the results will be submitted automatically.[/quote]Perhaps it would be better to show that only for users for whom automatic submission is not enabled, and for those for whom it is to display something like "the above result(s) will be sent to Primenet automatically within a minute or so".

chalsall 2020-02-19 17:50

[QUOTE=James Heinrich;537796]Perhaps it would be better to show that only for users for whom automatic submission is not enabled, and for those for whom it is to display something like "the above result(s) will be sent to Primenet automatically within a minute or so".[/QUOTE]

Thanks... I often forget about the humans... :wink:

So, it appears that Colab is definitely viable again! A massive thanks to Google for all the compute. I have to assume they understand, and are tolerant of, what we're up to. :tu:

Because of this, I have invested the time in getting the GPU_TF Notebook to also run CPU jobs in parallel. This is close to being ready for production -- will be running initially with some beta testers (beyond just me).

Although, to be honest, there's not a whole lotta power there. Only a single (hyperthreaded) core of a Xeon @ 2.30GHz. 12G of RAM, though, so good for (slowish) P-1'ing. And a pity to just let it sit there idle.

In preparation for taking this "live", I have enabled the GUI function for CPU Worktype selection at the [URL="https://www.gpu72.com/account/instances/"]Instance List and Edit pages[/URL]. Note that by default every instance is set to "Let GPU72 Decide" -- I haven't actually figured out what makes sense yet. Perhaps some ECM'ing?

But, importantly, anyone who doesn't want a CPU job to be run in parallel, please choose the "Disabled" option. Then the CPU is left for whatever you might want to do with it.

Pointing out of SPE's appreciated. As are thoughts and suggestions.


All times are UTC. The time now is 01:02.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.