![]() |
![]() |
#1 |
Romulan Interpreter
Jun 2011
Thailand
5·17·109 Posts |
![]()
We can't do any TF right now (no resources) and we somehow just managed to get assigned M332381057 for PRP First Test. We didn't realize that this is under-TF-ed (currently, to 77) until gpuOwl started to do P-1 instead the expected PRP. So, we are doing P-1 right now, which would take one or two hours more, then we will stop until we will be able to TF it to ar least 83 bits, which will be in a week or two.
Alternative is to unreserve, and get an assignment which is at least well TF-ed. Of course, unless somebody is willing to do the TF job for us (77 to 83). Any takers? ![]() Thanks in advance. Last fiddled with by LaurV on 2020-05-17 at 13:12 |
![]() |
![]() |
![]() |
#2 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×3×19×43 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#3 |
Romulan Interpreter
Jun 2011
Thailand
5×17×109 Posts |
![]()
Aaaa.. nope. Good point. I just restarted with proper limits. The TF limit is card-dependent. 2080Ti should go to 83, lower cards even to 84. Titans should stop at 81. Radeon VII should stop at 80. I said 83, but any bit over 77 should help. But don't start yet, I will let you know if the TF still needed. Thanks.
Edit: I mean tomorrow, or in weekend, now it is 1:20AM here, I go to sleep... Last fiddled with by LaurV on 2020-04-23 at 18:18 |
![]() |
![]() |
![]() |
#4 |
Romulan Interpreter
Jun 2011
Thailand
5·17·109 Posts |
![]()
Timeout.
Break. gpuOwl is not usable for P-1, for the cards we have and/or for the settings we have... Fullstop. More research is needed. We also believe we uncovered a bug in it, but that may be coincidental. This job is indefinitely postponed. Thread moved to personal blog. TL;DR: (to be written in the following 30 minutes or so) Last fiddled with by LaurV on 2020-04-25 at 05:16 |
![]() |
![]() |
![]() |
#5 |
Romulan Interpreter
Jun 2011
Thailand
5·17·109 Posts |
![]()
So, last weekend we decided to switch one wheelbarrow from "a lot of TF" to "a lot of LL/PRP".
These went out: (the foot is intentional, we can prove it is ours ![]() This is the bare-barrow: These went in: The "sevens" were our "self inflicted", "Christmas present", we got them end of November, from B&H, that is also where we got the mobo from, many years ago. This is a wonderful mobo, about which I posted in the past. But due to our "Australian detour" (from beginning of December to end of January), about which we also posted in the past, we had a lot of work stacked on, so we didn't really have time for playing God with our computers. Last week, after a lot of TF work with GPU72, and other adventures, we desperately needed a cleaning of the water fins of the CPU, as it was starting getting to the junction limit when both mfaktc and P95 were munching. Therefore, we took the toy apart and decided it's a good time to try the "sevens", which were still unboxed (well, we had a look to what's in the box when we got them, but that's all). They went successfully, and amazingly fast, through 15 or 20 LLDCs each, with the gpuOwl (we could not report all of them, as they all were our former work from the past, which was not yet verified, so, during we were "DC-ing our own work", some exponents were reserved by "reliable" third parties, and we didn't want to poach, or they were already DC-ed, so we only reported the results for those unreserved, or reserved by "unreliable" anonymous users, but that is another story, and we know Madpoo will not resist the temptation to TC those exponents ![]() There was only a single mismatch in all those DCs, to which we did the TC rerun, and everything turned fine (initial residue was right, our current DC was wrong). But all the other DCs turned out well. These cards are monsters for LL/PRP/DC. With proper cooling, they can go through one 55M DC in about six-seven hours. One error in 20 test or so, that is what I would call reliable, for a gaming card, and "very VERY fast" for a FFT implementation. Well done! But this is where the praise stops. We didn't like the fact that the checkpoint history is not retained, and there seems not to be any way in the program to tell it to keep the history. We have to make a batch to check every 30 minutes or so, and if there is any gpuowl.ll file in the folder, then rename it to gpuowl.001.ll, then 002, 003, etc. Then, in case of mismatches (which are properly recorded in the logs!) we can resume from the proper checkpoint and avoid wasting the time of reruning both tests from scratch. Otherwise a lot of resources are lost, and the toy will not be suitable for "long jobs", EVER. So, back to the story, drunk by the success we had with the DCs ![]() ![]() This was the plan. Finding that the runs differ somewhere in the middle, with no Gerbicz error signaled, would have made the news ![]() However, we didn't go so far. We didn't remark the fact that the reserved exponent was not TFed enough, until gpuOwl started to do P-1 on it, instead of the PRP that we expected. We said WTF? Well, actually, wait a moment, this is a good point with The Owl that it didn't let us PRP for ages some exponent for which a factor may have been found much faster. White ball for the owl. But on the other hand, we are now STUCK with the P-1, and we are quite unsatisfied with actual version of gpuOwl, when it comes to P-1. Sorry Mihai. I commend you for the work you invested in this toy, to make it fast, etc., and I know you have that genius spark and you are a hardworking guy, but the owl is far away of being robust or reliable, or even useful, for long jobs. Short runs, which you can repeat in case of failure, yeah, we are good. Maybe that was the goal. But long runs, no. We repeated the P-1 three times, each time in both cards, and every time they differ. We stopped the run as soon as we have seen the difference, but of course, sometimes hours after it happened, or even when the run was already in stage 2 (where, by the way, THERE IS NO RESIDUE OUTPUT! ![]() Now, we ended up with totally 6 (partial) runs (in two cards together), with 6 (partial) log files, all 6 different. One differs from the other at iteration ~680k, and there is another differing from the rest of 4 at iteration about 1.7M (from a total of about 6M iterations for P-1 with a 5% chances to a factor). These two runs are, no doubt, wrong. Because the other 4 agree about the residues, up to a higher number of iterations. Moreover, the two wrong runs came from the same card, so that card seems to be... not so reliable like the other. Or it was just unlucky this time and got the error sooner. All video cards are prone to errors, unless you buy specially dedicated GPGPUs (we had this discussion in the past, and I explained why, from the point of view of a guy working in the electronic manufacturing design/industry). So what, you would say, maybe the other card is also wrong, or you pushed it too much. Well.. that MAY BE. But my concern is the fact that I had to run three times, two cards, FROM SCRATCH. Because no history, no checkpoints. But NO, that CANNOT BE. Because the other 4 files, from which 3 were run in the same card, at different times, and the fourth came from a different card - and here is why I assume there is a bug in the P-1 code - all start differing at iteration 2.71M. That would be too much of a coincidence... So, future plans, that I have to do, all 3 in parallel, in the same time: 1. Make a clever batch file to grab the checkpoint files as soon as they are saved by the Owl, and store them in a better-organized fashion, to be able to resume in case sh!t happens. Possibly, read the iteration number from inside the file (well.. it was so difficult for gpuOwl to write it in the name of the file...) 2. Test how PRP is really going - maybe due to GC, this is not needed anymore, and GC can indeed save us all the trouble. Note that we didn't go so far yet, to be able to run a long PRP test with GC active, and LL/P-1 that we played with, have no GC-similar check. 3. Continue to bother Mihai on all fronts, until he is totally pissed off of us, and he makes the Owl to our liking (similar what we did during cudaLucas development ![]() Last fiddled with by LaurV on 2020-04-25 at 07:18 |
![]() |
![]() |
![]() |
#6 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×3×19×43 Posts |
![]()
I wonder if some of the mismatch you may be seeing could be differences in sieving results between runs. As I recall, I've seen P-1 be nonreproducible, also. Different bounds selected by CUDAPm1 from one run to the next, or different sieving of the same bounds, may occur. Sieving differences in <B1 would affect res64 matches in stage 1. Sieving differences in B1 to B2would affect res64 matches in stage 2 if there were any res64 there to look at in gpuowl.
Last fiddled with by kriesel on 2020-04-25 at 13:39 |
![]() |
![]() |
![]() |
#7 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×3×19×43 Posts |
![]()
Maybe try a test run or more with known-factor exponents.
P-1 selftest candidates https://www.mersenneforum.org/showpo...8&postcount=31 General background re P-1 errors (work in progress) https://www.mersenneforum.org/showth...937#post509937 Last fiddled with by kriesel on 2020-04-25 at 15:43 |
![]() |
![]() |
![]() |
#8 | |
Romulan Interpreter
Jun 2011
Thailand
926510 Posts |
![]() Quote:
![]() ![]() But is is an interesting idea... As I have seen the Owl working (only talk about P-1), if you also witnessed "nonreproducibility" in the past, then my confidence in the P-1 results reported to PrimeNet by all other users, assuming the work was done with gpuOwl, is zero divided by four. And that is where more work has to be done. Finally, I managed to finish one P-1 run giving the same residues in Stage 1 as one of the existent 6 previous runs, in the same card. The other card (same card that produced the other 3 wrong residues!) started differing at iteration ~3.5 millions, and I scrapped the run, together with the other 5. So, from 8 runs in 2 cards, one card produced 4 bad runs, and one card produced 2 good 2 bad. And I let it finish the Stage 2, (in both cards, by copying the end of Stage 1 checkpoints) and reported the result to PrimeNet. No factors. But the Stage 2 is still "not sure". Because no residues output. What if the both cards just went nuts? ![]() I started PRP, which up to now, matches. I also tried mfakto and these cards output about 1100-1500GHzD/D, depending on the bitlevel (the higher output is for higher levels). So, the "breaking point" is somewhere at 80 bits, and James' calculus is correct. Going higher with "sevens" is waste of time, you could clear exponents faster by PRP and PRPDC than you can clear them by TF to 81 bits (one PRP run takes about 17 days, assuming everything goes perfect, and no very long restarts or a lot of "resumes" - mind that I don't know yet how well PRP runs, and how efficient GC is in catching errors and resuming in a timely maner). Therefore, if you want to run some TF, now it is the time ![]() Last fiddled with by LaurV on 2020-04-25 at 17:20 |
|
![]() |
![]() |
![]() |
#9 | |||
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·3·19·43 Posts |
![]() Quote:
Quote:
Quote:
Last fiddled with by kriesel on 2020-04-26 at 10:07 |
|||
![]() |
![]() |
![]() |
#10 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·3·19·43 Posts |
![]() |
![]() |
![]() |
![]() |
#11 | |
Romulan Interpreter
Jun 2011
Thailand
5×17×109 Posts |
![]() Quote:
So PRP (from start to finish) would end in 16 days (and a bit more), which results in about 6% (and a bit more) per day. Meantime, a day and a half (from the 16) already went past, after which the PRP progressed to the expected ~10%, without any error and without any GC resume. Both cards output exactly the same residues, all the way. Which says the cards are sane, app is sane for PRP, and most probably, due to the LLDCs I did before (see former posts), app is also sane for LL (except that is very hard to use, due to missing worktodo facilities and checkpoint history). However, app sucks for P-1. Unless it is not supposed to be reproducible, as you said, in which case I am wrong and wasted my time. Mihai's input here should be valuable... Last fiddled with by LaurV on 2020-04-27 at 05:36 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Why did I get a 332M exponent | jovada | Information & Answers | 2 | 2019-05-05 12:31 |
ARM ASM request | ET_ | Programming | 0 | 2018-11-01 14:57 |
Not quite a bug... a request maybe. | ET_ | PrimeNet | 4 | 2018-07-06 16:08 |
Why do I get ~332M work assigned to the 100M search? | heliosh | PrimeNet | 6 | 2017-10-02 18:22 |
GPU72 out of 332M exponents? | Uncwilly | GPU to 72 | 16 | 2014-04-11 11:31 |