![]() |
I did some experimenting with clLucas over the past month, however I also had some system moves and liquid cooling changes at the same time.
I ended up with one successfully double check out of four. In one case I ran the same exponent twice and got a different non-matching residue both times. I was hoping to devote some GPU effort to the DC front, but I think I will wait until I can devote more time tuning the cards for LL work before I try to continue. [url]http://www.mersenne.org/report_exponent/?exp_lo=40452631&full=1[/url] verified [url]http://www.mersenne.org/report_exponent/?exp_lo=40885763&full=1[/url] failed to match [url]http://www.mersenne.org/report_exponent/?exp_lo=40458697&full=1[/url] failed to match twice I also had one suspect LL result on my Macbook Pro, which has typically been very reliable, if anyone wants to double check it. [url]http://www.mersenne.org/report_exponent/?exp_lo=72958471&full=1[/url] What is the group stance on self double/triple checking? I have a number of machines and would normally assign any suspect results to another machine for double check with my own resources, but if that is going to be viewed as less reliable later I will leave them for others to follow up on. I would just prefer to pick up after myself. |
[QUOTE=Madpoo;418021]All the more reason why a suspected bad system should only be doing double-checks until we're sure it's running okay, but I don't really know how that would work in practice.[/QUOTE]
Tangentially related to this... Since DC is currently about 10 years behind LL, might it make sense to do some DC'ing (perhaps 10%) on candidates where the original LLer's machine(s) success rate isn't yet known at all? Perhaps broken down by year / version. Once this is done it would help narrow down suspect machines. I would be happy to devote, say, 50% to 75% of my cycles to such an effort (will still want to do some low DC'ing to ensure the ongoing sanity of my systems). Thoughts? |
[QUOTE=airsquirrels;418022]I also had one suspect LL result on my Macbook Pro, which has typically been very reliable, if anyone wants to double check it.
[url]http://www.mersenne.org/report_exponent/?exp_lo=72958471&full=1[/url][/QUOTE] I would have taken it if it hadn't already been assigned to curtisc. [QUOTE=airsquirrels;418022]What is the group stance on self double/triple checking? ... I would just prefer to pick up after myself.[/QUOTE] Generally frowned upon. Even for entirely trusted workers like you and LaurV et al, the consensus seems to be that candidates should be cleared by different actors just to (try to) be absolutely certain. |
[QUOTE=airsquirrels;418022]What is the group stance on self double/triple checking? I have a number of machines and would normally assign any suspect results to another machine for double check with my own resources, but if that is going to be viewed as less reliable later I will leave them for others to follow up on. I would just prefer to pick up after myself.[/QUOTE]
Like chalsall mentioned, it's generally frowned upon (I think?). I personally don't think we should even accept a self-verified double-check... the whole system of double-checking relies on the 64-bit residue being masked until it's verified, to avoid shenanigans. But those residues are known to whomever did the first test. Not to say any of the good, hard working people out there are up to no good for no good reason, but still... you never know. That's why I made it a personal mission to do independent triple checks for anything like that, and we had some great help earlier this year in clearing out years and years worth of built-up candidates. Now with changes to the assignment code, you can't accidentally get an automatic double-check of something you already tested, but there are still odd cases where someone manually works on something they did previously (or in the case of curtisc's systems I think the same assignment accidentally makes it onto more than one machine... happens a couple times per month sadly). I still query for those and clear them out when I find them. So if you had matched on that one test you ran twice, you can bet I would have done an independent test anyway. Like I always tell people, not that I don't trust you, but so that others on the outside or looking back at the project will have more trust in the system itself. :smile: |
That all makes good sense to me. Those of us "paying attention" can just swap double checks with each other. Checks and balances, etc.
|
[QUOTE=chalsall;418023]Tangentially related to this...
Since DC is currently about 10 years behind LL, might it make sense to do some DC'ing (perhaps 10%) on candidates where the original LLer's machine(s) success rate isn't yet known at all? Perhaps broken down by year / version. Once this is done it would help narrow down suspect machines. I would be happy to devote, say, 50% to 75% of my cycles to such an effort (will still want to do some low DC'ing to ensure the ongoing sanity of my systems). Thoughts?[/QUOTE] It's a good idea... I did run through any systems that had no verified good/bad results and ran at least one double-check of their work, just to get them on the scoreboard, so to speak. Fortunately the # of systems without any double-checks being done on them at all was relatively low... probably because many of them have done a double-check of their own which gets them on the board. There are some interesting avenues for picking out some bad systems, one of them being to work through the 5100+ exponents that have been checked twice without a match. Of particular interest are the ~ 1000 of those where neither of the systems reported their result as suspect, but clearly one of them is wrong (maybe both...it happens). Quite a few of those don't have enough track record to say whether one or the other is bad. I can manually look at the different systems and make a guess, but it's not always clear cut which one I'd guess is the "winner". Anyway, point being that when a machine messes up the result and there were enough errors to mark it as suspect, that's actually a good thing... it means the error was noticed in some way and those exponents get handed out again as if it was a first check. But when a machine messes up and didn't even know something went awry, that's where it gets interesting... So if you're curious about those, I can spit out lists of exponents that need a confirming triple-check where both previous runs gave their effort a clean bill of health. I work on those every now and then when the well of obviously bad first-time checks runs a little dry. Quite a few of them (208 out of 1020 right now) are in the < 36M range. I could probably even narrow it down further to pull out just the exponents where both machines seem to be doing pretty well (or where both of them suck). I have some queries like those now, but they're kind of hard on the SQL server so I don't run them often. Me and my inefficient queries... For example, if I look for DC'd but mismatched exponents in the 35M-36M range where both systems involved have more good than bad, it's a list of only 32 exponents. One way or another with those, someone's track record is going to get worse once the triple-checks are done. :smile: Example: [CODE]exponent Bad Good Unk Solo Sus Mis CpuId 35926841 0 10 3 2 0 1 system #1 35926841 0 2 11 10 1 2 system #2[/CODE] I could guess that system #1 is correct, but it's far from a sure thing. Another example: [CODE]exponent Bad Good Unk Solo Sus Mis CpuId 35830493 1 2 3 2 2 3 system #1 35830493 0 152 82 78 0 4 system #2[/CODE] Okay, clearly system #2 is the right one, but then again, first time for everything, and system #1 is still technically "more good than bad". :smile: Actually in this case, my analysis already guesses that system #2 is the right one since it has a lot of good, zero bad. That's where the "1 bad" result on system #1 comes from, is my automatic "guess". But it could be wrong. Example 3: [CODE]exponent Bad Good Unk Solo Sus Mis CpuId 35726939 0 3 6 5 0 1 system #1 35726939 18 87 23 16 0 7 system #2[/CODE] Okay, this one is harder... system #2 has a lot of good, but geez, 18 bad. System #1 has 3 good/zero bad but that's not really a lot to work with in terms of saying "yeah, sure it's probably correct". |
[QUOTE=Madpoo;418021]
May 11 = 3 good in one day May 11 = non-suspect, but mismatched first result June 3 = bad June 6 = suspect June 16 = bad So, it's kind of spread out... After June 16th of 2014 that system got upgraded to a newer Prime95 and it was doing MUCH better .[/QUOTE] Yeah, those May/June bad/suspect results made me work on the memory timings. I eventually gave up on trying to run the memory at 2400MHz and switched to 2133. It's been fine ever since. You may also remember it was at this time that I implemented the server option to get a percent of DC assignments. So some good came out the machine becoming unreliable. |
[QUOTE=airsquirrels;418022]What is the group stance on self double/triple checking? I have a number of machines and would normally assign any suspect results to another machine for double check with my own resources, but if that is going to be viewed as less reliable later I will leave them for others to follow up on. I would just prefer to pick up after myself.[/QUOTE]
You have matched results from different machines? You report them. Let the headache for Madpoo to solve :razz: Especially in the 100M digits range. As the exponents get larger and larger, catching a rounding error as fast as possible saves AGES of work. And not all of us have ECC memory in their boxen. Do you have two similarly-fast CPUs or GPGPUs? Then run them in parallel for the same exponent, in different folders, and make a batch which tests the matching of the residues. Always check if they start with different shift values. For two Titans, for example, using cudaLucas, the residues are saved in the file name, so you only need to compare file names in the two folders. Let a perl/batch script running there which checks the content of the folders every 15 minutes and if new files are detected, then their name is compared. If a mismatch is detected, then stop (send ctrl-c) both cards and resume from the former saved file (substitute the file in the main folder, resume). Always keep the last two or three checkpoints (even if they match - I had the situation when the file name was ok, but disk errors or whatever, and the [U]content[/U] of the files was crap - in that case the next checkpoint doesn't match, and it doesn't match even if you retry few times, so you need to resume from the n-2nd, or n-3rd, checkpoint), delete the other/older checkpoint files (of course, comparing by content doesn't make sense, and you can not catch the content-change of the files, they have random shifts and their content is totally different). If someone is [U]always[/U] triple-checking your self-DC, and if your cards have a mismatch in few hundred million iterations, and yet, you save time for the project, comparing with the "classic" situation, when a test with errors is still continued to the end - the errors can happen in the beginning or in the end of any LL test, so statistically half test is saved, and the errors can happen in LL phase, as well as in DC phase, so another half test is saved, statistically. If a mismatch occurs in the classical situation, we don't know which one is wrong and a THIRD test must be run. Sometimes more tests are needed, to establish a good residue / correct test. Therefore, running two tests in parallel and spotting any error as soon as it happens, saves time for the project - even if somebody always TC your results. You can save [U]at least[/U] one test, so the "Madpoo's TC" comes into the project "for free". Think about it! And if you wasted a double amount of resources, why shouldn't you report both the LL and DC and get double credit? As long as the system accepts it... That is why we struggled to introduce shifting in cudaLucas, etc. Let others worry about you being a cheater or not. Those exponents which I (like in "me personally, LaurV") self-DC, I report. I proposed many times a way to "mark" them as "low priority for TC". Not for newbies, but at least for "established" users. Like myself :razz: It was not yet implemented. We still hope... I consider I speed up the project, and also have no worries in the heart that I could miss a prime. In the future, if somebody wants to TC those exponents, be my guest. Our grandsons will only need minutes to do work we need days for. Just my two coins. If you have LL and DC matching for the same exponent, with different shifts, submit them! |
[QUOTE=LaurV;418097]Sometimes more tests are needed, to establish a good residue / correct test. [/QUOTE]
What's the record for most mismatches for a single exponent? |
[QUOTE=Madpoo;418033]So if you're curious about those, I can spit out lists of exponents that need a confirming triple-check where both previous runs gave their effort a clean bill of health. I work on those every now and then when the well of obviously bad first-time checks runs a little dry. Quite a few of them (208 out of 1020 right now) are in the < 36M range.[/QUOTE]
OK, thanks for the additional details. And, yes please, generate some lists which you think would be productive. The work has to be done anyway. And so you know, I've put the three candidates you gave in your examples onto my systems. |
[QUOTE=LaurV;418097]You have matched results from different machines? You report them. Let the headache for Madpoo to solve :razz:
Especially in the 100M digits range. As the exponents get larger and larger, catching a rounding error as fast as possible saves AGES of work. And not all of us have ECC memory in their boxen. Do you have two similarly-fast CPUs or GPGPUs? Then run them in parallel for the same exponent, in different folders, and make a batch which tests the matching of the residues. [/QUOTE] This is very close to what I have been working on. I took it a bit further with clLucas. Since the FFT multiplication has a dependency chain it basically takes two concurrent LL tests on the same card to 100% utilize the card, I take advantage of the cache and memory locality to run each iteration twice in parallel on the same card (Through different compute units), the compare operation is very fast running on a few unused compute units, and can be pipelined while the FFTs trudge forward. We only need to lose efficiency during a rollback. This isn't as good at catching memory bit errors, but combined with the ECC already employed my hope is it brings the GPU reliability closer to a CPU while still maintaining the order of magnitude speed improvement. That will let me submit one LL result I have high confidence in and let someone else handle DC in a decade. At least in theory. I've went on about the merits of running two LLs in parallel (even on a primeness scale) and periodically comparing residues to prevent wasted cycles before. It did not seem to have much traction here. |
| All times are UTC. The time now is 23:05. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.