![]() |
Best use of large capacitor server
The flagship server in my little cluster is a Dual-2011v3 Xeon E5-2698. 32 total cores at 2.3 Ghz and 256GB of Ram running at 2133. The primary purpose of this is as a GPU host, but that is a lot of CPU and Ram to leave idle/running DCs/running LLs?
The CPU clock seems to make running LLs only occasionally worth it vs. my faster 8 core i7 systems, which also have a bit faster 2400 memory. I've seen alright performance if I dedicate many threads (Peaks around 8) towards a single exponent but it still doesn't seem to be the best use of the system. I've considered letting the CPU/RAM do 32 threads worth of P-1 work? What is the best use of this system? |
Someone more knowledgeable than me will probably recommend elliptic curve factoring using GMP-ECM on Mersenne numbers with [url=http://www.mersenne.org/report_ecm/]no known factors[/url], as high bounds can benefit from the large amount of memory you have available. How all that is done is beyond my little head :)
|
That sort of machine could greatly help nfs@home with postprocessing the larger numbers. [url]http://mersenneforum.org/forumdisplay.php?f=98[/url]
It would be capable of doing the largest jobs although it could take a few months for some of them. |
I second Henry's suggestion, as there are tasks to be done that require 32 or even 64GB RAM, a spec in short supply. There are tasks that require even more memory, but as he said those also take months to complete (and a partial solution is not easy to transfer to someone else, since "nobody" else has 128GB or more with which to finish it). NFS post-processing is nicely parallelized for your 16 cores, so you'd do these tasks at least twice as fast as those of us with mere 6-core i7s.
Within the mersenne project, GMP-ECM is indeed a potent use of massive memory. Madpoo is likely to have info for you about how many LL tests will nearly saturate your memory, while the rest of the cores can be spent on ECM. GMP-ECM uses massive memory but is massively more efficient at finding factors; again, Madpoo experimented with it, and can give you some info if you don't find his thread about ECM. LL testing is fine for any Intel-based machine, but your server has unique capabilities due to memory capacity, whilst the CPU cycles for LL are no more potent than a similar number of cores spread over simple desktops. |
I second VBCurtis opinion about GMP-ECM. A large amount of memory like the one you have available would be very useful searching for large factors of very small exponents.
You may find lots of info here: [URL="http://www.mersenneforum.org/showthread.php?t=20092"]http://www.mersenneforum.org/showthread.php?t=20092[/URL] |
Thanks for the input, I have some attachment to the Mersenne search but I would also like to do the most good possible with these resources.
I actually have a second system that's Dual 1.8Ghz 4-core CPUs but also has significant RAM resources, this weekend I will take a look at setting up some testing and see what makes sense. Multi-month jobs are not a problem, this is a dedicated number theory research cluster. |
[QUOTE=airsquirrels;408285]Thanks for the input, I have some attachment to the Mersenne search but I would also like to do the most good possible with these resources.
I actually have a second system that's Dual 1.8Ghz 4-core CPUs but also has significant RAM resources, this weekend I will take a look at setting up some testing and see what makes sense. Multi-month jobs are not a problem, this is a dedicated number theory research cluster.[/QUOTE] My suggestion then, would be to do some smaller jobs for nfs@home working out the best setup on your machines. It might be that it makes sense not to use all the cores on each cpu and use the rest for LL/PM1/ECM. As a postprocessor you might be able to encourage the largest jobs to be factoring mersenne numbers. In fact currently they are sieving 2^1285-1. |
[QUOTE=henryzz;408300]As a postprocessor you might be able to encourage the largest jobs to be factoring mersenne numbers. In fact currently they are sieving 2^1285-1.[/QUOTE]
This suggestion is misleading. Postprocessing for a gnfs-218 (2^1285-1) cannot (and will not) be done on a single machine (even with 64 cores). There are smaller postprocessing jobs in the pipeline though for which this server can do some good. |
[QUOTE=Batalov;408301]This suggestion is misleading. Postprocessing for a gnfs-218 (2^1285-1) cannot (and will not) be done on a single machine (even with 64 cores).
There are smaller postprocessing jobs in the pipeline though for which this server can do some good.[/QUOTE] I was under the impression that jobs like that just needed enough memory and would take many months on a pc like this. What would be the timeframe/memory capacity needed for such a job(cpus similar to above)? |
Something like the Lonestar cluster (I think Lonestar has by now been retired; there are other resources at XSEDE.)
Cf. [url]https://eprint.iacr.org/2012/444.pdf[/url] (Section 5). This job is only slightly larger. GNFS-218 is like a SNFS-335 (which is 1115 bits < 1285 so GNFS is clearly appropriate; M1061 was a 1061-bit job) |
Off topic: 768 bits is still the record for biggest GNFS? and 1061 bits for SNFS?
|
[QUOTE=airsquirrels;408285]Multi-month jobs are not a problem, this is a dedicated number theory research cluster.[/QUOTE]
If I may ask you, just what the heck do you do for a living? You are currently single handedly staying ahead of the GIMPS "churners" in the DCTF domain! |
[QUOTE=ATH;408314]Off topic: 768 bits is still the record for biggest GNFS? and 1061 bits for SNFS?[/QUOTE]
The CADO group did a bunch of 2^n-1 factorizations in the 1100s, all at once. Some of the relation-gathering work was re-used in multiple factorizations. See [url]https://eprint.iacr.org/2014/653.pdf[/url] for details. |
[QUOTE=chalsall;408316]If I may ask you, just what the heck do you do for a living?
You are currently single handedly staying ahead of the GIMPS "churners" in the DCTF domain![/QUOTE] I know - That was on purpose :) I would like to see DCTF get taken care of so we can move all our resources to the LL domain so we can accelerate this search for #49, though as always everyone's passion and use of their resources is their own. I'm hoping to turn on another 10,000GhzDay/Day or so of GPU power over the weekend. Once I get some more time I'm hoping to contribute on the Math and kernel side - the BitTorrent crowd had a bit more motivation/monetary support in creating highly tuned GCN-ISA kernels than those of us purely interested in research have had. To answer your question, my company does education and business focused wireless screen mirroring/sharing/collaboration software, but my background is encryption/security. Originally credit card processing followed by a bit of time in the medical industry (glad to be out). The number theory side of that is a passion stemming from that work, and we decided to kickoff an R&D investment by building a dedicated research cluster. It doesn't hurt that AMD's Fury X cards kick out a solid 1000GhzDay/Day for ~650 USD and 300 Watts of power. Two racks and a lot of liquid cooling later, we have a pretty good workhorse. Unfortunately I also believe I have contributed to AMD's supply crunch as I have been trying to get every card I can find. I'm just here to help with everyone else, if there are better places to put resources or immediate needs just let me know! |
[QUOTE=airsquirrels;408344]I'm hoping to turn on another 10,000GhzDay/Day or so of GPU power over the weekend.[/quote]
Yikes! I guess I won't be #1 for long lol [QUOTE]I'm just here to help with everyone else, if there are better places to put resources or immediate needs just let me know![/QUOTE] I noticed you're taking many exponents above the recommended level. Is there a reason why? |
[QUOTE=Mark Rose;408346]I noticed you're taking many exponents above the recommended level. Is there a reason why?[/QUOTE]
I wanted to find factors :) If it wasn't for that pesky effort doubling with every bit I would probably not stop till I succeeded in factoring every exponent I looked at. I took the first couple groups of assignments up to higher levels to get an idea of any GPU performance changes on longer jobs/different kernels and to verify my hit-rates. All of the newly taken assignments since yesterday are at recommended levels. On the topic in this thread, it does seem like I am in a fairly unique position with this much horsepower in one place/machine. One area I intend to explore is looking at working with multiple (very many) exponents at once. In the RSA-sized world I've done some work with pretty massive GCD trees along those lines, I'm not sure if there is something equivalent that would apply here. I need to brush up on the math involved, but at first glance the 2kp+1 aspect makes it seem like factors would be pretty rarely shared (Only when k = another p or a multiple of other p's). I'm sure all of this has been thought about and is spelled out in a paper or somewhere in this forum. I will continue researching, but I do intend to at least setup the large servers CPUs to look at the other types of work suggested in this thread. |
[QUOTE=airsquirrels;408348]On the topic in this thread, it does seem like I am in a fairly unique position with this much horsepower in one place/machine. One area I intend to explore is looking at working with multiple (very many) exponents at once. In the RSA-sized world I've done some work with pretty massive GCD trees along those lines, I'm not sure if there is something equivalent that would apply here. I need to brush up on the math involved, but at first glance the 2kp+1 aspect makes it seem like factors would be pretty rarely shared (Only when k = another p or a multiple of other p's).
I'm sure all of this has been thought about and is spelled out in a paper or somewhere in this forum. I will continue researching, but I do intend to at least setup the large servers CPUs to look at the other types of work suggested in this thread.[/QUOTE] I found reading the source code of mfaktc insightful. The 0.21 version is far cleaner if you looked at older versions previously. I'm not sure what the state of mfakto's code as I don't use it, but I understand it's a fork that implements the same general algorithm. |
[QUOTE=airsquirrels;408348]but at first glance the 2kp+1 aspect makes it seem like factors would be pretty rarely shared (Only when k = another p or a multiple of other p's). [/QUOTE]
They are never shared, just to point out. There is a theorem about it, which you can prove very simple, assuming some odd prime x divides 2^p-1 and 2^q-1, it will divide the difference, i.e 2^(p-q)-1. Repeat (already recognize Euclid's algorithm at the exponents?) then you reach the fact that x divides 2^gcd(p,q)-1 which is impossible if (p,q)=1 (here they are both prime, but this is not required, only they be prime to each other is enough). Related to "going over the recommended TF level", if you use Fury X, you can safely go 1 or 2 bits [U]over[/U] the recommended level, as their ratio LL to TF is very low (they are much better doing TF than other cards, but worse at doing LL than the same other cards). But [U]please![/U] don't go higher than that! We know that finding factors is fun, but you waste your time without bringing any benefit for the project. For example, assuming that 72 is the recommended level for some range of expos you are working with, in the same time you use to factor an exponent from 72 to 73, you could factor 2 [U]other[/U] exponents to 72, and have 3 exponents TF-ed and ready for LL, instead of only one (even if that one is factored higher). In the same time you use to factor to (73 and then) 74, you could TF other [B][U]six[/U][/B] expos to 72, and have 7 ([U]seven[/U]) expos ready for LL, instead of one. You will also find MORE factors in this way (that is true! the probability to find a factor of 73 or 72 bits is about the same as 72 bits, but you spend a double time). And you will help the project more, too. :rant: |
[QUOTE=airsquirrels;408344]I'm just here to help with everyone else, if there are better places to put resources or immediate needs just let me know![/QUOTE]
Wow! You've got some serious kit! Thanks for helping out! :smile: To answer your question, what you're doing is great -- many here would like to see DCTF "Die baby die!". On the other hand, we're currently /really/ tight "feeding" the P-1'ers at 75 bits. So, if you're so inclined, doing a bit of "What Makes Sense" or "Let GPU72 Decide" LLTF'ing would be much appreciated -- these options will give your machines candidates not yet P-1'ed. But, again, entirely up to you. |
[QUOTE=LaurV;408356]They are never shared, just to point out. There is a theorem about it, which you can prove very simple, assuming some odd prime x divides 2^p-1 and 2^q-1, it will divide the difference, i.e 2^(p-q)-1. Repeat (already recognize Euclid's algorithm at the exponents?) then you reach the fact that x divides 2^gcd(p,q)-1 which is impossible if (p,q)=1 (here they are both prime, but this is not required, only they be prime to each other is enough).
[/QUOTE] Thanks! I had a gut feeling that was the case but had not yet settled down to look into it. Of course as with most math, this raises just as many new questions and potential approaches as it eliminates... [QUOTE=LaurV;408356]Related to "going over the recommended TF level" ... :rant:[/QUOTE] I totally agree regarding going over the recommended level, that was only temporary while I was fiddling with the hardware configuration. [QUOTE=chalsall;408381]...So, if you're so inclined, doing a bit of "What Makes Sense" or "Let GPU72 Decide" LLTF'ing would be much appreciated ....[/QUOTE] I've put some LL to 75 work in queue for a few cards to help out (And immediately found a factor on the first assignment!), once I get the hardware situation settled this weekend I will certainly look more closely at the distribution of work. That said, if I can get under full steam (~24GhzDay/Day) three weeks of work should nearly clear the DCTF pile to current release levels. Of course that all depends on adequately addressing the power, cooling, and supporting infrastructure issues.... |
[QUOTE=airsquirrels;408387]I've put some LL to 75 work in queue for a few cards to help out (And immediately found a factor on the first assignment!), once I get the hardware situation settled this weekend I will certainly look more closely at the distribution of work.[/QUOTE]
Nicely nicely... :smile: [QUOTE=airsquirrels;408387]That said, if I can get under full steam (~24GhzDay/Day) three weeks of work should nearly clear the DCTF pile to current release levels. Of course that all depends on adequately addressing the power, cooling, and supporting infrastructure issues....[/QUOTE] OMG! If you pull that off, you'll almost double our aggregate throughput! One advantage of generating a large buffer of "optimally TF'ed DC candidates" is that that is where the "churners" live. Churner's refer to new uses who haven't yet proven their commitment to the project; often overclockers who use Prime95 / mprime to test the stability of their machines and don't read the "Only testing" language in the [G]UI. One fear I've always had in the back of my mind is when the next MP is found and announced -- there is always a surge of new users who don't appreciate just how much work is involved. The last time this happened we had to release for LL'ing candidates not yet optimally TF'ed. Fortunately (?) most of these were never actually completed and were subsequently TF'ed appropriately. But, around here balance is everything; eliminating DCTF would be cool, but we also have to "feed" the P-1'ers and LL'ers. |
I have noticed that a ton of CPU cycles in GIMPS get wasted on half completed assignments by users that abandon, even amongst some of my coworkers who I convinced to run prime95 on their machines... Until they stopped. In 2015 with fast and very present Internet wouldn't it be possible to actively checkpoint to PrimeNet so other users could pick up where an exponent was left off? Just wait till the residue for a specific iteration is small and upload a checkpoint :)
|
[QUOTE=airsquirrels;408401]I have noticed that a ton of CPU cycles in GIMPS get wasted on half completed assignments by users that abandon, even amongst some of my coworkers who I convinced to run prime95 on their machines... Until they stopped. In 2015 with fast and very present Internet wouldn't it be possible to actively checkpoint to PrimeNet so other users could pick up where an exponent was left off? Just wait till the residue for a specific iteration is small and upload a checkpoint :)[/QUOTE]
Yes, it would be possible. The checkpoint files aren't that big -- a few MB a piece. The problem is one of resources: someone to code the necessary changes and paying for the TBs of storage. |
[QUOTE=airsquirrels;408387]That said, if I can get under full steam (~24GhzDay/Day) three weeks of work should nearly clear the DCTF pile to current release levels. Of course that all depends on adequately addressing the power, cooling, and supporting infrastructure issues....[/QUOTE]
There is about 9 PHz days of DCTF work. Not all of it is being held by GPU72. Even at 30 THz-d/d, that's still going to take 10 months... but it would be pretty awesome to have that cleared in a year! |
[QUOTE=airsquirrels;408401]I have noticed that a ton of CPU cycles in GIMPS get wasted on half completed assignments by users that abandon, even amongst some of my coworkers who I convinced to run prime95 on their machines... Until they stopped. In 2015 with fast and very present Internet wouldn't it be possible to actively checkpoint to PrimeNet so other users could pick up where an exponent was left off?[/QUOTE]
There is a downside to implementing this. The quality of the result is only as good as the worst computer to work on the LL test. If an overclocker puts a few million low quality iterations in, then a highly reliably machine may waste tens of millions iterations finishing the LL test. |
[QUOTE=airsquirrels;408401]Just wait till the residue for a specific iteration is small and upload a checkpoint :)[/QUOTE]
You don't have to wait for a specific iteration, you can save the residue at every iteration, but to limit I/O writes it is usually done only every 5-15 minutes. |
[QUOTE=Mark Rose;408404]Yes, it would be possible. The checkpoint files aren't that big -- a few MB a piece. The problem is one of resources: someone to code the necessary changes and paying for the TBs of storage.[/QUOTE]
Hmm, well one problem at a time I suppose. It would also add a pretty significant benefit in that double checks could fail as soon as they don't match a checkpoint instead of needing a complete run. Last I heard a significant percentage still fail, so that's a not so insignificant amount of resources. Ideally that would lead to less DC backlog and also the ability for those 'slower' computers to contribute to the LL by advancing exponents the same way we do with TF, one iteration level at a time. I know I'm new here so I don't want to overreach, but I'm just as happy to contribute code and storage when the time comes. |
[QUOTE=Prime95;408406]There is a downside to implementing this. The quality of the result is only as good as the worst computer to work on the LL test. If an overclocker puts a few million low quality iterations in, then a highly reliably machine may waste tens of millions iterations finishing the LL test.[/QUOTE]
One (perhaps unpopular) way to mitigate this would be real time DC, each iteration group assigned to two users. The benefit of catching and correcting errors early would probably save enough resources to be worth it. How many resources are wasted by the first low quality computer completing weeks of work for an exponent when it could be detected as having made a mistake within the first few days. I imagine it would also be easier to quickly flag and quarantine bad actors in that case |
[QUOTE=VictordeHolland;408407]You don't have to wait for a specific iteration, you can save the residue at every iteration, but to limit I/O writes it is usually done only every 5-15 minutes.[/QUOTE]
Sorry for the triple reply, my thought on waiting was that at some points the residue is going to be much smaller than others, saving storage and bandwidth. Perhaps that is a premature optimization. |
[QUOTE=Mark Rose;408405]There is about 9 PHz days of DCTF work. Not all of it is being held by GPU72. Even at 30 THz-d/d, that's still going to take 10 months... but it would be pretty awesome to have that cleared in a year![/QUOTE]
771.7 THz Days by my calculations. This includes everything still to be working, not just that held by GPU72. That being said, it's been interesting watching the [URL="https://www.gpu72.com/reports/estimated_completion/primenet/"]Estimated Days to Complete Trial Factoring for all Candidates[/URL] drop precipitously in the DCTF table the last few days! :smile: |
[QUOTE=airsquirrels;408412]Sorry for the triple reply, my thought on waiting was that at some points the residue is going to be much smaller than others, saving storage and bandwidth.[/QUOTE]
Likely also wrong. My understanding is the true residue is about as close to true noise as you can get. Read: uncompressable. (Unless, of course, the candidate is a MP, at which point at the very last step it's *very* compressible!) |
[QUOTE=chalsall;408416]771.7 THz Days by my calculations.[/quote]
[url]http://imgur.com/szouqSx[/url] [quote] That being said, it's been interesting watching the [URL="https://www.gpu72.com/reports/estimated_completion/primenet/"]Estimated Days to Complete Trial Factoring for all Candidates[/URL] drop precipitously in the DCTF table the last few days! :smile:[/QUOTE] Yeah, it was 1400 not long ago! |
[QUOTE=chalsall;408418]Likely also wrong.
My understanding is the true residue is about as close to true noise as you can get. Read: uncompressable. (Unless, of course, the candidate is a MP, at which point at the very last step it's *very* compressible!)[/QUOTE] Well, there is certainly some debate as to how truly chaotic the residue sequence is, but for practical purposes and until we advance the state of the theory there we can treat it as a truly random number between 1 and 2^p-2. Any specific iteration (10,000, etc.) is going to be essentially random, but there will be points in the sequence where the residue is much closer to 1, and thus would take less bits to store. The more I think about this the less it is probably worth the effort to consider optimizing for those opportunities to checkpoint cheaply. |
[QUOTE=airsquirrels;408421]...for practical purposes and until we advance the state of the theory there we can treat it as a truly random number between 1 and 2^p-2.[/QUOTE]
Correct. For this reason, the top 20 bits will be all-zeros once in a million iterations; is that good savings? Not at all. [QUOTE=airsquirrels;408421]but there will be points in the sequence where the residue is much closer to 1[/QUOTE] The top 64 bits will be zeros ...never (for practical purposes). And even that - is it 'much closer to 1'? No. This data is also obviously incompressible (because it is truly random). |
[QUOTE=Batalov;408423]Correct. For this reason, the top 20 bits will be all-zeros once in a million iterations; is that good savings? Not at all.
The top 64 bits will be zeros ...never (for practical purposes). And even that - is it 'much closer to 1'? No. This data is also obviously incompressible (because it is truly random).[/QUOTE] I will concede to that irrefutable logic, I am far too used to dealing with much smaller residues (2048 bit moduli). It would still be possible to have frequent hash-based validity checks without much storage/bandwidth cost and less frequent full residue check points for resuming. |
[QUOTE=airsquirrels;408411]I imagine it would also be easier to quickly flag and quarantine bad actors in that case[/QUOTE]
Just so you know, Aaron (AKA madpoo) has been working on this problem space. Please see [URL="http://mersenneforum.org/showthread.php?t=20372"]this thread[/URL] for details. At the end of the day, it probably makes more sense for trusted users / machines to double / triple check candidates who were initially LL'ed by suspect machines than try to do a major overall off the client and server code-base. Separately, as George mentioned, an untrusted machine might taint the results of a trusted machine. Further, there's the whole question about "credit": who gets to claim (or at least be named in) the find? The machine/user who did the last iterations, the machine/user who did the majority, or everyone who did a few? (Hint: if the latter was the case, some would do a few iterations on many candidates!) |
Dear airsquirrels,
Your machines are well appreciated on sieving for NFS@Home at [url]http://escatter11.fullerton.edu/nfs/[/url]. Kind Regards, Carlos |
[QUOTE=Batalov;408309]Something like the Lonestar cluster (I think Lonestar has by now been retired; there are other resources at XSEDE.)
Cf. [url]https://eprint.iacr.org/2012/444.pdf[/url] (Section 5). This job is only slightly larger. GNFS-218 is like a SNFS-335 (which is 1115 bits < 1285 so GNFS is clearly appropriate; M1061 was a 1061-bit job)[/QUOTE] That specifies the memory usage for the matrix to be 40GB. The system mentioned above has 128GB. It also mentions 35 CPU years for the linear algebra. That is presumably including the inefficiency that is gained by running more cores on multiple machines. I still would be surprised it the system mentioned above couldn't finish this job in less than a year. As far as I am aware a lot of the reason a cluster is used is because people don't usually want to commit an expensive machine(that amount of memory isn't cheap) to one job for 6 months+. As memory gets cheaper with DDR4 I imagine that larger jobs will be done on home pcs again. |
[QUOTE=VBCurtis;408248]I second Henry's suggestion, as there are tasks to be done that require 32 or even 64GB RAM, a spec in short supply. There are tasks that require even more memory, but as he said those also take months to complete (and a partial solution is not easy to transfer to someone else, since "nobody" else has 128GB or more with which to finish it). NFS post-processing is nicely parallelized for your 16 cores, so you'd do these tasks at least twice as fast as those of us with mere 6-core i7s.
Within the mersenne project, GMP-ECM is indeed a potent use of massive memory. Madpoo is likely to have info for you about how many LL tests will nearly saturate your memory, while the rest of the cores can be spent on ECM. GMP-ECM uses massive memory but is massively more efficient at finding factors; again, Madpoo experimented with it, and can give you some info if you don't find his thread about ECM. LL testing is fine for any Intel-based machine, but your server has unique capabilities due to memory capacity, whilst the CPU cycles for LL are no more potent than a similar number of cores spread over simple desktops.[/QUOTE] I did testing to see how best I could use my dual chip boxes for LL testing, and "aurashift" did some as well on his systems with up to 18 cores per CPU (I think). Mine were only 10-core chips. See this thread for the gory details... I think it's where we discussed most of it: [URL="http://www.mersenneforum.org/showthread.php?t=13185"]http://www.mersenneforum.org/showthread.php?t=13185[/URL] In short, on a good Xeon chip you can continue to add all of the cores on one CPU (and even 1 core on the other CPU) with decreasing gains in LL performance, but still slightly faster with each core. It's only when you start adding additional cores on the other CPU (past the first one) where performance actually starts to get worse, as you flood the QPI channel. That thread was specifically about larger exponents, but the same holds true for smaller ones as well. It wasn't until I got down to some really tiny exponents (like sub 5M) before I noticed that cores were waiting on memory, which was weird. If I'm testing a 50M exponent, all of the cores would be 100%, but on a 5M exponent, the first core is 100% and the rest might be between 70-90% utilized. Oh well... they finish really fast at any rate. :smile: For GMP-ECM doing ECM work, you can get one-per-core running and if you have sufficient free RAM, set the parameters of each instance to use as much as it needs. Depending on the exponent in question you may be looking at a pretty large chunk when it's doing stage 2. I was running curves on small exponents like M1277 and some pretty large bounds... stage 2 could take 25-30 GB per instance (I think that was k=2) so obviously if you wanted to do that on 32 cores, you'd want a LOT of RAM. :) But doing ECM on "normal" exponents using "normal" bounds wouldn't use as much as that. I just had this thing about 1277 since it's the lowest one without a known factor yet. Some other thread has my gory details about getting GMP-ECM working well with having Prime95 doing stage 1 and feeding that to GMP-ECM. It's not the easiest process but if that's something you're interested in, then it'll work. Depending on your OS (Windows or Linux), the actual process of launching multiple gmp-ecm instances and setting affinity for each one would vary. I think I went into enough detail on how I did it with Windows to get you started, should you go down that path. For now I'm still devoting resources to triple-checking exponents where the first two didn't match, so I'm not currently doing any ECM work... I may go back to it at some point. |
[QUOTE=chalsall;408391]...One fear I've always had in the back of my mind is when the next MP is found and announced -- there is always a surge of new users who don't appreciate just how much work is involved. The last time this happened we had to release for LL'ing candidates not yet optimally TF'ed. Fortunately (?) most of these were never actually completed and were subsequently TF'ed appropriately.[/QUOTE]
Personally I think I'd force new accounts to do one or two double-checks first before they could do any first-time checks. [LIST=1][*]DC is way behind first time, so the help would be appreciated[*]they're unproven[*]bad machines from newcomers might not be discovered as bad for years (I'm seeing that now, for machines that were "alive" 3+ years ago and we're just now discovering that nearly all of their tests were bad)[/LIST] Of course people who climb aboard the GIMPS train after a new discovery are probably doing so in hopes of finding another new one, as if another one would be found in the next few days... so DC work on their first assignment or two would be a buzz kill. |
[QUOTE=airsquirrels;408409]Hmm, well one problem at a time I suppose. It would also add a pretty significant benefit in that double checks could fail as soon as they don't match a checkpoint instead of needing a complete run. Last I heard a significant percentage still fail, so that's a not so insignificant amount of resources. Ideally that would lead to less DC backlog and also the ability for those 'slower' computers to contribute to the LL by advancing exponents the same way we do with TF, one iteration level at a time.
I know I'm new here so I don't want to overreach, but I'm just as happy to contribute code and storage when the time comes.[/QUOTE] It's an interesting point... my current "best guess" based in historical data is that 3-4% of first time tests are bad. Primenet saves the final residue (or the last 64 bits of it anyway). Maybe George or someone could see some benefit in saving the partial residue at the 50% point as well, so that a double-checker would have some idea at the halfway point of whether or not they match that first check. I'm not entirely sure if that would be useful or not... it will either match by that point or not. If it matches, it could still be different at the end, but you'd complete the test to know. If it mismatches, the first one may be the bad one, so you'd still need to complete the test to know. Either way you would do the full test but maybe there'd be some interest in knowing way ahead of time if a mismatch had occurred. Unfortunately there's probably zero way to know at what point a bad result went off the rails... it could have been in the first hundred iterations, or it could have been the final one. So I'm just arbitrarily saying "50%". Maybe the rare people that save their residues and do simultaneous runs of the same work and have had mismatches occur could shed some light on "at what % did the results diverge?" |
[QUOTE=Madpoo;408530]Personally I think I'd force new accounts to do one or two double-checks first before they could do any first-time checks.
[LIST=1][*]DC is way behind first time, so the help would be appreciated[*]they're unproven[*]bad machines from newcomers might not be discovered as bad for years (I'm seeing that now, for machines that were "alive" 3+ years ago and we're just now discovering that nearly all of their tests were bad)[/LIST] [B]Of course people who climb aboard the GIMPS train after a new discovery are probably doing so in hopes of finding another new one, as if another one would be found in the next few days... so DC work on their first assignment or two would be a buzz kill.[/B][/QUOTE] I imagine Davieddy rubbing his hands and muttering, "Fools! I told you so!" This, and related issues were always favorite hobby horses of his. :davieddy: |
[QUOTE=kladner;408532]I imagine Davieddy rubbing his hands and muttering, "Fools! I told you so!" This, and related issues were always favorite hobby horses of his. :davieddy:[/QUOTE]
:ttu: Richardson approves! |
[QUOTE=Madpoo;408531]It's an interesting point... my current "best guess" based in historical data is that 3-4% of first time tests are bad.
Primenet saves the final residue (or the last 64 bits of it anyway). Maybe George or someone could see some benefit in saving the partial residue at the 50% point as well, so that a double-checker would have some idea at the halfway point of whether or not they match that first check. I'm not entirely sure if that would be useful or not... it will either match by that point or not. If it matches, it could still be different at the end, but you'd complete the test to know. If it mismatches, the first one may be the bad one, so you'd still need to complete the test to know. Either way you would do the full test but maybe there'd be some interest in knowing way ahead of time if a mismatch had occurred. Unfortunately there's probably zero way to know at what point a bad result went off the rails... it could have been in the first hundred iterations, or it could have been the final one. So I'm just arbitrarily saying "50%". Maybe the rare people that save their residues and do simultaneous runs of the same work and have had mismatches occur could shed some light on "at what % did the results diverge?"[/QUOTE] My thought with checkpoints would be that you don't need to complete or redo the entire test. If there are a few checkpoints at the point the double checking machine discovers a mismatch it would simply revert to the last checkpoint that matched, change the shift values around, and proceed to the mismatched checkpoint. The rest of that much smaller check would tell you with pretty good certainty if the original was wrong or if your machine is inconsistent with itself. Having DC so far behind is the real problem I am looking at here, either lots of resources are spent on low chance of learning anything new or we spend very little resources on double checks and a mistake somewhere along the line 'misses' an important Mersenne. What is the current stat for how many double checks are started and never completed? I'm not sure credit would be as important for DC and the whole project would benefit from all the work of churners who abandon the current low granularity work units. |
[QUOTE=kladner;408532]I imagine Davieddy rubbing his hands and muttering, "Fools! I told you so!" This, and related issues were always favorite hobby horses of his. :davieddy:[/QUOTE]
I don't really know much about other distributed computing projects. Do any other ones require that new systems "prove themselves worthy" in some way before they can get cooking on important stuff? I'm just thinking to myself that 3-4% (or even if it's just the known 1-2%) is a pretty high error rate for any other endeavor... it wouldn't be tolerated in many situations, and it's just a good thing that we double check our work (and hopefully not double checking our own work) :smile: |
I think rather than punish the vast majority who have good systems, it would be best to prioritize double checks of work done by systems with no prior matching result. That could be done by by giving those assignments in priority to the Cat 4 DC workers who have completed at least one assignment with a matching residue. Let the new Cat 4 DC'ers get assignments from machines with at least one good result to give them the best chance of turning into proven workers. Errors will still happen, but this strategy would probably cut down on the quadruple checks.
|
[QUOTE=Madpoo;408574]I don't really know much about other distributed computing projects. Do any other ones require that new systems "prove themselves worthy" in some way before they can get cooking on important stuff?[/QUOTE]
In many other DC projects you can verify the solution once you have it. No, I don't talk about bitcoin, hehe, but for example, think about fold-it, it takes ages to roll that protein around itself, but when you did it, the solution is plain and clear, and easy verifiable. Nothing like Lucas Lehmer test... |
[QUOTE=Madpoo;408529]
Some other thread has my gory details about getting GMP-ECM working well with having Prime95 doing stage 1 and feeding that to GMP-ECM. [/QUOTE] [URL="http://mersenneforum.org/showthread.php?t=20092&page=4"]http://mersenneforum.org/showthread.php?t=20092&page=4[/URL] |
[QUOTE=airsquirrels;408546]What is the current stat for how many double checks are started and never completed? I'm not sure credit would be as important for DC and the whole project would benefit from all the work of churners who abandon the current low granularity work units.[/QUOTE]
I haven't done a deep (read: highly accurate) query on that in a few months, but approximately 97% (+- 1%) of assignments to new users are never completed. |
[QUOTE=Madpoo;408530]Personally I think I'd force new accounts to do one or two double-checks first before they could do any first-time checks.[/QUOTE]
And that's now the case -- ever since George moved the Churners down to the DCTF Cat 4 range. |
[QUOTE=chalsall;408621]I haven't done a deep (read: highly accurate) query on that in a few months, but approximately 97% (+- 1%) of assignments to new users are never completed.[/QUOTE]
Is there data available to query how much work that is and how far those users get into it before abandoning? I'm curious if it's 97% of new users abandon their assignments/gimps but the lost work is only 1% or less of our throughput or if it is more significant. |
[QUOTE=airsquirrels;408635]Is there data available to query how much work that is and how far those users get into it before abandoning?[/QUOTE]
Not easily immediately available over a large temporal domain, but doing a query against [URL="http://www.mersenne.org/assignments/?exp_lo=37700000&exp_hi=40000000&execm=1&exfirst=1&exp1=1&extf=1"]Primenet like this[/URL] might give you a reasonable idea as to what we face. [QUOTE=airsquirrels;408635]I'm curious if it's 97% of new users abandon their assignments/gimps but the lost work is only 1% or less of our throughput or if it is more significant.[/QUOTE] Please note that I said 97% of assigned work, not 97% of new users. I've learnt (the hard way, over many years) that language is important.... :smile: |
A [URL="http://www.mersenne.org/report_exponent/?exp_lo=37769773&full=1"]random example[/URL] from Chris's query, this was dropped many times, and not yer completed, and well.. not exactly random picked, I cheated a bit, I picked it because it says over 23000 (!) days till completion, so there is actually no chance that it will be completed this time either. So, expect another drop... :smile:
|
[QUOTE=Prime95;408406]There is a downside to implementing this. The quality of the result is only as good as the worst computer to work on the LL test. If an overclocker puts a few million low quality iterations in, then a highly reliably machine may waste tens of millions iterations finishing the LL test.[/QUOTE]
Maybe use a points system? The more points an exponent file has, the less trusted it is. And when you start an exponent fresh, that's zero points, which is considered best. And then apply a simple math problem along with the predicted time it would take a new machine to complete from scratch, and voila, I've solved a problem in my mind that would probably take 100s of man hours to implement. (sorry, realized at the end how cheeky I sounded. But a trustworthiness algorithm, even a bad one, would be cool) |
I think madpoo and chalsall, among others, have put a lot of effort into defining reliability, and detecting it, or the lack thereof.
|
[QUOTE=kladner;408910]I think madpoo and chalsall, among others, have put a lot of effort into defining reliability, and detecting it, or the lack thereof.[/QUOTE]
One point to make, in terms of figuring out how reliable systems are, is that we often won't even have a clue how reliable a system is until at least a couple of their results have been double checked. Given the gap between first time and double checks, that means it could be years and years before we find the bad systems. We're just now going through the 34M exponents for DC work, and many of them had their first time check back in 2008-2009. That represents the best time frame at the moment for figuring out which of those systems were spitting out bad results. As it is right now, I've had limited success in finding machines that have been spitting out more recent results. I've found one or two, and that's only because they did a double check of some smaller number (28M - 35M) that was bad and had now been triple checked, and then they went on to do a bunch of larger first time checks in the 55M and up range. I have a couple bad machines that I've been doing my awesome "strategic double checks" where most of their work was 55M and up, and found most of those are bad as well. But like I said, that's ONLY because they did a small double check at some point that helped us figure out how bad that system was, years and years before we would have found out otherwise. Thus my suggestion that if a machine is going to be sticking around for a while and doing lots of work, hey, do a DC here and there and help us help you. If your machine generates crappy residues, you really would want to know, otherwise you're really just wasting your own time and murdering poor innocent electrons for no reason. :smile: |
Perhaps all new machines should be forced to complete one DC before they can take any LL work?
|
[QUOTE=LaurV;408651]A [URL="http://www.mersenne.org/report_exponent/?exp_lo=37769773&full=1"]random example[/URL] from Chris's query, this was dropped many times, and not yer completed, and well.. not exactly random picked, I cheated a bit, I picked it because it says over 23000 (!) days till completion, so there is actually no chance that it will be completed this time either. So, expect another drop... :smile:[/QUOTE]
I just looked for exponents with a lot of "churn" (for lack of a better word). None had more than 10 expired assignments, but quite a few (593) had 9 assignments over the years that all expired. These are *JUST* double check assignments. e.g. [URL="http://www.mersenne.org/M37326019"]M37326019[/URL] Looking at exponents that haven't been tested at all yet, there's still churn but so far no exponents had 8+ expired assignments, and only three had 7 expired assignments. For the nerds: [URL="http://www.mersenne.org/M69314629"]M69314629[/URL] [URL="http://www.mersenne.org/M69470263"]M69470263[/URL] [URL="http://www.mersenne.org/M69970099"]M69970099[/URL] On a grander scale, ~137K exponents for DC's have been assigned and expired 2+ times. ~44K exponents for 1st time checks. Overall it adds up to 730K assignments (just DC and LL) that were expired. It's not insignificant. 458K of those never checked in since the time they were assigned... they just got the assignment and then disappeared. Nearly 530K never checked in after a day had passed. Summary: Yes, lots of churn. Might be interesting to see, of those, how many are from runaway users... users who have never submitted one finished assignment. (might be only 33K assignments where the user never returned anything at all... but it was a quick query so I won't vouch for it) |
I queued the last one for LL, because I like how the digits are arranged :smile:
Remark that the other two are already assigned for LL, as they are TF-ed to 75. This one is TF-ed to 74 only, and is currently assigned to GPU72 for TF to 75, so I can not get it "legally" assigned for LL. I tried to get it to TF by myself, from GPU72, but is not available (got a 71M instead, which I returned to the pool). Maybe is assigned already to another TF-er, or maybe the form went nuts (it happened in the past). Chris, please reserve it for me if it is not assigned. Otherwise no problem, my p95 may grab it after it is reported as TF-ed to 75, if it is not reassigned immediately to someone else (low chances, in this range, so most probably my p95 which connects to the server once per day will get it, as it is already added to the worktodo, without the N/A key). |
[QUOTE=LaurV;408930]I queued the last one for LL, because I like how the digits are arranged :smile:
Remark that the other two are already assigned for LL, as they are TF-ed to 75. This one is TF-ed to 74 only, and is currently assigned to GPU72 for TF to 75, so I can not get it "legally" assigned for LL. I tried to get it to TF by myself, from GPU72, but is not available (got a 71M instead, which I returned to the pool). Maybe is assigned already to another TF-er, or maybe the form went nuts (it happened in the past). Chris, please reserve it for me if it is not assigned. Otherwise no problem, my p95 may grab it after it is reported as TF-ed to 75, if it is not reassigned immediately to someone else (low chances, in this range, so most probably my p95 which connects to the server once per day will get it, as it is already added to the worktodo, without the N/A key).[/QUOTE] I guess I just feel bad for those poor exponents... "always a bridesmaid, never a bride" kind of thing. LOL |
[QUOTE=LaurV;408930]Chris, please reserve it for me if it is not assigned. Otherwise no problem, my p95 may grab it after it is reported as TF-ed to 75, if it is not reassigned immediately to someone else (low chances, in this range, so most probably my p95 which connects to the server once per day will get it, as it is already added to the worktodo, without the N/A key).[/QUOTE]
Ah... Sorry guys... I've been busy moving several tens of thousands of litres of water back into our storage tanks. Turns out the "Pool Guys" didn't get the plumbing correct. Please stand by.... |
| All times are UTC. The time now is 21:19. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.