![]() |
[QUOTE=James Heinrich;290902]Not yet. :no:
I'm experiencing infinitely more difficulty setting up a development environment than I expected. [URL="http://en.wikipedia.org/wiki/WIMP_%28software_bundle%29"]WIMP[/URL] != [URL="http://en.wikipedia.org/wiki/LAMP_%28software_bundle%29"]LAMP[/URL] (or even [URL="http://en.wikipedia.org/wiki/WAMP"]WAMP[/URL] as I have on my home/development server).[/QUOTE] James, My factor for M[URL="http://www.mersenne.org/report_exponent/?exp_lo=58703263&exp_hi=10000&B1=Get+status"]58703263[/URL] showed up correctly today: [CODE]Manual testing 58703263 [B]F[/B] 2012-03-02 16:45 0.0 920694316080604322623 1.7365[/CODE] Did you make some changes? |
[QUOTE=flashjh;291636]Did you make some changes?[/QUOTE]No. I'm working through changes, but nothing has been published to the site yet.
I assume you mean that your TF factor was correctly credited as TF (instead of P-1)? Your example is a [url=http://mersenne-aries.sili.net/M58703263]relatively small factor[/url] (<2^70) so that's probably expected. It's only when you submit factors larger than what PrimeNet considers "reasonable" for TF that it falsely assumes it must've come from P-1. |
[QUOTE=James Heinrich;291638]No. I'm working through changes, but nothing has been published to the site yet.
I assume you mean that your TF factor was correctly credited as TF (instead of P-1)? It is a relatively small factor (<2^70) so that's probably expected. It's only when you submit factors larger than what PrimeNet considers "reasonable" for TF that it falsely assumes it must've come from P-1.[/QUOTE] Ah, that explains some from the past, as well. Thanks for the update. |
I got my first CUDA capable card (560 Ti) a little over a month ago, and have been running mfaktc on 64-bit Linux. I have a few questions.
1. Can anyone explain why Compute Capability 2.1 is about "half" as fast as 2.0 for running mfaktc? [Yes, I know there are a billion or so fewer transistors, but what specific feature/function do the 2.0 cards have that 2.1 lacks that makes such a huge difference to mfaktc.] 2. I am disappointed in how much CPU it takes to feed my GPU. I would happily give up a fraction of my GPU performance to get back my CPU performance. [It's no trouble consuming nearly two full i7 cores to feed the GPU via two instances of mfaktc.] mfaktc is compiled with a minimum SievePrimes=5000. I have tweaked the code to let me run at SievePrimes=1000. Is there a discussion as to why the user shouldn't be allowed to set a lower SievePrimes than 5K? 3. Has anyone considered running the sieving on the GPU? Is it just that nobody has written the code or is there a reason the idea was rejected? [If one were running the sieve and the trial factoring on the same processor, the proper tradeoff between sieving and trial factoring seems pretty clear -- If trial factoring can test, say, 250 million candidates per second, then sieving should stop at the point it can no longer remove more than 250 million candidates per second.] Thanks in advance! |
[url]http://www.mersenneforum.org/showthread.php?p=281245#post281245[/url]
^ That's the answer for your first question. For the other two, I'm not sure, though I'll cast my vote again for on-GPU sieving. |
Hi!
[QUOTE=rcv;291954]2. I am disappointed in how much CPU it takes to feed my GPU. I would happily give up a fraction of my GPU performance to get back my CPU performance. [It's no trouble consuming nearly two full i7 cores to feed the GPU via two instances of mfaktc.] [/QUOTE] Just run CudaLucas to get your CPU free. Primenet needs more LL and less TF! [QUOTE=rcv;291954]mfaktc is compiled with a minimum SievePrimes=5000. I have tweaked the code to let me run at SievePrimes=1000. Is there a discussion as to why the user shouldn't be allowed to set a lower SievePrimes than 5K?[/QUOTE] Have you compared the speed of mfaktc when you lower SievePrimes to 1000? Avg. rate is [B]not[/B] the speed, time per class is the speed! Lower SievePrimes is just a waste of energy and [B]not validated[/B]! src/params.h: [CODE] [...] /****************************************************************************** ******************************************************************************* [B]*** DO NOT EDIT DEFINES BELOW THIS LINE UNLESS YOU REALLY KNOW WHAT YOU DO! *** *** DO NOT EDIT DEFINES BELOW THIS LINE UNLESS YOU REALLY KNOW WHAT YOU DO! *** *** DO NOT EDIT DEFINES BELOW THIS LINE UNLESS YOU REALLY KNOW WHAT YOU DO! *** [/B]******************************************************************************* ******************************************************************************/ [...] #define SIEVE_PRIMES_MIN 5000 /* DO NOT CHANGE! */ #define SIEVE_PRIMES_DEFAULT 25000 /* DO NOT CHANGE! */ #define SIEVE_PRIMES_MAX 200000 /* DO NOT CHANGE! */ [...] [/CODE] [QUOTE=rcv;291954]3. Has anyone considered running the sieving on the GPU? Is it just that nobody has written the code or is there a reason the idea was rejected? [If one were running the sieve and the trial factoring on the same processor, the proper tradeoff between sieving and trial factoring seems pretty clear -- If trial factoring can test, say, 250 million candidates per second, then sieving should stop at the point it can no longer remove more than 250 million candidates per second.][/QUOTE] Yes, for sure... but I don't know how to implement this [B]efficient[/B]. Bdot (mfakto) tried and he got IIRC ~30M/s for a not so slow GPU. And the tradeoff is not that simple because the number of candidates per second doesn't matter, the time per assignment matters! Oliver |
btw. don't take it personally if my previos post sounds too rude.
I'm getting the same questions again and again so I might be a little bit annoyed. :sad: Oliver |
[QUOTE=TheJudger;291963]Have you compared the speed of mfaktc when you lower SievePrimes to 1000? Avg. rate is [B]not[/B] the speed, time per class is the speed![/QUOTE]
I've seen this misunderstanding quite a bit as well. And thought into removing it from future versions? Maybe replacing it with something like GHz-days/day to something which is easy to sum up among instances to see the total throughput? |
Difficult to calculate and estimate. Sticking with the raw data is better, but we do need to figure out a better way to print it.
|
[QUOTE=kjaget;291984]I've seen this misunderstanding quite a bit as well. And thought into removing it from future versions? Maybe replacing it with something like GHz-days/day to something which is easy to sum up among instances to see the total throughput?[/QUOTE]I highly second this. The "M/s" value is somewhat meaningless for the user, and often misunderstood. The conversion of time-per-class into GHz-days-per-day should be very simple: GHz-days for the assignment is given by:[code]0.016968 * pow(2, $bitlevel - 48) * 1680 / $exponent
// example using M50,000,000 from 2^69-2^70: = 0.016968 * pow(2, 70 - 48) * 1680 / 50000000 = 2.3912767291392 GHz-days // magic constant is 0.016968 for TF to 65-bit and above // magic constant is 0.017832 for 63-and 64-bit // magic constant is 0.01116 for 62-bit and below[/code]Then all you need is: 86400 / (time_per_class * classes_per_exponent) * ghz_days_assignment Of course, above code is based on a single bitlevel, but easily adapted to multi-bitlevel assigments. |
@Dubslow: Thank you for the pointer. That's exactly what I was looking for.
@kjaget/TheJudger: On my setup, the time per class and the megaprimes per second are in a lock-step inverse relationship with each other (with constant SievePrimes). Whether I set SievePrimes to 1000 or 1500 or 2000, the average rate remains a little above 125 megacandidates per second, on each of the two instances I am running. When I vary SievePrimes, the number of candidates changes, as expected, and the time per class changes proportionally. If I have a misunderstanding, I'm sure you folks will correct me. [See more, below.] [code][FONT=Courier New][SIZE=1]Starting trial factoring M52575179 from 2^71 to 2^72 class | candidates | time | ETA | avg. rate | SievePrimes | CPU wait 1669/4620 | 1.39G | 11.097s | 1h53m | 125.30M/s | 1500 | 0.39% 1672/4620 | 1.39G | 10.776s | 1h49m | 129.03M/s | 1500 | 0.41% 1680/4620 | 1.39G | 10.533s | 1h47m | 132.01M/s | 1500 | 0.41% 1681/4620 | 1.39G | 11.658s | 1h58m | 119.27M/s | 1500 | 0.37% 1689/4620 | 1.39G | 10.670s | 1h48m | 130.31M/s | 1500 | 0.41% [/SIZE][/FONT][/code][QUOTE=TheJudger;291965]btw. don't take it personally if my previos post sounds too rude. I'm getting the same questions again and again so I might be a little bit annoyed. :sad:[/QUOTE] OK. I won't take it personally. For all you know, I just fell off the turnip truck. At least your answers are all here in one big bold place for future questioners to find. :smile: [QUOTE=TheJudger;291963] Have you compared the speed of mfaktc when you lower SievePrimes to 1000? Avg. rate is [B]not[/B] the speed, time per class is the speed! Lower SievePrimes is just a waste of energy and [B]not validated[/B]![/QUOTE] I disagree completely about this being a waste of energy! I saw the warning in the code, which I heeded. I saw the word "unless". I've looked at the code. I've run the self-test. I've found 4 factors at the 71/72 bit size with the tweaked version. Who should I see to get it validated? If you know of a problem, PLEASE let me know. In the parlance of Mathematica, the fraction of candidates which pass the sieving is given by Apply[Times,Prime[5+Range[sp]]-1)/Prime[5+Range[sp]])], where sp is the number of SievePrimes. At SievePrimes=1500, the above formula yields 28.5914%. At SievePrimes=5000, the above formula yields 25.0285%. The number of candidates reported in each class by mfaktc (1.39G, as shown above) with SievePrimes=1500 agrees with the theoretical. Floor[.285914665945569*2^71/4620/52575179/2+1/2]=1389675478 candidates per class. At SievePrimes=5000, the number of candidates per class is theoretically Floor[0.250284623178239*2^71/4620/52575179/2+1/2]=1216497244. When I switched from SievePrimes=5000 to SievePrimes=1500, the number of candidates per second remained constant, but the time per class increased by about 14% (0.285915/0.250285-1). As best as I could tell, my CPU usage due to mfaktc went down by more than half. Now, the GPU is almost never starved for work. In contrast, with high fixed values of SievePrimes, my CPU becomes saturated, the GPU is often starved for work, the net mfaktc throughput goes down, and I can't use my CPU for other useful work. With moderate values of SievePrimes, the CPU burns a lot of time and the GPU is sometimes starved for work. With a smaller number of cores and a slower GPU, the default and minimum SievePrimes may make very good sense. But with a larger number of cores and a faster GPU, the minimum SievePrimes does not make sense for me. And, I would respectfully suggest it may not make sense for other people. So, let me re-ask my 2nd question... Aside from validating the code, is there a reason why the user shouldn't be allowed to set a lower SievePrimes than 5K? [QUOTE=TheJudger;291963]Yes, for sure... but I don't know how to implement this [B]efficient[/B]. Bdot (mfakto) tried and he got IIRC ~30M/s for a not so slow GPU. And the tradeoff is not that simple because the number of candidates per second doesn't matter, the time per assignment matters![/QUOTE] I maintain that if both sieving and trial factoring is done on the GPU, the tradeoff *is* as simple as matching candidates per second being removed by the siever to candidates per second being tested by the trial factoror. I actually have some prototype sieving code. It is not optimized. At the smallest prime factor, not inherently sieved by the class mechanism (p=13), it can sieve out 64 billion candidates per second. At p=1583, the incremental rate of candidate removal is 1 billion candidates per second, At p=2297, the incremental rate of candidate removal is 500 megacandidates per second, and at p=4093, the incremental rate of candidate removal is 261 megacandidates per second. But the curve is rather flat, here. With my 560Ti GPU, my prototype sieving code, and your trial factoring code, it would seem the tradeoff between more sieving and more trial factoring is probably in the vicinity of SievePrimes=1000+/-500, and not especially sensitive to variations. [This would leave your CPU essentially unused.] @Bdot: If you are interested, would you please weigh in on how this compares with your results. Thanks to all who responded! |
| All times are UTC. The time now is 23:16. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.