![]() |
|
|
#1608 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
11·311 Posts |
No. I'm working through changes, but nothing has been published to the site yet.
I assume you mean that your TF factor was correctly credited as TF (instead of P-1)? Your example is a relatively small factor (<2^70) so that's probably expected. It's only when you submit factors larger than what PrimeNet considers "reasonable" for TF that it falsely assumes it must've come from P-1. Last fiddled with by James Heinrich on 2012-03-02 at 22:08 |
|
|
|
|
|
#1609 | |
|
"Jerry"
Nov 2011
Vancouver, WA
100011000112 Posts |
Quote:
|
|
|
|
|
|
|
#1610 |
|
Dec 2011
11·13 Posts |
I got my first CUDA capable card (560 Ti) a little over a month ago, and have been running mfaktc on 64-bit Linux. I have a few questions.
1. Can anyone explain why Compute Capability 2.1 is about "half" as fast as 2.0 for running mfaktc? [Yes, I know there are a billion or so fewer transistors, but what specific feature/function do the 2.0 cards have that 2.1 lacks that makes such a huge difference to mfaktc.] 2. I am disappointed in how much CPU it takes to feed my GPU. I would happily give up a fraction of my GPU performance to get back my CPU performance. [It's no trouble consuming nearly two full i7 cores to feed the GPU via two instances of mfaktc.] mfaktc is compiled with a minimum SievePrimes=5000. I have tweaked the code to let me run at SievePrimes=1000. Is there a discussion as to why the user shouldn't be allowed to set a lower SievePrimes than 5K? 3. Has anyone considered running the sieving on the GPU? Is it just that nobody has written the code or is there a reason the idea was rejected? [If one were running the sieve and the trial factoring on the same processor, the proper tradeoff between sieving and trial factoring seems pretty clear -- If trial factoring can test, say, 250 million candidates per second, then sieving should stop at the point it can no longer remove more than 250 million candidates per second.] Thanks in advance! |
|
|
|
|
|
#1611 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
1C3516 Posts |
http://www.mersenneforum.org/showthr...245#post281245
^ That's the answer for your first question. For the other two, I'm not sure, though I'll cast my vote again for on-GPU sieving. Last fiddled with by Dubslow on 2012-03-05 at 08:45 Reason: That's the answer for your first question |
|
|
|
|
|
#1612 | |||
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
Hi!
Quote:
Quote:
src/params.h: Code:
[...] /****************************************************************************** ******************************************************************************* *** DO NOT EDIT DEFINES BELOW THIS LINE UNLESS YOU REALLY KNOW WHAT YOU DO! *** *** DO NOT EDIT DEFINES BELOW THIS LINE UNLESS YOU REALLY KNOW WHAT YOU DO! *** *** DO NOT EDIT DEFINES BELOW THIS LINE UNLESS YOU REALLY KNOW WHAT YOU DO! *** ******************************************************************************* ******************************************************************************/ [...] #define SIEVE_PRIMES_MIN 5000 /* DO NOT CHANGE! */ #define SIEVE_PRIMES_DEFAULT 25000 /* DO NOT CHANGE! */ #define SIEVE_PRIMES_MAX 200000 /* DO NOT CHANGE! */ [...] Quote:
Oliver |
|||
|
|
|
|
|
#1613 |
|
"Oliver"
Mar 2005
Germany
45716 Posts |
btw. don't take it personally if my previos post sounds too rude.
I'm getting the same questions again and again so I might be a little bit annoyed. ![]() Oliver |
|
|
|
|
|
#1614 |
|
Jun 2005
3·43 Posts |
I've seen this misunderstanding quite a bit as well. And thought into removing it from future versions? Maybe replacing it with something like GHz-days/day to something which is easy to sum up among instances to see the total throughput?
|
|
|
|
|
|
#1615 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3×29×83 Posts |
Difficult to calculate and estimate. Sticking with the raw data is better, but we do need to figure out a better way to print it.
|
|
|
|
|
|
#1616 | |
|
"James Heinrich"
May 2004
ex-Northern Ontario
11·311 Posts |
Quote:
Code:
0.016968 * pow(2, $bitlevel - 48) * 1680 / $exponent // example using M50,000,000 from 2^69-2^70: = 0.016968 * pow(2, 70 - 48) * 1680 / 50000000 = 2.3912767291392 GHz-days // magic constant is 0.016968 for TF to 65-bit and above // magic constant is 0.017832 for 63-and 64-bit // magic constant is 0.01116 for 62-bit and below Of course, above code is based on a single bitlevel, but easily adapted to multi-bitlevel assigments. |
|
|
|
|
|
|
#1617 | |||
|
Dec 2011
11·13 Posts |
@Dubslow: Thank you for the pointer. That's exactly what I was looking for.
@kjaget/TheJudger: On my setup, the time per class and the megaprimes per second are in a lock-step inverse relationship with each other (with constant SievePrimes). Whether I set SievePrimes to 1000 or 1500 or 2000, the average rate remains a little above 125 megacandidates per second, on each of the two instances I am running. When I vary SievePrimes, the number of candidates changes, as expected, and the time per class changes proportionally. If I have a misunderstanding, I'm sure you folks will correct me. [See more, below.] Code:
Starting trial factoring M52575179 from 2^71 to 2^72
class | candidates | time | ETA | avg. rate | SievePrimes | CPU wait
1669/4620 | 1.39G | 11.097s | 1h53m | 125.30M/s | 1500 | 0.39%
1672/4620 | 1.39G | 10.776s | 1h49m | 129.03M/s | 1500 | 0.41%
1680/4620 | 1.39G | 10.533s | 1h47m | 132.01M/s | 1500 | 0.41%
1681/4620 | 1.39G | 11.658s | 1h58m | 119.27M/s | 1500 | 0.37%
1689/4620 | 1.39G | 10.670s | 1h48m | 130.31M/s | 1500 | 0.41%
Quote:
![]() Quote:
In the parlance of Mathematica, the fraction of candidates which pass the sieving is given by Apply[Times,Prime[5+Range[sp]]-1)/Prime[5+Range[sp]])], where sp is the number of SievePrimes. At SievePrimes=1500, the above formula yields 28.5914%. At SievePrimes=5000, the above formula yields 25.0285%. The number of candidates reported in each class by mfaktc (1.39G, as shown above) with SievePrimes=1500 agrees with the theoretical. Floor[.285914665945569*2^71/4620/52575179/2+1/2]=1389675478 candidates per class. At SievePrimes=5000, the number of candidates per class is theoretically Floor[0.250284623178239*2^71/4620/52575179/2+1/2]=1216497244. When I switched from SievePrimes=5000 to SievePrimes=1500, the number of candidates per second remained constant, but the time per class increased by about 14% (0.285915/0.250285-1). As best as I could tell, my CPU usage due to mfaktc went down by more than half. Now, the GPU is almost never starved for work. In contrast, with high fixed values of SievePrimes, my CPU becomes saturated, the GPU is often starved for work, the net mfaktc throughput goes down, and I can't use my CPU for other useful work. With moderate values of SievePrimes, the CPU burns a lot of time and the GPU is sometimes starved for work. With a smaller number of cores and a slower GPU, the default and minimum SievePrimes may make very good sense. But with a larger number of cores and a faster GPU, the minimum SievePrimes does not make sense for me. And, I would respectfully suggest it may not make sense for other people. So, let me re-ask my 2nd question... Aside from validating the code, is there a reason why the user shouldn't be allowed to set a lower SievePrimes than 5K? Quote:
I actually have some prototype sieving code. It is not optimized. At the smallest prime factor, not inherently sieved by the class mechanism (p=13), it can sieve out 64 billion candidates per second. At p=1583, the incremental rate of candidate removal is 1 billion candidates per second, At p=2297, the incremental rate of candidate removal is 500 megacandidates per second, and at p=4093, the incremental rate of candidate removal is 261 megacandidates per second. But the curve is rather flat, here. With my 560Ti GPU, my prototype sieving code, and your trial factoring code, it would seem the tradeoff between more sieving and more trial factoring is probably in the vicinity of SievePrimes=1000+/-500, and not especially sensitive to variations. [This would leave your CPU essentially unused.] @Bdot: If you are interested, would you please weigh in on how this compares with your results. Thanks to all who responded! Last fiddled with by rcv on 2012-03-05 at 19:36 |
|||
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |