![]() |
|
|
#78 |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
32×5×7 Posts |
The exponent affects the memory use by way of the fft size needed for that exponent. The B1 and B2 don't really have an effect. What really affects the memory use is the B-S exponent e and the number of relative primes p processed in a pass. If n is the fft size, each data sequence uses 8 * n bytes. I think I can get by with an overhead of 4 such sequences. In addition, e + p sequences are needed.
|
|
|
|
|
|
#79 |
|
"Nathan"
Jul 2008
Maryland, USA
21338 Posts |
Is there any way of getting the GPU to make use of the system RAM? That would *really* give you some power.
|
|
|
|
|
|
#80 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
Host to device and back memory transfers are painfully slow. CUDALucas (at least pre-bit shift) used one device to host memory transfer (just one!) per iteration at its maximum "-polite" setting -- that caused a performance hit of 20%. Actually using main memory (many transfers of much data in an "iteration) in a useful way will be impossibly slow.
|
|
|
|
|
|
#81 |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
4738 Posts |
There is one place where host ram can be used. Stage 2 initialization data can be stored there. That will make starting a new pass for the next batch of relative primes relatively quick and painless. The host to device transfers would be spread out enough so as not to cause a log jam.
|
|
|
|
|
|
#82 | |
|
"Nathan"
Jul 2008
Maryland, USA
5·223 Posts |
Quote:
GPUs should have ever-increasing amounts of RAM in the years to come, anyway. It's also much faster RAM - I think GPUs are already at DDR5, while DDR4 system RAM is still in its infancy. |
|
|
|
|
|
|
#83 | ||
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
Quote:
![]() http://en.wikipedia.org/wiki/GDDR5 Quote:
|
||
|
|
|
|
|
#84 |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
32·5·7 Posts |
Cudapm1 output:
Code:
M61076737 has a factor: 432634830991289176546683053423 Edit: Looks like about 15 minutes longer to make e = 4. Last fiddled with by owftheevil on 2013-04-13 at 23:33 |
|
|
|
|
|
#85 |
|
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA
1A816 Posts |
That would definately put a dent in our p-1 deficit. Though it's hard to trade 25x p-1 work for 125x factoring work.
EDIT: Not that it wouldn't get used though. I was trading up 10 ghz day of factoring per ghz day of p-1, and this is a better deal than that.
Last fiddled with by Aramis Wyler on 2013-04-13 at 23:30 Reason: PS. |
|
|
|
|
|
#86 |
|
Oct 2011
7×97 Posts |
Is the program fairly stable now or is it still 'in beta'? Also, is everything done on the GPU or does it take up a CPU core?
|
|
|
|
|
|
#87 |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
13B16 Posts |
Its not at all stable yet, and lacks a lot of basic functionality besides. It makes heavy use of a cpu core during initialization of stage 1 and when computing the gcd after either stage. Other than that, the cpu load is not noticeable, much like CUDALucas.
|
|
|
|
|
|
#88 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2·3·1,693 Posts |
I look forward to trying it when it is ready to debut!
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3628 | 2023-04-17 22:08 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |
| World's dumbest CUDA program? | xilman | Programming | 1 | 2009-11-16 10:26 |
| Factoring program need help | Citrix | Lone Mersenne Hunters | 8 | 2005-09-16 02:31 |
| Factoring program | ET_ | Programming | 3 | 2003-11-25 02:57 |