![]() |
[QUOTE=ET_;334402]Question.
How is the GPU memory use computed, related to exponent, B1 and B2? In other words, how can I know if B1 and B2 ranges fit in my GPU memory? Luigi[/QUOTE] The exponent affects the memory use by way of the fft size needed for that exponent. The B1 and B2 don't really have an effect. What really affects the memory use is the B-S exponent e and the number of relative primes p processed in a pass. If n is the fft size, each data sequence uses 8 * n bytes. I think I can get by with an overhead of 4 such sequences. In addition, e + p sequences are needed. |
Is there any way of getting the GPU to make use of the system RAM? That would *really* give you some power.
|
[QUOTE=NBtarheel_33;334570]Is there any way of getting the GPU to make use of the system RAM? That would *really* give you some power.[/QUOTE]
Host to device and back memory transfers are [i]painfully[/i] slow. CUDALucas (at least pre-bit shift) used one device to host memory transfer (just one!) per iteration at its maximum "-polite" setting -- that caused a performance hit of 20%. Actually using main memory (many transfers of much data in an "iteration) in a useful way will be impossibly slow. |
[QUOTE=NBtarheel_33;334570]Is there any way of getting the GPU to make use of the system RAM? That would *really* give you some power.[/QUOTE]
There is one place where host ram can be used. Stage 2 initialization data can be stored there. That will make starting a new pass for the next batch of relative primes relatively quick and painless. The host to device transfers would be spread out enough so as not to cause a log jam. |
[QUOTE=Dubslow;334575]Host to device and back memory transfers are [I]painfully[/I] slow. CUDALucas (at least pre-bit shift) used one device to host memory transfer (just one!) per iteration at its maximum "-polite" setting -- that caused a performance hit of 20%. Actually using main memory (many transfers of much data in an "iteration) in a useful way will be impossibly slow.[/QUOTE]
Bummer. But owftheevil's idea of storing Stage 2 initialization data there is better than nothing, I suppose. GPUs should have ever-increasing amounts of RAM in the years to come, anyway. It's also much faster RAM - I think GPUs are already at DDR5, while DDR4 system RAM is still in its infancy. |
[QUOTE=NBtarheel_33;334752]
GPUs should have ever-increasing amounts of RAM in the years to come, anyway. It's also much faster RAM - I think GPUs are already at DDR5, while DDR4 system RAM is still in its infancy.[/QUOTE] Don't be confused -- it's GDDR5, not DDR5. It is rather faster than DDR3 though :smile: [url]http://en.wikipedia.org/wiki/GDDR5[/url] [quote]Like its predecessor, GDDR4, GDDR5 is based on DDR3 SDRAM memory which has double the data lines compared to DDR2 SDRAM...[/quote] (GDDR3 is based on DDR2 tech.) |
Cudapm1 output:
[CODE] M61076737 has a factor: 432634830991289176546683053423 [/CODE]Run with B1 = 65000, B2 = 12035000, n = 3360k, d = 2310, e =2, 8 rp per pass. It used about 600Mb of device memory. Stage 2 took ~53 minutes. Edit: Looks like about 15 minutes longer to make e = 4. |
That would definately put a dent in our p-1 deficit. Though it's hard to trade 25x p-1 work for 125x factoring work.
EDIT: Not that it wouldn't get used though. I was trading up 10 ghz day of factoring per ghz day of p-1, and this is a better deal than that. :smile: |
Is the program fairly stable now or is it still 'in beta'? Also, is everything done on the GPU or does it take up a CPU core?
|
[QUOTE=bcp19;337058]Is the program fairly stable now or is it still 'in beta'? Also, is everything done on the GPU or does it take up a CPU core?[/QUOTE]
Its not at all stable yet, and lacks a lot of basic functionality besides. It makes heavy use of a cpu core during initialization of stage 1 and when computing the gcd after either stage. Other than that, the cpu load is not noticeable, much like CUDALucas. |
[QUOTE=owftheevil;337063]Its not at all stable yet, and lacks a lot of basic functionality besides. It makes heavy use of a cpu core during initialization of stage 1 and when computing the gcd after either stage. Other than that, the cpu load is not noticeable, much like CUDALucas.[/QUOTE]
I look forward to trying it when it is ready to debut! |
| All times are UTC. The time now is 23:18. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.