mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   Talk on gpuLucas at GPGPU-4 Workshop in March (https://www.mersenneforum.org/showthread.php?t=15195)

Andrew Thall 2011-02-01 18:37

Talk on gpuLucas at GPGPU-4 Workshop in March
 
I'll be presenting a short paper describing the GPU Lucas-Lehmer code I reported on last month. If anyone is in the LA area, the workshop is in conjunction with ASPLOS XVI conference; GPGPU-4 will meet Saturday, March 5.

I'm doing revisions on the paper that I'll be presenting; I'd really like to say more about the other work that's been done on GPU-based Mersenne testing; if anyone could get in touch with me (thall) at my alma.edu address, I'd be grateful for more info on other systems, timing results, etc. I'll be glad to send you a copy of my current draft of the paper, and include credit for any information you provide in the final draft. Deadline is soon, however.

I'll post the final preprint shortly, and I've been removing dependencies from the code and will distribute that as well. (You can email me about that, too, if you're interested.)

Prime95 2011-02-02 23:21

Is anyone helping out? If not, let's get to it folks!

Andrew, are there specific video cards and FFT lengths you are interested in?

Are you at all interested in some non-GPU timings for additional data? If so, single-core or all-cores-working-on-the-one-exponent?

ATH 2011-02-02 23:33

I'm interested in helping with either compiling if I can or speedtesting. I have a Geforce GTX 460, and if you need CPU testing I can test on Core2Duo, Core2Quad, pentium4 (Prescott) and a laptop with Celeron (penryn).

Prime95 2011-02-03 00:46

[QUOTE=ATH;251048]I'm interested in helping with either compiling if I can or speedtesting. I have a Geforce GTX 460, and if you need CPU testing I can test on Core2Duo, Core2Quad, pentium4 (Prescott) and a laptop with Celeron (penryn).[/QUOTE]

I read Andrew's post as wanting timings of CUDALucas on your GTX 460

Andrew Thall 2011-02-03 02:38

Thanks, all. I've had a few volunteers by email already...it is mainly the CUDALucas timings I need, but I think we've got it covered for now. Just looking at msec per Lucas iteration for given FFT sizes on Fermi architecture cards.

I pulled some single CPU times off the benchmark pages but would be interested in the sorts of speedups you get with multiple cores...I did some early experiments with multicore FFTW that left me less than thrilled, but that was a few years ago, and I'd like to hear how your well-tuned FFTs perform. Too late to make it into this paper, though.

Finally had a chance to dig into the CUDALucas source code...a very different method than my approach, which is your academic, massively data-parallel, digit-per-thread sort of technique. It's likely faster than mine for large N, but without non-power-of-two transforms, it's not going to be as fast for any given Mersenne number. I suspect its performance won't scale as rapidly with a higher multiprocessing level---more cores---but we'll have to see when we get some better cards.

Prime95 2011-02-03 02:59

[QUOTE=Andrew Thall;251069]I pulled some single CPU times off the benchmark pages but would be interested in the sorts of speedups you get with multiple cores...I did some early experiments with multicore FFTW that left me less than thrilled, but that was a few years ago, and I'd like to hear how your well-tuned FFTs perform. [/QUOTE]

Prime95 FFTs are well-tuned for single core only. Multi-threading was wedged in as an afterthought. The reason for this is simple: no matter how effective the multi-threaded version is, prime95 will get more throughput by testing one exponent on each core.

P.S. I'm glad the community sent you the information you needed. I eagerly await studying your preprint.

kjaget 2011-02-03 14:46

[QUOTE=Andrew Thall;251069]I pulled some single CPU times off the benchmark pages but would be interested in the sorts of speedups you get with multiple cores...I did some early experiments with multicore FFTW that left me less than thrilled, but that was a few years ago, and I'd like to hear how your well-tuned FFTs perform. Too late to make it into this paper, though.[/QUOTE]

Assuming you haven't see it, the perpetual benchmark thread in the main hardware forum would be a great place to mine data from for multi-core Prime95 timings.


All times are UTC. The time now is 03:34.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.