mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-02-01, 18:37   #1
Andrew Thall
 
Dec 2010

816 Posts
Default Talk on gpuLucas at GPGPU-4 Workshop in March

I'll be presenting a short paper describing the GPU Lucas-Lehmer code I reported on last month. If anyone is in the LA area, the workshop is in conjunction with ASPLOS XVI conference; GPGPU-4 will meet Saturday, March 5.

I'm doing revisions on the paper that I'll be presenting; I'd really like to say more about the other work that's been done on GPU-based Mersenne testing; if anyone could get in touch with me (thall) at my alma.edu address, I'd be grateful for more info on other systems, timing results, etc. I'll be glad to send you a copy of my current draft of the paper, and include credit for any information you provide in the final draft. Deadline is soon, however.

I'll post the final preprint shortly, and I've been removing dependencies from the code and will distribute that as well. (You can email me about that, too, if you're interested.)
Andrew Thall is offline   Reply With Quote
Old 2011-02-02, 23:21   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

79·89 Posts
Default

Is anyone helping out? If not, let's get to it folks!

Andrew, are there specific video cards and FFT lengths you are interested in?

Are you at all interested in some non-GPU timings for additional data? If so, single-core or all-cores-working-on-the-one-exponent?
Prime95 is offline   Reply With Quote
Old 2011-02-02, 23:33   #3
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

72·59 Posts
Default

I'm interested in helping with either compiling if I can or speedtesting. I have a Geforce GTX 460, and if you need CPU testing I can test on Core2Duo, Core2Quad, pentium4 (Prescott) and a laptop with Celeron (penryn).
ATH is offline   Reply With Quote
Old 2011-02-03, 00:46   #4
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11011011101112 Posts
Default

Quote:
Originally Posted by ATH View Post
I'm interested in helping with either compiling if I can or speedtesting. I have a Geforce GTX 460, and if you need CPU testing I can test on Core2Duo, Core2Quad, pentium4 (Prescott) and a laptop with Celeron (penryn).
I read Andrew's post as wanting timings of CUDALucas on your GTX 460
Prime95 is offline   Reply With Quote
Old 2011-02-03, 02:38   #5
Andrew Thall
 
Dec 2010

108 Posts
Default

Thanks, all. I've had a few volunteers by email already...it is mainly the CUDALucas timings I need, but I think we've got it covered for now. Just looking at msec per Lucas iteration for given FFT sizes on Fermi architecture cards.

I pulled some single CPU times off the benchmark pages but would be interested in the sorts of speedups you get with multiple cores...I did some early experiments with multicore FFTW that left me less than thrilled, but that was a few years ago, and I'd like to hear how your well-tuned FFTs perform. Too late to make it into this paper, though.

Finally had a chance to dig into the CUDALucas source code...a very different method than my approach, which is your academic, massively data-parallel, digit-per-thread sort of technique. It's likely faster than mine for large N, but without non-power-of-two transforms, it's not going to be as fast for any given Mersenne number. I suspect its performance won't scale as rapidly with a higher multiprocessing level---more cores---but we'll have to see when we get some better cards.

Last fiddled with by Andrew Thall on 2011-02-03 at 02:40
Andrew Thall is offline   Reply With Quote
Old 2011-02-03, 02:59   #6
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

79×89 Posts
Default

Quote:
Originally Posted by Andrew Thall View Post
I pulled some single CPU times off the benchmark pages but would be interested in the sorts of speedups you get with multiple cores...I did some early experiments with multicore FFTW that left me less than thrilled, but that was a few years ago, and I'd like to hear how your well-tuned FFTs perform.
Prime95 FFTs are well-tuned for single core only. Multi-threading was wedged in as an afterthought. The reason for this is simple: no matter how effective the multi-threaded version is, prime95 will get more throughput by testing one exponent on each core.

P.S. I'm glad the community sent you the information you needed. I eagerly await studying your preprint.

Last fiddled with by Prime95 on 2011-02-03 at 03:01
Prime95 is offline   Reply With Quote
Old 2011-02-03, 14:46   #7
kjaget
 
kjaget's Avatar
 
Jun 2005

3·43 Posts
Default

Quote:
Originally Posted by Andrew Thall View Post
I pulled some single CPU times off the benchmark pages but would be interested in the sorts of speedups you get with multiple cores...I did some early experiments with multicore FFTW that left me less than thrilled, but that was a few years ago, and I'd like to hear how your well-tuned FFTs perform. Too late to make it into this paper, though.
Assuming you haven't see it, the perpetual benchmark thread in the main hardware forum would be a great place to mine data from for multi-core Prime95 timings.
kjaget is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuLucas and CUDALucas -- source where? Christenson Information & Answers 2 2011-03-21 23:31
Me, and you, and GPGPU. TehPenguin Software 27 2008-10-13 11:20
CADO workshop on integer factorization akruppa Factoring 14 2008-09-18 23:52
New GPGPU programming systems dsouza123 Programming 1 2006-11-17 21:54
[ANN] SHARCS'06 workshop Tromer Factoring 6 2006-03-18 21:25

All times are UTC. The time now is 18:38.

Tue Aug 11 18:38:11 UTC 2020 up 25 days, 14:24, 1 user, load averages: 2.44, 1.74, 1.59

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.