mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-12-13, 19:03   #1
diamonddave
 
diamonddave's Avatar
 
Feb 2004

25×5 Posts
Default Seiving on GPU

I recently started preliminary work on sieving on GPU (mfaktc). Right now I'm just at the design stage/proof of concept. Since it's my first time doing any programing on GPU let alone 10 years away from C/C++, things are going a bit slower then expected. Also the debugging experience is far from ideal!

Just getting the environment working was more work then I thought!

Right now it look like a second card is required to do kernel code debugging. No wonder I haven't gone very far! Does anyone know if the newer release of the tools require a second card to debug kernels? Right now I'm using 4.1 RC2, VS 2010 and Parallel Nsight 2.0

Right now I'm planing on using CUDA 4.1, could that pose a problem?

I was looking for some instruction timing manual but have come-up empty!

I'll keep you guys posted on the progress...
diamonddave is offline   Reply With Quote
Old 2011-12-14, 01:21   #2
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5×79 Posts
Default

It shouldn't require two cards for debugging. When I developed ppsieve_cuda, initially, I actually didn't use any cards! CUDA 2.3 was the last version, I believe that included a GPU emulator. Unless you're using some of the newer library functions, I see little reason why you couldn't try debugging on CUDA 2.3 with the emulator, if you can't get anything else to work.

I also notice you're using a Release Candidate version. Aren't the released GPU tools buggy enough for you? (I find that most GPU tools are fairly buggy, one way or another.)
Ken_g6 is offline   Reply With Quote
Old 2011-12-14, 01:46   #3
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5×79 Posts
Default

Quote:
Originally Posted by diamonddave View Post
I was looking for some instruction timing manual but have come-up empty!
I didn't notice this nugget before. In some ways, instruction timing is easy: Each 32-bit operation takes one clock cycle. Except for multiplication. Pre-Fermi cards (300 series, 200 series, 9000 series, and 8000 series) can't do more than 24 bits x 24 bits = 48 bits at a time. This effectively limits them to 16x16 bits at a time, with carry, so four multiplies are required for a 32-bit multiply. Fermis can do a 32-bit multiply in one cycle, producing either the low 32 bits or the high 32 bits. (Two cycles for both.) Do the math like this to figure out how long a 64-bit or 72-bit multiply will take. Oh, and double-precision floating-point instructions take something like four times as long, unless you bought a Tesla or something. I don't do floating point on GPUs.

Except it's not that simple. GPUs don't have a true single-cycle latency on instructions. They take 2-4 cycles to go through the pipeline. But if you have enough sets of instructions, called blocks, to occupy all the GPU processors all the time, (2-4 times as many threads as processors, at minimum), then instructions appear to have a single-cycle latency.

Except it's still not that simple, because you need to access memory to get the data to work with, and to save results. Accessing the main GPU memory takes hundreds of cycles! Supposedly, if you access it in the proper, parallelizable way, and have enough other instructions to process in the meantime, this can be pipelined away, but it's not easy. The very easiest thing to do is to read all your data into registers, work with them until you're done, and save the results. This is what I do with ppsieve-cuda. But you have to make sure your data fits in the registers. If it doesn't, on Fermi GPUs there's a data cache, which is nice, and supposedly is as fast as registers. There's also a shared memory area, which is fast, but has to be accessed in the proper, parallelizable way, and is still not as fast as registers.

There's a spreadsheet called the CUDA Occupancy Calculator that can help you sort out your memory issues. (No idea where, but it's somewhere on nVIDIA's site.) And basically all of this is spelled out in the CUDA Programming Guide which should have come with your software.

Good luck!
Ken_g6 is offline   Reply With Quote
Old 2011-12-16, 13:56   #4
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5×359 Posts
Default

Link to Cudappsieve? State?
Christenson is offline   Reply With Quote
Old 2011-12-22, 05:15   #5
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

18B16 Posts
Default

Quote:
Originally Posted by Christenson View Post
Link to Cudappsieve? State?
http://sites.google.com/site/kenscode/prime-programs

State: PSieve-CUDA is extensively used by PrimeGrid (just finishing a race with it), sometimes used by Twin Prime Search.
Ken_g6 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Weird seiving error popandbob Twin Prime Search 7 2007-06-09 20:37

All times are UTC. The time now is 16:49.

Thu Mar 4 16:49:03 UTC 2021 up 91 days, 13 hrs, 0 users, load averages: 1.73, 1.75, 1.92

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.