![]() |
[QUOTE=Brian Gladman;198713]Hi Jason,
I have a Visual Studio x64 build with CUDA linked in and compiled with the MS compiler but you mentioned that some extra files are needed. Is this correct? Once working, I am also looking for some way of testing this that does NOT take up a lot of machine time (I am running an 3.5GHz i7 Extreme with 24GB of memory and a dual GTX295 card). [/QUOTE] You will need to define HAVE_CUDA, compile common/cuda_xface.[ch], and compile gnfs/poly/stage1_gpu/* but not gnfs/poly/stage1/*. Also, you will need to run nvcc on all of the .cu files in stage1_gpu, and the resulting .ptx files have to be in the same directory as the final binary when it runs. The code only needs the driver API (in cuda.lib), not the runtime API. The GPU is used for all inputs, so a 90- to 105-digit input will be enough to test degree 4 and degree 5, and such a test would take 5-15 minutes. |
[quote=jasonp;198715]You will need to define HAVE_CUDA, compile common/cuda_xface.[ch], and compile gnfs/poly/stage1_gpu/* but not gnfs/poly/stage1/*. Also, you will need to run nvcc on all of the .cu files in stage1_gpu, and the resulting .ptx files have to be in the same directory as the final binary when it runs.
The code only needs the driver API (in cuda.lib), not the runtime API. The GPU is used for all inputs, so a 90- to 105-digit input will be enough to test degree 4 and degree 5, and such a test would take 5-15 minutes.[/quote] Thanks Jason - that helps a lot - I should get there soon if all goes well. Brian |
[quote=Brian Gladman;198720]Thanks Jason - that helps a lot - I should get there soon if all goes well.
Brian[/quote] I have added a build for CUDA to the MSIEVE Windows build - it is in build.cuda.vc9. It needs the Nvidia toolkit installed. It seems to work but it's very new and should NOT be relied on. It builds in both win32 and x64 but I have only tested the latter. Does anyone know where the shared memory architecture settings (SM_<n>) for various GPUs are documented? I would appreciate any testing that people are able to do. Brian |
[QUOTE=Brian Gladman;198737]I would appreciate any testing that people are able to do. [/QUOTE]
Unfortunately I only have access to ATI cards so can't help you with this one. :down: Jeff. |
Looking for a GNFS collaborator (w/ an Nvidia GPU)
I'd like to punch out a couple of those "smaller-but-needed" GNFS jobs on Wagstaff's Cunningham project page, at around 160 digits, and maybe some a bit larger: there are a few in the tables between 168-172 digits which look tantalizing.
I don't have an msieve-ready GPU, and, given the outrageous performance capabilities of the GPU-ready msieve code, I think it would be silly to crank such large polynomial searches on a regular CPU. Is anyone with the right GPU interested in collaborating? I'd be happy if you just cranked the polynomials, but if you wish to do sieving and/or linear algebra, I'm good with that as well. EDIT: PM me if you're interested, to avoid bifurcating this thread. |
You can start with 11,275-. The poly is [URL="http://www.mersenneforum.org/showpost.php?p=192831&postcount=58"]here[/URL]. :smile:
|
[QUOTE=frmky;198895]You can start with 11,275-. The poly is [URL="http://www.mersenneforum.org/showpost.php?p=192831&postcount=58"]here[/URL]. :smile:[/QUOTE]
Hmm: thanks! 36 hours. I'm not as old as Batalov thinks he his, but I can remember when a c160 polynomial search was a big deal. I'll reserve that with Wagstaff. |
[QUOTE=Brian Gladman;198737]
Does anyone know where the shared memory architecture settings (SM_<n>) for various GPUs are documented?[/QUOTE] If you mean the sm_1[0123] arguments to nvcc, all of the GPU code avoids anything beyond the default lowest-common-denominator architecture (sm_10). The msieve GPU driver is designed to make the Nvidia driver compile to the most appropriate architecture for the current GPU, so you don't have to. Choosig the argument to use for GPU code is described in Nvidia's nvcc reference manual (it's a somewhat confusing chapter). |
[QUOTE=jasonp;197929]So a good option is to have one thread for each GPU to do stage 1, and one thread for each CPU core to do stage 2. GPU threads use a double-buffering technique to feed the GPU with low CPU utilization; while the GPU is working they can scan the double buffer for hits. Any hits get the size optimization pass from stage 2 immediately, and most hits will not be good enough to pass on to the root sieve. The stage 2 threads should form a thread pool that pull work from a queue filled by the GPU threads, then run the root sieve and final size optimization. The root sieve finishes quickly when a polynomial is mediocre, so there should probably be an initial phase that guesses how long it will take and spreads big jobs across multiple threads that are idle, maybe by filling the queue with multiple root sieve pieces for each polynomial.[/QUOTE]
What would help the most immediately would be to run the stage 2 job as a whole in a separate thread, while stage 1 continues to look for hits. As you said, maybe even have multiple threads to run several stage 2 jobs simultaneously. Unless you want to support multiple GPUs, I think the GPU core code and the code which feeds the GPU core do not need to change at all. If you want to support multiple GPUs, you could nearly accomplish the same thing by running multiple instances of msieve on different ranges, each using a different GPU. |
SLI behavior
Has anyone run this code on a machine with two identical SLI cards?
I assume that it's at least a 2x boost, and that the CUDA drivers don't care. |
[QUOTE=FactorEyes;201956]Has anyone run this code on a machine with two identical SLI cards?
I assume that it's at least a 2x boost, and that the CUDA drivers don't care.[/QUOTE] According to the forum posts I'm seeing, if you have more than one card with SLI enabled then CUDA only sees one card out of the entire group. That makes sense, since most of the time a GPU kernel assumes that global memory is really visible to all processors, which is not true with SLI enabled. |
| All times are UTC. The time now is 15:48. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.