mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2009-10-12, 04:02   #1
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,529 Posts
Default Msieve with GPU support

Msieve now has support for Nvidia graphics cards when performing NFS polynomial selection. This is my way of getting started with graphics card programming, and building something worthwhile at the same time.

If you want to play with this, you can grab the msieve-gpu branch from sourceforge and build from source, or use the following win32 binary:

www.boo.net/~jasonp/msieve144_gpu.zip

Unfortunately, anytime third party hardware and software is involved, the setup is a little complicated. To get this to work you will need:

- a CUDA-compliant Nvidia graphics card

- the latest CUDA drivers (version 2.3 for now)

- both files in the above zip file

If you want to build from source you will also need:

- Nvidia's CUDA runtime (version 2.3 for now)

- the MSVC project has not been updated to reflect the changes needed to compile GPU support. To build on windows, you will need MinGW as well as MSVC or MSVC Express. MSVC is required by Nvidia's compiler, even though Microsoft's compiler is not used by the msieve makefile. If I actually used Nvidia's compiler to build some of the host code and not just the device code, then the .ptx file included with the msieve binary could be embedded into the exe

- if building on unix, a lot of patience; I only have one suitable graphics card and it's not on my linux machine (heck, my linux machine doesn't even have PCI-e)

You use this binary the same as usual. It has a few limitations when performing NFS polynomial selection:

- it can only find degree 5 polynomials (for technical reasons)

- it only uses one GPU in your system, though that's easy to change

- there's a little more debugging output than usual. Occaisionally, it will find a polynomial that is corrupt; I don't know why yet.

- the code feeds big blocks of work to the GPU; on windows, this causes a noticeable drop in system responsiveness. I don't know if it's my code, problems in windows, or problems in the driver.

The speedup in stage 1 of NFS polynomial selection when using a graphics card is incredible. I've tested this setup on a GeForce 9800GT (a medium-end card that cost a bit over $100) and stage 1 runs over 27x faster than stage 1 on one core of a 1.86GHz Core2Duo. Nvidia's CUDA profiler shows the GPU is only 15% utilized, so there's headroom for much more performance too. I'm still working on this, and will post updated binaries as I understand this hardware better.

Good luck,
jasonp

PS: No, I don't care about optimizing Prime95 :)

PPS: The GPU code has now been folded into the trunk of the msieve codebase

Last fiddled with by jasonp on 2009-12-19 at 17:40 Reason: changed URL
jasonp is offline   Reply With Quote
Old 2009-10-12, 04:37   #2
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2,039 Posts
Default

Error in 64-bit Linux. I copied the PTX file from the Windows zip.

Code:
[childers@physicstitan test]$ ./msieve_gpu -np -v

Msieve v. 1.43
Sun Oct 11 21:36:07 2009
random seeds: dce40fdf b5d2743c
factoring 9370548739750343689742077059611741296688413458087068027338328923603585147935698143105876573510157864118212297131774808193943011745511363829026508600700379919701 (160 digits)
no P-1/P+1/ECM available, skipping
commencing number field sieve (160-digit input)
commencing number field sieve polynomial selection
time limit set to 300.00 hours
searching leading coefficients from 1 to 3322515
deadline: 1600 seconds per coefficient
coeff 60-960 429744497 472718946 472718947 519990841
------- 429744497-472718946 472718947-519990841
error (line 244): CUDA_ERROR_LAUNCH_FAILED
Update:

I tried to compile the core, and got a trivial error:
Code:
[childers@physicstitan stage1]$ /usr/local/cuda/bin/nvcc -arch sm_13 --ptx -I /usr/local/cuda/include -L /usr/local/cuda/lib64 -O -o stage1_core.ptx stage1_core.cu
stage1_core.cu(339): error: identifier "__min" is undefined
I changed this into a trivial ()?_:_; statement, and it now seems to work:
Code:
coeff 60-960 429744497 472718946 472718947 519990841
------- 429744497-472718946 472718947-519990841
poly  9 p 430304299 q 473248693 coeff 203640947094031207
poly 11 p 431535019 q 472813609 coeff 204035629743273571
poly 13 p 431254729 q 473068679 coeff 204013104960532991

Last fiddled with by frmky on 2009-10-12 at 04:56
frmky is offline   Reply With Quote
Old 2009-10-12, 04:40   #3
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

203910 Posts
Default

Quote:
Originally Posted by jasonp View Post
- the code feeds big blocks of work to the GPU; on windows, this causes a noticeable drop in system responsiveness. I don't know if it's my code, problems in windows, or problems in the driver.
This is a common issue. The driver does not update the screen while a kernel is running. The only way to avoid this is to do less work during each kernel invocation.
frmky is offline   Reply With Quote
Old 2009-10-12, 06:03   #4
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2,039 Posts
Default

Definitely seems to be working. For 11,275- (C160), it very quickly found

Code:
# norm 8.307323e-16 alpha -7.237004 e 9.177e-13
skew: 52448155.42
c0:  267038620364241191527667889616886081204148
c1: -49448434956421569875337754924497148
c2: -1522849815591746764247431467
c3:  374839148657816567502
c4:  754895057140
c5:  600
Y0: -27461351586329669434359784673597
Y1:  212660576998720663
I'll leave it running overnight.

Greg

Last fiddled with by frmky on 2009-10-12 at 06:05
frmky is offline   Reply With Quote
Old 2009-10-12, 12:04   #5
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,529 Posts
Default

Thanks, fixed in SVN. I wonder how it could work in a windows development environment...

Your tesla card should run 5x faster than mine, so it doesn't surprise me that it flies there.

PS: windows binary updated too

Last fiddled with by jasonp on 2009-10-12 at 12:12
jasonp is offline   Reply With Quote
Old 2009-10-12, 16:52   #6
ltd
 
ltd's Avatar
 
Apr 2003

22·193 Posts
Default

Could you also post a test worktodo.ini to allow two things.
First it would be possible to compare if differnt GPUs create comparable results or if there are minimal differnces between the cards.
Second it would allow to check if changes in your code improve the speed on different GPUs.

Edit: Forgot to report. Exe is running fine on a GTX260. (Windows Vista 64Bit OS)

Last fiddled with by ltd on 2009-10-12 at 16:56
ltd is offline   Reply With Quote
Old 2009-10-12, 17:03   #7
jrk
 
jrk's Avatar
 
May 2008

3×5×73 Posts
Default

Jason, to compile on Linux, it was necessary for me to change a line in file include/cuda_xface.h from:

Code:
#include <cuda.h>
to:

Code:
#include <cuda/cuda.h>

Last fiddled with by jrk on 2009-10-12 at 17:04
jrk is offline   Reply With Quote
Old 2009-10-12, 17:17   #8
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2,039 Posts
Default

Quote:
Originally Posted by jrk View Post
Jason, to compile on Linux, it was necessary for me to change a line in file include/cuda_xface.h from:
You can also add -I/usr/include/cuda or -I/usr/local/include/cuda, whichever is appropriate for your setup, to the Makefile. In my setup, I added -I/usr/local/cuda/include to the Makefile. Changing cuda.h to cuda/cuda.h would break the build here.
frmky is offline   Reply With Quote
Old 2009-10-12, 17:27   #9
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

7F716 Posts
Default

Quote:
Originally Posted by frmky View Post
I'll leave it running overnight.
Overnight, it found a few more polys with leading coefficient 1020, but nothing better.

Once I stop it, how do I know the maximum leading coefficient it has searched? And how do I compare runtimes with the CPU version? The CPU version claims a deadline of 800 sec/coefficient and the GPU version 1600 sec/coefficient. I presume therefore that the time spent on a coefficient range is not indicative of the work done?

And, of course, thanks again Jason for your work!
frmky is offline   Reply With Quote
Old 2009-10-12, 17:28   #10
jrk
 
jrk's Avatar
 
May 2008

3×5×73 Posts
Default

Quote:
Originally Posted by frmky View Post
You can also add -I/usr/include/cuda or -I/usr/local/include/cuda, whichever is appropriate for your setup, to the Makefile. In my setup, I added -I/usr/local/cuda/include to the Makefile. Changing cuda.h to cuda/cuda.h would break the build here.
I also see that the Makefile refers to the environment vars CUDA_INC_PATH and CUDA_LIB_PATH, so one could just set those as well.
jrk is offline   Reply With Quote
Old 2009-10-12, 17:45   #11
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DC916 Posts
Default

When you install the CUDA tools in windows those environment variables are created; apparently the tools don't do so in linux.

In the GPU branch, there's a CPU version that performs the same computations but without any CUDA calls; you can switch it on by editing gnfs/poly/stage1/stage1_sieve.c

Comparing to the non-GPU branch is more tricky; they do not search the same search space, and the CPU version restricts the rational coefficients that are searched. I think what matters the most here is the number of pairwise tests per second, since the rest of the poly generation process has no dependence at all on the precise factors in a leading rational coefficient, only the polynomial they generate.

Both GPU and CPU versions search multiple leading coefficients simultaneously for increased parallelism and reduced overhead. If interrupted in the middle of a run the most you could conclude is that the 'coeff XXX-YYY' line indicated some searching of leading cofficients in that range.

Regarding the deadlines, it's possible these are too conservative, now that the underlying arithmetic proceeds so much faster than before. Greg is correct that there is no fixed amount of per-coefficient work to do; that's because you don't have a realistic chance of exhausting a fixed-size search space for even a single leading coefficient using Kleinjung's improved algorithm. Rather than penalize fast machines, I opted for a time deadline during which you should search as much as you can.

Last fiddled with by jasonp on 2009-10-12 at 17:56
jasonp is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Compiling Msieve with GPU support LegionMammal978 Msieve 6 2017-02-09 04:28
5+ GPU support TheMawn GPU Computing 3 2014-07-13 02:31
Support AVX Unregistered Information & Answers 5 2011-07-05 17:12
Msieve with GNFS support R.D. Silverman Msieve 465 2010-01-11 20:59
Athlon64 support? JuanTutors Software 1 2004-06-04 02:46

All times are UTC. The time now is 10:37.

Mon Sep 21 10:37:49 UTC 2020 up 11 days, 7:48, 0 users, load averages: 2.14, 1.72, 1.60

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.