mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2009-10-12, 17:54   #12
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Jason, when running msieve-gpu, it uses 100% of CPU. Is there a way to make it use less CPU and only do computation on GPU?
jrk is offline   Reply With Quote
Old 2009-10-12, 17:59   #13
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Beyond telling the CUDA driver that the process should not spin waiting for kernel calls to complete, I don't know how to force the application to yield the CPU while the graphics card is busy. Maybe you can run with -p to switch msieve to idle priority? I see 50-80% utilization on a dual core while poly selection is running, with occaisional bursts up to 100%. Your card is a lot nicer than mine, so maybe the CPU has to work harder to keep up :)

Last fiddled with by jasonp on 2009-10-12 at 18:00
jasonp is offline   Reply With Quote
Old 2009-10-12, 18:09   #14
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Quote:
Originally Posted by jasonp View Post
Beyond telling the CUDA driver that the process should not spin waiting for kernel calls to complete,
Yes, getting rid of spin-waiting would be ideal. I don't know much of CUDA development, so I don't know how to do that.

Quote:
Originally Posted by jasonp View Post
I don't know how to force the application to yield the CPU while the graphics card is busy. Maybe you can run with -p to switch msieve to idle priority?
That would still rob time from other idle tasks, so is not ideal.

Quote:
Originally Posted by jasonp View Post
I see 50-80% utilization on a dual core while poly selection is running, with occaisional bursts up to 100%. Your card is a lot nicer than mine, so maybe the CPU has to work harder to keep up :)
Hmm. Well it is a 8800GTS so I don't think it is better than your 9800GT. So the difference in CPU consumption is for some other reason.
jrk is offline   Reply With Quote
Old 2009-10-12, 18:23   #15
axn
 
axn's Avatar
 
Jun 2003

5,051 Posts
Default

Code:
commencing number field sieve polynomial selection
time limit set to 300.00 hours
searching leading coefficients from 1 to 2115959
deadline: 1600 seconds per coefficient
coeff 60-960 395459735 435005708 435005709 478506279
------- 395459735-435005708 435005709-478506279
poly  1 p 396019553 q 435057283 coeff 172291190743054499
poly  8 p 395868619 q 435084233 coeff 172236194466384227
poly 14 p 395853127 q 435133843 coeff 172249092415077061
poly 10 p 395661803 q 435139097 coeff 172167919674811891
Seeing output like the above. Does that mean it is currently doing stage1 for the range 60-960 using GPU and will only commence stage 2 after the GPU has done its job?
axn is offline   Reply With Quote
Old 2009-10-12, 18:27   #16
ltd
 
ltd's Avatar
 
Apr 2003

22·193 Posts
Default

My GTX260 also uses one complete "virtual core". Its a i7 so it has 4 cores with HT making it 8 "virtual cores".
ltd is offline   Reply With Quote
Old 2009-10-12, 18:28   #17
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2×34×13 Posts
Default

Quote:
Originally Posted by jasonp View Post
Beyond telling the CUDA driver that the process should not spin waiting for kernel calls to complete, I don't know how to force the application to yield the CPU while the graphics card is busy.
In the cuCtxCreate() call in gnfs/poly/stage1/stage1_sieve.c, change CU_CTX_SCHED_YIELD to CU_CTX_BLOCKING_SYNC. This takes the CPU usage from near 100% to almost 0.

As a crude comparison of relative speeds, I created a cpu version by changing the #if statement. The CPU version output 5 "poly" lines in the time that the GPU version output 66 "poly" lines. By this crude measure, it's somewhere around 11-17 times faster.

The S1070 here has 4 GPUs. Specifying the number of GPUs, and perhaps even which GPUs, would be useful.

Last fiddled with by frmky on 2009-10-12 at 18:32
frmky is online now   Reply With Quote
Old 2009-10-12, 18:54   #18
jrk
 
jrk's Avatar
 
May 2008

109510 Posts
Default

Quote:
Originally Posted by frmky View Post
In the cuCtxCreate() call in gnfs/poly/stage1/stage1_sieve.c, change CU_CTX_SCHED_YIELD to CU_CTX_BLOCKING_SYNC. This takes the CPU usage from near 100% to almost 0.
That brings my CPU usage down to 88%.
jrk is offline   Reply With Quote
Old 2009-10-12, 18:57   #19
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default

The good news is that with the assistance of Jason and Greg Childers I managed to compile a binary.

The bad news is:
Code:
[pcl@maat msieve-gpu]$ ./msieve -np -i test.num
deadline: 1600 seconds per coefficient
error (line 137): CUDA_ERROR_FILE_NOT_FOUND
[pcl@maat msieve-gpu]$ grep CUDA_ERROR_FILE_NOT_FOUND */*
common/cuda_xface.c:    case CUDA_ERROR_FILE_NOT_FOUND: return "CUDA_ERROR_FILE_NOT_FOUND";
Binary file common/cuda_xface.o matches
Hmm. I'll investigate more tomorrow when I'm likely to be more sober and less tired than I am right now.


Paul
xilman is offline   Reply With Quote
Old 2009-10-12, 18:57   #20
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

354110 Posts
Default

axn: the code begins stage 2 after every output line appears there; this isn't like pol5, stage 1 output is not batched up and sent to stage 2 all at once.

Greg: thanks, that works much better. Am I correct that I can't use that flag if I want asynchronous behavior, i.e. if I build double-buffering into the code so that the CPU runs stage 2 while the GPU runs the next stage 1?

jrk: my CPU usage drops to 10-15% with Greg's change

Paul: once nvcc generates it, stage1_core.ptx is expected to be in the same directory as the msieve binary

all: binary has been updated again.

Last fiddled with by jasonp on 2009-10-12 at 19:08
jasonp is offline   Reply With Quote
Old 2009-10-12, 18:58   #21
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Quote:
Originally Posted by jasonp View Post
Nvidia's CUDA profiler shows the GPU is only 15% utilized, so there's headroom for much more performance too. I'm still working on this, and will post updated binaries as I understand this hardware better.
Just a quick thought: How much time do the kernels spend on the GPU relative to the overhead of loading them?
jrk is offline   Reply With Quote
Old 2009-10-12, 19:00   #22
jrk
 
jrk's Avatar
 
May 2008

109510 Posts
Default

Quote:
Originally Posted by xilman View Post
The good news is that with the assistance of Jason and Greg Childers I managed to compile a binary.

The bad news is:
Code:
[pcl@maat msieve-gpu]$ ./msieve -np -i test.num
deadline: 1600 seconds per coefficient
error (line 137): CUDA_ERROR_FILE_NOT_FOUND
[pcl@maat msieve-gpu]$ grep CUDA_ERROR_FILE_NOT_FOUND */*
common/cuda_xface.c:    case CUDA_ERROR_FILE_NOT_FOUND: return "CUDA_ERROR_FILE_NOT_FOUND";
Binary file common/cuda_xface.o matches
Hmm. I'll investigate more tomorrow when I'm likely to be more sober and less tired than I am right now.


Paul
You need the stage1_core.ptx file in your msieve directory. If you are on Linux you'll need to compile this file per frmky's instructions in the second post here.
jrk is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Compiling Msieve with GPU support LegionMammal978 Msieve 6 2017-02-09 04:28
5+ GPU support TheMawn GPU Computing 3 2014-07-13 02:31
Support AVX Unregistered Information & Answers 5 2011-07-05 17:12
Msieve with GNFS support R.D. Silverman Msieve 465 2010-01-11 20:59
Athlon64 support? JuanTutors Software 1 2004-06-04 02:46

All times are UTC. The time now is 00:48.


Sat Jul 17 00:48:49 UTC 2021 up 49 days, 22:36, 1 user, load averages: 1.34, 1.47, 1.38

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.