20150812, 08:37  #1 
Aug 2015
2^{2}×5 Posts 
get msieve1.52 working with CUDA 7
Hello all,
I am new here :) I try to get msieve working with my GPU. I have:  CUDA v 7 and I added the cudart64_55.dll path by hand to the environment variable.  My msieve version is : msieve152_svn942_win64_cuda  I have the sort_engine_*.dll in the ggnfs directory.  My system is windows8 64bits I stuck at: error (line 1080): CUDA_ERROR_FILE_NOT_FOUND I appreciate if you could help. BR 
20150812, 14:26  #2 
Aug 2015
2^{2}×5 Posts 
Update: I placed the cudart64_55.dll in the directory of ggnfs. Thanks to "wombatman"
here is the error I see: Msieve v. 1.52 (SVN unknown) Wed Aug 12 16:23:27 2015 random seeds: 4d942750 a41475b6 factoring 2881039827457895971881627053137530734638790825166127496066674320241571 446494762386620442953820735453 (100 digits) searching for 15digit factors commencing number field sieve (100digit input) commencing number field sieve polynomial selection polynomial degree: 4 max stage 1 norm: 1.58e+017 max stage 2 norm: 3.44e+015 min Evalue: 8.85e009 poly select deadline: 1317 time limit set to 0.37 CPUhours expecting poly E from 1.43e008 to > 1.64e008 searching leading coefficients from 1 to 64740271 using GPU 0 (GeForce GTX 980M) selected card has CUDA arch 5.2 deadline: 5 CPUseconds per coefficient error (line 1080): CUDA_ERROR_FILE_NOT_FOUND 
20150812, 16:50  #3 
I moo ablest echo power!
May 2013
1,741 Posts 
Try one thing. Make a folder in GGNFS named "b40c". Then copy the sort_engine dlls into it. It probably won't work, but I just want to make sure.

20150813, 06:43  #4  
"Antonio Key"
Sep 2011
UK
3^{2}·59 Posts 
Quote:
Code:
cudart64_55.dll cufft64_55.dll pthreadVC2.dll sort_engine_sm20.dll stage1_core_sm20.ptx 

20150813, 21:25  #5  
Aug 2015
14_{16} Posts 
Quote:
Here is what I have done: 1 I copied " cudart64_55.dll" from CUDA 7 to the ggnfs folder. msieve crashes !!! 2 I installed CUDA 5.5 (without removing CUDA7) and copied " cudart64_55.dll" from the 5.5 version. now msieve is working. Thanks to wombatman Antonio: what is pthreadVC2.dll and from where I can have it? Now I need your help guys to confirm that my settings are working: 1 I want to use msieve for polynomial selection, I have GeForce GTX 980M and expecting a lot of time gain compared to YAFU for the polynomial selection step. 2 I choose 320bit number to factor and here after the results of both YAFU and msieve: YAFU: "c:\ .. \yafux64.exe threads 12 noecm" 492 2310901259263 220881370469797408706386 hashtable: 1024 entries, 0.02 MB 492 4753593644573 220881370405355353824960 hashtable: 1024 entries, 0.02 MB elapsed time of 405.5493 seconds exceeds 75 second deadline; poly select done nfs: commencing algebraic side lattice sieving over range: 828337  830004 MSIEVE: "c:\ ...\msieve v np t 12" Thu Aug 13 14:51:02 2015 Msieve v. 1.52 (SVN unknown) Thu Aug 13 14:51:02 2015 random seeds: 26de9c40 896a5b2a Thu Aug 13 14:51:02 2015 factoring 1171120157961862379212541202839180646708107670917765283552896655485570855165005510866723839109497 (97 digits) Thu Aug 13 14:51:02 2015 searching for 15digit factors Thu Aug 13 14:51:03 2015 commencing number field sieve (97digit input) Thu Aug 13 14:51:03 2015 commencing number field sieve polynomial selection Thu Aug 13 14:51:03 2015 polynomial degree: 4 Thu Aug 13 14:51:03 2015 max stage 1 norm: 2.64e+016 Thu Aug 13 14:51:03 2015 max stage 2 norm: 1.34e+015 Thu Aug 13 14:51:03 2015 min Evalue: 1.32e008 Thu Aug 13 14:51:03 2015 poly select deadline: 986 Thu Aug 13 14:51:03 2015 time limit set to 0.27 CPUhours Thu Aug 13 14:51:03 2015 expecting poly E from 2.32e008 to > 2.67e008 Thu Aug 13 14:51:03 2015 searching leading coefficients from 1 to 33775013 Thu Aug 13 14:51:03 2015 using GPU 0 (GeForce GTX 980M) Thu Aug 13 14:51:03 2015 selected card has CUDA arch 5.2 Thu Aug 13 19:41:49 2015 polynomial selection complete Thu Aug 13 19:41:49 2015 R0: 105160964616771536423352 Thu Aug 13 19:41:49 2015 R1: 3495768873991 Thu Aug 13 19:41:49 2015 A0: 40279315312944749723464025 Thu Aug 13 19:41:49 2015 A1: 75131233264987802384 Thu Aug 13 19:41:49 2015 A2: 1627385590123895 Thu Aug 13 19:41:49 2015 A3: 613883062 Thu Aug 13 19:41:49 2015 A4: 9576 Thu Aug 13 19:41:49 2015 skew 331681.64, size 3.415e013, alpha 4.691, combined = 2.122e008 rroots = 2 Thu Aug 13 19:41:49 2015 elapsed time 04:50:47 ==>> as you can see: msieve needs 4 hours 50 minutes compared to 7 minutes for YAFU to get the ply select done !!! do I miss something ?!!! Note: I used GPUZ.0.8.5 to check the utlization of my GPUs; the GPU core clock and memory clock were at the maximum however the GPU load was always zero% !!! Any comments Thanks 

20150813, 21:48  #6 
I moo ablest echo power!
May 2013
1,741 Posts 
YAFU doesn't sieve anywhere near as well as MSieve does. You can also try playing with some of MSieve's options (for instance, stage1_norm and stage2_norm) to speed it up further.
The reason your GPU load is basically 0% is that the GPU works far faster than the CPU can keep up. If you break the np step into, say, np1 (the GPU part), and nps npr (the CPU part), you will probably find that your GPU is more consistently loaded. I would pick a small number (say 110 digits or so) and play around with the parameters to understand what each does. That's the best way to learn it! Edit: Also note that for MSieve, it defaulted to a polynomial of degree 4. You could try changing that to degree 5 and see how it affects your timing. Last fiddled with by wombatman on 20150813 at 21:49 
20150813, 22:17  #7  
Aug 2015
20_{10} Posts 
Thanks wombatman for your great help
I already changed to np1 and increase the number of threads too 100, means: msieve v np1 t 100 now one of the GPUs (I have two on my system) is loaded by 50% on the average. the results are as follows: polynomial selection complete error generating or reading NFS polynomials elapsed time 00:10:15 What is this error ?? "generating or reading NFS polynomials" !!! Now the silly questions :) 1 what is the difference between np1 (the GPU part), and nps npr (the CPU part)? is there any doc, link to understand the difference? 2 What do you mean by: YAFU doesn't sieve anywhere near as well as MSieve does? !!! 3 what is stage1_norm and stage2_norm ? again , my objective is to speed up the prime number finding by nfs. I get into the poly select step in msieve instead of YAFU to save reduce the factorization time. Do I miss something? Thanks Quote:


20150814, 00:31  #8 
I moo ablest echo power!
May 2013
1,741 Posts 
I would start with just typing msieve.exe (with nothing else after it) and take a look at all the different options that come up. Each one has a description of what it means or does. That will explain what np1, nps, and npr do.
As for the error, I'm betting it's because you already have a file with a .fb extension on it in the GGNFS folder. When you do what I wrote above, you'll see a few options related to that. In general, you're going to want to specify a name (option nf). Otherwise, msieve defaults to a certain name (I forget what), so if that file is already present, it can present a problem. Lastly, there's no need to do t 100. You will be limited at t 8 or so. 
20150814, 02:21  #9  
Romulan Interpreter
Jun 2011
Thailand
5·17·109 Posts 
Quote:
Also, if your number originated from a SNFSable form, yafu is enough clever to find this out and spit out a SNFS polynomial, in few minutes. Look to yafu like to a "more automated" version of msieve, you can do more with msieve if you know how and what. Yafu is for people like me who do not know the math, but like to play around, you set it and forget it (but it will call msieve functions when it needs them). The rest is like he said Last fiddled with by LaurV on 20150814 at 02:33 Reason: s/fill/feel/ my English sucks, as usual 

20150814, 03:26  #10 
I moo ablest echo power!
May 2013
1,741 Posts 
That's fair. I meant more that nonYAFU Msieve will do GPU sieving and you actually have control over it, so you can dictate where it stops. In YAFU, if I recall correctly, it sets a pretty fast time limit on the CPU sieving, so you don't actually sieve very far. For small numbers, it doesn't really matter, but once you get to higher ones, it obviously makes a difference.

20150814, 06:08  #11  
"Curtis"
Feb 2005
Riverside, CA
123E_{16} Posts 
Quote:
Anyone GPU poly select on degree 4 polys is a waste of effort. A C97 should be factored in an hour or so on a multicore system, so poly select might take ~5 minutes. Even up to C110 or so, YAFU's simplicity outweighs any time savings you might get from doing GPU poly select on your own. Once you get up over C130, the GPU's power on the first step of the 3step poly select pays meaningful benefits; I wouldn't bother with CPU select at all over C135. Poly selection has 3 steps: Generating raw polynomial candidates (GPU does this 100x faster, or more), followed by size optimization (the nps flag in msieve, CPU only), followed by root optimization (npr, also CPU only). The improvement in final poly output is mostly created by doing the nps and npr steps on a small subset of the polys found in step 1 (this is controlled by stage1norm). The general plan is to set stage 1 norm so that the output from the GPU (the lines on the screen of coefficients of the polys) is in the range of 25/second. A desktop CPU core can do the size optimization at roughly that rate, so running msieve with np1 (GPU step) nps (size opt step) flags both set results in one core fully loaded to do size optimization while the gpu generates a TON of polys but stage1_norm rejects 98% of them before the nps step is run. Alternately, you can leave stage1norm alone, and just have the GPU spit out thousands of polys to a file with np1. You then direct msieve later to nps that file, while the GPU (presumably) does something else or sits idle. Running in this manner causes the CPU step to take ~20x as long as the GPU step! the stage2_norm flag is used to control how many of the nps output polys are rootoptimized (a slow process). Two ways to deal with this: sort the output of the .ms file that nps outputs by the last column (a measure of poly quality), then truncate the file to only run npr on the best 100 (or 200, whatever suits your fancy). *Or*, set stage2_norm to be about a factor of 20 (or 25) lower than the default msieve setting (listed in the log), which causes nps to output 100200 polys per GPUday of search. If you run np all at once, msieve isn't very good at these filters, and will take a LONG time to npr the massive output from the GPU/nps steps. This is why it's not worth bothering to use GPU select for composites below C120 or so it's not worth the human time to run 23 stages and filter the outputs. Hope this helps! Ask questions if you wish, but do read the msieve docs too. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Keeping cuda working over Ubuntu upgrades  fivemack  Software  9  20160116 16:37 
ssh X not working?  Dubslow  Linux  3  20120511 14:44 
Log Out not working?  cheesehead  Forum Feedback  1  20120319 17:13 
DST not working?  Dubslow  Forum Feedback  2  20120319 06:53 
How is working on 44M45M ?  hbock  Lone Mersenne Hunters  0  20050406 17:16 