mersenneforum.org get msieve1.52 working with CUDA 7
 Register FAQ Search Today's Posts Mark Forums Read

 2015-08-12, 08:37 #1 Anyone   Aug 2015 1416 Posts get msieve1.52 working with CUDA 7 Hello all, I am new here :) I try to get msieve working with my GPU. I have: - CUDA v 7 and I added the cudart64_55.dll path by hand to the environment variable. - My msieve version is : msieve152_svn942_win64_cuda - I have the sort_engine_*.dll in the ggnfs directory. - My system is windows-8 64-bits I stuck at: error (line 1080): CUDA_ERROR_FILE_NOT_FOUND I appreciate if you could help. BR
 2015-08-12, 14:26 #2 Anyone   Aug 2015 101002 Posts Update: I placed the cudart64_55.dll in the directory of ggnfs. Thanks to "wombatman" here is the error I see: Msieve v. 1.52 (SVN unknown) Wed Aug 12 16:23:27 2015 random seeds: 4d942750 a41475b6 factoring 2881039827457895971881627053137530734638790825166127496066674320241571 446494762386620442953820735453 (100 digits) searching for 15-digit factors commencing number field sieve (100-digit input) commencing number field sieve polynomial selection polynomial degree: 4 max stage 1 norm: 1.58e+017 max stage 2 norm: 3.44e+015 min E-value: 8.85e-009 poly select deadline: 1317 time limit set to 0.37 CPU-hours expecting poly E from 1.43e-008 to > 1.64e-008 searching leading coefficients from 1 to 64740271 using GPU 0 (GeForce GTX 980M) selected card has CUDA arch 5.2 deadline: 5 CPU-seconds per coefficient error (line 1080): CUDA_ERROR_FILE_NOT_FOUND
 2015-08-12, 16:50 #3 wombatman I moo ablest echo power!     May 2013 32·193 Posts Try one thing. Make a folder in GGNFS named "b40c". Then copy the sort_engine dlls into it. It probably won't work, but I just want to make sure.
2015-08-13, 06:43   #4
Antonio

"Antonio Key"
Sep 2011
UK

32·59 Posts

Quote:
 Originally Posted by Anyone Hello all, I am new here :) I try to get msieve working with my GPU. I have: - CUDA v 7 and I added the cudart64_55.dll path by hand to the environment variable. - My msieve version is : msieve152_svn942_win64_cuda - I have the sort_engine_*.dll in the ggnfs directory. - My system is windows-8 64-bits I stuck at: error (line 1080): CUDA_ERROR_FILE_NOT_FOUND I appreciate if you could help. BR
To get msieve working on my system I needed the following support files:

Code:
cudart64_55.dll
cufft64_55.dll
sort_engine_sm20.dll
stage1_core_sm20.ptx
They are all in the same directory as msieve.

2015-08-13, 21:25   #5
Anyone

Aug 2015

22·5 Posts

Quote:
 Originally Posted by Antonio To get msieve working on my system I needed the following support files: Code: cudart64_55.dll cufft64_55.dll pthreadVC2.dll sort_engine_sm20.dll stage1_core_sm20.ptx They are all in the same directory as msieve.

Here is what I have done:
1- I copied " cudart64_55.dll" from CUDA 7 to the ggnfs folder. msieve crashes !!!
2- I installed CUDA 5.5 (without removing CUDA7) and copied " cudart64_55.dll" from the 5.5 version. now msieve is working. Thanks to wombatman

Antonio: what is pthreadVC2.dll and from where I can have it?

Now I need your help guys to confirm that my settings are working:
1- I want to use msieve for polynomial selection, I have GeForce GTX 980M and expecting a lot of time gain compared to YAFU for the polynomial selection step.
2- I choose 320-bit number to factor and here after the results of both YAFU and msieve:

YAFU:
"c:\ .. \yafu-x64.exe -threads 12 -noecm"

492 2310901259263 220881370469797408706386
hashtable: 1024 entries, 0.02 MB
492 4753593644573 220881370405355353824960
hashtable: 1024 entries, 0.02 MB
elapsed time of 405.5493 seconds exceeds 75 second deadline; poly select done
nfs: commencing algebraic side lattice sieving over range: 828337 - 830004

MSIEVE:
"c:\ ...\msieve -v -np -t 12"

Thu Aug 13 14:51:02 2015 Msieve v. 1.52 (SVN unknown)
Thu Aug 13 14:51:02 2015 random seeds: 26de9c40 896a5b2a
Thu Aug 13 14:51:02 2015 factoring 1171120157961862379212541202839180646708107670917765283552896655485570855165005510866723839109497 (97 digits)
Thu Aug 13 14:51:02 2015 searching for 15-digit factors
Thu Aug 13 14:51:03 2015 commencing number field sieve (97-digit input)
Thu Aug 13 14:51:03 2015 commencing number field sieve polynomial selection
Thu Aug 13 14:51:03 2015 polynomial degree: 4
Thu Aug 13 14:51:03 2015 max stage 1 norm: 2.64e+016
Thu Aug 13 14:51:03 2015 max stage 2 norm: 1.34e+015
Thu Aug 13 14:51:03 2015 min E-value: 1.32e-008
Thu Aug 13 14:51:03 2015 poly select deadline: 986
Thu Aug 13 14:51:03 2015 time limit set to 0.27 CPU-hours
Thu Aug 13 14:51:03 2015 expecting poly E from 2.32e-008 to > 2.67e-008
Thu Aug 13 14:51:03 2015 searching leading coefficients from 1 to 33775013
Thu Aug 13 14:51:03 2015 using GPU 0 (GeForce GTX 980M)
Thu Aug 13 14:51:03 2015 selected card has CUDA arch 5.2
Thu Aug 13 19:41:49 2015 polynomial selection complete
Thu Aug 13 19:41:49 2015 R0: -105160964616771536423352
Thu Aug 13 19:41:49 2015 R1: 3495768873991
Thu Aug 13 19:41:49 2015 A0: 40279315312944749723464025
Thu Aug 13 19:41:49 2015 A1: -75131233264987802384
Thu Aug 13 19:41:49 2015 A2: -1627385590123895
Thu Aug 13 19:41:49 2015 A3: -613883062
Thu Aug 13 19:41:49 2015 A4: 9576
Thu Aug 13 19:41:49 2015 skew 331681.64, size 3.415e-013, alpha -4.691, combined = 2.122e-008 rroots = 2
Thu Aug 13 19:41:49 2015 elapsed time 04:50:47

==>> as you can see: msieve needs 4 hours 50 minutes compared to 7 minutes for YAFU to get the ply select done !!! do I miss something ?!!!

Note: I used GPU-Z.0.8.5 to check the utlization of my GPUs; the GPU core clock and memory clock were at the maximum however the GPU load was always zero% !!!

Thanks

 2015-08-13, 21:48 #6 wombatman I moo ablest echo power!     May 2013 32×193 Posts YAFU doesn't sieve anywhere near as well as MSieve does. You can also try playing with some of MSieve's options (for instance, stage1_norm and stage2_norm) to speed it up further. The reason your GPU load is basically 0% is that the GPU works far faster than the CPU can keep up. If you break the -np step into, say, -np1 (the GPU part), and -nps -npr (the CPU part), you will probably find that your GPU is more consistently loaded. I would pick a small number (say 110 digits or so) and play around with the parameters to understand what each does. That's the best way to learn it! Edit: Also note that for MSieve, it defaulted to a polynomial of degree 4. You could try changing that to degree 5 and see how it affects your timing. Last fiddled with by wombatman on 2015-08-13 at 21:49
2015-08-13, 22:17   #7
Anyone

Aug 2015

22×5 Posts

Thanks wombatman for your great help
I already changed to -np1 and increase the number of threads too 100, means:
msieve -v -np1 -t 100

now one of the GPUs (I have two on my system) is loaded by 50% on the average.
the results are as follows:

polynomial selection complete
error generating or reading NFS polynomials
elapsed time 00:10:15

What is this error ?? "generating or reading NFS polynomials" !!!

Now the silly questions :)
1- what is the difference between -np1 (the GPU part), and -nps -npr (the CPU part)? is there any doc, link to understand the difference?

2- What do you mean by: YAFU doesn't sieve anywhere near as well as MSieve does? !!!
3- what is stage1_norm and stage2_norm ?

again , my objective is to speed up the prime number finding by nfs. I get into the poly select step in msieve instead of YAFU to save reduce the factorization time. Do I miss something?

Thanks

Quote:
 Originally Posted by wombatman YAFU doesn't sieve anywhere near as well as MSieve does. You can also try playing with some of MSieve's options (for instance, stage1_norm and stage2_norm) to speed it up further. The reason your GPU load is basically 0% is that the GPU works far faster than the CPU can keep up. If you break the -np step into, say, -np1 (the GPU part), and -nps -npr (the CPU part), you will probably find that your GPU is more consistently loaded. I would pick a small number (say 110 digits or so) and play around with the parameters to understand what each does. That's the best way to learn it! Edit: Also note that for MSieve, it defaulted to a polynomial of degree 4. You could try changing that to degree 5 and see how it affects your timing.

 2015-08-14, 00:31 #8 wombatman I moo ablest echo power!     May 2013 32·193 Posts I would start with just typing msieve.exe (with nothing else after it) and take a look at all the different options that come up. Each one has a description of what it means or does. That will explain what -np1, -nps, and -npr do. As for the error, I'm betting it's because you already have a file with a .fb extension on it in the GGNFS folder. When you do what I wrote above, you'll see a few options related to that. In general, you're going to want to specify a name (option -nf). Otherwise, msieve defaults to a certain name (I forget what), so if that file is already present, it can present a problem. Lastly, there's no need to do -t 100. You will be limited at -t 8 or so.
2015-08-14, 02:21   #9
LaurV
Romulan Interpreter

Jun 2011
Thailand

2·3·31·47 Posts

Quote:
 Originally Posted by Anyone 2- What do you mean by: YAFU doesn't sieve anywhere near as well as MSieve does? !!!
I can answer for this, he is wrong , yafu uses msieve, so there is no difference when you come to "sieving". For your C97, yafu would use its own, very fast, quadratic sieve algorithm (SIQS, "the fastest public available qs", google for "yafu siqs"), so it would be faster to factor that number without any polynomial selection. The difference you feel when you do NFS, that is, starting from a C110 and above, where yafu would use the CPU for poly selection, and msieve would use the GPU, which is about two orders of magnitude faster for this step. The difference comes from hardware, not from software. Once you have the poly, they both will sieve with the CPU, with the same speed.

Also, if your number originated from a SNFS-able form, yafu is enough clever to find this out and spit out a SNFS polynomial, in few minutes. Look to yafu like to a "more automated" version of msieve, you can do more with msieve if you know how and what. Yafu is for people like me who do not know the math, but like to play around, you set it and forget it (but it will call msieve functions when it needs them).

The rest is like he said

Last fiddled with by LaurV on 2015-08-14 at 02:33 Reason: s/fill/feel/ my English sucks, as usual

 2015-08-14, 03:26 #10 wombatman I moo ablest echo power!     May 2013 32×193 Posts That's fair. I meant more that non-YAFU Msieve will do GPU sieving and you actually have control over it, so you can dictate where it stops. In YAFU, if I recall correctly, it sets a pretty fast time limit on the CPU sieving, so you don't actually sieve very far. For small numbers, it doesn't really matter, but once you get to higher ones, it obviously makes a difference.
2015-08-14, 06:08   #11
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

103418 Posts

Quote:
 Originally Posted by wombatman That's fair. I meant more that non-YAFU Msieve will do GPU sieving and you actually have control over it, so you can dictate where it stops. In YAFU, if I recall correctly, it sets a pretty fast time limit on the CPU sieving, so you don't actually sieve very far. For small numbers, it doesn't really matter, but once you get to higher ones, it obviously makes a difference.
What meaning of "sieve" do you mean here? LaurV's reply concerned NFS sieving, as in finding relations, while you seem to be referring to part of the poly-select step.

Anyone-
GPU poly select on degree 4 polys is a waste of effort. A C97 should be factored in an hour or so on a multi-core system, so poly select might take ~5 minutes. Even up to C110 or so, YAFU's simplicity outweighs any time savings you might get from doing GPU poly select on your own. Once you get up over C130, the GPU's power on the first step of the 3-step poly select pays meaningful benefits; I wouldn't bother with CPU select at all over C135.
Poly selection has 3 steps: Generating raw polynomial candidates (GPU does this 100x faster, or more), followed by size optimization (the -nps flag in msieve, CPU only), followed by root optimization (-npr, also CPU only).
The improvement in final poly output is mostly created by doing the nps and npr steps on a small subset of the polys found in step 1 (this is controlled by stage1norm). The general plan is to set stage 1 norm so that the output from the GPU (the lines on the screen of coefficients of the polys) is in the range of 2-5/second. A desktop CPU core can do the size optimization at roughly that rate, so running msieve with -np1 (GPU step) -nps (size opt step) flags both set results in one core fully loaded to do size optimization while the gpu generates a TON of polys but -stage1_norm rejects 98% of them before the -nps step is run.
Alternately, you can leave stage1norm alone, and just have the GPU spit out thousands of polys to a file with -np1. You then direct msieve later to -nps that file, while the GPU (presumably) does something else or sits idle. Running in this manner causes the CPU step to take ~20x as long as the GPU step!
the -stage2_norm flag is used to control how many of the -nps output polys are root-optimized (a slow process). Two ways to deal with this: sort the output of the .ms file that -nps outputs by the last column (a measure of poly quality), then truncate the file to only run -npr on the best 100 (or 200, whatever suits your fancy). *Or*, set -stage2_norm to be about a factor of 20 (or 25) lower than the default msieve setting (listed in the log), which causes -nps to output 100-200 polys per GPU-day of search.

If you run -np all at once, msieve isn't very good at these filters, and will take a LONG time to -npr the massive output from the GPU/nps steps. This is why it's not worth bothering to use GPU select for composites below C120 or so- it's not worth the human time to run 2-3 stages and filter the outputs.

Hope this helps! Ask questions if you wish, but do read the msieve docs too.

 Similar Threads Thread Thread Starter Forum Replies Last Post fivemack Software 9 2016-01-16 16:37 Dubslow Linux 3 2012-05-11 14:44 cheesehead Forum Feedback 1 2012-03-19 17:13 Dubslow Forum Feedback 2 2012-03-19 06:53 hbock Lone Mersenne Hunters 0 2005-04-06 17:16

All times are UTC. The time now is 22:19.

Thu Sep 24 22:19:31 UTC 2020 up 14 days, 19:30, 0 users, load averages: 1.44, 1.61, 1.61