![]() |
|
|
#12 |
|
Sep 2017
916 Posts |
wombatman, thank you.
But something strange... I'm try to start today gpu version with parameters: msieve -g 0 -v -np 1,4000 -t 4 for 256bit N: 84995282910877845319177434936754876201592264917464386708389127475187790029013 and all is works now with your and with original sortlib but I'm not see in log what the videocard is detected. all works with "-g 0" flag or without it have a same speed result. I write to worktodo.ini test value with 335 bit: 2881039827457895971881627053137530734638790825166127496066674320241571446494762386620442953820735453 and again have CUDA_ERROR_FILE_NOT_FOUND. modified sortlib engine not help. I need to factorize 512 bit value, and my CPU do 0.1% for ~14 min. I think what it will be a little faster with my gpu but most likely for this time it's incompatible for cuda 6.1.
Last fiddled with by usermode on 2017-10-02 at 07:43 |
|
|
|
|
|
#13 |
|
Sep 2009
207810 Posts |
84995282910877845319177434936754876201592264917464386708389127475187790029013 is 78 digits and msieve won't use NFS for numbers with less than 85 digits, it just uses the quadratic sieve on them. So it won't have tried to use the GPU.
2881039827457895971881627053137530734638790825166127496066674320241571446494762386620442953820735453 is a reasonable test case where it will try to use the GPU. Or you could give up on the GPU and use msieve on the CPU for polynomial selection. The speed difference is probably less than the time you've spent trying to get the GPU to work. @jasonp, why doesn't msieve say the name of the file it can't find? That would save a lot of puzzling. Chris Last fiddled with by chris2be8 on 2017-10-02 at 16:05 |
|
|
|
|
|
#14 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
That error is from CUDA, which also doesn't say the file name it was looking for. It could actually refer to either their driver or a CUBIN section of a DLL that is supposed to have the GPU code for the relevant card.
I have seen cases where for a modern enough Pascal card you must compile with the 8.0 toolkit or it won't work even if it doesn't crash. Last fiddled with by jasonp on 2017-10-03 at 14:36 |
|
|
|
|
|
#15 |
|
Jun 2003
Ottawa, Canada
117310 Posts |
I just tried running the factmsieve.py script from my example page with the msieve GPU binary that Brian Gladman compiled and is hosted on my website (http://gilchrist.ca/jeff/factoring/) and it works fine on my GTX 1070 card. It just takes a *really* long time for polynomial selection (5 hours vs 1 minute using 4 thread CPU).
@usermode do you have cudart64_80.dll in your system path or in the same folder as msieve? If not you can download it from here https://developer.nvidia.com/cuda-downloads Code:
Thu Oct 05 10:00:57 2017 -> factmsieve.py (v0.86) Thu Oct 05 10:00:57 2017 -> This is client 1 of 1 Thu Oct 05 10:00:57 2017 -> Running on 4 Cores with 1 hyper-thread per Core Thu Oct 05 10:00:57 2017 -> Working with NAME = example Thu Oct 05 10:00:57 2017 -> Running polynomial selection ... Thu Oct 05 10:00:57 2017 -> msieve -s ..\GPU\example.dat -l ..\GPU\example.log -i ..\GPU\example.ini -nf ..\GPU\example.fb -g 0 -v -np 1,4000 -t 4 Thu Oct 5 10:00:57 2017 Thu Oct 5 10:00:57 2017 Thu Oct 5 10:00:57 2017 Msieve v. 1.53 (SVN 998) Thu Oct 5 10:00:57 2017 random seeds: 830ea260 d235ef08 Thu Oct 5 10:00:57 2017 factoring 2881039827457895971881627053137530734638790825166127496066674320241571446494762386620442953820735453 (100 digits) Thu Oct 5 10:00:57 2017 searching for 15-digit factors Thu Oct 5 10:00:58 2017 commencing number field sieve (100-digit input) Thu Oct 5 10:00:58 2017 commencing number field sieve polynomial selection Thu Oct 5 10:00:58 2017 polynomial degree: 4 Thu Oct 5 10:00:58 2017 max stage 1 norm: 1.58e+17 Thu Oct 5 10:00:58 2017 max stage 2 norm: 3.44e+15 Thu Oct 5 10:00:58 2017 min E-value: 8.85e-09 Thu Oct 5 10:00:58 2017 poly select deadline: 1317 Thu Oct 5 10:00:58 2017 time limit set to 0.37 CPU-hours Thu Oct 5 10:00:58 2017 expecting poly E from 1.43e-08 to > 1.64e-08 Thu Oct 5 10:00:58 2017 searching leading coefficients from 1 to 4000 Thu Oct 5 10:00:58 2017 using GPU 0 (GeForce GTX 1070) Thu Oct 5 10:00:58 2017 selected card has CUDA arch 6.1 Thu Oct 5 15:17:29 2017 polynomial selection complete Thu Oct 5 15:17:29 2017 R0: -1191805077826652345824255 Thu Oct 5 15:17:29 2017 R1: 1949275902691 Thu Oct 5 15:17:29 2017 A0: -900094273514840852683747752 Thu Oct 5 15:17:29 2017 A1: -7337844764575786222070 Thu Oct 5 15:17:29 2017 A2: -3360162038991689 Thu Oct 5 15:17:29 2017 A3: 258820560 Thu Oct 5 15:17:29 2017 A4: 1428 Thu Oct 5 15:17:29 2017 skew 1641629.80, size 1.033e-13, alpha -5.078, combined = 1.251e-08 rroots = 2 Thu Oct 5 15:17:29 2017 elapsed time 05:16:32 |
|
|
|
|
|
#16 |
|
Jun 2003
Ottawa, Canada
3×17×23 Posts |
The only problem I had was when msieve was finally done GPU poly selection there was an error and the script stopped (could not open msieve.dat.p file). The files were actually written so I was able to just re-run the script and it continued from the right spot.
During the long poly run it seemed like the msieve.dat.p file was 0 bytes, except at the end it actually had about 15MB worth of data in it so not sure if there is an issue with it not flushing to disk or something before the end. Any ideas @jasonp? Last fiddled with by Jeff Gilchrist on 2017-10-05 at 21:51 |
|
|
|
|
|
#17 | |
|
Sep 2017
32 Posts |
Quote:
1. At this time I have previous CPU factoring result 26% of work: if I continue process with gpu support, the "cpu" results will be continued or recreated? 2. Now CPU not fully loaded (4 threads from 8 loaded on 40-50% only) and have only one msieve.exe process - is it normally? 3. Which engine I should use better for CPU i7 6700K: msieve.gpu.core2 or msieve.gpu.ivybridge? 4. Oh, ~5+ hours with gpu for 100 digits number. With cpu it 0.35 hours for me. I need to factorize 155 digits number - it will be faster with cpu? Last fiddled with by usermode on 2017-10-05 at 22:17 |
|
|
|
|
|
|
#18 | |
|
I moo ablest echo power!
May 2013
29×61 Posts |
Quote:
Code:
(msieve file name here) -s ..\GPU\example.dat -l ..\GPU\example.log -i ..\GPU\example.ini -nf ..\GPU\example.fb -g 0 -v -np1 1,4000 -t 4
Last fiddled with by wombatman on 2017-10-05 at 23:54 |
|
|
|
|
|
|
#19 | |
|
Sep 2017
32 Posts |
Quote:
https://www.upload.ee/image/7529357/err2.png error generating or reading NFS polynomials in "gpu" dir following files: example.log example.dat.m (157.7 Mb) example.ini which next step is needed? Last fiddled with by usermode on 2017-10-06 at 06:10 |
|
|
|
|
|
|
#20 | |
|
I moo ablest echo power!
May 2013
6E916 Posts |
Quote:
Code:
(msieve_file_name) --help Second, that's not actually an error. It just means you've only done one part of the polynomial selection. Your next step is to run the same command line, but replace "-np1 1,4000 -t 4" with "-nps -npr". This will run the size and root optimization of the candidates you generated (in the file example.dat.m). Note again that this step is single-threaded, so it will take a bit longer. Once you've finished this step, start playing with the command line parameters related to these steps (stage1_norm, stage2_norm, min_evalue, and so on).
|
|
|
|
|
|
|
#21 | |
|
"Curtis"
Feb 2005
Riverside, CA
4,861 Posts |
Quote:
The default msieve settings were created before GPU code was written, so the GPU first stage (-np1) generates more data than is efficient for -nps and -npr to handle. For instance, 3 hrs of -np1 GPU work might generate 8 hours of -nps, and the -nps step might generate 12+ hrs of -npr work. You can order the -nps output by score ("sort" command in linux) and only run -npr on the top 100 or 200 results to save time, or you can adjust the msieve settings with the above flags. I alter stage1_norm and stage2_norm by first looking at the default settings in msieve.log, and then dividing stage 1 by 8 to 10 and dividing stage 2 by 25 to 35. This reduces output enough to make the -nps and -npr steps comparable in length to the -np1 GPU-enabled step, and saves me from having to sort/edit the -nps output. Note that -t threads step in -np1 is threads sent to GPU, *not* threads on CPU. The CPU handles some overhead to manage the data generated by the GPU, and for small jobs or fast cards setting threads to 3 or 4 can more fully utilize the GPU to reduce the time the GPU waits . In no case does this need to be 10+ threads! |
|
|
|
|
|
|
#22 | |
|
I moo ablest echo power!
May 2013
33518 Posts |
Quote:
![]()
|
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Best 4XX series GPU | siegert81 | GPU Computing | 47 | 2011-10-14 00:49 |
| PR 4 # 33 -- The last puzzle from this series | Wacky | Puzzles | 31 | 2006-09-14 16:17 |
| An interesting series | Citrix | Math | 0 | 2005-11-02 05:33 |
| Another Series | Gary Edstrom | Puzzles | 7 | 2003-07-03 08:32 |
| Series | Rosenfeld | Puzzles | 2 | 2003-07-01 17:41 |