mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > GMP-ECM

Reply
 
Thread Tools
Old 2016-08-10, 16:59   #408
cgy606
 
Feb 2012

32×7 Posts
Default

I am using a GTX 980 on my laptop with 2048 cuda cores and running 1024 stage 1 instances per process. I have been running curves using ecm_gpu (stage 1 on my gpu and multi-threading stage 2 on my cpu using ecm.py script). Suddenly, the gpu speed cut in half. I tried playing around with the number of curves running on my gpu using the -gpucurves tag. It was still slow. Then I shut my laptop off and restarted... it returned to normal. Has anybody ever experienced this? My concern is that it might go back to "half speed" later on in the middle of some big job...
cgy606 is offline   Reply With Quote
Old 2016-08-10, 17:42   #409
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

1179610 Posts
Default

Quote:
Originally Posted by cgy606 View Post
I am using a GTX 980 on my laptop with 2048 cuda cores and running 1024 stage 1 instances per process. I have been running curves using ecm_gpu (stage 1 on my gpu and multi-threading stage 2 on my cpu using ecm.py script). Suddenly, the gpu speed cut in half. I tried playing around with the number of curves running on my gpu using the -gpucurves tag. It was still slow. Then I shut my laptop off and restarted... it returned to normal. Has anybody ever experienced this? My concern is that it might go back to "half speed" later on in the middle of some big job...
Not experienced it myself, but laptops are notorious for slowing things down if they feel that the temperature is too high or the estimated battery lifetime is too low. Had endless fun persuading my MacBook that I knew better than it did when it came to deciding whether or not to use the GPU.
xilman is offline   Reply With Quote
Old 2016-08-10, 17:57   #410
Gordon
 
Gordon's Avatar
 
Nov 2008

1FD16 Posts
Default

Quote:
Originally Posted by cgy606 View Post
I am using a GTX 980 on my laptop with 2048 cuda cores and running 1024 stage 1 instances per process. I have been running curves using ecm_gpu (stage 1 on my gpu and multi-threading stage 2 on my cpu using ecm.py script). Suddenly, the gpu speed cut in half. I tried playing around with the number of curves running on my gpu using the -gpucurves tag. It was still slow. Then I shut my laptop off and restarted... it returned to normal. Has anybody ever experienced this? My concern is that it might go back to "half speed" later on in the middle of some big job...
Happens when you start a 2nd copy of gmp-ecm with only a single gpu available...perhaps the same applies to ecm_gpu?
Gordon is offline   Reply With Quote
Old 2016-08-10, 18:37   #411
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

74116 Posts
Default

Yeah, my assumption would be the GPU getting too hot. GPU-ECM will definitely raise the temperature.
wombatman is offline   Reply With Quote
Old 2016-08-11, 05:56   #412
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

41×251 Posts
Default

Take the GPU-Z and run it in parallel with GPU-ECM, watch it close. If there was a temperature issue, then it will happen again in about the same amount of time, then look to what GPU-Z says in that moment (it says why the card's speed is restricted, like temperature issues, power issues, etc). Look if the clock changes (it may, to reduce the power, or it may not, and reduce the power by inserting idle clocks, each method has advantages and disadvantages, but for a clear temperature issue, the clock will be cut, for sure).

Last fiddled with by LaurV on 2016-08-11 at 05:57
LaurV is offline   Reply With Quote
Old 2016-08-19, 22:29   #413
cgy606
 
Feb 2012

3F16 Posts
Default

Quote:
Originally Posted by wombatman View Post
I truthfully don't remember all the steps I went through to get everything compiled successfully with VS2012 (I have no experience with VS2015). I'm attaching the gpu_ecm exe I have built. Try it out and see what dlls you need for it.
Hi Wombatman,

I was wondering you have have thought about what the steps that you took to compile gmp-ecm for gpu-ecm. I have the standard 1018-bit version and was wondering if I could try to see if a lower bit version would work more quickly and be effective (as in finding factors in stage 1 or stage 2 on the cpu). This guy from a much earlier post seems to have gotten it to work:

Quote:
Originally Posted by debrouxl View Post
I've switched to CC 2.0 compilation as well, and the default number of curves has raised from 32 to 64 - same change as xilman above.

I haven't yet seen a mention of non-power of 2 NB_DIGITS in this thread... therefore, I tried it, even if I have no idea whether it should work
Well, at least, it does not seem to fail horribly:
* the resulting executable doesn't crash;
* the size of the executable is between the size of the 512-bit version and the size of the 1024-bit version;
* on both a C211 and a C148, the 768-bit version is faster than the 1024-bit-arithmetic version:

Code:
$ echo 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663 | ./gpu_ecm_24 -vv -save 76663_210_ecm24_3e6 3000000
#Compiled for a NVIDIA GPU with compute capability 2.0.
#Will use device 0 : GeForce GT 540M, compute capability 2.1, 2 MPs.
#s has 4328086 bits
Precomputation of s took 0.252s
Input number is 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663 (212 digits)
Using B1=3000000, firstinvd=435701810, with 64 curves
...
gpu_ecm took : 1444.690s (0.000+1444.686+0.004)
Throughput : 0.044

$ echo 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663 | ./gpu_ecm_32 -vv -save 76663_210_ecm32_3e6 3000000
...
gpu_ecm took : 1814.801s (0.000+1814.797+0.004)
Throughput : 0.035
Code:
for i in 16 24 32; do echo 3068628376360794912078530386432442844396649484227245118385713667577336042284107359110543525586164007547649873239035755922916136752709773803297694127 | "./gpu_ecm_$i" -vv -save "80009_213_ecm${i}_3e6" 3000000; done
...
gpu_ecm took : 865.578s (0.000+865.574+0.004)
Throughput : 0.074
...
gpu_ecm took : 1707.302s (0.000+1707.298+0.004)
Throughput : 0.037
...
gpu_ecm took : 2044.451s (0.000+2044.447+0.004)
Throughput : 0.031

Comparison against CPU GMP-ECM running on 1 hyperthread of a SandyBridge i7, whose other 7 hyperthreads are used to the max as well:
Code:
$ echo 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663 | ecm -c 1 3e6
GMP-ECM 6.5-dev [configured with GMP 5.0.90, --enable-asm-redc, --enable-assert] [ECM]
Input number is 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663 (211 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=1718921992
Step 1 took 34590ms
Step 2 took 11536ms
Code:
$ echo 3068628376360794912078530386432442844396649484227245118385713667577336042284107359110543525586164007547649873239035755922916136752709773803297694127 | ecm -c 1 3e6
GMP-ECM 6.5-dev [configured with GMP 5.0.90, --enable-asm-redc, --enable-assert] [ECM]
Input number is 3068628376360794912078530386432442844396649484227245118385713667577336042284107359110543525586164007547649873239035755922916136752709773803297694127 (148 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=3766168691

Step 1 took 21521ms
Step 2 took 8016ms
For composites of those sizes, the GT 540M can beat one hyperthread of i7-2670QM if the CPU is busy, but not if the CPU is idle.
I was thinking it would be nice to have multiple versions of gpu-ecm for varying bit levels in order to speed up stage 1 on the gpu in comparison to stage 2 on the cpu. I have observed a constant 3.2 hr on my GTX 980 for all input C307 and below for 1024 curves at B1 = 43M. My quad core 4ghz processor takes 4.1hr to run 1024 stage 2 curves (using 8 hyper-threads). However for a C155, cpu stage 2 takes only 2.4 hr, so looking to make up the difference using a 634-bit version (changing NB_DIGITS to 20 from 32 in the standard 1018-bit) for this particular number. Any help would be appreciated...

Last fiddled with by cgy606 on 2016-08-19 at 22:34
cgy606 is offline   Reply With Quote
Old 2016-08-19, 23:06   #414
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

3×619 Posts
Default

Here's the program compiled with NB_DIGITS set to 20. I would try finding a known factor to ensure it works properly. It passed the first few tests in test.gpuecm, which generally indicates it is working, but it's good to be sure.
Attached Files
File Type: zip ecm_gpu_ndigits20.zip (364.8 KB, 175 views)
wombatman is offline   Reply With Quote
Old 2016-08-20, 04:52   #415
cgy606
 
Feb 2012

32·7 Posts
Default

Some interesting behavior to say the least. I ran a test on a C147 that I know has a p10, p17, p20, p20 at B1 = 250K. I ran 512 curves on the GPU using the ndigits = 20. It found all factors after stage 1:

Code:
GMP-ECM 7.0.1-dev [configured with MPIR 2.7.0, --enable-gpu, --enable-openmp] [ECM]
Input number is (99!+5)/9176362385 (147 digits)
Using B1=250000, B2=0, sigma=3:3407157017-3:3407157528 (512 curves)
Block: 20x32x1 Grid: 16x1x1

Computing 512 Step 1 took 2328ms of CPU time / 19709ms of GPU time
********** Factor found in step 1: 5275321151
Found probable prime factor of 10 digits: 5275321151
Composite cofactor ((99!+5)/9176362385)/5275321151 has 137 digits
********** Factor found in step 1: 42645646522247063
Found probable prime factor of 17 digits: 42645646522247063
Composite cofactor (((99!+5)/9176362385)/5275321151)/42645646522247063 has 120 digits
********** Factor found in step 1: 61133702826671342149
Found probable prime factor of 20 digits: 61133702826671342149
Composite cofactor ((((99!+5)/9176362385)/5275321151)/42645646522247063)/61133702826671342149 has 100 digits
********** Factor found in step 1: 31905776268663843113
Found probable prime factor of 20 digits: 31905776268663843113
Probable prime cofactor (((((99!+5)/9176362385)/5275321151)/42645646522247063)/61133702826671342149)/31905776268663843113 has 81 digits
However, I now tried to run the on the gpu a composite that I was reasonable certain wouldn't find factors in stage 1. I choose a C189 that has 2x p24 in it using B1 = 250K. I ran a total of 3072 curves at that B1 and no factors were found in stage 1. I then went on to the save files using the ecm.py script written by WraithX for stage 2 and I didn't find a single factor in stage 2 (no factor when I know there are 2 p24 after a 7*t30 search is highly improbable)

The scientist in me decided that I should try another 512 curves at B1 = 250K (a t30 search) using the default ndigits = 32. Here is the output I got:
Code:
ON GPU

GMP-ECM 7.0.1-dev [configured with MPIR 2.7.0, --enable-gpu, --enable-openmp] [ECM]
Input number is (126!+5)/79768672096773991353065 (189 digits)
Using B1=250000, B2=0, sigma=3:290623459-3:290623970 (512 curves)
Block: 32x32x1 Grid: 16x1x1
300000 iterations to go
200000 iterations to go
100000 iterations to go
90000 iterations to go
80000 iterations to go
70000 iterations to go
60000 iterations to go
50000 iterations to go
40000 iterations to go
30000 iterations to go
20000 iterations to go
10000 iterations to go
GPU: factor 324295084094116127662247 found in Step 1 with curve 368 (-sigma 3:290623827)
Computing 512 Step 1 took 2344ms of CPU time / 33876ms of GPU time
********** Factor found in step 1: 324295084094116127662247
Found probable prime factor of 24 digits: 324295084094116127662247
Composite cofactor ((126!+5)/79768672096773991353065)/324295084094116127662247 has 165 digits

STAGE 2 RESUME ON CPU

-> ___________________________________________________________________
-> | Running ecm.py, a Python driver for distributing GMP-ECM work   |
-> | on a single machine.  It is copyright, 2011-2016, David Cleaver |
-> | and is a conversion of factmsieve.py that is Copyright, 2010,   |
-> | Brian Gladman. Version 0.40 (Python 2.6 or later)  6th Aug 2016 |
-> |_________________________________________________________________|

-> Resuming work from resume file: 126fac5_250e3_0.save
-> Spreading the work across 8 thread(s)
->=============================================================================
-> Working on the number(s) in the resume file: 126fac5_250e3_0.save
-> Using up to 8 instances of GMP-ECM...
-> Found 512 unique resume lines to work on.
-> Will start working on the 512 resume lines.
-> ecm -resume resume_job_126fac5_250e3_0-save_inp_t00.txt 250000 > resume_job_126fac5_250e3_0-save_out_t00.txt  (64 resume lines)
-> ecm -resume resume_job_126fac5_250e3_0-save_inp_t01.txt 250000 > resume_job_126fac5_250e3_0-save_out_t01.txt  (64 resume lines)
-> ecm -resume resume_job_126fac5_250e3_0-save_inp_t02.txt 250000 > resume_job_126fac5_250e3_0-save_out_t02.txt  (64 resume lines)
-> ecm -resume resume_job_126fac5_250e3_0-save_inp_t03.txt 250000 > resume_job_126fac5_250e3_0-save_out_t03.txt  (64 resume lines)
-> ecm -resume resume_job_126fac5_250e3_0-save_inp_t04.txt 250000 > resume_job_126fac5_250e3_0-save_out_t04.txt  (64 resume lines)
-> ecm -resume resume_job_126fac5_250e3_0-save_inp_t05.txt 250000 > resume_job_126fac5_250e3_0-save_out_t05.txt  (64 resume lines)
-> ecm -resume resume_job_126fac5_250e3_0-save_inp_t06.txt 250000 > resume_job_126fac5_250e3_0-save_out_t06.txt  (64 resume lines)
-> ecm -resume resume_job_126fac5_250e3_0-save_inp_t07.txt 250000 > resume_job_126fac5_250e3_0-save_out_t07.txt  (64 resume lines)
GMP-ECM 7.0.1-dev [configured with MPIR 2.7.0, --enable-gpu, --enable-openmp] [ECM]
Using B1=250000-250000, B2=128992510, polynomial Dickson(3), 8 threads
____________________________________________________________________________
 Curves Complete |   Average seconds/curve   |    Runtime    |      ETA
-----------------|---------------------------|---------------|--------------
    17 of    512 | Stg1 0.000s | Stg2 0.523s |   0d 00:00:02 |   0d 00:02:09

Resume line 17 out of 512:

Using B1=250000-250000, B2=128992510, polynomial Dickson(3), sigma=3:290623649
Step 1 took 0ms
Step 2 took 594ms
********** Factor found in step 2: 452655830807187689684039
Found probable prime factor of 24 digits: 452655830807187689684039
Probable prime cofactor (((126!+5)/79768672096773991353065)/324295084094116127662247)/452655830807187689684039 has 142 digits
Not only does it find one of the factors in stage 1 (after 512 stage 1 runs at 250K), but it finds the other p24 after only 17 curves in stage 2. Now, I know ecm is a probabilistic algorithm, but these results are not due to sheer 'luck'...

I think something is grossly wrong in the ecm code when ndigits is changed from the default (I know cyril has pointed to this), but haven;t a clue what it is...

Last fiddled with by Batalov on 2016-08-20 at 15:49 Reason: formatted the program output chunks into /code/ blocks
cgy606 is offline   Reply With Quote
Old 2016-08-20, 07:09   #416
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

3×23×89 Posts
Default

From memory only 16 and 32 worked when I looked at it a long while ago. Not a clue why.
henryzz is offline   Reply With Quote
Old 2016-08-20, 14:31   #417
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

2·2,927 Posts
Default

I agree with Henry- only power-of-2 values worked, and only 32 worked without issue. 16 and 64 "mostly worked", but were never found to be 100% reliable for anyone.
IMO, it's not worth the missed factors to gain the time savings from using 16, unless you're running CPU-only ECM in parallel.
VBCurtis is offline   Reply With Quote
Old 2016-08-20, 15:12   #418
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

3·619 Posts
Default

Yeah, I played around NB_DIGITS before, but only bumping it up as I recall. It will compile and run and can find factors in Stage 1, but it always had issues with Stage 2.
wombatman is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Running CUDA on non-Nvidia GPUs Rodrigo GPU Computing 3 2016-05-17 05:43
Error in GMP-ECM 6.4.3 and latest svn ATH GMP-ECM 10 2012-07-29 17:15
latest SVN 1677 ATH GMP-ECM 7 2012-01-07 18:34
Has anyone seen my latest treatise? davieddy Lounge 0 2011-01-21 19:29
Latest version? [CZ]Pegas Software 3 2002-08-23 17:05

All times are UTC. The time now is 04:22.


Fri Jul 7 04:22:27 UTC 2023 up 323 days, 1:51, 0 users, load averages: 1.94, 1.72, 1.56

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

โ‰  ยฑ โˆ“ รท ร— ยท โˆ’ โˆš โ€ฐ โŠ— โŠ• โŠ– โŠ˜ โŠ™ โ‰ค โ‰ฅ โ‰ฆ โ‰ง โ‰จ โ‰ฉ โ‰บ โ‰ป โ‰ผ โ‰ฝ โŠ โА โŠ‘ โŠ’ ยฒ ยณ ยฐ
โˆ  โˆŸ ยฐ โ‰… ~ โ€– โŸ‚ โซ›
โ‰ก โ‰œ โ‰ˆ โˆ โˆž โ‰ช โ‰ซ โŒŠโŒ‹ โŒˆโŒ‰ โˆ˜ โˆ โˆ โˆ‘ โˆง โˆจ โˆฉ โˆช โจ€ โŠ• โŠ— ๐–• ๐–– ๐–— โŠฒ โŠณ
โˆ… โˆ– โˆ โ†ฆ โ†ฃ โˆฉ โˆช โІ โŠ‚ โŠ„ โŠŠ โЇ โŠƒ โŠ… โŠ‹ โŠ– โˆˆ โˆ‰ โˆ‹ โˆŒ โ„• โ„ค โ„š โ„ โ„‚ โ„ต โ„ถ โ„ท โ„ธ ๐“Ÿ
ยฌ โˆจ โˆง โŠ• โ†’ โ† โ‡’ โ‡ โ‡” โˆ€ โˆƒ โˆ„ โˆด โˆต โŠค โŠฅ โŠข โŠจ โซค โŠฃ โ€ฆ โ‹ฏ โ‹ฎ โ‹ฐ โ‹ฑ
โˆซ โˆฌ โˆญ โˆฎ โˆฏ โˆฐ โˆ‡ โˆ† ฮด โˆ‚ โ„ฑ โ„’ โ„“
๐›ข๐›ผ ๐›ฃ๐›ฝ ๐›ค๐›พ ๐›ฅ๐›ฟ ๐›ฆ๐œ€๐œ– ๐›ง๐œ ๐›จ๐œ‚ ๐›ฉ๐œƒ๐œ— ๐›ช๐œ„ ๐›ซ๐œ… ๐›ฌ๐œ† ๐›ญ๐œ‡ ๐›ฎ๐œˆ ๐›ฏ๐œ‰ ๐›ฐ๐œŠ ๐›ฑ๐œ‹ ๐›ฒ๐œŒ ๐›ด๐œŽ๐œ ๐›ต๐œ ๐›ถ๐œ ๐›ท๐œ™๐œ‘ ๐›ธ๐œ’ ๐›น๐œ“ ๐›บ๐œ”