mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GMP-ECM (https://www.mersenneforum.org/forumdisplay.php?f=55)
-   -   ECM for CUDA GPUs in latest GMP-ECM ? (https://www.mersenneforum.org/showthread.php?t=16480)

Karl M Johnson 2012-01-27 21:27

ECM for CUDA GPUs in latest GMP-ECM ?
 
Greetings.

While working on gmp-ecm, I discovered a "gpu" folder inside latest svn.
Can somebody shed some light on it?
Is it a fully-working gmp-ecm implementation ?
Or partially gpu-accelerated ?
Any info will do!

xilman 2012-01-27 21:47

[QUOTE=Karl M Johnson;287461]Greetings.

While working on gmp-ecm, I discovered a "gpu" folder inside latest svn.
Can somebody shed some light on it?
Is it a fully-working gmp-ecm implementation ?
Or partially gpu-accelerated ?
Any info will do![/QUOTE]I'm playing with it. Not yet found a new factor with it. Current activity is to work out what's going on and then to see whether I can contribute to the effort. So far I've found areas where I may be able to help Cyril with development.

So far, it is Stage 1 only (as one should expect) and each bit in the product of prime powers up to B1 requires a separate kernel call. It would be easy enough to make the entire sequence of elliptic curve arithmetic operations a single kernel but not obviously a good idea (think about it).

The code currently uses fixed kilobit arithmetic and so is limited to factoring integers somewhat under that size. One of the areas I may be able to help is to add flexibility in that regard. Another is improve the underlying arithmetic primitives. A third is to reduce the (presently extortionate IMO) amount of cpu time used by busy-waiting for the kernels to complete.


Paul

Karl M Johnson 2012-01-28 06:55

Oh, alright.
It cant substitute gmp-ecm yet.

Stage2 on GPUs is not very possible(except Teslas) because it requires a lot of RAM ?

xilman 2012-02-10 15:35

[QUOTE=Karl M Johnson;287497]Oh, alright.
It cant substitute gmp-ecm yet.[/QUOTE]Actually it can. I just found my first factor with gpu-ecm :surprised:[code]

Resuming ECM residue saved by pcl@anubis.home.brnikat.com with GPU-ECM 0.1
Input number is 27199999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 (275 digits)
Using B1=110000000-0, B2=110000000-776278396540, polynomial Dickson(30), A=112777948379516601562499999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999998
Step 1 took 0ms
Step 2 took 176954ms
********** Factor found in step 2: 7219650603145651593481420276356225303436099
Found probable prime factor of 43 digits: 7219650603145651593481420276356225303436099
Composite cofactor 3767495339476507669490528975999036413199084363632138583182977316467718682572689799651214383863597597569613063674073098348731470273762390945944125050829513627232967008803673015815350396060271580465332444072090424114938478394067646101 has 232 digits[/code]
A total of 1792 curves were run at B1=110M and the factor found on the 1349th second stage. I'm running the remainder in case another factor can be found.

Each stage one took 70 seconds on a GTX-460. The latest ECM takes 679 seconds per stage 1 on a single core of a 1090T clocked at 3264MHz, so the GPU version is close to 10 times faster in this situation.

Paul

Brain 2012-02-10 15:46

GPU ECM 0.1
 
Could anybody provide a (link to a) (Windows 64) binary for GPU-ECM 0.1?

xilman 2012-02-10 16:08

[QUOTE=xilman;288906]Actually it can. I just found my first factor with gpu-ecm[/QUOTE]Surprise is no longer adequate, and I'm forced to resort to astonishment :shock:[code]

Using B1=110000000-0, B2=110000000-776278396540, polynomial Dickson(30), A=113233923912048339843749999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999998
Step 1 took 0ms
Step 2 took 176408ms
********** Factor found in step 2: 2315784840580190375316972295830305082761
Found probable prime factor of 40 digits: 2315784840580190375316972295830305082761
Composite cofactor 11745478044145667127387304121681137574169838448099828222193156280305822325912582394902125137121744003039884215185197839759773264126893037178852537820600462333360764644042755609744438099938488940900666235577910910744169710591419476281159 has 236 digits[/code] took another 18 curves.

If I was unlucky not to find the p43 earlier than this, given the amount of ECM work performed, I was especially unlucky not to find the p40. The candidate number, GW(10,272) had no previously known factors despite having had a complete t40 run through the ECMNET server here and clients around the world.

The cofactor is now c193 so it's worth seeing whether the remaining curves will find anything.


Paul

xilman 2012-02-10 16:23

[QUOTE=Brain;288909]Could anybody provide a (link to a) (Windows 64) binary for GPU-ECM 0.1?[/QUOTE]Not me.

pinhodecarlos 2012-02-10 16:25

[QUOTE=xilman;288913]Not me.[/QUOTE]

And for linux?:bow:

xilman 2012-02-10 16:56

[QUOTE=pinhodecarlos;288914]And for linux?:bow:[/QUOTE]If you have Linux you can build from the SVN sources as easily as I can.

The process really is very straightforward and you'll end up with something which doesn't carry the risk of the Linux equivalent of DLL-hell.

If you really are not lazy(*) enough to build your own, I could make available the binary I use. No guarantees that it will work, or even run, on any other Linux system. It almost certainly won't work optimally unless you have exactly the same environment as me.

Paul

* Sometimes it's much better to do some work ahead of time to remove the need to do much more work later. That's true laziness.

Karl M Johnson 2012-02-10 21:30

Windows binary wanted:smile:

R.D. Silverman 2012-02-11 00:07

[QUOTE=xilman;288906]Actually it can. I just found my first factor with gpu-ecm :surprised:[code]

Resuming ECM residue saved by pcl@anubis.home.brnikat.com with GPU-ECM 0.1
Input number is 27199999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 (275 digits)
Using B1=110000000-0, B2=110000000-776278396540, polynomial Dickson(30), A=112777948379516601562499999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999998
Step 1 took 0ms
Step 2 took 176954ms
********** Factor found in step 2: 7219650603145651593481420276356225303436099
Found probable prime factor of 43 digits: 7219650603145651593481420276356225303436099
Composite cofactor 3767495339476507669490528975999036413199084363632138583182977316467718682572689799651214383863597597569613063674073098348731470273762390945944125050829513627232967008803673015815350396060271580465332444072090424114938478394067646101 has 232 digits[/code]
A total of 1792 curves were run at B1=110M and the factor found on the 1349th second stage. I'm running the remainder in case another factor can be found.

Each stage one took 70 seconds on a GTX-460. The latest ECM takes 679 seconds per stage 1 on a single core of a 1090T clocked at 3264MHz, so the GPU version is close to 10 times faster in this situation.

Paul[/QUOTE]

Awesome.

Is the code specific to a particular GPU? How portable is it?

Batalov 2012-02-11 00:23

For Windows, it would have been nice if someone could downgrade the build/solution files vs2010 to vs2008 and post. For comparison, CUDA toolkit contains scripts for 2005, 2008, 2010 -- that is kinda user friendly. Not everyone has 2010. (Well, temporarily one can get the trial license.)

xilman 2012-02-11 00:27

[QUOTE=Batalov;288969]For Windows, it would have been nice if someone could downgrade the build/solution files vs2010 to vs2008 and post. For comparison, CUDA toolkit contains scripts for 2005, 2008, 2010 -- that is kinda user friendly. Not everyone has 2010. (Well, temporarily one can get the trial license.)[/QUOTE]If you want it, why don't you do it?

Prime95 2012-02-11 01:21

[QUOTE=Batalov;288969]Not everyone has 2010. (Well, temporarily one can get the trial license.)[/QUOTE]

VS 2010 is free (the express edition). Yes, it is 32-bit only, but you can get the Windows SDK for free which contains the 64-bit compiler.

Batalov 2012-02-11 02:51

[QUOTE=xilman;288971]If you want it, why don't you do it?[/QUOTE]
Because it is ugly! I am not talking about .sln files; for the .vcxproj -> .vcproj conversion most of the internet based advices amount to 'you might be best served by using the "New project from existing code" wizard to build a new VC2008 project for the code rather than trying to convert the existing project.' It is best done by the authors who know their source and dependencies.

I'll try the free version.

frmky 2012-02-11 08:03

[QUOTE=R.D. Silverman;288966]Is the code specific to a particular GPU? How portable is it?[/QUOTE]
It uses CUDA, thus requires an nVidia GPU supporting CC 1.3 or higher.

frmky 2012-02-11 08:09

[QUOTE=xilman;288906]Each stage one took 70 seconds on a GTX-460.[/QUOTE]
The GTX 460 has 7 MP's. Did you do 224 curves in parallel or more? Are there memory latencies that are hidden by doing more?

debrouxl 2012-02-11 08:59

On my GT540M (admittedly a fairly low-end model, with 2 MPs), under Debian unstable x86_64, gpu_ecm with 64 curves in parallel seems to be somewhat slower than CPU-based stage 1 (tuned GMP-ECM binary) running on Core i7-2670QM @ 2.2 GHz.
I have tested B1 bounds from 5e4 to 16e6, and 32, 64 or 128 parallel curves. 32 curves has throughput markedly slower than 64, but 128 is hardly better than 64 for throughput.

xilman 2012-02-11 09:14

[QUOTE=frmky;289014]The GTX 460 has 7 MP's. Did you do 224 curves in parallel or more? Are there memory latencies that are hidden by doing more?[/QUOTE]The first 896 curves were done in four batches of 224 --- the default. The second ran all in parallel with a block of 32x32 and grid of 70x1x1. I seem to have lost the detailed timing information for the earlier curves :sad: As best I recall, running 896 took slightly less than running 224 four times but I could be quite wrong.

xilman 2012-02-11 09:23

Big oops.
 
I screwed up computing the time per curve :redface:
[QUOTE=xilman;288906]Each stage one took 70 seconds on a GTX-460. The latest ECM takes 679 seconds per stage 1 on a single core of a 1090T clocked at 3264MHz, so the GPU version is close to 10 times faster in this situation.[/QUOTE]

1792 curves took 141 hours to run. I evaluated (1792 * 141 / 3600) to obtain the quoted figure of 70 seconds per curve.

The correct expression is (141 * 3600 / 1792), which evaluates to 283 seconds per curve.
Although this is four times worse than the initial figure, it is still 2.4 times faster than a singe core.

Sorry about that.

Karl M Johnson 2012-02-11 11:23

[QUOTE=frmky;289013]It uses CUDA, thus requires an nVidia GPU supporting CC 1.3 or higher.[/QUOTE]
DP FP calculations are being performed ?

jasonp 2012-02-11 12:30

No, it looks to be integer only. Compute Capability 1.3 also has advanced synchronization primitives that the CC 1.3 branch uses. The code is [url="https://gforge.inria.fr/scm/viewvc.php/trunk/gpu/?root=ecm"]here[/url] for anyone who's interested.

pinhodecarlos 2012-02-11 12:54

[QUOTE=xilman;289019]I screwed up computing the time per curve :redface:


1792 curves took 141 hours to run. I evaluated (1792 * 141 / 3600) to obtain the quoted figure of 70 seconds per curve.

The correct expression is (141 * 3600 / 1792), which evaluates to 283 seconds per curve.
Although this is four times worse than the initial figure, it is still 2.4 times faster than a singe core.

Sorry about that.[/QUOTE]

Paul, do you see cpu usage when running GPU-ECM?
BTW, I'm too lazy even to install linux...lol When I look back to the software I use on windows I just don't think I can use all of them in linux.

ATH 2012-02-11 16:39

[QUOTE=xilman;288917]If you have Linux you can build from the SVN sources as easily as I can.

The process really is very straightforward and you'll end up with something which doesn't carry the risk of the Linux equivalent of DLL-hell.[/QUOTE]

Can you post your compile options please? Then maybe I can figure out how to compile this in msys/mingw for "windoze".

xilman 2012-02-11 17:43

[QUOTE=ATH;289050]Can you post your compile options please? Then maybe I can figure out how to compile this in msys/mingw for "windoze".[/QUOTE]I don't use compile options [i]per se[/i], just configure and make.

You will almost certainly find life much much easier if you install VirtualBox or the like and then a Linux inside a virtual machine. Building GPM and GPM-ECM is then a complete doddle --- essentially a matter of saying "./configure; make ; make check ; make install" in the respective build directories. Once you've done that for each, you have everything you need --- working binaries which you can us as-is or use as a gold standard against which to check new builds, together with all the documentation, compile options, etc, which you can cut and paste into either the host environment or into other hosted machines.

Brain 2012-02-11 19:01

Makefile for CC13
 
1 Attachment(s)
Here's the currently available trunk makefile for CC13.

WraithX 2012-02-11 19:06

[QUOTE=xilman;289057]I don't use compile options [i]per se[/i], just configure and make.

Building GPM and GPM-ECM is then a complete doddle --- essentially a matter of saying "./configure; make ; make check ; make install" in the respective build directories.[/QUOTE]

I was wondering, inside the gpu directory is a makefile and there are also two other directories (gpu_ecm and gpu_ecm_cc13) that both have makefiles. In which directory, or directories, do you run make? In which directory do you create the binary that you are referencing?

Also, inside the gpu directories, I see no configure file. So, it's not "configure and make" that you run, it's just "make", correct?

If I had an nVidia video card, I would try this myself. However, I do not, so I will leave it to others to try.

xilman 2012-02-11 19:16

[QUOTE=WraithX;289062]I was wondering, inside the gpu directory is a makefile and there are also two other directories (gpu_ecm and gpu_ecm_cc13) that both have makefiles. In which directory, or directories, do you run make? In which directory do you create the binary that you are referencing?

Also, inside the gpu directories, I see no configure file. So, it's not "configure and make" that you run, it's just "make", correct?

If I had an nVidia video card, I would try this myself. However, I do not, so I will leave it to others to try.[/QUOTE]My main machine is a Fermi so I didn't even bother with the cc13 version.


To answer your other question: you should read README.dev in the trunk directory. I'm not being wilfully obtuse. You really should read how to configure the development code environment.

Once everything is in place, you do indeed just run make.

xilman 2012-02-11 19:24

[QUOTE=xilman;289064]I'm not being wilfully obtuse. You really should read how to configure the development code environment.[/QUOTE]In case it is not clear to bystanders, this code is [b]not[/b] fire and forget. It is [b]not[/b] production quality.

If you want to use it, you will need to get your hands dirty. I'm prepared to help as best I can [b]after[/b] you've followed the instructions in the svn distro and after you've made a sincere effort to get things working by yourself. I am not prepared to bottle-feed, to wipe noses or to change {nappies,diapers}.

That may sound harsh but it's the way the world of alpha-code development works and you'll need to get used to it if you want to play with the big boys and girls. Once you pass the audition you'll find most developers are very friendly and helpful.

Neither am I addressing these remarks to any particular individuals who may, or may not, have posted in this thread.

Paul

frmky 2012-02-11 22:34

I'm getting results slower than the cpu. I'm using a c144 (from the 4788 aliquoit sequence) on a Core i7 CPU and GTX 480 GPU:

[CODE]~/ecmtest$ ~/bin/ecm 11e6 < c144
GMP-ECM 6.5-dev [configured with GMP 5.0.4, --enable-asm-redc] [ECM]
Input number is 216210261026078873728038575619824007502275880651339130269087415140753033343108746166779571643387473335848998664028620971224681169067812545897739 (144 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=1115846
Step 1 took 35420ms
Step 2 took 13620ms

~/bin/gpu_ecm -n 256 -save test 11000000 < c144
Precomputation of s took 0.950s
Input number is 216210261026078873728038575619824007502275880651339130269087415140753033343108746166779571643387473335848998664028620971224681169067812545897739 (144 digits)
Using B1=11000000, firstinvd=724674352, with 256 curves
gpu_ecm took : 13144.730s (0.000+13144.720+0.010)
Throughput : 0.019

~/bin/gpu_ecm -n 480 -save test 11000000 < c144
Precomputation of s took 0.950s
Input number is 216210261026078873728038575619824007502275880651339130269087415140753033343108746166779571643387473335848998664028620971224681169067812545897739 (144 digits)
Using B1=11000000, firstinvd=1789558835, with 480 curves
gpu_ecm took : 24198.970s (0.000+24198.960+0.010)
Throughput : 0.020
[/CODE]
This GPU has 15 MP's, so gpu_ecm defaults to 480 curves, but that was only slightly faster than using 256 curves:
CPU: 35.4 s
GPU 256: 51.3 s
GPU 480: 50.4 s


Hmmm... Why do larger numbers take less time?
[CODE]
~/bin/gpu_ecm -n 480 11000 < c144
Precomputation of s took 0.000s
Input number is 216210261026078873728038575619824007502275880651339130269087415140753033343108746166779571643387473335848998664028620971224681169067812545897739 (144 digits)
Using B1=11000, firstinvd=1718283956, with 480 curves
gpu_ecm took : 24.260s (0.000+24.250+0.010)
Throughput : 19.786

~/bin/gpu_ecm -n 480 11000 < 10p332
Precomputation of s took 0.000s
Input number is 3082036244247618744713879350181267942494229636149227133619560368864804688115816966917438461372823837680425045410470575056718115654210704653050148781462686145415984611154261527877775921978501350266306075811598788040720480163782506686648165217270804627622798871662974986806951627082442232588805761 (295 digits)
Using B1=11000, firstinvd=412318627, with 480 curves
gpu_ecm took : 12.530s (0.000+12.520+0.010)
Throughput : 38.308
[/CODE]

jasonp 2012-02-12 02:19

Maybe having more multiprocessors means that larger blocks of work have to be given in a kernel launch.

frmky 2012-02-12 06:27

With the larger number 10,332+, gpu_ecm is indeed about 4x faster:
[CODE]~/bin/gpu_ecm -n 480 11000000 < 10p332
Precomputation of s took 0.950s
Input number is 3082036244247618744713879350181267942494229636149227133619560368864804688115816966917438461372823837680425045410470575056718115654210704653050148781462686145415984611154261527877775921978501350266306075811598788040720480163782506686648165217270804627622798871662974986806951627082442232588805761 (295 digits)
Using B1=11000000, firstinvd=197457519, with 480 curves
gpu_ecm took : 12460.500s (0.000+12460.490+0.010)
Throughput : 0.039
26 s/curve

~/bin/ecm 11e6 < 10p332
GMP-ECM 6.5-dev [configured with GMP 5.0.4, --enable-asm-redc] [ECM]
Input number is 3082036244247618744713879350181267942494229636149227133619560368864804688115816966917438461372823837680425045410470575056718115654210704653050148781462686145415984611154261527877775921978501350266306075811598788040720480163782506686648165217270804627622798871662974986806951627082442232588805761 (295 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=770548151
Step 1 took 116030ms
Step 2 took 31240ms[/CODE]

debrouxl 2012-02-12 08:08

Indeed, it's much better with larger numbers, even on fairly low-end GPUs :smile:

[code]$ echo 77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777677777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777 | ./gpu_ecm -vv -n 64 -save 77677_149_3e6_1 3000000
#Compiled for a NVIDIA GPU with compute capability 1.3.
#Will use device 0 : GeForce GT 540M, compute capability 2.1, 2 MPs.
#s has 4328086 bits
Precomputation of s took 0.260s
Input number is 77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777677777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777 (299 digits)
Using B1=3000000, firstinvd=1956725845, with 64 curves
8+64*d=15748722851276397705078124999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999979751642048358917236328125000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000037
8+64*d=15748816728591918945312499999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999979751521348953247070312500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000037
#Begin GPU computation...
Block: 32x16x1 Grid: 4x1x1

#Looking for factors for the curves with (d*2^32) mod N = 1956725845
xfin=30793582623383249085792654048330071529605422286616239783953502163671251485785103766950295774964496127287193014815633605233749525283700023789095348986097253179578725011454469815600504523632074726379316756899242430619968238212335497266401636095557828925843807562412336497359441279441288718847426104007
zfin=15653600481320091921866091998449420910733335116855114761017777079039414436628501359278897983474928453087970457407357858920433473350010208701033703353460574476309980980321812580804654671402344055561817931400864403015931562744329996813634620940647894525755619616828739385722340718168011611105085982277
xunif=39408042568104336805270379492712518016456213440668263700932960319496692204712994493267826611404606523547436560316003809780479305567060943184017323896559838396801417755478902976203909415762293776133099854903949460128487164294408353102629988943560190937509387321676681877121514979954245436789912376565

#Looking for factors for the curves with (d*2^32) mod N = 1956725908
xfin=14966698215750072697023404424489655322417510285848104176664523355430448144237921356160942849117869928416482104860850242503535702178551676734700459600950072236295600757108345379537820143078621679600800366849565012827321265584610218377563003322400088365819158936957519419145860156952262705564563161722
zfin=13196899771013716409148933418583531970092448862012559897727436095137691137124639002299074611035125565787268265934160640735274655897627921875382930307151913405846752684117468279045581956406343173251902392466987748520166146927437149020702523884077620927133537101563815962380052471821344666329942845796
xunif=16137963010874506957647933426009827242074704110758480638668956781797848968933741211221737736217202543887736175366874915988459776979538810976607154822667437291905995042783104430898367764816031215094913761969808431783659168083194336651034563970539910660975759146036916968022060602536777108941718805587
gpu_ecm took : 1420.292s (0.000+1420.288+0.004)
Throughput : 0.045
(~22 s/curve)


$ echo 77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777677777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777 | ecm -c 1 3000000
GMP-ECM 6.5-dev [configured with GMP 5.0.90, --enable-asm-redc, --enable-assert] [ECM]
Input number is 77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777677777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777 (299 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=2227022774
Step 1 took 58271ms
Step 2 took 17197ms[/code]

debrouxl 2012-02-12 10:07

Another data point, for numbers between C144 and C29x: C237 is slower on the GPU, but obviously faster on the CPU, than C29x:
[code]$ echo 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 | ./gpu_ecm -vv -n 64 -save 80009_248_3e6_1 3000000
#Compiled for a NVIDIA GPU with compute capability 1.3.
#Will use device 0 : GeForce GT 540M, compute capability 2.1, 2 MPs.
#s has 4328086 bits
Precomputation of s took 0.256s
Input number is 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 (237 digits)
Using B1=3000000, firstinvd=563947071, with 64 curves
[snip]
gpu_ecm took : 1637.614s (0.000+1637.610+0.004)
Throughput : 0.039


$ echo 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 | ./ecm -c 1 3000000
bash: ./ecm: Aucun fichier ou dossier de ce type
debrouxl@asus2:~/ecm/gpu/gpu_ecm$ echo 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 | ecm -c 1 3000000
GMP-ECM 6.5-dev [configured with GMP 5.0.90, --enable-asm-redc, --enable-assert] [ECM]
Input number is 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 (237 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=379651352
Step 1 took 42974ms
Step 2 took 12981ms[/code]

On that number, the core busy with gpu_ecm is spending a bit more than half of its time in "system" state. I guess that it's what xilman mentioned above ?
[quote]A third is to reduce the (presently extortionate IMO) amount of cpu time used by busy-waiting for the kernels to complete.[/quote]

lorgix 2012-02-12 12:58

[QUOTE=xilman;289019]I screwed up computing the time per curve :redface:


1792 curves took 141 hours to run. I evaluated (1792 * 141 / 3600) to obtain the quoted figure of 70 seconds per curve.

The correct expression is (141 * 3600 / 1792), which evaluates to 283 seconds per curve.
Although this is four times worse than the initial figure, it is still 2.4 times faster than a singe core.

Sorry about that.[/QUOTE]
Still a very interesting development. Congrats on the factors.

xilman 2012-02-12 13:18

Since my first experiments, I've been playing with a version which uses 512-bit arithmetic (fudged with CFLAGS+=-DNB_DIGITS=16 in the relevant line of Makefile). As expected, ECM runs around 3 times faster on ~500 bit numbers with this change.

One of the things on my to-do list is to add greater flexibility to the choice of bignum sizes.

Experiments with both 1024 and 512-bit arithmetic indicate that running more than the default number of curves is a Good Thing, presumably by hiding memory latency. The downside, of course, is that the display stays rather sluggish for a proportionately long time. I'm trying to estimate how long a run will take and then kick it off overnight when display latency is likely to be unimportant.


Paul

frmky 2012-02-12 21:26

[QUOTE=xilman;289134]I'm trying to estimate how long a run will take and then kick it off overnight when display latency is likely to be unimportant.[/QUOTE]
I added a percent complete counter in the for loop launching the kernels in cudautil.cu. I don't think adding an ETA would be difficult.

R.D. Silverman 2012-02-12 22:07

[QUOTE=xilman;289065]In case it is not clear to bystanders, this code is [b]not[/b] fire and forget. It is [b]not[/b] production quality.

If you want to use it, you will need to get your hands dirty. I'm prepared to help as best I can [b]after[/b] you've followed the instructions in the svn distro and after you've made a sincere effort to get things working by yourself. I am not prepared to bottle-feed, to wipe noses or to change {nappies,diapers}.

That may sound harsh but it's the way the world of alpha-code development works and you'll need to get used to it if you want to play with the big boys and girls. Once you pass the audition you'll find most developers are very friendly and helpful.

Neither am I addressing these remarks to any particular individuals who may, or may not, have posted in this thread.

Paul[/QUOTE]

I totally agree with you.

However, allow me to point out that when I present a similar attitude
toward the learning of the algorithms discussed herein and the mathematics
behind them, I am lambasted for my efforts.

Participants should be willing to put in the effort or they should leave.

xilman 2012-02-12 23:40

[QUOTE=R.D. Silverman;289203]I totally agree with you.

However, allow me to point out that when I present a similar attitude
toward the learning of the algorithms discussed herein and the mathematics
behind them, I am lambasted for my efforts.

Participants should be willing to put in the effort or they should leave.[/QUOTE]It seems to me that one difference is that there is a large amount of fire-and-forget code available and that code is suited to the majority of the people here. [b]Only[/b] those who prepared for all the frustrations of working at the bleeding edge have any great need to be able to build, debug and install alpha code from a subversion repository.

Much of the mathematics discussed here is [b]not[/b] at the bleeding edge, IMO. It is closer in spirit to oft-times cranky but nonetheless well understood and supported applications such as mainstream gmp-ecm.

IMO, your diatribes against those wishing to perform bleeding edge mathematics are fully justified. They are less appropriate, again IMO, further away from the bleeding edge. I hope I would never feel the urge to issue my earlier warnings to those who only wish to use gmp-ecm and are confused by its jargon and multitudinous options.

R.D. Silverman 2012-02-13 00:40

[QUOTE=xilman;289218]It seems to me that one difference is that there is a large amount of fire-and-forget code available and that code is suited to the majority of the people here. [b]Only[/b] those who prepared for all the frustrations of working at the bleeding edge have any great need to be able to build, debug and install alpha code from a subversion repository.
[/QUOTE]

We agree.

Indeed. I have even heard one of the people (whom I hold in contempt)
admit that he does not even know how to use a compiler.

[QUOTE]

Much of the mathematics discussed here is [b]not[/b] at the bleeding edge, IMO. It is closer in spirit to oft-times cranky but nonetheless well understood and supported applications such as mainstream gmp-ecm.
[/QUOTE]

And from my point of view too many of the participants herein do
not understand things even at that level. Nor do they seem willing
to make the attempt. They don't even understand mathematics that
was known 150+ years ago. Nor do they want to make the effort.

xilman 2012-02-14 19:12

[QUOTE=xilman;289134]Experiments with both 1024 and 512-bit arithmetic indicate that running more than the default number of curves is a Good Thing[/QUOTE]Another data point shows that even choosing the correct default number is significant.

Out of the box (well, my box anyway) the default build appears to use parameters suitable for a CC1.3 system, despite there being a Fermi card installed. A run on a C302 with these parameters chooses 112 curves arranged 32x16 x 7x1x1 and takes 3845.428 seconds. Rebuilding with "make cc=2" and re-running took 5539.049 seconds for 224 curves arranged 32x32 x 7x1x1. The ratio (224/112) * (3845.428 / 5539.049) is 1.388.

I suggest a 39% speed-up is worth having.

Ralf Recker 2012-02-14 22:02

A few quick tests with a small B1 value
 
CC 2.0 card (GTX 470, stock clocks), 512 bit arithmetic, CUDA SDK 4.0. The c151 was taken from the Aliquot sequence 890460:i898

[CODE]ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm -d 0 -save c151.save 250000 < c151
Precomputation of s took 0.004s
Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits)
Using B1=250000, firstinvd=24351435, with 448 curves
gpu_ecm took : 116.363s (0.000+116.355+0.008)
Throughput : 3.850[/CODE]Doubling the number of curves improves the throughput:

[CODE]ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm -d 0 -n 896 -save c151.save 250000 < c151
Precomputation of s took 0.004s
Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits)
Using B1=250000, firstinvd=1471710578, with 896 curves
gpu_ecm took : 179.747s (0.000+179.731+0.016)
Throughput : 4.985[/CODE]32 curves less and the throughput increases by another 30%
[CODE]ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm -d 0 -n 864 -save c151.save 250000 < c151
Precomputation of s took 0.004s
Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits)
Using B1=250000, firstinvd=1374804691, with 864 curves
gpu_ecm took : 130.964s (0.000+130.948+0.016)
Throughput : 6.597
[/CODE]The throughput on a CC 2.1 card (GTX 460, 725 MHz factory OC) for the same number:

[CODE] 224 curves - Throughput : 2.289
416 curves - Throughput : 4.223
448 curves - Throughput : 4.547
480 curves - Throughput : 3.039
672 curves - Throughput : 4.233
896 curves - Throughput : 4.638
1792 curves - Throughput : 4.753[/CODE]

ET_ 2012-02-15 19:19

gpu_ecm ready to work
 
OK, I downloaded the source code with cc=1.3, and successfully compiled it :smile:

Sadly, I see differences between the Xilman and Ralf Recker outputs.

The executable passes the test.

What represents the (needed) parameter N in the command line? All I can see is that it has to do with the xfin, zfin and xunif parameters, and should be odd...

I also tried ./gpu_ecm 9699691 11000 -n 1 <in where in contains the number 65798732165875434667. I got the factor 347 that is not a factor of the number in input...

To testify my good will:

[code]
./gpu_ecm 9699691 11000 -n 1 <in
#Compiled for a NVIDIA GPU with compute capability 1.3.
#Will use device 0 : GeForce GTX 275, compute capability 1.3, 30 MPs.
#gpu_ecm launched with :
N=9699691
B1=11000
curves=1
firstsigma=11
#used seed 1329332970 to generate sigma

#Begin GPU computation...
#All kernels launched, waiting for results...
#All kernels finished, analysing results...
#Looking for factors for the curves with sigma=11
xfin=3111202
zfin=7720056
#Factor found : 347 (with z)
#Results : 1 factor found

#Temps gpu : 15.080 init&copy=0.040 computation=15.040
[/code]

Now, I understand that the program is not "fire ad forget", and I would really, REALLY know more about it, but the interface is not documented, the use of gmp-ecm is different and in the link posted by Jason there is no indication that a README file is present anywhere in the trunk.

Would you mind (now that my hands have been contaminated by bits and compilers) shedding some light to this obscure valley? Even a link explaining what N means in this context would suffice... :smile:

Many thanks...

Luigi

P.S. after some more fiddling, I noticed that 347 is a factor of 9699691, so I think I got the meaning of N after all... :redface:

With N3 and 448 curves, my GTX275 has the same speed of my Intel I5-750.

jasonp 2012-02-15 19:44

That directory was not the trunk, [url="https://gforge.inria.fr/scm/viewvc.php/trunk/?root=ecm"]this[/url] is, complete with lots of readme files.

Ralf Recker 2012-02-15 19:52

The usage is simple (type ./gpu_ecm -h for help). Try:

./gpu_ecm -n 1 11000 < in

B1 is the last parameter. gpu_ecm will run of course more than one curve anyway ;)

BTW: Prime numbers don't factor very well :smile:

ET_ 2012-02-15 20:04

[QUOTE=jasonp;289470]That directory was not the trunk, [url="https://gforge.inria.fr/scm/viewvc.php/trunk/?root=ecm"]this[/url] is, complete with lots of readme files.[/QUOTE]

Thank you Jason, I skimmed around there when I noticed I had some problem...

It wasn't at all a remark to your pointing. :smile:

Luigi

ET_ 2012-02-15 20:11

[QUOTE=Ralf Recker;289473]The usage is simple (type ./gpu_ecm -h for help). Try:

./gpu_ecm -n 1 11000 < in

B1 is the last parameter. gpu_ecm will run of course more than one curve anyway ;)

BTW: Prime numbers don't factor very well :smile:[/QUOTE]

Then we definitely have different versions :sad:

When I try that command I get a "Error in call function: wrong number of arguments." message.
The "usage" reports "./gpu_ecm N B1 [ -s firstsigma ] [ -n number of curves ] [ -d device ]".

Thank you anyway, I'm doing my own tests to get acquainted with the new wonderful program. :smile:

Luigi

Ralf Recker 2012-02-15 20:15

[QUOTE=ET_;289479]Then we definitely have different versions :sad:

When I try that command I get a "Error in call function: wrong number of arguments." message.
The "usage" reports "./gpu_ecm N B1 [ -s firstsigma ] [ -n number of curves ] [ -d device ]".

Thank you anyway, I'm doing my own tests to get acquainted with the new wonderful program. :smile:

Luigi[/QUOTE]
I used the program in the gpu_ecm subdirectory, not that in the gpu_ecm_cc13 subdirectory.

The command line for the other version would be:

./gpu_ecm 65798732165875434667 11000

but like I said: I would try to factor another number ;)

ET_ 2012-02-15 20:21

[QUOTE=Ralf Recker;289480]I used the program in the gpu_ecm subdirectory, not that in the gpu_ecm_cc13 subdirectory.[/QUOTE]

I guessed it when I noticed that your work was done on cc=2.0 ans cc=2.1, but for some reason I thought that the repository was updated with the same code, apart from the cc details...

Thank you all, I (think I) understood how the program works. I have been a bit naif in thinking that an alpha version should maintain the same user interface of the trunk. The problem was on my side, between the monitor and the chair... Next time I will turn on my gray cells before writing.

(and please excuse me for the multiple postings)

Luigi

Ralf Recker 2012-02-15 20:46

[QUOTE=ET_;289481]I guessed it when I noticed that your work was done on cc=2.0 ans cc=2.1, but for some reason I thought that the repository was updated with the same code, apart from the cc details...[/QUOTE]
You should be able to compile and run both versions on your CC 1.3 capable card (you can type make cc=2 in the gpu_ecm subdirectory, if you want a fermi build).

debrouxl 2012-02-15 21:18

I've switched to CC 2.0 compilation as well, and the default number of curves has raised from 32 to 64 - same change as xilman above.

I haven't yet seen a mention of non-power of 2 NB_DIGITS in this thread... therefore, I tried it, even if I have no idea whether it should work :smile:
Well, at least, it does not seem to fail horribly:
* the resulting executable doesn't crash;
* the size of the executable is between the size of the 512-bit version and the size of the 1024-bit version;
* on both a C211 and a C148, the 768-bit version is faster than the 1024-bit-arithmetic version:

[code]$ echo 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663 | ./gpu_ecm_24 -vv -save 76663_210_ecm24_3e6 3000000
#Compiled for a NVIDIA GPU with compute capability 2.0.
#Will use device 0 : GeForce GT 540M, compute capability 2.1, 2 MPs.
#s has 4328086 bits
Precomputation of s took 0.252s
Input number is 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663 (212 digits)
Using B1=3000000, firstinvd=435701810, with 64 curves
...
gpu_ecm took : 1444.690s (0.000+1444.686+0.004)
Throughput : 0.044

$ echo 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663 | ./gpu_ecm_32 -vv -save 76663_210_ecm32_3e6 3000000
...
gpu_ecm took : 1814.801s (0.000+1814.797+0.004)
Throughput : 0.035[/code]

[code]for i in 16 24 32; do echo 3068628376360794912078530386432442844396649484227245118385713667577336042284107359110543525586164007547649873239035755922916136752709773803297694127 | "./gpu_ecm_$i" -vv -save "80009_213_ecm${i}_3e6" 3000000; done
...
gpu_ecm took : 865.578s (0.000+865.574+0.004)
Throughput : 0.074
...
gpu_ecm took : 1707.302s (0.000+1707.298+0.004)
Throughput : 0.037
...
gpu_ecm took : 2044.451s (0.000+2044.447+0.004)
Throughput : 0.031
[/code]


Comparison against CPU GMP-ECM running on 1 hyperthread of a SandyBridge i7, whose other 7 hyperthreads are used to the max as well:
[code]$ echo 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663 | ecm -c 1 3e6
GMP-ECM 6.5-dev [configured with GMP 5.0.90, --enable-asm-redc, --enable-assert] [ECM]
Input number is 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663 (211 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=1718921992
Step 1 took 34590ms
Step 2 took 11536ms[/code]

[code]$ echo 3068628376360794912078530386432442844396649484227245118385713667577336042284107359110543525586164007547649873239035755922916136752709773803297694127 | ecm -c 1 3e6
GMP-ECM 6.5-dev [configured with GMP 5.0.90, --enable-asm-redc, --enable-assert] [ECM]
Input number is 3068628376360794912078530386432442844396649484227245118385713667577336042284107359110543525586164007547649873239035755922916136752709773803297694127 (148 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=3766168691

Step 1 took 21521ms
Step 2 took 8016ms[/code]

For composites of those sizes, the GT 540M can beat one hyperthread of i7-2670QM if the CPU is busy, but not if the CPU is idle.

frmky 2012-02-15 23:08

Looking at the source, does NB_DIGITS really need to be a power of two? I haven't thought carefully about the memory access patterns, but it doesn't seem to need to be. And does it really need to be a compile-time constant? If the answer to both is no, then the code can adjust NB_DIGITS to the minimum needed for a particular number. Without doing any profiling, for numbers much smaller than the max allowed for a particular NB_DIGITS, I suspect a lot of time is spent spinning in the comparison function.

firejuggler 2012-02-24 21:56

i must have done something wrong... (ubuntu 10.04 on a virtualbox, can't install nvidia driver)
[code]
./gpu_ecm 155 11000 -s 11 -n 5
#Compiled for a NVIDIA GPU with compute capability 1.3.
#Will use device -1 : �@, compute capability 18927808.0 (you should compile the program for this compute capability to be more efficient), 0 MPs.
#gpu_ecm launched with :
N=155
B1=11000
curves=5
firstsigma=11

#Begin GPU computation...
#All kernels launched, waiting for results...
#All kernels finished, analysing results...
#Looking for factors for the curves with sigma=11
xfin=4
zfin=1
xunif=4
#No factors found. You shoud try with a bigger B1.
#Looking for factors for the curves with sigma=12
xfin=132
zfin=1
xunif=132
#No factors found. You shoud try with a bigger B1.
#Looking for factors for the curves with sigma=13
xfin=153
zfin=1
xunif=153
#No factors found. You shoud try with a bigger B1.
#Looking for factors for the curves with sigma=14
xfin=1
zfin=1
xunif=1
#No factors found. You shoud try with a bigger B1.
#Looking for factors for the curves with sigma=15
xfin=80
zfin=1
xunif=80
#No factors found. You should try with a smaller B1
#Results : No factor found

#Temps gpu : 0.050 init&copy=0.000 computation=0.050
[/code]

how come zfin is always=1?

Cyril 2012-02-24 23:13

Hey

As said above the more recent GPU-ECM program is in the gpu_ecm subdirectory (and NOT gpu_ecm_cc13 even for cards of compute compatibility 1.3).

The NB_DIGITS stuff is still highly experimental and will most of the time either crash or return wrong results. Only the 1024 arithmetic (the default case) is, for now, working. But every feedbacks from experiments with NB_DIGITS are welcome.

Cyril

firejuggler 2012-02-25 04:14

1 Attachment(s)
I searched on the web a little more and I found that the virtualbox can't handle cuda. So I installed a proper ubuntu, and now it seem to work.
another problem I found is that a found factor can be repeated... multiple time

a short excerpt
running ./gpu_ecm -n 100 -vv 1000 < c50 >test.txt
[code]
#Compiled for a NVIDIA GPU with compute capability 2.0.
#Will use device 0 : GeForce GTX 560, compute capability 2.1, 7 MPs.
#s has 1438 bits
Precomputation of s took 0.000s
Input number is 35969183562720316973971642318240294003662279400539 (50 digits)
Using B1=1000, firstinvd=3505000919, with 96 curves
8+64*d=2290718356455159082183201030488769570180030093771
8+64*d=7718354218174420648608739771694329402521941399016
#Begin GPU computation...
Block: 32x32x1 Grid: 3x1x1

#Looking for factors for the curves with (d*2^32) mod N = 3505000919
xfin=29784982820960298529336575761344388540459270300965
zfin=34069908628801445601848672255228786699573682666789
********** Factor found in step 1: 78069
Found composite factor of 6 digits: 78069
Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits
Factor found with (d*2^32) mod N = 3505000919
xunif=29784982820960298529336575761344388540459270300965
********** Factor found in step 1: 78069
Found composite factor of 6 digits: 78069
Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits
Factor found with (d*2^32) mod N = 3505000920
********** Factor found in step 1: 3
Found probable prime factor of 1 digits: 3
Composite cofactor 11989727854240105657990547439413431334554093133513 has 50 digits
Factor found with (d*2^32) mod N = 3505000921
********** Factor found in step 1: 159
Found composite factor of 3 digits: 159
Composite cofactor 226221280268681238830010329045536440274605530821 has 48 digits

[snip]
[/snip]Factor found with (d*2^32) mod N = 3505001007
********** Factor found in step 1: 78069
Found composite factor of 6 digits: 78069
Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits
Factor found with (d*2^32) mod N = 3505001008
********** Factor found in step 1: 78069
Found composite factor of 6 digits: 78069
Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits
Factor found with (d*2^32) mod N = 3505001009
********** Factor found in step 1: 78069
Found composite factor of 6 digits: 78069
Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits
Factor found with (d*2^32) mod N = 3505001010
********** Factor found in step 1: 78069
Found composite factor of 6 digits: 78069
Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits
Factor found with (d*2^32) mod N = 3505001011
********** Factor found in step 1: 78069
Found composite factor of 6 digits: 78069
Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits
Factor found with (d*2^32) mod N = 3505001012
********** Factor found in step 1: 1473
Found composite factor of 4 digits: 1473
Composite cofactor 24418997666476793600795412300231021047971676443 has 47 digits
Factor found with (d*2^32) mod N = 3505001013

#Looking for factors for the curves with (d*2^32) mod N = 3505001014
xfin=27705706019293510342398168717956449375083997353432
zfin=7141519035842313136767879128655301632201227912190
********** Factor found in step 1: 159
Found composite factor of 3 digits: 159
Composite cofactor 226221280268681238830010329045536440274605530821 has 48 digits
Factor found with (d*2^32) mod N = 3505001014
xunif=27705706019293510342398168717956449375083997353432
gpu_ecm took : 0.800s (0.000+0.800+0.000)
Throughput : 120.000
[/code]Another question : is sigma random?

Batalov 2012-02-25 20:10

Small factors will be found with very many sigma values - and it doesn't matter if random or sequential values are used for sigma.
Prefactor your input number (with yafu, for example) until it will become interesting to ECM it.

xilman 2012-02-26 10:14

[QUOTE=Cyril;290764]Hey

As said above the more recent GPU-ECM program is in the gpu_ecm subdirectory (and NOT gpu_ecm_cc13 even for cards of compute compatibility 1.3).

The NB_DIGITS stuff is still highly experimental and will most of the time either crash or return wrong results. Only the 1024 arithmetic (the default case) is, for now, working. But every feedbacks from experiments with NB_DIGITS are welcome.

Cyril[/QUOTE]Hi Cyril, welcome aboard!

Thanks for the warnng. I'd not seen any difficulties myself with halving NB_DIGITS but it's good to know that I'm on dangerous ground.

Paul

rcv 2012-03-29 02:35

Can someone answer a theory question on the ECM algorithm, please.

I know it is normal to run curves one level at a time to remove small factors. But does the presence of small factors inhibit higher level curves from finding big factors.

Practically speaking, if gmp_ecm runs at the same speed for all numbers, regardless of whether they are 769 bits or 1018 bits, then could I just run a set of curves at B1=260e6, without bothering to remove the small factors.

For example, if I run (2^1009+1) through the program, at B1=260000000, nearly every curve will find factors. Are those small factors merely noise or do the small factors prevent the algorithm from finding factors of the big C244?

I'm talking ONLY stage 1, here.

Thanks in advance.

firejuggler 2012-03-29 03:18

the factor here is *time* if 1 curve take 10 sec at B1=11e3, it will take 1000 sec at B1=1e6 ( or close). I would rather do 200 curve at B1 = 11e3 and find *all* the 15 digits factor than 3 at B1=1e6 to find them. and with a large B1 you are more prone to find a *composite* factor you will have to ... factor

rcv 2012-03-29 04:04

[QUOTE=firejuggler;294599]the factor here is *time* ...[/QUOTE]
Thank you. I understand all that.

But, I'm really asking a deeper question about the theoretical behavior of the ECM algorithm.

jyb 2012-03-29 04:20

[QUOTE=rcv;294605]Thank you. I understand all that.

But, I'm really asking a deeper question about the theoretical behavior of the ECM algorithm.[/QUOTE]

I believe firejuggler's answer did address at least one aspect of the theoretical behavior you're talking about. If you run ECM with a high B1 when there are small factors, those small factors don't in any way prevent the algorithm from finding larger factors. But if it does find a large factor, it will probably also find the small factors, so the result of the algorithm will be a composite number which is the product of all the prime factors found.

In other words, it will find large factors with the same probability (per curve) as if there were no small factors. But it may not give them to you in a form you want.

And of course, as firejuggler pointed out, the total time you take to find factors will likely be much greater than if you first ran curves with lower B1 to find and divide out the small factors. But subject to your hypothetical ("if gmp_ecm runs at the same speed for all numbers, regardless of whether they are 769 bits or 1018 bits"), then it will find the large factors just as quickly whether or not the small factors are there. It just won't find them alone.

Does that answer your question adequately, or are you looking for something else?

rcv 2012-03-29 12:15

[QUOTE=jyb;294608]... it will find large factors with the same probability (per curve) as if there were no small factors. ...

Does that answer your question adequately, or are you looking for something else?[/QUOTE]
Thank you. That's *exactly* what I was asking.

xilman 2012-04-06 06:49

Result
 
[code]********** Factor found in step 1: 123606794672656155230910235277199616421317
Found probable prime factor of 42 digits: 123606794672656155230910235277199616421317
Probable prime cofactor 14866190468551593309306643619217939222105550143777577617032523708180871799254017557065876729519990261631832681060060230768948753357462444734554678109987149185755720805584735539302523326567649552404524657646651599763316335399327139435288373515331 has 245 digits
Factor found with (d*2^32) mod N = 1001097
[/code]The input number was the 286-digit cofactor of 296*11^296+1, aka GC(11,296). The factor was found on the 1098'th curve of a batch 1120.

Two things are especially noteworthy. You don't often see p42 factors found in stage 1, even with B1 as high as 110M. The other can be seen in this snippet if you know what you are looking for.
[code]time GPU: 209474.766s
time CPU: 716.798s (0.001+716.737+0.060)
Throughput: 0.005

real 3491m28.677s
user 8m59.778s
sys 3m4.908s
[/code]This was my first production run with gpu_ecm incorporating a patch provided by Rocke Verser (rcv of this parish) which replaced the old busy-wait behaviour. System cpu time fell from an expected 50+ hours to a little over three minutes. Very impressive in my view. Thanks Rocke.

Paul

R.D. Silverman 2012-04-06 13:40

[QUOTE=rcv;294605]Thank you. I understand all that.

But, I'm really asking a deeper question about the theoretical behavior of the ECM algorithm.[/QUOTE]

Read my joint paper with Sam Wagstaff: A Practical Analysis of ECM.

rcv 2012-04-06 22:45

[QUOTE=xilman;295541]Thanks Rocke.[/QUOTE]
You're welcome. It's not that I did anything great. It's more that NVIDIA's CUDA drivers are disrespectful of the CPU. When trying to keep the GPU busy, special care must be taken to prevent them from wasting an entire core in spin-loops. [I've now used the technique on three separate CUDA programs with significant reductions in CPU wastage in each case.]

BTW, Cyril also has the patch, and hints that it will be in a future release of his code.

[QUOTE=R.D. Silverman;295562]Read my joint paper with Sam Wagstaff: A Practical Analysis of ECM.[/QUOTE]
Thanks for the reference! For others who need a more complete reference, the paper was published in Mathematics of Computation, July 1993, v61, n203, pp 445-462. The paper is presently available online at [URL]http://www.ams.org/journals/mcom/1993-61-203/S0025-5718-1993-1122078-7/[/URL]

While a little dated (the largest ECM factors discovered at the time of its writing were 38 digits), I would recommend the paper to those interested in a derivation of the various parameters used by GMP-ECM and other factoring software.

While the paper is interesting, I fail to see that it directly answers my question. Table 3, for example, gives L, the expected number of curves to find a factor of a given size. However, the assumptions that lead to that derivation are not true in the theoretical question I was asking.

As with other papers and articles I have read concerning ECM, a basic assumption is that you factor out the small factors as you find them. My question was more "...but what happens if you don't remove the small factors?"

While not referring to my specific theoretical question, the second Remark on page 460 seems to acknowledge that analysis is difficult when the number being factored "does not have the same expected smoothness properties of some other arbitrary integer of about the same size."

I will reread the paper in more depth over the next few days.

debrouxl 2012-04-07 15:49

In SVN HEAD, the CPU usage was greatly reduced (thanks to rcv's patch, I guess), and the PRINT_REMAINING_ITER define is interesting for making the program's progress more observable - great :smile:


I've just made a trivial patch for adding timestamps to the output of GPU-ECM, which makes it easier (for me, at least) to estimate when the job will finish.
Here it is, in case someone else is interested:
[code]
Index: cudakernel.cu
===================================================================
--- cudakernel.cu (révision 1908)
+++ cudakernel.cu (copie de travail)
@@ -191,7 +191,14 @@
if (j < 1000000) jmod = 100000;
if (j < 100000) jmod = 10000;
if (j % jmod == 0)
- printf("%lu iterations to go\n", j);
+ {
+ char buf[32];
+ time_t curtime = time(NULL);
+
+ strcpy(buf, ctime(&curtime));
+ *(strchr(buf, '\n')) = 0;
+ printf("%s %lu iterations to go\n", buf, j);
+ }
#endif
}

[/code]
(unsurprisingly, the timestamping code is basically the same as in msieve)

ET_ 2012-04-20 15:02

No more cc1.3 for GPU-ECM
 
Is it true? :sad:

Luigi

xilman 2012-04-20 18:19

[QUOTE=ET_;296928]Is it true? :sad:

Luigi[/QUOTE]First I've heard of it. Have you mailed Cyril?

ET_ 2012-04-21 09:55

[QUOTE=xilman;296944]First I've heard of it. Have you mailed Cyril?[/QUOTE]

I read this advice yesterday in a README file located in the GPU-ECM trunk of the svn.
But the makefile still had the comment to switch compilation between cc1.3 and 2.0 (2.0 by default).

I asked here because it's the easiest thing to do for me, as I don't know the developers.

Luigi

Cyril 2012-04-24 15:35

[QUOTE=ET_;296928]Is it true? :sad:

Luigi[/QUOTE]

Yes it is. The CUDA code in the svn repository requires a GPU with compute capability at least 2.0 and the CUDA toolkit version should be at least 4.1.

Cyril

ET_ 2012-04-24 15:40

[QUOTE=Cyril;297208]Yes it is. The CUDA code in the svn repository requires a GPU with compute capability at least 2.0 and the CUDA toolkit version should be at least 4.1.

Cyril[/QUOTE]

Thank you for your answer Cyril.

It's time I get a new graphic board...

Luigi

xilman 2012-04-24 16:57

[QUOTE=Cyril;297208]Yes it is. The CUDA code in the svn repository requires a GPU with compute capability at least 2.0 and the CUDA toolkit version should be at least 4.1.

Cyril[/QUOTE]:sad:

Perhaps it's time I started adding more code to the reversion suppository.

I've a Tesla C1060 which is still a nice compute engine despite being CC 1.x

ET_ 2012-09-23 10:20

ECM mailing list
 
Sorry to pop out here again...

Is the development of GPU-ECM described in the GMP-ECM mailing list?

May I be (read-only) added to it, in case?

Luigi

WraithX 2012-09-23 21:34

[QUOTE=ET_;312502]Is the development of GPU-ECM described in the GMP-ECM mailing list?

May I be (read-only) added to it, in case?

Luigi[/QUOTE]

You can find the gmp-ecm web page here: [url]https://gforge.inria.fr/projects/ecm/[/url]
From there you can click the "Lists" link, and then select subscribe/unsubscribe link for the "ecm-discuss" list.
Also, from the "Lists" link, you can click on "ecm-discuss Archives" to view all previous messages sent to the list.

I would say that the discussion of GPU-ECM can be discussed there, but I don't see much discussion of it. The first/last burst of traffic about GPU-ECM looks to be around March 2012.

Also, a request for anyone capable. Can someone post a Windows x64 binary for GPU-ECM? I would really like to try running this, but I can't seem to build it on my own. I cannot get it to compile with MingW64 and my skills with Visual Studio is non-existent so I never seem to be able to build one there. I understand that this is considered beta/non-production code for now, but I would still like to try running it. Thanks to anyone who can help.

ET_ 2012-09-23 22:53

[QUOTE=WraithX;312572]You can find the gmp-ecm web page here: [url]https://gforge.inria.fr/projects/ecm/[/url]
From there you can click the "Lists" link, and then select subscribe/unsubscribe link for the "ecm-discuss" list.
Also, from the "Lists" link, you can click on "ecm-discuss Archives" to view all previous messages sent to the list.

I would say that the discussion of GPU-ECM can be discussed there, but I don't see much discussion of it. The first/last burst of traffic about GPU-ECM looks to be around March 2012.
[/QUOTE]

Thank you very much. :bow:

In fact I'd expect to see runs, tables and benchmarks here, as well as tests done...

Luigi

xilman 2012-12-24 12:03

Nailing bugs.
 
A very long standing problem with the GPU version on my machine is that the second test in "make check" fails by finding the input number. I returned to the issue today.

I've not fixed the bug but have characterized it better and probably have a work-around. The failure appears to be in how stage2 is set up after running stage1 on the GPU. If only stage1 is run and a save file created, that file can be used successfully to complete the factorization. The proposed work-around should now be obvious. When I better understand the stage1 to stage2 conversion routines I'll try to fix the bug properly. My best guess is that something is not being initialised properly from a default zero value.

Another trivial bug was also fixed in the latest SVN --- 2310 --- which prevented compilation under CUDA. Cyril has been informed about these developments.

Cyril 2013-01-04 15:04

[QUOTE=xilman;322498]A very long standing problem with the GPU version on my machine is that the second test in "make check" fails by finding the input number. I returned to the issue today.

I've not fixed the bug but have characterized it better and probably have a work-around. The failure appears to be in how stage2 is set up after running stage1 on the GPU. If only stage1 is run and a save file created, that file can be used successfully to complete the factorization. The proposed work-around should now be obvious. When I better understand the stage1 to stage2 conversion routines I'll try to fix the bug properly. My best guess is that something is not being initialised properly from a default zero value.

Another trivial bug was also fixed in the latest SVN --- 2310 --- which prevented compilation under CUDA. Cyril has been informed about these developments.[/QUOTE]

Indeed, I'm working on it. During the process of corecting theses bugs, I found out that the GPU code is not correct when compiled with CUDA 5.0 (at least on my machine) but it work when compiled with CUDA 4.2.

Cyril

Cyril 2013-01-10 09:25

After looking carefully at the assembly code produced by CUDA 5.0, I found a bug in CUDA 5.0 and report it to Nvidia people. They are looking into it. In the meantime [U][B]I recommend not to use CUDA 5.0 to compile the GPU version of GMP-ECM.[/B][/U] (if you manage to succefully run make check with a version compile with CUDA 5.0, i'll be interested to hear about it).

Cyril

ET_ 2013-01-10 12:52

I noticed that on [url]https://gforge.inria.fr/scm/viewvc.php/trunk/?root=ecm[/url] there is the ecm /trunk relative to version 6.4.2 of source code, while on [url]http://www.loria.fr/~zimmerma/records/ecmnet.html[/url] the stable version is still 6.2.2, [URL="https://gforge.inria.fr/frs/?group_id=135&release_id=7362#ecm-_6.4.3b-title-content"]This[/URL] page reports the source code of version 6.4.3 [U]without[/U] GPU extensions, and there are other precompiled executables over the forum.

What should I do if I need the complete 6.4.3 source code [U]with[/U] the GPU extensions?

Luigi

Cyril 2013-01-10 15:33

The last release is the 6.4.3 but it does not contain any GPU code. The GPU version of GMP-ECM is only, for now, in the development version of GMP-ECM. If you want to try it you should download the svn repository. But be aware that being a development version, it should not be used for important computations but only for test.

Cyril

ET_ 2013-01-10 15:41

[QUOTE=Cyril;324284]The last release is the 6.4.3 but it does not contain any GPU code. The GPU version of GMP-ECM is only, for now, in the development version of GMP-ECM. If you want to try it you should download the svn repository. But be aware that being a development version, it should not be used for important computations but only for test.

Cyril[/QUOTE]

Thank you Cyril, I understand.

Luigi

Cyril 2013-01-18 16:02

[QUOTE=Cyril;324247]After looking carefully at the assembly code produced by CUDA 5.0, I found a bug in CUDA 5.0 and report it to Nvidia people. They are looking into it. In the meantime [U][B]I recommend not to use CUDA 5.0 to compile the GPU version of GMP-ECM.[/B][/U] (if you manage to succefully run make check with a version compile with CUDA 5.0, i'll be interested to hear about it).

Cyril[/QUOTE]

From revision 2342, CUDA 5.0 can be used to compile GMP-ECM for GPU. The "bug" was that I use the carry flag inside assembly statement not protected by __volatile__. This did not raise any problem with CUDA 4.2, but was incorrect when compile with CUDA 5.0.

Cyril

Cyril 2013-02-12 16:57

[QUOTE=xilman;322498]A very long standing problem with the GPU version on my machine is that the second test in "make check" fails by finding the input number. I returned to the issue today.

I've not fixed the bug but have characterized it better and probably have a work-around. The failure appears to be in how stage2 is set up after running stage1 on the GPU. If only stage1 is run and a save file created, that file can be used successfully to complete the factorization. The proposed work-around should now be obvious. When I better understand the stage1 to stage2 conversion routines I'll try to fix the bug properly. My best guess is that something is not being initialised properly from a default zero value.

Another trivial bug was also fixed in the latest SVN --- 2310 --- which prevented compilation under CUDA. Cyril has been informed about these developments.[/QUOTE]

Can you try to run make check with the GPU code enable with SVN --- 2396 --- ? Does the error still happen ?

Cyril

ET_ 2013-03-06 18:01

[QUOTE=Cyril;329114]Can you try to run make check with the GPU code enable with SVN --- 2396 --- ? Does the error still happen ?

Cyril[/QUOTE]

I just did it, SVN 2478.

I'm afraid the bug is still there :sad:

It works fine when a factor is found in Step 1:

[code]
luigi@luigi-ubuntu:~/luigi/CUDA/gpu-ecm/trunk$ echo 2432902008176640001 | ./ecm -gpu -v 1000
GMP-ECM 7.0-dev [configured with MPIR 2.5.1, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Running on luigi-ubuntu
Input number is 2432902008176640001 (19 digits)
Using MODMULN [mulredc:0, sqrredc:0]
Computing batch product (of 1438 bits) of primes below B1=1000 took 0ms
GPU: compiled for a NVIDIA GPU with compute capability 2.0.
GPU: will use device 0: GeForce GTX 580, compute capability 2.0, 16 MPs.
GPU: Selection and initialization of the device took 116ms
Using B1=1000, B2=51606, sigma=3:2200948026-3:2200948537 (512 curves)
dF=32, k=6, d=240, d2=7, i0=-2
Expected number of curves to find a factor of n digits:
35 40 45 50 55 60 65 70 75 80
1.3e+11 Inf Inf Inf Inf Inf Inf Inf Inf Inf
Computing 512 Step 1 took 72ms of CPU time / 2528ms of GPU time
Throughput: 202.555 curves by second (on average 4.94ms by Step 1)
********** Factor found in step 1: 20639383
Found probable prime factor of 8 digits: 20639383
Probable prime cofactor 117876683047 has 12 digits
********** Factor found in step 1: 117876683047
Found input number N
[/code]

But when I lower the B1 parameter, I got this:

[code]
luigi@luigi-ubuntu:~/luigi/CUDA/gpu-ecm/trunk$ echo 2432902008176640001 | ./ecm -gpu 20
GMP-ECM 7.0-dev [configured with MPIR 2.5.1, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Input number is 2432902008176640001 (19 digits)
Using B1=20, B2=210, sigma=3:2107982176-3:2107982687 (512 curves)
Computing 512 Step 1 took 72ms of CPU time / 157ms of GPU time
********** Factor found in step 2: 2432902008176640001
Found input number N
[/code]

Luigi

mklasson 2013-04-27 16:08

If anyone is still looking for windows binaries of the gpu version I posted some at [url]http://www.mersenneforum.org/showpost.php?p=338530&postcount=9[/url].

lorgix 2013-04-27 18:37

[QUOTE=mklasson;338532]If anyone is still looking for windows binaries of the gpu version I posted some at [URL]http://www.mersenneforum.org/showpost.php?p=338530&postcount=9[/URL].[/QUOTE]
Thanks for the upload!

I'm having trouble.

It finds the input number in stage 2 sometimes, but it never finds a factor.
Rarely it will find a factor in stage 1. It doesn't seem to be using stage 2 right.
What am I doing wrong?

[CODE]gpu_ecm.exe -n -v -gpu -gpucurves 32 -one -c 32 11000 <factorme.txt >> log.txt
pause[/CODE]Finding input number in stg2:
[CODE]GMP-ECM 7.0-dev [configured with MPIR 2.6.0, --enable-gpu] [ECM]
Input number is (3^467-1)/2 (223 digits)
Using MODMULN [mulredc:0, sqrredc:1]
Computing batch product (of zu bits) of primes below B1=0 took 0ms
GPU: compiled for a NVIDIA GPU with compute capability 2.0.
GPU: will use device 0: GeForce GT 440, compute capability 2.1, 2 MPs.
GPU: Selection and initialization of the device took 0ms
Using B1=11000, B2=1873422, sigma=3:3900211286-3:3900211317 (32 curves)
dF=256, k=3, d=2310, d2=13, i0=-8
Expected number of curves to find a factor of n digits:
35 40 45 50 55 60 65 70 75 80
4298501 2.8e+008 2.2e+010 Inf Inf Inf Inf Inf Inf Inf
Computing 32 Step 1 took 93ms of CPU time / 2143ms of GPU time
Throughput: 14.934 curves by second (on average 66.96ms by Step 1)
Using 27 small primes for NTT
Estimated memory usage: 1800K
Initializing tables of differences for F took 0ms
Computing roots of F took 0ms
Building F from its roots took 16ms
Computing 1/F took 0ms
Initializing table of differences for G took 0ms
Computing roots of G took 0ms
Building G from its roots took 0ms
Computing roots of G took 0ms
Building G from its roots took 15ms
Computing G * H took 0ms
Reducing G * H mod F took 16ms
Computing roots of G took 0ms
Building G from its roots took 0ms
Computing G * H took 0ms
Reducing G * H mod F took 16ms
Computing polyeval(F,G) took 15ms
Computing product of all F(g_i) took 0ms
Step 2 took 78ms
********** Factor found in step 2: 3270362983146927377028671682671960437107912062834199545091118347495738861506779345734946890846481108479144446587334081489280966903453000172689482880397175599299632714862220456046073976859568442978416930175676229727557533293
Found input number N
[/CODE]Finding a factor in stg1:
[CODE]GMP-ECM 7.0-dev [configured with MPIR 2.6.0, --enable-gpu] [ECM]
Input number is (3^467-1)/2 (223 digits)
Using MODMULN [mulredc:0, sqrredc:1]
Computing batch product (of zu bits) of primes below B1=0 took 0ms
GPU: compiled for a NVIDIA GPU with compute capability 2.0.
GPU: will use device 0: GeForce GT 440, compute capability 2.1, 2 MPs.
GPU: Selection and initialization of the device took 0ms
Using B1=11000, B2=1873422, sigma=3:2535707131-3:2535707162 (32 curves)
dF=256, k=3, d=2310, d2=13, i0=-8
Expected number of curves to find a factor of n digits:
35 40 45 50 55 60 65 70 75 80
4298501 2.8e+008 2.2e+010 Inf Inf Inf Inf Inf Inf Inf
Computing 32 Step 1 took 140ms of CPU time / 2143ms of GPU time
Throughput: 14.933 curves by second (on average 66.97ms by Step 1)
********** Factor found in step 1: 27836167022857
Found probable prime factor of 14 digits: 27836167022857
Probable prime cofactor ((3^467-1)/2)/27836167022857 has 210 digits
[/CODE]

mklasson 2013-04-27 18:59

Oh, right, I noticed some full Ns in stage 2 as well, but figured I was just unlucky...

You might be right that there's some problem though. Alas, I have no idea if it's specific to my build, or how to fix it if it is.

ET_ 2013-04-29 07:39

[QUOTE=mklasson;338546]Oh, right, I noticed some full Ns in stage 2 as well, but figured I was just unlucky...

You might be right that there's some problem though. Alas, I have no idea if it's specific to my build, or how to fix it if it is.[/QUOTE]

It is definitely not specific to your build. I ran into the same "feature" with my Linux build, and told the thread about it.

Luigi

xilman 2013-04-29 08:17

[QUOTE=mklasson;338546]Oh, right, I noticed some full Ns in stage 2 as well, but figured I was just unlucky...

You might be right that there's some problem though. Alas, I have no idea if it's specific to my build, or how to fix it if it is.[/QUOTE]Run stage 1 on the gpu and create a save file. Continue from that file on the cpu and the factor should appear.

It's the workaround which seems to be the only way of getting factors right now. Cyril is aware of this problem and the workaround.


Paul

mklasson 2013-04-29 17:49

[QUOTE=xilman;338725]Run stage 1 on the gpu and create a save file. Continue from that file on the cpu and the factor should appear.[/QUOTE]

Ah, great, thanks!

xilman 2013-05-08 10:34

[QUOTE=mklasson;338762]Ah, great, thanks![/QUOTE]To clarify, a run has just finished here. I've snipped out the relevant lines so that you may see if you can reproduce them on your machine if you wish. The save file entry can be used as-is but if you want to try to reproduce it from a gpu run you will need to set sigma on the command line to the required value.[code]
echo 101482149355388048731487881935340331090889262212842924363713738989156240527153150278043500061215942991261583087754330974561238357811350557823367611213054527945175907871764674837850271132890282786375916516004587107515545577946886906206187414317583840986632315132836398937826193398451 | ecm -gpu -savea gpu.save -c 448 3000000 0 >> gpu.out
ecm -resume gpu.save 3000000 5706890290 > gpu2.out
[/code]The B2=5706890290 was taken from a preliminary run with ecm -v to discover what the default B2 should be.

Snip from gpu.save:[code]METHOD=ECM; PARAM=3; SIGMA=449286141; B1=3000000; N=101482149355388048731487881935340331090889262212842924363713738989156240527153150278043500061215942991261583087754330974561238357811350557823367611213054527945175907871764674837850271132890282786375916516004587107515545577946886906206187414317583840986632315132836398937826193398451; X=0xd94b0c9842b63bdc4ddf6fe836281864cf5e4a6e32f441c357219ee321da4ba0a1a42a7096ce7b9191ea65ede910405e3a2d602d2a23ae403873a49689270fb6493d768d98601a20cf387f9cd155a2429608c2beb9a55198eca823b9fc787cdaed067fd137c7e60f66b21771ef80e59a12c064748; CHECKSUM=937322733; PROGRAM=GMP-ECM 7.0-dev; X0=0x0; Y0=0x0; WHO=pcl@anubis.home.brnikat.com; TIME=Sat May 4 09:48:08 2013;[/code]Relevant portion of gpu2.out:[code]
Resuming ECM residue saved by pcl@anubis.home.brnikat.com with GMP-ECM 7.0-dev on Sat May 4 09:48:08 2013
Input number is 101482149355388048731487881935340331090889262212842924363713738989156240527153150278043500061215942991261583087754330974561238357811350557823367611213054527945175907871764674837850271132890282786375916516004587107515545577946886906206187414317583840986632315132836398937826193398451 (282 digits)
Using B1=3000000-3000000, B2=5706890290, polynomial Dickson(6), sigma=3:449286141
Step 1 took 0ms
Step 2 took 7098ms
********** Factor found in step 2: 445224571829374761288699666131147495198356727
Found probable prime factor of 45 digits: 445224571829374761288699666131147495198356727
Composite cofactor 227934745241957285881778043049404755009011820338900145654372601421491779248241846890768278353591918611426577107776860159203725842047017937949868936707925419578871783371101851788601523957788893378997332682158710312764704944917884656543013 has 237 digits
[/code] Good luck!

Paul

VBCurtis 2013-05-08 17:06

I achieve this by using -B2scale 0 on the gpu run, and then -resume with the same B1 on the CPU stage 2. ECM runs the default B2 by doing so.

ecm -resume savefile.txt 43e6 >> output.txt does the trick for me.

xilman 2013-05-08 18:24

[QUOTE=VBCurtis;339717]I achieve this by using -B2scale 0 on the gpu run, and then -resume with the same B1 on the CPU stage 2. ECM runs the default B2 by doing so.

ecm -resume savefile.txt 43e6 >> output.txt does the trick for me.[/QUOTE]Thanks for the tip. I'll try to remember it in future.

Paul

Ralf Recker 2013-07-07 11:07

FYI
 
The -save(a) option of current SVN HEAD generates savefiles with B1 set to 1.

Here is an example from a GPU run:

[CODE]METHOD=ECM; PARAM=3; SIGMA=1338409529; [B]B1=1[/B]; N=107337638919967483141063623365542229910680957563823617797446929412356389149831517148531981651634180847916212539333192454506616947630553940911141905872245222153540770799676352775350889317617472274748340543075085097852507186666728619014330952808580810310164700637173471479651231324428337639337605274423; X=0x80e2c39ebf6af056d56c5ca03fdb34173afd8189a7ae9c4e42291b3cc4e2dcfec3cb5b27f3d45e3c02cded29e571c7f00cb0d4cb536eb10e4a585e888b1ba37804d73c0d6dc715129091957eb888d831bc16417c30180be983c5fa89d77d0caac142d864eed7a09b14340a41c46dd30a49f4ce673ce450fb3bbca03d; CHECKSUM=1807052727; PROGRAM=GMP-ECM 7.0-dev; X0=0x0; Y0=0x0; WHO=ralf@quadriga; TIME=Sun Jul 7 12:54:06 2013;[/CODE]and from a CPU run:

[CODE]METHOD=ECM; PARAM=1; SIGMA=2099601580; [B]B1=1[/B]; N=107337638919967483141063623365542229910680957563823617797446929412356389149831517148531981651634180847916212539333192454506616947630553940911141905872245222153540770799676352775350889317617472274748340543075085097852507186666728619014330952808580810310164700637173471479651231324428337639337605274423; X=0xec9ba92e05563539a531b6eb6aaf42e3eb8c2c05ddd1b2bdf08d4908b5ceb4875a3ceb7a6a5f9046af2ba2f27aaca39d08c51cb927ae3ffabc682df4420515b2b354631183762317ea2c4a35a965d15f5c892c63daf8e97f672bc38f2f5268b39e0d14667b82d542bde08c6d72ccd602bbf0d7586d39b08992502d5f; CHECKSUM=1971712291; PROGRAM=GMP-ECM 7.0-dev; Y=0x0; X0=0x0; Y0=0x0; WHO=ralf@quadriga; TIME=Sun Jul 7 12:57:27 2013;[/CODE]B1 was set to 11e3 in both (test)cases. batch_last_B1_used contains the B1 value given in the command line.

B1done=1 when write_resumefile is called from main.c:1581

Here is a debugger output from the GPU run:

[CODE]#0 main (argc=2, argv=0x7fffffffe3b0) at main.c:1581
(gdb) display params
1: params = {{method = 0, x = {{_mp_alloc = 6955, _mp_size = 6954,
_mp_d = 0x7ec450}}, y = {{_mp_alloc = 1, _mp_size = 0,
_mp_d = 0x6da4c0}}, param = 3, sigma = {{_mp_alloc = 2, _mp_size = 1,
_mp_d = 0x6da4e0}}, sigma_is_A = 0, E = 0x6da300, go = {{
_mp_alloc = 1, _mp_size = 1, _mp_d = 0x6da5c0}}, [B]B1done = 1[/B], B2min = {{
_mp_alloc = 1, _mp_size = -1, _mp_d = 0x6da5e0}}, B2 = {{
_mp_alloc = 1, _mp_size = -1, _mp_d = 0x6da600}}, k = 0, S = 0,
repr = 0, nobase2step2 = 0, verbose = 1, os = 0x7ffff72c57a0,
es = 0x7ffff72c5880, chkfilename = 0x0, TreeFilename = 0x0, maxmem = 0,
stage1time = 0, rng = {{_mp_seed = {{_mp_alloc = 313, _mp_size = 32767,
_mp_d = 0x6da620}}, _mp_alg = GMP_RAND_ALG_DEFAULT, _mp_algdata = {
_mp_lc = 0x4af140}}}, use_ntt = 1,
stop_asap = 0x405a1c <stop_asap_test>, batch_s = {{_mp_alloc = 249,
_mp_size = 249, _mp_d = 0x6de290}}, [B]batch_last_B1_used = 11000[/B],
gpu = 1, gpu_device = -1, gpu_device_init = 1, gpu_number_of_curves = 448,
gw_k = 0, gw_b = 0, gw_n = 0, gw_c = 0}}
(gdb)
[/CODE]Trying to resume from this savefile(s) leads to an internal error:

[CODE]Error, x0 should be equal to 2 with this parametrization
Please report internal errors at <ecm-discuss@lists.gforge.inria.fr>.[/CODE]

wombatman 2013-07-28 16:48

Apologies for popping this thread up, but I was trying to compile the GPU-enabled GMP-ECM, and I can't seem to get it to happen despite multiple attempts. I've tried to get the VS2010 build up and running, but there are many issues there (can't find header files that it should be able, trouble with CUDA, etc.).

So I figured I would try and compile it under MinGW (32bit, Windows XP) since I've not really had many issues there. I pulled the latest SVN (v.2521) and tried to use

[CODE]./configure.in[/CODE]

and get

[CODE]./configure.in: line 1: syntax error near unexpected token `[ECM_VERSION_AC],'./configure.in: line 1: `m4_define([ECM_VERSION_AC], [7.0-dev])'[/CODE]

Any help with this would be appreciated. System specs listed below:

AMD Phenom II X4, Windows XP 32-bit, GTX570 (CC 2.0)

Thanks,
Ben

jasonp 2013-07-28 20:35

You don't run configure.in directly; these are inputs to the autotools, which you have to use manually on a repository checkout in order to be able to run the configure script. Release ditributions of GMP-ECM have done that for you already.

wombatman 2013-07-28 20:55

Thanks Jason. Another opportunity to learn more about compilation background. Off to play with autotools...


All times are UTC. The time now is 20:34.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.