mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GMP-ECM (https://www.mersenneforum.org/forumdisplay.php?f=55)
-   -   ECM for CUDA GPUs in latest GMP-ECM ? (https://www.mersenneforum.org/showthread.php?t=16480)

pinhodecarlos 2012-02-11 12:54

[QUOTE=xilman;289019]I screwed up computing the time per curve :redface:


1792 curves took 141 hours to run. I evaluated (1792 * 141 / 3600) to obtain the quoted figure of 70 seconds per curve.

The correct expression is (141 * 3600 / 1792), which evaluates to 283 seconds per curve.
Although this is four times worse than the initial figure, it is still 2.4 times faster than a singe core.

Sorry about that.[/QUOTE]

Paul, do you see cpu usage when running GPU-ECM?
BTW, I'm too lazy even to install linux...lol When I look back to the software I use on windows I just don't think I can use all of them in linux.

ATH 2012-02-11 16:39

[QUOTE=xilman;288917]If you have Linux you can build from the SVN sources as easily as I can.

The process really is very straightforward and you'll end up with something which doesn't carry the risk of the Linux equivalent of DLL-hell.[/QUOTE]

Can you post your compile options please? Then maybe I can figure out how to compile this in msys/mingw for "windoze".

xilman 2012-02-11 17:43

[QUOTE=ATH;289050]Can you post your compile options please? Then maybe I can figure out how to compile this in msys/mingw for "windoze".[/QUOTE]I don't use compile options [i]per se[/i], just configure and make.

You will almost certainly find life much much easier if you install VirtualBox or the like and then a Linux inside a virtual machine. Building GPM and GPM-ECM is then a complete doddle --- essentially a matter of saying "./configure; make ; make check ; make install" in the respective build directories. Once you've done that for each, you have everything you need --- working binaries which you can us as-is or use as a gold standard against which to check new builds, together with all the documentation, compile options, etc, which you can cut and paste into either the host environment or into other hosted machines.

Brain 2012-02-11 19:01

Makefile for CC13
 
1 Attachment(s)
Here's the currently available trunk makefile for CC13.

WraithX 2012-02-11 19:06

[QUOTE=xilman;289057]I don't use compile options [i]per se[/i], just configure and make.

Building GPM and GPM-ECM is then a complete doddle --- essentially a matter of saying "./configure; make ; make check ; make install" in the respective build directories.[/QUOTE]

I was wondering, inside the gpu directory is a makefile and there are also two other directories (gpu_ecm and gpu_ecm_cc13) that both have makefiles. In which directory, or directories, do you run make? In which directory do you create the binary that you are referencing?

Also, inside the gpu directories, I see no configure file. So, it's not "configure and make" that you run, it's just "make", correct?

If I had an nVidia video card, I would try this myself. However, I do not, so I will leave it to others to try.

xilman 2012-02-11 19:16

[QUOTE=WraithX;289062]I was wondering, inside the gpu directory is a makefile and there are also two other directories (gpu_ecm and gpu_ecm_cc13) that both have makefiles. In which directory, or directories, do you run make? In which directory do you create the binary that you are referencing?

Also, inside the gpu directories, I see no configure file. So, it's not "configure and make" that you run, it's just "make", correct?

If I had an nVidia video card, I would try this myself. However, I do not, so I will leave it to others to try.[/QUOTE]My main machine is a Fermi so I didn't even bother with the cc13 version.


To answer your other question: you should read README.dev in the trunk directory. I'm not being wilfully obtuse. You really should read how to configure the development code environment.

Once everything is in place, you do indeed just run make.

xilman 2012-02-11 19:24

[QUOTE=xilman;289064]I'm not being wilfully obtuse. You really should read how to configure the development code environment.[/QUOTE]In case it is not clear to bystanders, this code is [b]not[/b] fire and forget. It is [b]not[/b] production quality.

If you want to use it, you will need to get your hands dirty. I'm prepared to help as best I can [b]after[/b] you've followed the instructions in the svn distro and after you've made a sincere effort to get things working by yourself. I am not prepared to bottle-feed, to wipe noses or to change {nappies,diapers}.

That may sound harsh but it's the way the world of alpha-code development works and you'll need to get used to it if you want to play with the big boys and girls. Once you pass the audition you'll find most developers are very friendly and helpful.

Neither am I addressing these remarks to any particular individuals who may, or may not, have posted in this thread.

Paul

frmky 2012-02-11 22:34

I'm getting results slower than the cpu. I'm using a c144 (from the 4788 aliquoit sequence) on a Core i7 CPU and GTX 480 GPU:

[CODE]~/ecmtest$ ~/bin/ecm 11e6 < c144
GMP-ECM 6.5-dev [configured with GMP 5.0.4, --enable-asm-redc] [ECM]
Input number is 216210261026078873728038575619824007502275880651339130269087415140753033343108746166779571643387473335848998664028620971224681169067812545897739 (144 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=1115846
Step 1 took 35420ms
Step 2 took 13620ms

~/bin/gpu_ecm -n 256 -save test 11000000 < c144
Precomputation of s took 0.950s
Input number is 216210261026078873728038575619824007502275880651339130269087415140753033343108746166779571643387473335848998664028620971224681169067812545897739 (144 digits)
Using B1=11000000, firstinvd=724674352, with 256 curves
gpu_ecm took : 13144.730s (0.000+13144.720+0.010)
Throughput : 0.019

~/bin/gpu_ecm -n 480 -save test 11000000 < c144
Precomputation of s took 0.950s
Input number is 216210261026078873728038575619824007502275880651339130269087415140753033343108746166779571643387473335848998664028620971224681169067812545897739 (144 digits)
Using B1=11000000, firstinvd=1789558835, with 480 curves
gpu_ecm took : 24198.970s (0.000+24198.960+0.010)
Throughput : 0.020
[/CODE]
This GPU has 15 MP's, so gpu_ecm defaults to 480 curves, but that was only slightly faster than using 256 curves:
CPU: 35.4 s
GPU 256: 51.3 s
GPU 480: 50.4 s


Hmmm... Why do larger numbers take less time?
[CODE]
~/bin/gpu_ecm -n 480 11000 < c144
Precomputation of s took 0.000s
Input number is 216210261026078873728038575619824007502275880651339130269087415140753033343108746166779571643387473335848998664028620971224681169067812545897739 (144 digits)
Using B1=11000, firstinvd=1718283956, with 480 curves
gpu_ecm took : 24.260s (0.000+24.250+0.010)
Throughput : 19.786

~/bin/gpu_ecm -n 480 11000 < 10p332
Precomputation of s took 0.000s
Input number is 3082036244247618744713879350181267942494229636149227133619560368864804688115816966917438461372823837680425045410470575056718115654210704653050148781462686145415984611154261527877775921978501350266306075811598788040720480163782506686648165217270804627622798871662974986806951627082442232588805761 (295 digits)
Using B1=11000, firstinvd=412318627, with 480 curves
gpu_ecm took : 12.530s (0.000+12.520+0.010)
Throughput : 38.308
[/CODE]

jasonp 2012-02-12 02:19

Maybe having more multiprocessors means that larger blocks of work have to be given in a kernel launch.

frmky 2012-02-12 06:27

With the larger number 10,332+, gpu_ecm is indeed about 4x faster:
[CODE]~/bin/gpu_ecm -n 480 11000000 < 10p332
Precomputation of s took 0.950s
Input number is 3082036244247618744713879350181267942494229636149227133619560368864804688115816966917438461372823837680425045410470575056718115654210704653050148781462686145415984611154261527877775921978501350266306075811598788040720480163782506686648165217270804627622798871662974986806951627082442232588805761 (295 digits)
Using B1=11000000, firstinvd=197457519, with 480 curves
gpu_ecm took : 12460.500s (0.000+12460.490+0.010)
Throughput : 0.039
26 s/curve

~/bin/ecm 11e6 < 10p332
GMP-ECM 6.5-dev [configured with GMP 5.0.4, --enable-asm-redc] [ECM]
Input number is 3082036244247618744713879350181267942494229636149227133619560368864804688115816966917438461372823837680425045410470575056718115654210704653050148781462686145415984611154261527877775921978501350266306075811598788040720480163782506686648165217270804627622798871662974986806951627082442232588805761 (295 digits)
Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=770548151
Step 1 took 116030ms
Step 2 took 31240ms[/CODE]

debrouxl 2012-02-12 08:08

Indeed, it's much better with larger numbers, even on fairly low-end GPUs :smile:

[code]$ echo 77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777677777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777 | ./gpu_ecm -vv -n 64 -save 77677_149_3e6_1 3000000
#Compiled for a NVIDIA GPU with compute capability 1.3.
#Will use device 0 : GeForce GT 540M, compute capability 2.1, 2 MPs.
#s has 4328086 bits
Precomputation of s took 0.260s
Input number is 77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777677777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777 (299 digits)
Using B1=3000000, firstinvd=1956725845, with 64 curves
8+64*d=15748722851276397705078124999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999979751642048358917236328125000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000037
8+64*d=15748816728591918945312499999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999979751521348953247070312500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000037
#Begin GPU computation...
Block: 32x16x1 Grid: 4x1x1

#Looking for factors for the curves with (d*2^32) mod N = 1956725845
xfin=30793582623383249085792654048330071529605422286616239783953502163671251485785103766950295774964496127287193014815633605233749525283700023789095348986097253179578725011454469815600504523632074726379316756899242430619968238212335497266401636095557828925843807562412336497359441279441288718847426104007
zfin=15653600481320091921866091998449420910733335116855114761017777079039414436628501359278897983474928453087970457407357858920433473350010208701033703353460574476309980980321812580804654671402344055561817931400864403015931562744329996813634620940647894525755619616828739385722340718168011611105085982277
xunif=39408042568104336805270379492712518016456213440668263700932960319496692204712994493267826611404606523547436560316003809780479305567060943184017323896559838396801417755478902976203909415762293776133099854903949460128487164294408353102629988943560190937509387321676681877121514979954245436789912376565

#Looking for factors for the curves with (d*2^32) mod N = 1956725908
xfin=14966698215750072697023404424489655322417510285848104176664523355430448144237921356160942849117869928416482104860850242503535702178551676734700459600950072236295600757108345379537820143078621679600800366849565012827321265584610218377563003322400088365819158936957519419145860156952262705564563161722
zfin=13196899771013716409148933418583531970092448862012559897727436095137691137124639002299074611035125565787268265934160640735274655897627921875382930307151913405846752684117468279045581956406343173251902392466987748520166146927437149020702523884077620927133537101563815962380052471821344666329942845796
xunif=16137963010874506957647933426009827242074704110758480638668956781797848968933741211221737736217202543887736175366874915988459776979538810976607154822667437291905995042783104430898367764816031215094913761969808431783659168083194336651034563970539910660975759146036916968022060602536777108941718805587
gpu_ecm took : 1420.292s (0.000+1420.288+0.004)
Throughput : 0.045
(~22 s/curve)


$ echo 77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777677777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777 | ecm -c 1 3000000
GMP-ECM 6.5-dev [configured with GMP 5.0.90, --enable-asm-redc, --enable-assert] [ECM]
Input number is 77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777677777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777 (299 digits)
Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=2227022774
Step 1 took 58271ms
Step 2 took 17197ms[/code]


All times are UTC. The time now is 20:25.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.