ECM for CUDA GPUs in latest GMPECM ?
Greetings.
While working on gmpecm, I discovered a "gpu" folder inside latest svn. Can somebody shed some light on it? Is it a fullyworking gmpecm implementation ? Or partially gpuaccelerated ? Any info will do! 
[QUOTE=Karl M Johnson;287461]Greetings.
While working on gmpecm, I discovered a "gpu" folder inside latest svn. Can somebody shed some light on it? Is it a fullyworking gmpecm implementation ? Or partially gpuaccelerated ? Any info will do![/QUOTE]I'm playing with it. Not yet found a new factor with it. Current activity is to work out what's going on and then to see whether I can contribute to the effort. So far I've found areas where I may be able to help Cyril with development. So far, it is Stage 1 only (as one should expect) and each bit in the product of prime powers up to B1 requires a separate kernel call. It would be easy enough to make the entire sequence of elliptic curve arithmetic operations a single kernel but not obviously a good idea (think about it). The code currently uses fixed kilobit arithmetic and so is limited to factoring integers somewhat under that size. One of the areas I may be able to help is to add flexibility in that regard. Another is improve the underlying arithmetic primitives. A third is to reduce the (presently extortionate IMO) amount of cpu time used by busywaiting for the kernels to complete. Paul 
Oh, alright.
It cant substitute gmpecm yet. Stage2 on GPUs is not very possible(except Teslas) because it requires a lot of RAM ? 
[QUOTE=Karl M Johnson;287497]Oh, alright.
It cant substitute gmpecm yet.[/QUOTE]Actually it can. I just found my first factor with gpuecm :surprised:[code] Resuming ECM residue saved by pcl@anubis.home.brnikat.com with GPUECM 0.1 Input number is 27199999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 (275 digits) Using B1=1100000000, B2=110000000776278396540, polynomial Dickson(30), A=112777948379516601562499999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999998 Step 1 took 0ms Step 2 took 176954ms ********** Factor found in step 2: 7219650603145651593481420276356225303436099 Found probable prime factor of 43 digits: 7219650603145651593481420276356225303436099 Composite cofactor 3767495339476507669490528975999036413199084363632138583182977316467718682572689799651214383863597597569613063674073098348731470273762390945944125050829513627232967008803673015815350396060271580465332444072090424114938478394067646101 has 232 digits[/code] A total of 1792 curves were run at B1=110M and the factor found on the 1349th second stage. I'm running the remainder in case another factor can be found. Each stage one took 70 seconds on a GTX460. The latest ECM takes 679 seconds per stage 1 on a single core of a 1090T clocked at 3264MHz, so the GPU version is close to 10 times faster in this situation. Paul 
GPU ECM 0.1
Could anybody provide a (link to a) (Windows 64) binary for GPUECM 0.1?

[QUOTE=xilman;288906]Actually it can. I just found my first factor with gpuecm[/QUOTE]Surprise is no longer adequate, and I'm forced to resort to astonishment :shock:[code]
Using B1=1100000000, B2=110000000776278396540, polynomial Dickson(30), A=113233923912048339843749999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999998 Step 1 took 0ms Step 2 took 176408ms ********** Factor found in step 2: 2315784840580190375316972295830305082761 Found probable prime factor of 40 digits: 2315784840580190375316972295830305082761 Composite cofactor 11745478044145667127387304121681137574169838448099828222193156280305822325912582394902125137121744003039884215185197839759773264126893037178852537820600462333360764644042755609744438099938488940900666235577910910744169710591419476281159 has 236 digits[/code] took another 18 curves. If I was unlucky not to find the p43 earlier than this, given the amount of ECM work performed, I was especially unlucky not to find the p40. The candidate number, GW(10,272) had no previously known factors despite having had a complete t40 run through the ECMNET server here and clients around the world. The cofactor is now c193 so it's worth seeing whether the remaining curves will find anything. Paul 
[QUOTE=Brain;288909]Could anybody provide a (link to a) (Windows 64) binary for GPUECM 0.1?[/QUOTE]Not me.

[QUOTE=xilman;288913]Not me.[/QUOTE]
And for linux?:bow: 
[QUOTE=pinhodecarlos;288914]And for linux?:bow:[/QUOTE]If you have Linux you can build from the SVN sources as easily as I can.
The process really is very straightforward and you'll end up with something which doesn't carry the risk of the Linux equivalent of DLLhell. If you really are not lazy(*) enough to build your own, I could make available the binary I use. No guarantees that it will work, or even run, on any other Linux system. It almost certainly won't work optimally unless you have exactly the same environment as me. Paul * Sometimes it's much better to do some work ahead of time to remove the need to do much more work later. That's true laziness. 
Windows binary wanted:smile:

[QUOTE=xilman;288906]Actually it can. I just found my first factor with gpuecm :surprised:[code]
Resuming ECM residue saved by pcl@anubis.home.brnikat.com with GPUECM 0.1 Input number is 27199999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 (275 digits) Using B1=1100000000, B2=110000000776278396540, polynomial Dickson(30), A=112777948379516601562499999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999998 Step 1 took 0ms Step 2 took 176954ms ********** Factor found in step 2: 7219650603145651593481420276356225303436099 Found probable prime factor of 43 digits: 7219650603145651593481420276356225303436099 Composite cofactor 3767495339476507669490528975999036413199084363632138583182977316467718682572689799651214383863597597569613063674073098348731470273762390945944125050829513627232967008803673015815350396060271580465332444072090424114938478394067646101 has 232 digits[/code] A total of 1792 curves were run at B1=110M and the factor found on the 1349th second stage. I'm running the remainder in case another factor can be found. Each stage one took 70 seconds on a GTX460. The latest ECM takes 679 seconds per stage 1 on a single core of a 1090T clocked at 3264MHz, so the GPU version is close to 10 times faster in this situation. Paul[/QUOTE] Awesome. Is the code specific to a particular GPU? How portable is it? 
For Windows, it would have been nice if someone could downgrade the build/solution files vs2010 to vs2008 and post. For comparison, CUDA toolkit contains scripts for 2005, 2008, 2010  that is kinda user friendly. Not everyone has 2010. (Well, temporarily one can get the trial license.)

[QUOTE=Batalov;288969]For Windows, it would have been nice if someone could downgrade the build/solution files vs2010 to vs2008 and post. For comparison, CUDA toolkit contains scripts for 2005, 2008, 2010  that is kinda user friendly. Not everyone has 2010. (Well, temporarily one can get the trial license.)[/QUOTE]If you want it, why don't you do it?

[QUOTE=Batalov;288969]Not everyone has 2010. (Well, temporarily one can get the trial license.)[/QUOTE]
VS 2010 is free (the express edition). Yes, it is 32bit only, but you can get the Windows SDK for free which contains the 64bit compiler. 
[QUOTE=xilman;288971]If you want it, why don't you do it?[/QUOTE]
Because it is ugly! I am not talking about .sln files; for the .vcxproj > .vcproj conversion most of the internet based advices amount to 'you might be best served by using the "New project from existing code" wizard to build a new VC2008 project for the code rather than trying to convert the existing project.' It is best done by the authors who know their source and dependencies. I'll try the free version. 
[QUOTE=R.D. Silverman;288966]Is the code specific to a particular GPU? How portable is it?[/QUOTE]
It uses CUDA, thus requires an nVidia GPU supporting CC 1.3 or higher. 
[QUOTE=xilman;288906]Each stage one took 70 seconds on a GTX460.[/QUOTE]
The GTX 460 has 7 MP's. Did you do 224 curves in parallel or more? Are there memory latencies that are hidden by doing more? 
On my GT540M (admittedly a fairly lowend model, with 2 MPs), under Debian unstable x86_64, gpu_ecm with 64 curves in parallel seems to be somewhat slower than CPUbased stage 1 (tuned GMPECM binary) running on Core i72670QM @ 2.2 GHz.
I have tested B1 bounds from 5e4 to 16e6, and 32, 64 or 128 parallel curves. 32 curves has throughput markedly slower than 64, but 128 is hardly better than 64 for throughput. 
[QUOTE=frmky;289014]The GTX 460 has 7 MP's. Did you do 224 curves in parallel or more? Are there memory latencies that are hidden by doing more?[/QUOTE]The first 896 curves were done in four batches of 224  the default. The second ran all in parallel with a block of 32x32 and grid of 70x1x1. I seem to have lost the detailed timing information for the earlier curves :sad: As best I recall, running 896 took slightly less than running 224 four times but I could be quite wrong.

Big oops.
I screwed up computing the time per curve :redface:
[QUOTE=xilman;288906]Each stage one took 70 seconds on a GTX460. The latest ECM takes 679 seconds per stage 1 on a single core of a 1090T clocked at 3264MHz, so the GPU version is close to 10 times faster in this situation.[/QUOTE] 1792 curves took 141 hours to run. I evaluated (1792 * 141 / 3600) to obtain the quoted figure of 70 seconds per curve. The correct expression is (141 * 3600 / 1792), which evaluates to 283 seconds per curve. Although this is four times worse than the initial figure, it is still 2.4 times faster than a singe core. Sorry about that. 
[QUOTE=frmky;289013]It uses CUDA, thus requires an nVidia GPU supporting CC 1.3 or higher.[/QUOTE]
DP FP calculations are being performed ? 
No, it looks to be integer only. Compute Capability 1.3 also has advanced synchronization primitives that the CC 1.3 branch uses. The code is [url="https://gforge.inria.fr/scm/viewvc.php/trunk/gpu/?root=ecm"]here[/url] for anyone who's interested.

[QUOTE=xilman;289019]I screwed up computing the time per curve :redface:
1792 curves took 141 hours to run. I evaluated (1792 * 141 / 3600) to obtain the quoted figure of 70 seconds per curve. The correct expression is (141 * 3600 / 1792), which evaluates to 283 seconds per curve. Although this is four times worse than the initial figure, it is still 2.4 times faster than a singe core. Sorry about that.[/QUOTE] Paul, do you see cpu usage when running GPUECM? BTW, I'm too lazy even to install linux...lol When I look back to the software I use on windows I just don't think I can use all of them in linux. 
[QUOTE=xilman;288917]If you have Linux you can build from the SVN sources as easily as I can.
The process really is very straightforward and you'll end up with something which doesn't carry the risk of the Linux equivalent of DLLhell.[/QUOTE] Can you post your compile options please? Then maybe I can figure out how to compile this in msys/mingw for "windoze". 
[QUOTE=ATH;289050]Can you post your compile options please? Then maybe I can figure out how to compile this in msys/mingw for "windoze".[/QUOTE]I don't use compile options [i]per se[/i], just configure and make.
You will almost certainly find life much much easier if you install VirtualBox or the like and then a Linux inside a virtual machine. Building GPM and GPMECM is then a complete doddle  essentially a matter of saying "./configure; make ; make check ; make install" in the respective build directories. Once you've done that for each, you have everything you need  working binaries which you can us asis or use as a gold standard against which to check new builds, together with all the documentation, compile options, etc, which you can cut and paste into either the host environment or into other hosted machines. 
Makefile for CC13
1 Attachment(s)
Here's the currently available trunk makefile for CC13.

[QUOTE=xilman;289057]I don't use compile options [i]per se[/i], just configure and make.
Building GPM and GPMECM is then a complete doddle  essentially a matter of saying "./configure; make ; make check ; make install" in the respective build directories.[/QUOTE] I was wondering, inside the gpu directory is a makefile and there are also two other directories (gpu_ecm and gpu_ecm_cc13) that both have makefiles. In which directory, or directories, do you run make? In which directory do you create the binary that you are referencing? Also, inside the gpu directories, I see no configure file. So, it's not "configure and make" that you run, it's just "make", correct? If I had an nVidia video card, I would try this myself. However, I do not, so I will leave it to others to try. 
[QUOTE=WraithX;289062]I was wondering, inside the gpu directory is a makefile and there are also two other directories (gpu_ecm and gpu_ecm_cc13) that both have makefiles. In which directory, or directories, do you run make? In which directory do you create the binary that you are referencing?
Also, inside the gpu directories, I see no configure file. So, it's not "configure and make" that you run, it's just "make", correct? If I had an nVidia video card, I would try this myself. However, I do not, so I will leave it to others to try.[/QUOTE]My main machine is a Fermi so I didn't even bother with the cc13 version. To answer your other question: you should read README.dev in the trunk directory. I'm not being wilfully obtuse. You really should read how to configure the development code environment. Once everything is in place, you do indeed just run make. 
[QUOTE=xilman;289064]I'm not being wilfully obtuse. You really should read how to configure the development code environment.[/QUOTE]In case it is not clear to bystanders, this code is [b]not[/b] fire and forget. It is [b]not[/b] production quality.
If you want to use it, you will need to get your hands dirty. I'm prepared to help as best I can [b]after[/b] you've followed the instructions in the svn distro and after you've made a sincere effort to get things working by yourself. I am not prepared to bottlefeed, to wipe noses or to change {nappies,diapers}. That may sound harsh but it's the way the world of alphacode development works and you'll need to get used to it if you want to play with the big boys and girls. Once you pass the audition you'll find most developers are very friendly and helpful. Neither am I addressing these remarks to any particular individuals who may, or may not, have posted in this thread. Paul 
I'm getting results slower than the cpu. I'm using a c144 (from the 4788 aliquoit sequence) on a Core i7 CPU and GTX 480 GPU:
[CODE]~/ecmtest$ ~/bin/ecm 11e6 < c144 GMPECM 6.5dev [configured with GMP 5.0.4, enableasmredc] [ECM] Input number is 216210261026078873728038575619824007502275880651339130269087415140753033343108746166779571643387473335848998664028620971224681169067812545897739 (144 digits) Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=1115846 Step 1 took 35420ms Step 2 took 13620ms ~/bin/gpu_ecm n 256 save test 11000000 < c144 Precomputation of s took 0.950s Input number is 216210261026078873728038575619824007502275880651339130269087415140753033343108746166779571643387473335848998664028620971224681169067812545897739 (144 digits) Using B1=11000000, firstinvd=724674352, with 256 curves gpu_ecm took : 13144.730s (0.000+13144.720+0.010) Throughput : 0.019 ~/bin/gpu_ecm n 480 save test 11000000 < c144 Precomputation of s took 0.950s Input number is 216210261026078873728038575619824007502275880651339130269087415140753033343108746166779571643387473335848998664028620971224681169067812545897739 (144 digits) Using B1=11000000, firstinvd=1789558835, with 480 curves gpu_ecm took : 24198.970s (0.000+24198.960+0.010) Throughput : 0.020 [/CODE] This GPU has 15 MP's, so gpu_ecm defaults to 480 curves, but that was only slightly faster than using 256 curves: CPU: 35.4 s GPU 256: 51.3 s GPU 480: 50.4 s Hmmm... Why do larger numbers take less time? [CODE] ~/bin/gpu_ecm n 480 11000 < c144 Precomputation of s took 0.000s Input number is 216210261026078873728038575619824007502275880651339130269087415140753033343108746166779571643387473335848998664028620971224681169067812545897739 (144 digits) Using B1=11000, firstinvd=1718283956, with 480 curves gpu_ecm took : 24.260s (0.000+24.250+0.010) Throughput : 19.786 ~/bin/gpu_ecm n 480 11000 < 10p332 Precomputation of s took 0.000s Input number is 3082036244247618744713879350181267942494229636149227133619560368864804688115816966917438461372823837680425045410470575056718115654210704653050148781462686145415984611154261527877775921978501350266306075811598788040720480163782506686648165217270804627622798871662974986806951627082442232588805761 (295 digits) Using B1=11000, firstinvd=412318627, with 480 curves gpu_ecm took : 12.530s (0.000+12.520+0.010) Throughput : 38.308 [/CODE] 
Maybe having more multiprocessors means that larger blocks of work have to be given in a kernel launch.

With the larger number 10,332+, gpu_ecm is indeed about 4x faster:
[CODE]~/bin/gpu_ecm n 480 11000000 < 10p332 Precomputation of s took 0.950s Input number is 3082036244247618744713879350181267942494229636149227133619560368864804688115816966917438461372823837680425045410470575056718115654210704653050148781462686145415984611154261527877775921978501350266306075811598788040720480163782506686648165217270804627622798871662974986806951627082442232588805761 (295 digits) Using B1=11000000, firstinvd=197457519, with 480 curves gpu_ecm took : 12460.500s (0.000+12460.490+0.010) Throughput : 0.039 26 s/curve ~/bin/ecm 11e6 < 10p332 GMPECM 6.5dev [configured with GMP 5.0.4, enableasmredc] [ECM] Input number is 3082036244247618744713879350181267942494229636149227133619560368864804688115816966917438461372823837680425045410470575056718115654210704653050148781462686145415984611154261527877775921978501350266306075811598788040720480163782506686648165217270804627622798871662974986806951627082442232588805761 (295 digits) Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=770548151 Step 1 took 116030ms Step 2 took 31240ms[/CODE] 
Indeed, it's much better with larger numbers, even on fairly lowend GPUs :smile:
[code]$ echo 77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777677777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777  ./gpu_ecm vv n 64 save 77677_149_3e6_1 3000000 #Compiled for a NVIDIA GPU with compute capability 1.3. #Will use device 0 : GeForce GT 540M, compute capability 2.1, 2 MPs. #s has 4328086 bits Precomputation of s took 0.260s Input number is 77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777677777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777 (299 digits) Using B1=3000000, firstinvd=1956725845, with 64 curves 8+64*d=15748722851276397705078124999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999979751642048358917236328125000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000037 8+64*d=15748816728591918945312499999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999979751521348953247070312500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000037 #Begin GPU computation... Block: 32x16x1 Grid: 4x1x1 #Looking for factors for the curves with (d*2^32) mod N = 1956725845 xfin=30793582623383249085792654048330071529605422286616239783953502163671251485785103766950295774964496127287193014815633605233749525283700023789095348986097253179578725011454469815600504523632074726379316756899242430619968238212335497266401636095557828925843807562412336497359441279441288718847426104007 zfin=15653600481320091921866091998449420910733335116855114761017777079039414436628501359278897983474928453087970457407357858920433473350010208701033703353460574476309980980321812580804654671402344055561817931400864403015931562744329996813634620940647894525755619616828739385722340718168011611105085982277 xunif=39408042568104336805270379492712518016456213440668263700932960319496692204712994493267826611404606523547436560316003809780479305567060943184017323896559838396801417755478902976203909415762293776133099854903949460128487164294408353102629988943560190937509387321676681877121514979954245436789912376565 #Looking for factors for the curves with (d*2^32) mod N = 1956725908 xfin=14966698215750072697023404424489655322417510285848104176664523355430448144237921356160942849117869928416482104860850242503535702178551676734700459600950072236295600757108345379537820143078621679600800366849565012827321265584610218377563003322400088365819158936957519419145860156952262705564563161722 zfin=13196899771013716409148933418583531970092448862012559897727436095137691137124639002299074611035125565787268265934160640735274655897627921875382930307151913405846752684117468279045581956406343173251902392466987748520166146927437149020702523884077620927133537101563815962380052471821344666329942845796 xunif=16137963010874506957647933426009827242074704110758480638668956781797848968933741211221737736217202543887736175366874915988459776979538810976607154822667437291905995042783104430898367764816031215094913761969808431783659168083194336651034563970539910660975759146036916968022060602536777108941718805587 gpu_ecm took : 1420.292s (0.000+1420.288+0.004) Throughput : 0.045 (~22 s/curve) $ echo 77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777677777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777  ecm c 1 3000000 GMPECM 6.5dev [configured with GMP 5.0.90, enableasmredc, enableassert] [ECM] Input number is 77777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777677777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777777 (299 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=2227022774 Step 1 took 58271ms Step 2 took 17197ms[/code] 
Another data point, for numbers between C144 and C29x: C237 is slower on the GPU, but obviously faster on the CPU, than C29x:
[code]$ echo 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329  ./gpu_ecm vv n 64 save 80009_248_3e6_1 3000000 #Compiled for a NVIDIA GPU with compute capability 1.3. #Will use device 0 : GeForce GT 540M, compute capability 2.1, 2 MPs. #s has 4328086 bits Precomputation of s took 0.256s Input number is 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 (237 digits) Using B1=3000000, firstinvd=563947071, with 64 curves [snip] gpu_ecm took : 1637.614s (0.000+1637.610+0.004) Throughput : 0.039 $ echo 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329  ./ecm c 1 3000000 bash: ./ecm: Aucun fichier ou dossier de ce type debrouxl@asus2:~/ecm/gpu/gpu_ecm$ echo 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329  ecm c 1 3000000 GMPECM 6.5dev [configured with GMP 5.0.90, enableasmredc, enableassert] [ECM] Input number is 472367364481324943429608990380363865230376899949857658144588096283146783114430372207621802600829155058766951167631153619328587819346877117165453306904995816614534365740792256712736351604580048562330248528078693598071309876495244264859329 (237 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=379651352 Step 1 took 42974ms Step 2 took 12981ms[/code] On that number, the core busy with gpu_ecm is spending a bit more than half of its time in "system" state. I guess that it's what xilman mentioned above ? [quote]A third is to reduce the (presently extortionate IMO) amount of cpu time used by busywaiting for the kernels to complete.[/quote] 
[QUOTE=xilman;289019]I screwed up computing the time per curve :redface:
1792 curves took 141 hours to run. I evaluated (1792 * 141 / 3600) to obtain the quoted figure of 70 seconds per curve. The correct expression is (141 * 3600 / 1792), which evaluates to 283 seconds per curve. Although this is four times worse than the initial figure, it is still 2.4 times faster than a singe core. Sorry about that.[/QUOTE] Still a very interesting development. Congrats on the factors. 
Since my first experiments, I've been playing with a version which uses 512bit arithmetic (fudged with CFLAGS+=DNB_DIGITS=16 in the relevant line of Makefile). As expected, ECM runs around 3 times faster on ~500 bit numbers with this change.
One of the things on my todo list is to add greater flexibility to the choice of bignum sizes. Experiments with both 1024 and 512bit arithmetic indicate that running more than the default number of curves is a Good Thing, presumably by hiding memory latency. The downside, of course, is that the display stays rather sluggish for a proportionately long time. I'm trying to estimate how long a run will take and then kick it off overnight when display latency is likely to be unimportant. Paul 
[QUOTE=xilman;289134]I'm trying to estimate how long a run will take and then kick it off overnight when display latency is likely to be unimportant.[/QUOTE]
I added a percent complete counter in the for loop launching the kernels in cudautil.cu. I don't think adding an ETA would be difficult. 
[QUOTE=xilman;289065]In case it is not clear to bystanders, this code is [b]not[/b] fire and forget. It is [b]not[/b] production quality.
If you want to use it, you will need to get your hands dirty. I'm prepared to help as best I can [b]after[/b] you've followed the instructions in the svn distro and after you've made a sincere effort to get things working by yourself. I am not prepared to bottlefeed, to wipe noses or to change {nappies,diapers}. That may sound harsh but it's the way the world of alphacode development works and you'll need to get used to it if you want to play with the big boys and girls. Once you pass the audition you'll find most developers are very friendly and helpful. Neither am I addressing these remarks to any particular individuals who may, or may not, have posted in this thread. Paul[/QUOTE] I totally agree with you. However, allow me to point out that when I present a similar attitude toward the learning of the algorithms discussed herein and the mathematics behind them, I am lambasted for my efforts. Participants should be willing to put in the effort or they should leave. 
[QUOTE=R.D. Silverman;289203]I totally agree with you.
However, allow me to point out that when I present a similar attitude toward the learning of the algorithms discussed herein and the mathematics behind them, I am lambasted for my efforts. Participants should be willing to put in the effort or they should leave.[/QUOTE]It seems to me that one difference is that there is a large amount of fireandforget code available and that code is suited to the majority of the people here. [b]Only[/b] those who prepared for all the frustrations of working at the bleeding edge have any great need to be able to build, debug and install alpha code from a subversion repository. Much of the mathematics discussed here is [b]not[/b] at the bleeding edge, IMO. It is closer in spirit to ofttimes cranky but nonetheless well understood and supported applications such as mainstream gmpecm. IMO, your diatribes against those wishing to perform bleeding edge mathematics are fully justified. They are less appropriate, again IMO, further away from the bleeding edge. I hope I would never feel the urge to issue my earlier warnings to those who only wish to use gmpecm and are confused by its jargon and multitudinous options. 
[QUOTE=xilman;289218]It seems to me that one difference is that there is a large amount of fireandforget code available and that code is suited to the majority of the people here. [b]Only[/b] those who prepared for all the frustrations of working at the bleeding edge have any great need to be able to build, debug and install alpha code from a subversion repository.
[/QUOTE] We agree. Indeed. I have even heard one of the people (whom I hold in contempt) admit that he does not even know how to use a compiler. [QUOTE] Much of the mathematics discussed here is [b]not[/b] at the bleeding edge, IMO. It is closer in spirit to ofttimes cranky but nonetheless well understood and supported applications such as mainstream gmpecm. [/QUOTE] And from my point of view too many of the participants herein do not understand things even at that level. Nor do they seem willing to make the attempt. They don't even understand mathematics that was known 150+ years ago. Nor do they want to make the effort. 
[QUOTE=xilman;289134]Experiments with both 1024 and 512bit arithmetic indicate that running more than the default number of curves is a Good Thing[/QUOTE]Another data point shows that even choosing the correct default number is significant.
Out of the box (well, my box anyway) the default build appears to use parameters suitable for a CC1.3 system, despite there being a Fermi card installed. A run on a C302 with these parameters chooses 112 curves arranged 32x16 x 7x1x1 and takes 3845.428 seconds. Rebuilding with "make cc=2" and rerunning took 5539.049 seconds for 224 curves arranged 32x32 x 7x1x1. The ratio (224/112) * (3845.428 / 5539.049) is 1.388. I suggest a 39% speedup is worth having. 
A few quick tests with a small B1 value
CC 2.0 card (GTX 470, stock clocks), 512 bit arithmetic, CUDA SDK 4.0. The c151 was taken from the Aliquot sequence 890460:i898
[CODE]ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm d 0 save c151.save 250000 < c151 Precomputation of s took 0.004s Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, firstinvd=24351435, with 448 curves gpu_ecm took : 116.363s (0.000+116.355+0.008) Throughput : 3.850[/CODE]Doubling the number of curves improves the throughput: [CODE]ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm d 0 n 896 save c151.save 250000 < c151 Precomputation of s took 0.004s Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, firstinvd=1471710578, with 896 curves gpu_ecm took : 179.747s (0.000+179.731+0.016) Throughput : 4.985[/CODE]32 curves less and the throughput increases by another 30% [CODE]ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm d 0 n 864 save c151.save 250000 < c151 Precomputation of s took 0.004s Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, firstinvd=1374804691, with 864 curves gpu_ecm took : 130.964s (0.000+130.948+0.016) Throughput : 6.597 [/CODE]The throughput on a CC 2.1 card (GTX 460, 725 MHz factory OC) for the same number: [CODE] 224 curves  Throughput : 2.289 416 curves  Throughput : 4.223 448 curves  Throughput : 4.547 480 curves  Throughput : 3.039 672 curves  Throughput : 4.233 896 curves  Throughput : 4.638 1792 curves  Throughput : 4.753[/CODE] 
gpu_ecm ready to work
OK, I downloaded the source code with cc=1.3, and successfully compiled it :smile:
Sadly, I see differences between the Xilman and Ralf Recker outputs. The executable passes the test. What represents the (needed) parameter N in the command line? All I can see is that it has to do with the xfin, zfin and xunif parameters, and should be odd... I also tried ./gpu_ecm 9699691 11000 n 1 <in where in contains the number 65798732165875434667. I got the factor 347 that is not a factor of the number in input... To testify my good will: [code] ./gpu_ecm 9699691 11000 n 1 <in #Compiled for a NVIDIA GPU with compute capability 1.3. #Will use device 0 : GeForce GTX 275, compute capability 1.3, 30 MPs. #gpu_ecm launched with : N=9699691 B1=11000 curves=1 firstsigma=11 #used seed 1329332970 to generate sigma #Begin GPU computation... #All kernels launched, waiting for results... #All kernels finished, analysing results... #Looking for factors for the curves with sigma=11 xfin=3111202 zfin=7720056 #Factor found : 347 (with z) #Results : 1 factor found #Temps gpu : 15.080 init©=0.040 computation=15.040 [/code] Now, I understand that the program is not "fire ad forget", and I would really, REALLY know more about it, but the interface is not documented, the use of gmpecm is different and in the link posted by Jason there is no indication that a README file is present anywhere in the trunk. Would you mind (now that my hands have been contaminated by bits and compilers) shedding some light to this obscure valley? Even a link explaining what N means in this context would suffice... :smile: Many thanks... Luigi P.S. after some more fiddling, I noticed that 347 is a factor of 9699691, so I think I got the meaning of N after all... :redface: With N3 and 448 curves, my GTX275 has the same speed of my Intel I5750. 
That directory was not the trunk, [url="https://gforge.inria.fr/scm/viewvc.php/trunk/?root=ecm"]this[/url] is, complete with lots of readme files.

The usage is simple (type ./gpu_ecm h for help). Try:
./gpu_ecm n 1 11000 < in B1 is the last parameter. gpu_ecm will run of course more than one curve anyway ;) BTW: Prime numbers don't factor very well :smile: 
[QUOTE=jasonp;289470]That directory was not the trunk, [url="https://gforge.inria.fr/scm/viewvc.php/trunk/?root=ecm"]this[/url] is, complete with lots of readme files.[/QUOTE]
Thank you Jason, I skimmed around there when I noticed I had some problem... It wasn't at all a remark to your pointing. :smile: Luigi 
[QUOTE=Ralf Recker;289473]The usage is simple (type ./gpu_ecm h for help). Try:
./gpu_ecm n 1 11000 < in B1 is the last parameter. gpu_ecm will run of course more than one curve anyway ;) BTW: Prime numbers don't factor very well :smile:[/QUOTE] Then we definitely have different versions :sad: When I try that command I get a "Error in call function: wrong number of arguments." message. The "usage" reports "./gpu_ecm N B1 [ s firstsigma ] [ n number of curves ] [ d device ]". Thank you anyway, I'm doing my own tests to get acquainted with the new wonderful program. :smile: Luigi 
[QUOTE=ET_;289479]Then we definitely have different versions :sad:
When I try that command I get a "Error in call function: wrong number of arguments." message. The "usage" reports "./gpu_ecm N B1 [ s firstsigma ] [ n number of curves ] [ d device ]". Thank you anyway, I'm doing my own tests to get acquainted with the new wonderful program. :smile: Luigi[/QUOTE] I used the program in the gpu_ecm subdirectory, not that in the gpu_ecm_cc13 subdirectory. The command line for the other version would be: ./gpu_ecm 65798732165875434667 11000 but like I said: I would try to factor another number ;) 
[QUOTE=Ralf Recker;289480]I used the program in the gpu_ecm subdirectory, not that in the gpu_ecm_cc13 subdirectory.[/QUOTE]
I guessed it when I noticed that your work was done on cc=2.0 ans cc=2.1, but for some reason I thought that the repository was updated with the same code, apart from the cc details... Thank you all, I (think I) understood how the program works. I have been a bit naif in thinking that an alpha version should maintain the same user interface of the trunk. The problem was on my side, between the monitor and the chair... Next time I will turn on my gray cells before writing. (and please excuse me for the multiple postings) Luigi 
[QUOTE=ET_;289481]I guessed it when I noticed that your work was done on cc=2.0 ans cc=2.1, but for some reason I thought that the repository was updated with the same code, apart from the cc details...[/QUOTE]
You should be able to compile and run both versions on your CC 1.3 capable card (you can type make cc=2 in the gpu_ecm subdirectory, if you want a fermi build). 
I've switched to CC 2.0 compilation as well, and the default number of curves has raised from 32 to 64  same change as xilman above.
I haven't yet seen a mention of nonpower of 2 NB_DIGITS in this thread... therefore, I tried it, even if I have no idea whether it should work :smile: Well, at least, it does not seem to fail horribly: * the resulting executable doesn't crash; * the size of the executable is between the size of the 512bit version and the size of the 1024bit version; * on both a C211 and a C148, the 768bit version is faster than the 1024bitarithmetic version: [code]$ echo 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663  ./gpu_ecm_24 vv save 76663_210_ecm24_3e6 3000000 #Compiled for a NVIDIA GPU with compute capability 2.0. #Will use device 0 : GeForce GT 540M, compute capability 2.1, 2 MPs. #s has 4328086 bits Precomputation of s took 0.252s Input number is 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663 (212 digits) Using B1=3000000, firstinvd=435701810, with 64 curves ... gpu_ecm took : 1444.690s (0.000+1444.686+0.004) Throughput : 0.044 $ echo 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663  ./gpu_ecm_32 vv save 76663_210_ecm32_3e6 3000000 ... gpu_ecm took : 1814.801s (0.000+1814.797+0.004) Throughput : 0.035[/code] [code]for i in 16 24 32; do echo 3068628376360794912078530386432442844396649484227245118385713667577336042284107359110543525586164007547649873239035755922916136752709773803297694127  "./gpu_ecm_$i" vv save "80009_213_ecm${i}_3e6" 3000000; done ... gpu_ecm took : 865.578s (0.000+865.574+0.004) Throughput : 0.074 ... gpu_ecm took : 1707.302s (0.000+1707.298+0.004) Throughput : 0.037 ... gpu_ecm took : 2044.451s (0.000+2044.447+0.004) Throughput : 0.031 [/code] Comparison against CPU GMPECM running on 1 hyperthread of a SandyBridge i7, whose other 7 hyperthreads are used to the max as well: [code]$ echo 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663  ecm c 1 3e6 GMPECM 6.5dev [configured with GMP 5.0.90, enableasmredc, enableassert] [ECM] Input number is 7666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666663 (211 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=1718921992 Step 1 took 34590ms Step 2 took 11536ms[/code] [code]$ echo 3068628376360794912078530386432442844396649484227245118385713667577336042284107359110543525586164007547649873239035755922916136752709773803297694127  ecm c 1 3e6 GMPECM 6.5dev [configured with GMP 5.0.90, enableasmredc, enableassert] [ECM] Input number is 3068628376360794912078530386432442844396649484227245118385713667577336042284107359110543525586164007547649873239035755922916136752709773803297694127 (148 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=3766168691 Step 1 took 21521ms Step 2 took 8016ms[/code] For composites of those sizes, the GT 540M can beat one hyperthread of i72670QM if the CPU is busy, but not if the CPU is idle. 
Looking at the source, does NB_DIGITS really need to be a power of two? I haven't thought carefully about the memory access patterns, but it doesn't seem to need to be. And does it really need to be a compiletime constant? If the answer to both is no, then the code can adjust NB_DIGITS to the minimum needed for a particular number. Without doing any profiling, for numbers much smaller than the max allowed for a particular NB_DIGITS, I suspect a lot of time is spent spinning in the comparison function.

i must have done something wrong... (ubuntu 10.04 on a virtualbox, can't install nvidia driver)
[code] ./gpu_ecm 155 11000 s 11 n 5 #Compiled for a NVIDIA GPU with compute capability 1.3. #Will use device 1 : �@, compute capability 18927808.0 (you should compile the program for this compute capability to be more efficient), 0 MPs. #gpu_ecm launched with : N=155 B1=11000 curves=5 firstsigma=11 #Begin GPU computation... #All kernels launched, waiting for results... #All kernels finished, analysing results... #Looking for factors for the curves with sigma=11 xfin=4 zfin=1 xunif=4 #No factors found. You shoud try with a bigger B1. #Looking for factors for the curves with sigma=12 xfin=132 zfin=1 xunif=132 #No factors found. You shoud try with a bigger B1. #Looking for factors for the curves with sigma=13 xfin=153 zfin=1 xunif=153 #No factors found. You shoud try with a bigger B1. #Looking for factors for the curves with sigma=14 xfin=1 zfin=1 xunif=1 #No factors found. You shoud try with a bigger B1. #Looking for factors for the curves with sigma=15 xfin=80 zfin=1 xunif=80 #No factors found. You should try with a smaller B1 #Results : No factor found #Temps gpu : 0.050 init©=0.000 computation=0.050 [/code] how come zfin is always=1? 
Hey
As said above the more recent GPUECM program is in the gpu_ecm subdirectory (and NOT gpu_ecm_cc13 even for cards of compute compatibility 1.3). The NB_DIGITS stuff is still highly experimental and will most of the time either crash or return wrong results. Only the 1024 arithmetic (the default case) is, for now, working. But every feedbacks from experiments with NB_DIGITS are welcome. Cyril 
1 Attachment(s)
I searched on the web a little more and I found that the virtualbox can't handle cuda. So I installed a proper ubuntu, and now it seem to work.
another problem I found is that a found factor can be repeated... multiple time a short excerpt running ./gpu_ecm n 100 vv 1000 < c50 >test.txt [code] #Compiled for a NVIDIA GPU with compute capability 2.0. #Will use device 0 : GeForce GTX 560, compute capability 2.1, 7 MPs. #s has 1438 bits Precomputation of s took 0.000s Input number is 35969183562720316973971642318240294003662279400539 (50 digits) Using B1=1000, firstinvd=3505000919, with 96 curves 8+64*d=2290718356455159082183201030488769570180030093771 8+64*d=7718354218174420648608739771694329402521941399016 #Begin GPU computation... Block: 32x32x1 Grid: 3x1x1 #Looking for factors for the curves with (d*2^32) mod N = 3505000919 xfin=29784982820960298529336575761344388540459270300965 zfin=34069908628801445601848672255228786699573682666789 ********** Factor found in step 1: 78069 Found composite factor of 6 digits: 78069 Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits Factor found with (d*2^32) mod N = 3505000919 xunif=29784982820960298529336575761344388540459270300965 ********** Factor found in step 1: 78069 Found composite factor of 6 digits: 78069 Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits Factor found with (d*2^32) mod N = 3505000920 ********** Factor found in step 1: 3 Found probable prime factor of 1 digits: 3 Composite cofactor 11989727854240105657990547439413431334554093133513 has 50 digits Factor found with (d*2^32) mod N = 3505000921 ********** Factor found in step 1: 159 Found composite factor of 3 digits: 159 Composite cofactor 226221280268681238830010329045536440274605530821 has 48 digits [snip] [/snip]Factor found with (d*2^32) mod N = 3505001007 ********** Factor found in step 1: 78069 Found composite factor of 6 digits: 78069 Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits Factor found with (d*2^32) mod N = 3505001008 ********** Factor found in step 1: 78069 Found composite factor of 6 digits: 78069 Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits Factor found with (d*2^32) mod N = 3505001009 ********** Factor found in step 1: 78069 Found composite factor of 6 digits: 78069 Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits Factor found with (d*2^32) mod N = 3505001010 ********** Factor found in step 1: 78069 Found composite factor of 6 digits: 78069 Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits Factor found with (d*2^32) mod N = 3505001011 ********** Factor found in step 1: 78069 Found composite factor of 6 digits: 78069 Composite cofactor 460735805027864030203687024532660774490031631 has 45 digits Factor found with (d*2^32) mod N = 3505001012 ********** Factor found in step 1: 1473 Found composite factor of 4 digits: 1473 Composite cofactor 24418997666476793600795412300231021047971676443 has 47 digits Factor found with (d*2^32) mod N = 3505001013 #Looking for factors for the curves with (d*2^32) mod N = 3505001014 xfin=27705706019293510342398168717956449375083997353432 zfin=7141519035842313136767879128655301632201227912190 ********** Factor found in step 1: 159 Found composite factor of 3 digits: 159 Composite cofactor 226221280268681238830010329045536440274605530821 has 48 digits Factor found with (d*2^32) mod N = 3505001014 xunif=27705706019293510342398168717956449375083997353432 gpu_ecm took : 0.800s (0.000+0.800+0.000) Throughput : 120.000 [/code]Another question : is sigma random? 
Small factors will be found with very many sigma values  and it doesn't matter if random or sequential values are used for sigma.
Prefactor your input number (with yafu, for example) until it will become interesting to ECM it. 
[QUOTE=Cyril;290764]Hey
As said above the more recent GPUECM program is in the gpu_ecm subdirectory (and NOT gpu_ecm_cc13 even for cards of compute compatibility 1.3). The NB_DIGITS stuff is still highly experimental and will most of the time either crash or return wrong results. Only the 1024 arithmetic (the default case) is, for now, working. But every feedbacks from experiments with NB_DIGITS are welcome. Cyril[/QUOTE]Hi Cyril, welcome aboard! Thanks for the warnng. I'd not seen any difficulties myself with halving NB_DIGITS but it's good to know that I'm on dangerous ground. Paul 
Can someone answer a theory question on the ECM algorithm, please.
I know it is normal to run curves one level at a time to remove small factors. But does the presence of small factors inhibit higher level curves from finding big factors. Practically speaking, if gmp_ecm runs at the same speed for all numbers, regardless of whether they are 769 bits or 1018 bits, then could I just run a set of curves at B1=260e6, without bothering to remove the small factors. For example, if I run (2^1009+1) through the program, at B1=260000000, nearly every curve will find factors. Are those small factors merely noise or do the small factors prevent the algorithm from finding factors of the big C244? I'm talking ONLY stage 1, here. Thanks in advance. 
the factor here is *time* if 1 curve take 10 sec at B1=11e3, it will take 1000 sec at B1=1e6 ( or close). I would rather do 200 curve at B1 = 11e3 and find *all* the 15 digits factor than 3 at B1=1e6 to find them. and with a large B1 you are more prone to find a *composite* factor you will have to ... factor

[QUOTE=firejuggler;294599]the factor here is *time* ...[/QUOTE]
Thank you. I understand all that. But, I'm really asking a deeper question about the theoretical behavior of the ECM algorithm. 
[QUOTE=rcv;294605]Thank you. I understand all that.
But, I'm really asking a deeper question about the theoretical behavior of the ECM algorithm.[/QUOTE] I believe firejuggler's answer did address at least one aspect of the theoretical behavior you're talking about. If you run ECM with a high B1 when there are small factors, those small factors don't in any way prevent the algorithm from finding larger factors. But if it does find a large factor, it will probably also find the small factors, so the result of the algorithm will be a composite number which is the product of all the prime factors found. In other words, it will find large factors with the same probability (per curve) as if there were no small factors. But it may not give them to you in a form you want. And of course, as firejuggler pointed out, the total time you take to find factors will likely be much greater than if you first ran curves with lower B1 to find and divide out the small factors. But subject to your hypothetical ("if gmp_ecm runs at the same speed for all numbers, regardless of whether they are 769 bits or 1018 bits"), then it will find the large factors just as quickly whether or not the small factors are there. It just won't find them alone. Does that answer your question adequately, or are you looking for something else? 
[QUOTE=jyb;294608]... it will find large factors with the same probability (per curve) as if there were no small factors. ...
Does that answer your question adequately, or are you looking for something else?[/QUOTE] Thank you. That's *exactly* what I was asking. 
Result
[code]********** Factor found in step 1: 123606794672656155230910235277199616421317
Found probable prime factor of 42 digits: 123606794672656155230910235277199616421317 Probable prime cofactor 14866190468551593309306643619217939222105550143777577617032523708180871799254017557065876729519990261631832681060060230768948753357462444734554678109987149185755720805584735539302523326567649552404524657646651599763316335399327139435288373515331 has 245 digits Factor found with (d*2^32) mod N = 1001097 [/code]The input number was the 286digit cofactor of 296*11^296+1, aka GC(11,296). The factor was found on the 1098'th curve of a batch 1120. Two things are especially noteworthy. You don't often see p42 factors found in stage 1, even with B1 as high as 110M. The other can be seen in this snippet if you know what you are looking for. [code]time GPU: 209474.766s time CPU: 716.798s (0.001+716.737+0.060) Throughput: 0.005 real 3491m28.677s user 8m59.778s sys 3m4.908s [/code]This was my first production run with gpu_ecm incorporating a patch provided by Rocke Verser (rcv of this parish) which replaced the old busywait behaviour. System cpu time fell from an expected 50+ hours to a little over three minutes. Very impressive in my view. Thanks Rocke. Paul 
[QUOTE=rcv;294605]Thank you. I understand all that.
But, I'm really asking a deeper question about the theoretical behavior of the ECM algorithm.[/QUOTE] Read my joint paper with Sam Wagstaff: A Practical Analysis of ECM. 
[QUOTE=xilman;295541]Thanks Rocke.[/QUOTE]
You're welcome. It's not that I did anything great. It's more that NVIDIA's CUDA drivers are disrespectful of the CPU. When trying to keep the GPU busy, special care must be taken to prevent them from wasting an entire core in spinloops. [I've now used the technique on three separate CUDA programs with significant reductions in CPU wastage in each case.] BTW, Cyril also has the patch, and hints that it will be in a future release of his code. [QUOTE=R.D. Silverman;295562]Read my joint paper with Sam Wagstaff: A Practical Analysis of ECM.[/QUOTE] Thanks for the reference! For others who need a more complete reference, the paper was published in Mathematics of Computation, July 1993, v61, n203, pp 445462. The paper is presently available online at [URL]http://www.ams.org/journals/mcom/199361203/S00255718199311220787/[/URL] While a little dated (the largest ECM factors discovered at the time of its writing were 38 digits), I would recommend the paper to those interested in a derivation of the various parameters used by GMPECM and other factoring software. While the paper is interesting, I fail to see that it directly answers my question. Table 3, for example, gives L, the expected number of curves to find a factor of a given size. However, the assumptions that lead to that derivation are not true in the theoretical question I was asking. As with other papers and articles I have read concerning ECM, a basic assumption is that you factor out the small factors as you find them. My question was more "...but what happens if you don't remove the small factors?" While not referring to my specific theoretical question, the second Remark on page 460 seems to acknowledge that analysis is difficult when the number being factored "does not have the same expected smoothness properties of some other arbitrary integer of about the same size." I will reread the paper in more depth over the next few days. 
In SVN HEAD, the CPU usage was greatly reduced (thanks to rcv's patch, I guess), and the PRINT_REMAINING_ITER define is interesting for making the program's progress more observable  great :smile:
I've just made a trivial patch for adding timestamps to the output of GPUECM, which makes it easier (for me, at least) to estimate when the job will finish. Here it is, in case someone else is interested: [code] Index: cudakernel.cu ===================================================================  cudakernel.cu (révision 1908) +++ cudakernel.cu (copie de travail) @@ 191,7 +191,14 @@ if (j < 1000000) jmod = 100000; if (j < 100000) jmod = 10000; if (j % jmod == 0)  printf("%lu iterations to go\n", j); + { + char buf[32]; + time_t curtime = time(NULL); + + strcpy(buf, ctime(&curtime)); + *(strchr(buf, '\n')) = 0; + printf("%s %lu iterations to go\n", buf, j); + } #endif } [/code] (unsurprisingly, the timestamping code is basically the same as in msieve) 
No more cc1.3 for GPUECM
Is it true? :sad:
Luigi 
[QUOTE=ET_;296928]Is it true? :sad:
Luigi[/QUOTE]First I've heard of it. Have you mailed Cyril? 
[QUOTE=xilman;296944]First I've heard of it. Have you mailed Cyril?[/QUOTE]
I read this advice yesterday in a README file located in the GPUECM trunk of the svn. But the makefile still had the comment to switch compilation between cc1.3 and 2.0 (2.0 by default). I asked here because it's the easiest thing to do for me, as I don't know the developers. Luigi 
[QUOTE=ET_;296928]Is it true? :sad:
Luigi[/QUOTE] Yes it is. The CUDA code in the svn repository requires a GPU with compute capability at least 2.0 and the CUDA toolkit version should be at least 4.1. Cyril 
[QUOTE=Cyril;297208]Yes it is. The CUDA code in the svn repository requires a GPU with compute capability at least 2.0 and the CUDA toolkit version should be at least 4.1.
Cyril[/QUOTE] Thank you for your answer Cyril. It's time I get a new graphic board... Luigi 
[QUOTE=Cyril;297208]Yes it is. The CUDA code in the svn repository requires a GPU with compute capability at least 2.0 and the CUDA toolkit version should be at least 4.1.
Cyril[/QUOTE]:sad: Perhaps it's time I started adding more code to the reversion suppository. I've a Tesla C1060 which is still a nice compute engine despite being CC 1.x 
ECM mailing list
Sorry to pop out here again...
Is the development of GPUECM described in the GMPECM mailing list? May I be (readonly) added to it, in case? Luigi 
[QUOTE=ET_;312502]Is the development of GPUECM described in the GMPECM mailing list?
May I be (readonly) added to it, in case? Luigi[/QUOTE] You can find the gmpecm web page here: [url]https://gforge.inria.fr/projects/ecm/[/url] From there you can click the "Lists" link, and then select subscribe/unsubscribe link for the "ecmdiscuss" list. Also, from the "Lists" link, you can click on "ecmdiscuss Archives" to view all previous messages sent to the list. I would say that the discussion of GPUECM can be discussed there, but I don't see much discussion of it. The first/last burst of traffic about GPUECM looks to be around March 2012. Also, a request for anyone capable. Can someone post a Windows x64 binary for GPUECM? I would really like to try running this, but I can't seem to build it on my own. I cannot get it to compile with MingW64 and my skills with Visual Studio is nonexistent so I never seem to be able to build one there. I understand that this is considered beta/nonproduction code for now, but I would still like to try running it. Thanks to anyone who can help. 
[QUOTE=WraithX;312572]You can find the gmpecm web page here: [url]https://gforge.inria.fr/projects/ecm/[/url]
From there you can click the "Lists" link, and then select subscribe/unsubscribe link for the "ecmdiscuss" list. Also, from the "Lists" link, you can click on "ecmdiscuss Archives" to view all previous messages sent to the list. I would say that the discussion of GPUECM can be discussed there, but I don't see much discussion of it. The first/last burst of traffic about GPUECM looks to be around March 2012. [/QUOTE] Thank you very much. :bow: In fact I'd expect to see runs, tables and benchmarks here, as well as tests done... Luigi 
Nailing bugs.
A very long standing problem with the GPU version on my machine is that the second test in "make check" fails by finding the input number. I returned to the issue today.
I've not fixed the bug but have characterized it better and probably have a workaround. The failure appears to be in how stage2 is set up after running stage1 on the GPU. If only stage1 is run and a save file created, that file can be used successfully to complete the factorization. The proposed workaround should now be obvious. When I better understand the stage1 to stage2 conversion routines I'll try to fix the bug properly. My best guess is that something is not being initialised properly from a default zero value. Another trivial bug was also fixed in the latest SVN  2310  which prevented compilation under CUDA. Cyril has been informed about these developments. 
[QUOTE=xilman;322498]A very long standing problem with the GPU version on my machine is that the second test in "make check" fails by finding the input number. I returned to the issue today.
I've not fixed the bug but have characterized it better and probably have a workaround. The failure appears to be in how stage2 is set up after running stage1 on the GPU. If only stage1 is run and a save file created, that file can be used successfully to complete the factorization. The proposed workaround should now be obvious. When I better understand the stage1 to stage2 conversion routines I'll try to fix the bug properly. My best guess is that something is not being initialised properly from a default zero value. Another trivial bug was also fixed in the latest SVN  2310  which prevented compilation under CUDA. Cyril has been informed about these developments.[/QUOTE] Indeed, I'm working on it. During the process of corecting theses bugs, I found out that the GPU code is not correct when compiled with CUDA 5.0 (at least on my machine) but it work when compiled with CUDA 4.2. Cyril 
After looking carefully at the assembly code produced by CUDA 5.0, I found a bug in CUDA 5.0 and report it to Nvidia people. They are looking into it. In the meantime [U][B]I recommend not to use CUDA 5.0 to compile the GPU version of GMPECM.[/B][/U] (if you manage to succefully run make check with a version compile with CUDA 5.0, i'll be interested to hear about it).
Cyril 
I noticed that on [url]https://gforge.inria.fr/scm/viewvc.php/trunk/?root=ecm[/url] there is the ecm /trunk relative to version 6.4.2 of source code, while on [url]http://www.loria.fr/~zimmerma/records/ecmnet.html[/url] the stable version is still 6.2.2, [URL="https://gforge.inria.fr/frs/?group_id=135&release_id=7362#ecm_6.4.3btitlecontent"]This[/URL] page reports the source code of version 6.4.3 [U]without[/U] GPU extensions, and there are other precompiled executables over the forum.
What should I do if I need the complete 6.4.3 source code [U]with[/U] the GPU extensions? Luigi 
The last release is the 6.4.3 but it does not contain any GPU code. The GPU version of GMPECM is only, for now, in the development version of GMPECM. If you want to try it you should download the svn repository. But be aware that being a development version, it should not be used for important computations but only for test.
Cyril 
[QUOTE=Cyril;324284]The last release is the 6.4.3 but it does not contain any GPU code. The GPU version of GMPECM is only, for now, in the development version of GMPECM. If you want to try it you should download the svn repository. But be aware that being a development version, it should not be used for important computations but only for test.
Cyril[/QUOTE] Thank you Cyril, I understand. Luigi 
[QUOTE=Cyril;324247]After looking carefully at the assembly code produced by CUDA 5.0, I found a bug in CUDA 5.0 and report it to Nvidia people. They are looking into it. In the meantime [U][B]I recommend not to use CUDA 5.0 to compile the GPU version of GMPECM.[/B][/U] (if you manage to succefully run make check with a version compile with CUDA 5.0, i'll be interested to hear about it).
Cyril[/QUOTE] From revision 2342, CUDA 5.0 can be used to compile GMPECM for GPU. The "bug" was that I use the carry flag inside assembly statement not protected by __volatile__. This did not raise any problem with CUDA 4.2, but was incorrect when compile with CUDA 5.0. Cyril 
[QUOTE=xilman;322498]A very long standing problem with the GPU version on my machine is that the second test in "make check" fails by finding the input number. I returned to the issue today.
I've not fixed the bug but have characterized it better and probably have a workaround. The failure appears to be in how stage2 is set up after running stage1 on the GPU. If only stage1 is run and a save file created, that file can be used successfully to complete the factorization. The proposed workaround should now be obvious. When I better understand the stage1 to stage2 conversion routines I'll try to fix the bug properly. My best guess is that something is not being initialised properly from a default zero value. Another trivial bug was also fixed in the latest SVN  2310  which prevented compilation under CUDA. Cyril has been informed about these developments.[/QUOTE] Can you try to run make check with the GPU code enable with SVN  2396  ? Does the error still happen ? Cyril 
[QUOTE=Cyril;329114]Can you try to run make check with the GPU code enable with SVN  2396  ? Does the error still happen ?
Cyril[/QUOTE] I just did it, SVN 2478. I'm afraid the bug is still there :sad: It works fine when a factor is found in Step 1: [code] luigi@luigiubuntu:~/luigi/CUDA/gpuecm/trunk$ echo 2432902008176640001  ./ecm gpu v 1000 GMPECM 7.0dev [configured with MPIR 2.5.1, enableasmredc, enablegpu, enableassert] [ECM] Running on luigiubuntu Input number is 2432902008176640001 (19 digits) Using MODMULN [mulredc:0, sqrredc:0] Computing batch product (of 1438 bits) of primes below B1=1000 took 0ms GPU: compiled for a NVIDIA GPU with compute capability 2.0. GPU: will use device 0: GeForce GTX 580, compute capability 2.0, 16 MPs. GPU: Selection and initialization of the device took 116ms Using B1=1000, B2=51606, sigma=3:22009480263:2200948537 (512 curves) dF=32, k=6, d=240, d2=7, i0=2 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 1.3e+11 Inf Inf Inf Inf Inf Inf Inf Inf Inf Computing 512 Step 1 took 72ms of CPU time / 2528ms of GPU time Throughput: 202.555 curves by second (on average 4.94ms by Step 1) ********** Factor found in step 1: 20639383 Found probable prime factor of 8 digits: 20639383 Probable prime cofactor 117876683047 has 12 digits ********** Factor found in step 1: 117876683047 Found input number N [/code] But when I lower the B1 parameter, I got this: [code] luigi@luigiubuntu:~/luigi/CUDA/gpuecm/trunk$ echo 2432902008176640001  ./ecm gpu 20 GMPECM 7.0dev [configured with MPIR 2.5.1, enableasmredc, enablegpu, enableassert] [ECM] Input number is 2432902008176640001 (19 digits) Using B1=20, B2=210, sigma=3:21079821763:2107982687 (512 curves) Computing 512 Step 1 took 72ms of CPU time / 157ms of GPU time ********** Factor found in step 2: 2432902008176640001 Found input number N [/code] Luigi 
If anyone is still looking for windows binaries of the gpu version I posted some at [url]http://www.mersenneforum.org/showpost.php?p=338530&postcount=9[/url].

[QUOTE=mklasson;338532]If anyone is still looking for windows binaries of the gpu version I posted some at [URL]http://www.mersenneforum.org/showpost.php?p=338530&postcount=9[/URL].[/QUOTE]
Thanks for the upload! I'm having trouble. It finds the input number in stage 2 sometimes, but it never finds a factor. Rarely it will find a factor in stage 1. It doesn't seem to be using stage 2 right. What am I doing wrong? [CODE]gpu_ecm.exe n v gpu gpucurves 32 one c 32 11000 <factorme.txt >> log.txt pause[/CODE]Finding input number in stg2: [CODE]GMPECM 7.0dev [configured with MPIR 2.6.0, enablegpu] [ECM] Input number is (3^4671)/2 (223 digits) Using MODMULN [mulredc:0, sqrredc:1] Computing batch product (of zu bits) of primes below B1=0 took 0ms GPU: compiled for a NVIDIA GPU with compute capability 2.0. GPU: will use device 0: GeForce GT 440, compute capability 2.1, 2 MPs. GPU: Selection and initialization of the device took 0ms Using B1=11000, B2=1873422, sigma=3:39002112863:3900211317 (32 curves) dF=256, k=3, d=2310, d2=13, i0=8 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 4298501 2.8e+008 2.2e+010 Inf Inf Inf Inf Inf Inf Inf Computing 32 Step 1 took 93ms of CPU time / 2143ms of GPU time Throughput: 14.934 curves by second (on average 66.96ms by Step 1) Using 27 small primes for NTT Estimated memory usage: 1800K Initializing tables of differences for F took 0ms Computing roots of F took 0ms Building F from its roots took 16ms Computing 1/F took 0ms Initializing table of differences for G took 0ms Computing roots of G took 0ms Building G from its roots took 0ms Computing roots of G took 0ms Building G from its roots took 15ms Computing G * H took 0ms Reducing G * H mod F took 16ms Computing roots of G took 0ms Building G from its roots took 0ms Computing G * H took 0ms Reducing G * H mod F took 16ms Computing polyeval(F,G) took 15ms Computing product of all F(g_i) took 0ms Step 2 took 78ms ********** Factor found in step 2: 3270362983146927377028671682671960437107912062834199545091118347495738861506779345734946890846481108479144446587334081489280966903453000172689482880397175599299632714862220456046073976859568442978416930175676229727557533293 Found input number N [/CODE]Finding a factor in stg1: [CODE]GMPECM 7.0dev [configured with MPIR 2.6.0, enablegpu] [ECM] Input number is (3^4671)/2 (223 digits) Using MODMULN [mulredc:0, sqrredc:1] Computing batch product (of zu bits) of primes below B1=0 took 0ms GPU: compiled for a NVIDIA GPU with compute capability 2.0. GPU: will use device 0: GeForce GT 440, compute capability 2.1, 2 MPs. GPU: Selection and initialization of the device took 0ms Using B1=11000, B2=1873422, sigma=3:25357071313:2535707162 (32 curves) dF=256, k=3, d=2310, d2=13, i0=8 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 4298501 2.8e+008 2.2e+010 Inf Inf Inf Inf Inf Inf Inf Computing 32 Step 1 took 140ms of CPU time / 2143ms of GPU time Throughput: 14.933 curves by second (on average 66.97ms by Step 1) ********** Factor found in step 1: 27836167022857 Found probable prime factor of 14 digits: 27836167022857 Probable prime cofactor ((3^4671)/2)/27836167022857 has 210 digits [/CODE] 
Oh, right, I noticed some full Ns in stage 2 as well, but figured I was just unlucky...
You might be right that there's some problem though. Alas, I have no idea if it's specific to my build, or how to fix it if it is. 
[QUOTE=mklasson;338546]Oh, right, I noticed some full Ns in stage 2 as well, but figured I was just unlucky...
You might be right that there's some problem though. Alas, I have no idea if it's specific to my build, or how to fix it if it is.[/QUOTE] It is definitely not specific to your build. I ran into the same "feature" with my Linux build, and told the thread about it. Luigi 
[QUOTE=mklasson;338546]Oh, right, I noticed some full Ns in stage 2 as well, but figured I was just unlucky...
You might be right that there's some problem though. Alas, I have no idea if it's specific to my build, or how to fix it if it is.[/QUOTE]Run stage 1 on the gpu and create a save file. Continue from that file on the cpu and the factor should appear. It's the workaround which seems to be the only way of getting factors right now. Cyril is aware of this problem and the workaround. Paul 
[QUOTE=xilman;338725]Run stage 1 on the gpu and create a save file. Continue from that file on the cpu and the factor should appear.[/QUOTE]
Ah, great, thanks! 
[QUOTE=mklasson;338762]Ah, great, thanks![/QUOTE]To clarify, a run has just finished here. I've snipped out the relevant lines so that you may see if you can reproduce them on your machine if you wish. The save file entry can be used asis but if you want to try to reproduce it from a gpu run you will need to set sigma on the command line to the required value.[code]
echo 101482149355388048731487881935340331090889262212842924363713738989156240527153150278043500061215942991261583087754330974561238357811350557823367611213054527945175907871764674837850271132890282786375916516004587107515545577946886906206187414317583840986632315132836398937826193398451  ecm gpu savea gpu.save c 448 3000000 0 >> gpu.out ecm resume gpu.save 3000000 5706890290 > gpu2.out [/code]The B2=5706890290 was taken from a preliminary run with ecm v to discover what the default B2 should be. Snip from gpu.save:[code]METHOD=ECM; PARAM=3; SIGMA=449286141; B1=3000000; N=101482149355388048731487881935340331090889262212842924363713738989156240527153150278043500061215942991261583087754330974561238357811350557823367611213054527945175907871764674837850271132890282786375916516004587107515545577946886906206187414317583840986632315132836398937826193398451; X=0xd94b0c9842b63bdc4ddf6fe836281864cf5e4a6e32f441c357219ee321da4ba0a1a42a7096ce7b9191ea65ede910405e3a2d602d2a23ae403873a49689270fb6493d768d98601a20cf387f9cd155a2429608c2beb9a55198eca823b9fc787cdaed067fd137c7e60f66b21771ef80e59a12c064748; CHECKSUM=937322733; PROGRAM=GMPECM 7.0dev; X0=0x0; Y0=0x0; WHO=pcl@anubis.home.brnikat.com; TIME=Sat May 4 09:48:08 2013;[/code]Relevant portion of gpu2.out:[code] Resuming ECM residue saved by pcl@anubis.home.brnikat.com with GMPECM 7.0dev on Sat May 4 09:48:08 2013 Input number is 101482149355388048731487881935340331090889262212842924363713738989156240527153150278043500061215942991261583087754330974561238357811350557823367611213054527945175907871764674837850271132890282786375916516004587107515545577946886906206187414317583840986632315132836398937826193398451 (282 digits) Using B1=30000003000000, B2=5706890290, polynomial Dickson(6), sigma=3:449286141 Step 1 took 0ms Step 2 took 7098ms ********** Factor found in step 2: 445224571829374761288699666131147495198356727 Found probable prime factor of 45 digits: 445224571829374761288699666131147495198356727 Composite cofactor 227934745241957285881778043049404755009011820338900145654372601421491779248241846890768278353591918611426577107776860159203725842047017937949868936707925419578871783371101851788601523957788893378997332682158710312764704944917884656543013 has 237 digits [/code] Good luck! Paul 
I achieve this by using B2scale 0 on the gpu run, and then resume with the same B1 on the CPU stage 2. ECM runs the default B2 by doing so.
ecm resume savefile.txt 43e6 >> output.txt does the trick for me. 
[QUOTE=VBCurtis;339717]I achieve this by using B2scale 0 on the gpu run, and then resume with the same B1 on the CPU stage 2. ECM runs the default B2 by doing so.
ecm resume savefile.txt 43e6 >> output.txt does the trick for me.[/QUOTE]Thanks for the tip. I'll try to remember it in future. Paul 
FYI
The save(a) option of current SVN HEAD generates savefiles with B1 set to 1.
Here is an example from a GPU run: [CODE]METHOD=ECM; PARAM=3; SIGMA=1338409529; [B]B1=1[/B]; N=107337638919967483141063623365542229910680957563823617797446929412356389149831517148531981651634180847916212539333192454506616947630553940911141905872245222153540770799676352775350889317617472274748340543075085097852507186666728619014330952808580810310164700637173471479651231324428337639337605274423; X=0x80e2c39ebf6af056d56c5ca03fdb34173afd8189a7ae9c4e42291b3cc4e2dcfec3cb5b27f3d45e3c02cded29e571c7f00cb0d4cb536eb10e4a585e888b1ba37804d73c0d6dc715129091957eb888d831bc16417c30180be983c5fa89d77d0caac142d864eed7a09b14340a41c46dd30a49f4ce673ce450fb3bbca03d; CHECKSUM=1807052727; PROGRAM=GMPECM 7.0dev; X0=0x0; Y0=0x0; WHO=ralf@quadriga; TIME=Sun Jul 7 12:54:06 2013;[/CODE]and from a CPU run: [CODE]METHOD=ECM; PARAM=1; SIGMA=2099601580; [B]B1=1[/B]; N=107337638919967483141063623365542229910680957563823617797446929412356389149831517148531981651634180847916212539333192454506616947630553940911141905872245222153540770799676352775350889317617472274748340543075085097852507186666728619014330952808580810310164700637173471479651231324428337639337605274423; X=0xec9ba92e05563539a531b6eb6aaf42e3eb8c2c05ddd1b2bdf08d4908b5ceb4875a3ceb7a6a5f9046af2ba2f27aaca39d08c51cb927ae3ffabc682df4420515b2b354631183762317ea2c4a35a965d15f5c892c63daf8e97f672bc38f2f5268b39e0d14667b82d542bde08c6d72ccd602bbf0d7586d39b08992502d5f; CHECKSUM=1971712291; PROGRAM=GMPECM 7.0dev; Y=0x0; X0=0x0; Y0=0x0; WHO=ralf@quadriga; TIME=Sun Jul 7 12:57:27 2013;[/CODE]B1 was set to 11e3 in both (test)cases. batch_last_B1_used contains the B1 value given in the command line. B1done=1 when write_resumefile is called from main.c:1581 Here is a debugger output from the GPU run: [CODE]#0 main (argc=2, argv=0x7fffffffe3b0) at main.c:1581 (gdb) display params 1: params = {{method = 0, x = {{_mp_alloc = 6955, _mp_size = 6954, _mp_d = 0x7ec450}}, y = {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0x6da4c0}}, param = 3, sigma = {{_mp_alloc = 2, _mp_size = 1, _mp_d = 0x6da4e0}}, sigma_is_A = 0, E = 0x6da300, go = {{ _mp_alloc = 1, _mp_size = 1, _mp_d = 0x6da5c0}}, [B]B1done = 1[/B], B2min = {{ _mp_alloc = 1, _mp_size = 1, _mp_d = 0x6da5e0}}, B2 = {{ _mp_alloc = 1, _mp_size = 1, _mp_d = 0x6da600}}, k = 0, S = 0, repr = 0, nobase2step2 = 0, verbose = 1, os = 0x7ffff72c57a0, es = 0x7ffff72c5880, chkfilename = 0x0, TreeFilename = 0x0, maxmem = 0, stage1time = 0, rng = {{_mp_seed = {{_mp_alloc = 313, _mp_size = 32767, _mp_d = 0x6da620}}, _mp_alg = GMP_RAND_ALG_DEFAULT, _mp_algdata = { _mp_lc = 0x4af140}}}, use_ntt = 1, stop_asap = 0x405a1c <stop_asap_test>, batch_s = {{_mp_alloc = 249, _mp_size = 249, _mp_d = 0x6de290}}, [B]batch_last_B1_used = 11000[/B], gpu = 1, gpu_device = 1, gpu_device_init = 1, gpu_number_of_curves = 448, gw_k = 0, gw_b = 0, gw_n = 0, gw_c = 0}} (gdb) [/CODE]Trying to resume from this savefile(s) leads to an internal error: [CODE]Error, x0 should be equal to 2 with this parametrization Please report internal errors at <ecmdiscuss@lists.gforge.inria.fr>.[/CODE] 
Apologies for popping this thread up, but I was trying to compile the GPUenabled GMPECM, and I can't seem to get it to happen despite multiple attempts. I've tried to get the VS2010 build up and running, but there are many issues there (can't find header files that it should be able, trouble with CUDA, etc.).
So I figured I would try and compile it under MinGW (32bit, Windows XP) since I've not really had many issues there. I pulled the latest SVN (v.2521) and tried to use [CODE]./configure.in[/CODE] and get [CODE]./configure.in: line 1: syntax error near unexpected token `[ECM_VERSION_AC],'./configure.in: line 1: `m4_define([ECM_VERSION_AC], [7.0dev])'[/CODE] Any help with this would be appreciated. System specs listed below: AMD Phenom II X4, Windows XP 32bit, GTX570 (CC 2.0) Thanks, Ben 
You don't run configure.in directly; these are inputs to the autotools, which you have to use manually on a repository checkout in order to be able to run the configure script. Release ditributions of GMPECM have done that for you already.

Thanks Jason. Another opportunity to learn more about compilation background. Off to play with autotools...

All times are UTC. The time now is 20:34. 
Powered by vBulletin® Version 3.8.11
Copyright ©2000  2021, Jelsoft Enterprises Ltd.