![]() |
With 3^1009-1, I find the factor 2 in Stage 1.
I find 10091, 54534433, and 15283224119, but not 684071451517 with B1=100000 in Stage 2. |
[QUOTE=wombatman;379212]Follow-up question: If it can find the two known factors (apart from any larger ones it may find) of 2^63997-1, would you have confidence in it working properly?[/QUOTE]
As a baseline, I already have confidence that the code is working properly, because it is a well-respected team of researchers with a modern bug- and task-tracking pipeline. If the build quality needs checking, try the test cases included with the software ([I]make; make check[/I]; and only then, [I]make install [/I]- almost everyone knows the conventional three commands. Yet many people skip the middle part ... for speed or whatever other reasons?). [QUOTE=wombatman;379236]My results are very hit-or-miss.[/QUOTE] ECM with random initial values (which is the default) is by definition random (or you can call it hit-or-miss). Running 890 curves at 1e6 and not finding a factor proves no bug (or the absence of a bug). What you want to do is find a specific sigma that does produce a factor (in controlled environment, e.g. with old/CPU code in case that you are testing the GPU), then (unsure about this step because of the new GPU parallelism; check the documentation/source) transform the sigma into the new GPU-compatible sigma, and expect a factor from the code under testing. Unclear about what you are trying to achieve. |
To be clear, all of my concerns are with GPU-ECM only. CPU-based ECM is not causing me any issues. I did the make/make check/make install and everything completes perfectly.
Unfortunately, I can't do the make check portion with a Windows-based build of GPU-based ECM via Visual Studio, so I'm only able to use a file I was given that essentially runs some basic Stage 1 checks for a GPU-enabled build. This is what it checks: [CODE]# Exit statues returned by GMP-ECM: # 0 Normal program termination, no factor found # 1 Error # 2 Composite factor found, cofactor is composite # 6 Probable prime factor found, cofactor is composite # 8 Input number found # 10 Composite factor found, cofactor is a probable prime # 14 Probable prime factor found, cofactor is a probable prime #test for stage 1 on GPU echo 458903930815802071188998938170281707063809443792768383215233 | $ECM -sigma 3:227 125 0 checkcode $? 14 #test for stage 1 on GPU echo "2^349-1" | $ECM -sigma 3:279 587 0 checkcode $? 6 #2^1018-1 is the maximum number that can be used echo "2^1018-1" | $ECM -sigma 3:92 4 0 checkcode $? 2[/CODE] I'll try and figure out how to convert the sigma from CPU-based to GPU-based. My goal is to try and confirm that the GPU-ECM build is working properly with a higher bit-limit than 1018. To use a recent example, I am concerned when I don't find a 12-digit factor (684071451517, of 3^1009-1) while doing 480 curves on the GPU with B1=100,000, which is greater than a t25. Maybe I have the wrong impression, but wouldn't I reasonably expect to catch that 12-digit factor? |
Maybe they mean it when they said "2^1018-1 is the maximum number that can be used". For this particular implementation, that is. If you are just changing some #define and recompiling, then there's no guarantee that you are not breaking internal dependencies that are not easily explained - you have to have a working knowledge of all internals. I don't know ecm code all too well at all, so I will give you an example from the programs that I know in some more intimate detail:
1) NewPgen: you cannot just up some limits and get it to use more memory - it is written with 32-bit asm and to extend its capabilities, one needs to rewrite asm, and this is much hairier than one would want, so it is best tweaked within its limits; 2) the K-F siever: there is no magic way to go around the bulk of the code written with short ints -- and that makes adapting the "17e" siever from old code practically impossible; if you convert short ints into ints you won't have enough memory on most comps, and if you'd want to use 16-bits with 1-bit somehow cleverly packed away you then would want to discard almost everything and write from scratch. Etc. 3) mmff: one cannot just add 32 in some places and the code starts working with larger numbers; you have to implement quite a bit - which is easier that writing from scratch, but still tedious (copy the template code, then make everything wider and then scratch your head and write quite a bit extra for a couple evenings; and then debug until it works). When Paul playfully says that the code very likely can be pushed much further [TEX]\ne[/TEX] "we double a few #define'd limits and everything just works". More likely, quite a bit has to be re-implemented if some values just jump out of their Procrustean beds ([I]even if[/I] occasionally, while most of the time fitting in them). |
I certainly concede that it's far more complicated.
I guess I'm ultimately just confused with why it still seems to work, if only partially. For instance, running (((3^1009-1)/2)/8410464655934379916957) (the factors mentioned in my last post) and using B1=1e6, I find another factor--684071451517 at sigma 3:473815648--with the remaining composite of 448 digits being well past the apparent limit of 2^1018. Regardless, thanks for taking the time to help out with my confusion. I do very much appreciate it. |
[QUOTE=wombatman;379236]If anybody finds a large factor (25-35 digits?) with a known sigma from a composite larger than 2^1018-1, please let me know.[/QUOTE]From an email received just short of 10 years ago (9 August 2004 to be precise):
[code] GMP-ECM 5.0.3 [powered by GMP 4.1.3] [ECM] Input number is 791395160180513434925493302042988391765189206021240027609905182633953874141701415568465466331240326981326923973788166276499799514661251096178510399218287572287099081763160076327898905016494232924287370411286185115460579419942236967762802195925225659431940912994144487907043366577681908978575539552129317826543837057074224429283959629605570832177613666749841648425748727472212410659081090982546999644806715165903661973226553262903054634522680493789572092376708237522695533293372981915261383302824644696323709130493657087 (519 digits) Using B1=1000000, B2=839549780, polynomial Dickson(6), sigma=1743901906 Step 1 took 179265ms Step 2 took 83297ms ********** Factor found in step 2: 147380237642809197871546843418239 Found probable prime factor of 33 digits: 147380237642809197871546843418239 Composite cofactor 5369750875952168414443816420699332431795820624727865885821034376732136830156150321744936437728716035183053122667573256784267448160985961215791775970975869884330782622104150907555736955403408627414489081834152723633862437836650013891244407645622806325285053174011039418524274630141893462696783549120812199818549229694139352185629606669322896747319898095601307656723865978708079670509098659082110655805803426884366044802794880808466077343593119322766188996763096027433864806175462275976833 has 487 digits Jo Yeong Uk [/code] |
Or (as already hinted a bit earlier),
[URL="http://mersenneforum.org/showpost.php?p=209764&postcount=9"]F12's factor[/URL] with sigma=1428526317. Quite a nice catch, that was. |
Thanks for those, guys. I'm going to do some systematic testing of both base 2 and non-base 2 numbers and see if I can pin down at least a pattern of what's happening. Maybe that will help me figure out what might need changing.
|
[QUOTE=Batalov;379288]Or (as already hinted a bit earlier),
[URL="http://mersenneforum.org/showpost.php?p=209764&postcount=9"]F12's factor[/URL] with sigma=1428526317. Quite a nice catch, that was.[/QUOTE] Or also the 36-digit factor 775608719589345260583891023073879169 of the hundred-thousands-digit 19th Sylvester number (see [URL="http://primerecords.dk/sylvester-factors.htm"]http://primerecords.dk/sylvester-factors.htm[/URL]) found with sigma=787582611. Note sure this 106721-digit number is suitable for your tests :wink: [CODE]GMP-ECM 6.4.3 [configured with GMP 5.0.5, --enable-asm-redc] [ECM] Running on dmi-t5500-cc Input number is 1166841411...6400110443 (106721 digits) Using REDC Using B1=250000, B2=183032866, polynomial Dickson(3), sigma=787582611 dF=2880, k=2, d=30030, d2=17, i0=-8 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 4550 64790 1126804 2.3e+07 5.3e+08 1.4e+10 2.1e+13 3.1e+18 4.1e+23 Inf Step 1 took 24318335ms Estimated memory usage: 4627M Initializing tables of differences for F took 59063ms Computing roots of F took 496365ms Building F from its roots took 351980ms Computing 1/F took 114670ms Initializing table of differences for G took 65536ms Computing roots of G took 423292ms Building G from its roots took 352247ms Computing roots of G took 425429ms Building G from its roots took 351820ms Computing G * H took 82705ms Reducing G * H mod F took 113569ms Computing polyeval(F,G) took 590871ms Computing product of all F(g_i) took 26225ms Step 2 took 3454012ms ********** Factor found in step 2: 775608719589345260583891023073879169 Found probable prime factor of 36 digits: 775608719589345260583891023073879169 [/CODE] |
[QUOTE=Ralf Recker;289402]CC 2.0 card (GTX 470, stock clocks), 512 bit arithmetic, CUDA SDK 4.0. The c151 was taken from the Aliquot sequence 890460:i898
[CODE]ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm -d 0 -save c151.save 250000 < c151 Precomputation of s took 0.004s Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, firstinvd=24351435, with 448 curves gpu_ecm took : 116.363s (0.000+116.355+0.008) Throughput : 3.850[/CODE]Doubling the number of curves improves the throughput: [CODE]ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm -d 0 -n 896 -save c151.save 250000 < c151 Precomputation of s took 0.004s Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, firstinvd=1471710578, with 896 curves gpu_ecm took : 179.747s (0.000+179.731+0.016) Throughput : 4.985[/CODE]32 curves less and the throughput increases by another 30% [CODE]ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm -d 0 -n 864 -save c151.save 250000 < c151 Precomputation of s took 0.004s Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, firstinvd=1374804691, with 864 curves gpu_ecm took : 130.964s (0.000+130.948+0.016)[/CODE]Throughput : 6.597 [/QUOTE] CC 5.2 card (GTX 970), 512 bit arithmetic, CUDA SDK 6.5.19. The same c151 as above. Compiled for sm_50: [CODE]GMP-ECM 7.0-dev [configured with GMP 6.0.0, --enable-asm-redc, --enable-gpu, --enable-assert, --enable-openmp] [ECM] Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, B2=128992510, sigma=3:2693078490-3:2693080153 (1664 curves) Computing 1664 Step 1 took 3056ms of CPU time / 45778ms of GPU time[/CODE]Throughput is 36.349 CC 5.2 card (GTX 970), 512 bit arithmetic, CUDA SDK 6.5.19. The same c151 as above. Compiled for sm_52: [CODE] GMP-ECM 7.0-dev [configured with GMP 6.0.0, --enable-asm-redc, --enable-gpu, --enable-assert, --enable-openmp] [ECM] Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, B2=128992510, sigma=3:1776416363-3:1776418026 (1664 curves) Computing 1664 Step 1 took 3128ms of CPU time / 46272ms of GPU time[/CODE]Throughput is 35.961 1024 bit arithmetic / sm_50 : [CODE]Computing 1664 Step 1 took 3052ms of CPU time / 128693ms of GPU time[/CODE]1024 bit arithmetic / sm_52 : [CODE]Computing 1664 Step 1 took 3116ms of CPU time / 140051ms of GPU time[/CODE] |
Ralf: thanks for posting your data. It prompted me to do something similar for my 460. This card has 448 CUDA cores and GMP-ECM believes that the best number of curves to run in parallel is 224. Timing data for a C242 and B1=100K tells a different story. First the tabulated data
[code] gpucurves ms cpu ms gpu ms/curve ----------------------------------- 224 1070 48104 214.75 256 950 89919 351.25 384 960 90021 234.43 416 890 89835 215.95 448 920 90126 201.27 608 960 127456 209.63 640 920 127450 199.14 672 950 127526 189.77 ** 832 1040 179530 215.78 864 990 179556 207.82 896 970 179649 200.50 [/code] Taking a hint from you I ran integer multiples of 224 and the same decreased by 32 and 64. Just for kicks I also ran 256 curves in parallel (i.e., 224 + 32). This last was, as expected, much worse than all the others. Also expected was the essentially constant cpu overhead. What was not expected was the GPU-ECM chose very nearly the [i]worst[/i] number of curves to run in parallel. My card differs from yours in that reducing the number of curves by 32 gave poorer performance. The best result, with a 13% performance increase, comes from running 672 (= 3 * 224) curves in parallel. The moral, I suppose, is that one should run performance tests before putting a card into production use. Yes, I know I should have done that long ago but I must have run out of round tuits back in the day. Paul |
| All times are UTC. The time now is 22:06. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.