![]() |
[QUOTE=wombatman;379236]If anybody finds a large factor (25-35 digits?) with a known sigma from a composite larger than 2^1018-1, please let me know.[/QUOTE]From an email received just short of 10 years ago (9 August 2004 to be precise):
[code] GMP-ECM 5.0.3 [powered by GMP 4.1.3] [ECM] Input number is 791395160180513434925493302042988391765189206021240027609905182633953874141701415568465466331240326981326923973788166276499799514661251096178510399218287572287099081763160076327898905016494232924287370411286185115460579419942236967762802195925225659431940912994144487907043366577681908978575539552129317826543837057074224429283959629605570832177613666749841648425748727472212410659081090982546999644806715165903661973226553262903054634522680493789572092376708237522695533293372981915261383302824644696323709130493657087 (519 digits) Using B1=1000000, B2=839549780, polynomial Dickson(6), sigma=1743901906 Step 1 took 179265ms Step 2 took 83297ms ********** Factor found in step 2: 147380237642809197871546843418239 Found probable prime factor of 33 digits: 147380237642809197871546843418239 Composite cofactor 5369750875952168414443816420699332431795820624727865885821034376732136830156150321744936437728716035183053122667573256784267448160985961215791775970975869884330782622104150907555736955403408627414489081834152723633862437836650013891244407645622806325285053174011039418524274630141893462696783549120812199818549229694139352185629606669322896747319898095601307656723865978708079670509098659082110655805803426884366044802794880808466077343593119322766188996763096027433864806175462275976833 has 487 digits Jo Yeong Uk [/code] |
Or (as already hinted a bit earlier),
[URL="http://mersenneforum.org/showpost.php?p=209764&postcount=9"]F12's factor[/URL] with sigma=1428526317. Quite a nice catch, that was. |
Thanks for those, guys. I'm going to do some systematic testing of both base 2 and non-base 2 numbers and see if I can pin down at least a pattern of what's happening. Maybe that will help me figure out what might need changing.
|
[QUOTE=Batalov;379288]Or (as already hinted a bit earlier),
[URL="http://mersenneforum.org/showpost.php?p=209764&postcount=9"]F12's factor[/URL] with sigma=1428526317. Quite a nice catch, that was.[/QUOTE] Or also the 36-digit factor 775608719589345260583891023073879169 of the hundred-thousands-digit 19th Sylvester number (see [URL="http://primerecords.dk/sylvester-factors.htm"]http://primerecords.dk/sylvester-factors.htm[/URL]) found with sigma=787582611. Note sure this 106721-digit number is suitable for your tests :wink: [CODE]GMP-ECM 6.4.3 [configured with GMP 5.0.5, --enable-asm-redc] [ECM] Running on dmi-t5500-cc Input number is 1166841411...6400110443 (106721 digits) Using REDC Using B1=250000, B2=183032866, polynomial Dickson(3), sigma=787582611 dF=2880, k=2, d=30030, d2=17, i0=-8 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 4550 64790 1126804 2.3e+07 5.3e+08 1.4e+10 2.1e+13 3.1e+18 4.1e+23 Inf Step 1 took 24318335ms Estimated memory usage: 4627M Initializing tables of differences for F took 59063ms Computing roots of F took 496365ms Building F from its roots took 351980ms Computing 1/F took 114670ms Initializing table of differences for G took 65536ms Computing roots of G took 423292ms Building G from its roots took 352247ms Computing roots of G took 425429ms Building G from its roots took 351820ms Computing G * H took 82705ms Reducing G * H mod F took 113569ms Computing polyeval(F,G) took 590871ms Computing product of all F(g_i) took 26225ms Step 2 took 3454012ms ********** Factor found in step 2: 775608719589345260583891023073879169 Found probable prime factor of 36 digits: 775608719589345260583891023073879169 [/CODE] |
[QUOTE=Ralf Recker;289402]CC 2.0 card (GTX 470, stock clocks), 512 bit arithmetic, CUDA SDK 4.0. The c151 was taken from the Aliquot sequence 890460:i898
[CODE]ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm -d 0 -save c151.save 250000 < c151 Precomputation of s took 0.004s Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, firstinvd=24351435, with 448 curves gpu_ecm took : 116.363s (0.000+116.355+0.008) Throughput : 3.850[/CODE]Doubling the number of curves improves the throughput: [CODE]ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm -d 0 -n 896 -save c151.save 250000 < c151 Precomputation of s took 0.004s Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, firstinvd=1471710578, with 896 curves gpu_ecm took : 179.747s (0.000+179.731+0.016) Throughput : 4.985[/CODE]32 curves less and the throughput increases by another 30% [CODE]ralf@quadriga:~/dev/gpu_ecm$ LD_LIBRARY_PATH=/usr/local/cuda/lib64/ ./gpu_ecm -d 0 -n 864 -save c151.save 250000 < c151 Precomputation of s took 0.004s Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, firstinvd=1374804691, with 864 curves gpu_ecm took : 130.964s (0.000+130.948+0.016)[/CODE]Throughput : 6.597 [/QUOTE] CC 5.2 card (GTX 970), 512 bit arithmetic, CUDA SDK 6.5.19. The same c151 as above. Compiled for sm_50: [CODE]GMP-ECM 7.0-dev [configured with GMP 6.0.0, --enable-asm-redc, --enable-gpu, --enable-assert, --enable-openmp] [ECM] Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, B2=128992510, sigma=3:2693078490-3:2693080153 (1664 curves) Computing 1664 Step 1 took 3056ms of CPU time / 45778ms of GPU time[/CODE]Throughput is 36.349 CC 5.2 card (GTX 970), 512 bit arithmetic, CUDA SDK 6.5.19. The same c151 as above. Compiled for sm_52: [CODE] GMP-ECM 7.0-dev [configured with GMP 6.0.0, --enable-asm-redc, --enable-gpu, --enable-assert, --enable-openmp] [ECM] Input number is 4355109846524047003246531292211765742521128216321735054909228664961069056051308281896789359834792526662067203883345116753066761522281210568477760081509 (151 digits) Using B1=250000, B2=128992510, sigma=3:1776416363-3:1776418026 (1664 curves) Computing 1664 Step 1 took 3128ms of CPU time / 46272ms of GPU time[/CODE]Throughput is 35.961 1024 bit arithmetic / sm_50 : [CODE]Computing 1664 Step 1 took 3052ms of CPU time / 128693ms of GPU time[/CODE]1024 bit arithmetic / sm_52 : [CODE]Computing 1664 Step 1 took 3116ms of CPU time / 140051ms of GPU time[/CODE] |
Ralf: thanks for posting your data. It prompted me to do something similar for my 460. This card has 448 CUDA cores and GMP-ECM believes that the best number of curves to run in parallel is 224. Timing data for a C242 and B1=100K tells a different story. First the tabulated data
[code] gpucurves ms cpu ms gpu ms/curve ----------------------------------- 224 1070 48104 214.75 256 950 89919 351.25 384 960 90021 234.43 416 890 89835 215.95 448 920 90126 201.27 608 960 127456 209.63 640 920 127450 199.14 672 950 127526 189.77 ** 832 1040 179530 215.78 864 990 179556 207.82 896 970 179649 200.50 [/code] Taking a hint from you I ran integer multiples of 224 and the same decreased by 32 and 64. Just for kicks I also ran 256 curves in parallel (i.e., 224 + 32). This last was, as expected, much worse than all the others. Also expected was the essentially constant cpu overhead. What was not expected was the GPU-ECM chose very nearly the [i]worst[/i] number of curves to run in parallel. My card differs from yours in that reducing the number of curves by 32 gave poorer performance. The best result, with a 13% performance increase, comes from running 672 (= 3 * 224) curves in parallel. The moral, I suppose, is that one should run performance tests before putting a card into production use. Yes, I know I should have done that long ago but I must have run out of round tuits back in the day. Paul |
Oh, this thread is certainly alive.
Is there a page with gmp-ecm CUDA binaries for Windows? |
Hi, would this work for 69 * 2 ^ n - 1 ?
I would want to give it a shot at n > 4 million if there is any hope it can work for it efficiently... Thanks, Vincent |
GPU-ECM is limited to 2^1018-1 (~300 digits).
|
Just got a new 750 Ti. I am trying to get ecm_gpu to work but it keeps on crashing for me. It seems to work with very small values of B1(2000 is around the peak). Is there something I am missing? mfaktc works.
I am using the binary in post 243. Is it possible that the binary uses cpu instructions I don't have on my core2? I would guess it wouldn't work at all if that was the case. |
[QUOTE=henryzz;390927]Just got a new 750 Ti. I am trying to get ecm_gpu to work but it keeps on crashing for me. It seems to work with very small values of B1(2000 is around the peak). Is there something I am missing? mfaktc works.
I am using the binary in post 243. Is it possible that the binary uses cpu instructions I don't have on my core2? I would guess it wouldn't work at all if that was the case.[/QUOTE]Have you tried rebuilding from a completely clean distro? Sometimes it's too easy to screw up the config options. |
[QUOTE=xilman;390937]Have you tried rebuilding from a completely clean distro? Sometimes it's too easy to screw up the config options.[/QUOTE]
It wasn't me that compiled the binary. mfaktc worked out of the box. Do I need to compile the gmp-ecm binary myself? Just realized I never specified that I was working on windows apart from the binary I referenced. If there are instructions somewhere on the forum for installing cuda for linux that would be useful for the occasion I am on there. Windows is what I am really bothered with though. |
Henry, that binary I made probably is using instructions that the core2 doesn't have. It was compiled in VS2012 on an Ivy Bridge-E.
|
[QUOTE=wombatman;390946]Henry, that binary I made probably is using instructions that the core2 doesn't have. It was compiled in VS2012 on an Ivy Bridge-E.[/QUOTE]
I will attempt to set up VS2012 Express with CUDA tomorrow. I haven't had much success getting MPIR and GMP_ECM working well together under VS in the past. Hopefully it is easier now. |
I found the MPIR build to be very simple to run (this was version 2.7.0). That is, I was able to open the solution file and pretty much build straight away without any major changes. Setting up the GPU_ECM project is a little more involved in that you have to change a number of things (paths to MPIR, etc), but isn't too bad.
|
Getting mpir and ecm to compile was easy. I still haven't managed to get the gpu version to compile without error.
Getting various errors among the standard headers for some reason. Is anyone able to compile a gpu version of ecm and msieve that should work on my system? (750 Ti and Q6600) It seems something is screwed up on my system. |
Has anyone tested if ecm-gpu works correctly if you reduce ECM_GPU_SIZE_DIGIT to 16 bits (the default's 32)? Or got any performance figures for it?
I've searched the ecm-discuss mailing list archives but couldn't find any mention of it. Chris |
[QUOTE=chris2be8;400801]Has anyone tested if ecm-gpu works correctly if you reduce ECM_GPU_SIZE_DIGIT to 16 bits (the default's 32)? Or got any performance figures for it?[/QUOTE]I haven't, though I've experimented with 512 and 2048 bit arithmetic.
Why do you think it may be useful? |
[QUOTE=xilman;400805] Why do you think it may be useful?[/QUOTE]
I'm using my GPU for ECM pre-testing of GNFS targets around 100 digits. At that size it takes the GPU nearly as long to do stage 1 as it takes 1 core to do stage 2. So when I have done T30 and start T35 I end up with the CPU waiting for the GPU to finish stage 1 to the higher B1. I've been trying to avoid that by only increasing B1 by about 40% between steps, but that's still too large and makes the script messy. Speeding up stage 1 on the GPU would be a much better option. Saving electricity would be nice too. Chris |
I just tried building ECM_GPU with the change you requested. It compiled without issue, but it fails the basic check that I was provided a while back. So, at the very least, doing a 512 bit version isn't a simple change.
|
[QUOTE=chris2be8;400899]I'm using my GPU for ECM pre-testing of GNFS targets around 100 digits. At that size it takes the GPU nearly as long to do stage 1 as it takes 1 core to do stage 2. So when I have done T30 and start T35 I end up with the CPU waiting for the GPU to finish stage 1 to the higher B1.
I've been trying to avoid that by only increasing B1 by about 40% between steps, but that's still too large and makes the script messy. Speeding up stage 1 on the GPU would be a much better option. Saving electricity would be nice too. Chris[/QUOTE] If stage1 & stage2 take the same amount of time you should just increase B1, or am I missing something? |
It's time doing stage on my GPU vs time doing stage 2 on my CPU that's the issue. The GPU takes the same time for any size of number while the CPU is faster for smaller numbers. So for smallish GNFS targets it's difficult to ensure the CPU never has to wait for the GPU to finish a set of stage 1 curves before it can start stage 2.
So if I run T30 curves with B1=25e4 and then start T35 with B1=1e6 the CPU will finish stage 2 on the last set from T30 long before the first set for T35 is ready. I can get round it by increasing B1 by no more than 30% at a time, but that's rather messy to code. A version that runs stage 1 faster on smaller numbers would be a lot easier to handle. But if it won't work I'll have to live with what I've got. Thanks for the information that it needs more than a simple change Wombatman. Chris |
[QUOTE=chris2be8;400899]I'm using my GPU for ECM pre-testing of GNFS targets around 100 digits. At that size it takes the GPU nearly as long to do stage 1 as it takes 1 core to do stage 2. So when I have done T30 and start T35 I end up with the CPU waiting for the GPU to finish stage 1 to the higher B1.[/quote]
If you get the GNFS targets in batches, it's worth multiplying them together into larger chunks before doing GPU-ECM with the current version; at 100 digits you might well be able to take a product of three. |
[QUOTE=chris2be8;400971]It's time doing stage on my GPU vs time doing stage 2 on my CPU that's the issue. The GPU takes the same time for any size of number while the CPU is faster for smaller numbers. So for smallish GNFS targets it's difficult to ensure the CPU never has to wait for the GPU to finish a set of stage 1 curves before it can start stage 2.
So if I run T30 curves with B1=25e4 and then start T35 with B1=1e6 the CPU will finish stage 2 on the last set from T30 long before the first set for T35 is ready. I can get round it by increasing B1 by no more than 30% at a time, but that's rather messy to code. A version that runs stage 1 faster on smaller numbers would be a lot easier to handle. But if it won't work I'll have to live with what I've got. Thanks for the information that it needs more than a simple change Wombatman. Chris[/QUOTE] Aha, OK. You want to minimize wall-clock-time rather than optimize the efficiency of hardware utilization. That makes sense if work/wall-time is your main concern, and you have no other work for your CPU. I was thinking in terms of work/hardware-time. I think many people would like a version with other than 1024 bit arithmetic. I sure would. |
[QUOTE=fivemack;400972]If you get the GNFS targets in batches, it's worth multiplying them together into larger chunks before doing GPU-ECM with the current version; at 100 digits you might well be able to take a product of three.[/QUOTE]That's exactly what I have been doing, though my GNFS targets were around 150 digits so I was running ECM on them in pairs.
Paul |
[QUOTE=fivemack;400972]If you get the GNFS targets in batches, it's worth multiplying them together into larger chunks before doing GPU-ECM with the current version; at 100 digits you might well be able to take a product of three.[/QUOTE]
Will that make stage 2 slower overall? I'm trying to get the most work out of the CPU, GPU time is nearly free for me. And it'll be a fair job to update my scripts to handle numbers in parallel. Chris |
GPU test fails with "Error cuda : too many resources requested for launch."
Good evening,
I have just downloaded the latest version of ecm and compiled it with --enable-gpu=sm_21 in order to try to run stage1 on my Nvidia GeForce GT 525M. It compiles without issues and all the ECM tests pass during "make check". However when testing the GPU with "./test.gpuecm ./ecm" I get the following CUDA error: [QUOTE]$./test.gpuecm ./ecm GMP-ECM 7.0-dev [configured with GMP 6.0.0, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM] Input number is 458903930815802071188998938170281707063809443792768383215233 (60 digits) Using B1=125, B2=0, sigma=3:227-3:258 (32 curves) cudakernel.cu(216) : Error cuda : too many resources requested for launch.[/QUOTE]When I launch a manual test with "./ecm -v -gpu 125" I get the following more verbose output ending with the same error message: [QUOTE]$ ./ecm -v -gpu 125 GMP-ECM 7.0-dev [configured with GMP 6.0.0, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM] Running on blackbox 458903930815802071188998938170281707063809443792768383215233 Input number is 458903930815802071188998938170281707063809443792768383215233 (60 digits) Using MODMULN [mulredc:0, sqrredc:1] Computing batch product (of 176 bits) of primes below B1=125 took 0ms GPU: compiled for a NVIDIA GPU with compute capability 2.1. GPU: will use device 0: GeForce GT 525M, compute capability 2.1, 2 MPs. GPU: Selection and initialization of the device took 14ms Using B1=125, B2=2706, sigma=3:2586393407-3:2586393470 (64 curves) dF=8, k=6, d=60, d2=7, i0=-4 Expected number of curves to find a factor of n digits: 35 40 45 50 55 60 65 70 75 80 Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf cudakernel.cu(216) : Error cuda : too many resources requested for launch.[/QUOTE]After some googling I seems the issue is that the cuda kernel is using too much registers/variables/ressources for my card to handle. Unfortunately I don't know if this is the expected behaviour on my card or how to fix it. Any help will be greatly appreciated. I tried to reduce the number of simultaneous curves with the -gpucurves parameter hoping it would reduce the ressources to something acceptable. It turns out that the minimum value is 32 and I still get the same error. In case it's helpful here is my version of nvcc: [QUOTE]$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2015 NVIDIA Corporation Built on Mon_Feb_16_22:59:02_CST_2015 Cuda compilation tools, release 7.0, V7.0.27[/QUOTE] |
I can't help you with that problem, but I think you should be using either -gpucurves 48 or 96.
48 if you want the system to be somewhat responsive. |
[QUOTE=lorgix;402904 but I think you should be using either -gpucurves 48 or 96.[/QUOTE]
Thank you for your answer, why are those two values the only one I should be using ? I thought the program automatically selects the maximum that the card can handle. On your hardware it may well be 96, but on mine it seems to be 64 and as I said any curve number below 32 gets adjusted to 32. Any number strictly above 32 gets adjusted to 64, so I guess those two values are the only possible ones for me. Regardless, I would be very glad for information with which I could diagnose the problem further, if there is a problem at all (maybe my card can't handle the provided cudakernel and this behaviour is expected). I could write to the dedicated mailing list at gforge.inria.fr but I don't want to bother developers if the issue is trivial and an answer can be found here. |
[QUOTE=Singularity;403250]Thank you for your answer, why are those two values the only one I should be using ? I thought the program automatically selects the maximum that the card can handle. On your hardware it may well be 96, but on mine it seems to be 64 and as I said any curve number below 32 gets adjusted to 32. Any number strictly above 32 gets adjusted to 64, so I guess those two values are the only possible ones for me.
Regardless, I would be very glad for information with which I could diagnose the problem further, if there is a problem at all (maybe my card can't handle the provided cudakernel and this behaviour is expected). I could write to the dedicated mailing list at gforge.inria.fr but I don't want to bother developers if the issue is trivial and an answer can be found here.[/QUOTE] It should select the maximum if you don't tell it otherwise. I thought GT525M had two "blocks" of 48 cores, but maybe it has three "blocks" of 32 like you say. From what I can see it has 48 cores per SM, and in my experience that decides what number of concurrent curves makes the most sense. But I'm not 100% on this. |
Anybody know how to resolve this error:
[CODE]unresolved external symbol mpn_mul_fft referenced in function __ecm_mpres_mul[/CODE] It seems to be caused by MPIR not having "mpn_mul_fft" defined or mapped or something. But MPIR works so well on windows. |
[QUOTE=wombatman;403386]Anybody know how to resolve this error:
[CODE]unresolved external symbol mpn_mul_fft referenced in function __ecm_mpres_mul[/CODE] It seems to be caused by MPIR not having "mpn_mul_fft" defined or mapped or something. But MPIR works so well on windows.[/QUOTE] MPIR does have this function and, since GMP-ECM builds fine with MSVC, this seems to be a problem specific to your build environment. How are you building GMP-ECM? |
Building in VS 2012 with MPIR 2.7.0 (also built in VS 2012) as 64-bit using CUDA 7.0.
The definition is supposed to be here: [CODE]#if defined( __MPIR_RELEASE ) && __MPIR_RELEASE >= 20600 #if __MPIR_RELEASE == 20600 #error MPIR 2.6 does not support GMP-ECM, please use an alternative version #endif /* WARNING - the following two defintions map the internal interface of the new FFT in MPIR 2.6 (and later) to the GMP FFT interface - they work in this context but the parameters for mpn_fft_next_size and fft_adjust_limbs have different semantics, which means that these definitions may fail if used in other circumstances */ # define mpn_fft_best_k(n, k) (0) # define mpn_fft_next_size(n, k) fft_adjust_limbs(n) #else #define mpn_mul_fft __gmpn_mul_fft mp_limb_t __gmpn_mul_fft (mp_ptr, mp_size_t, mp_srcptr, mp_size_t, mp_srcptr, mp_size_t, int); #define mpn_mulmod_bnm1 __gmpn_mulmod_bnm1 void mpn_mulmod_bnm1 (mp_ptr, mp_size_t, mp_srcptr, mp_size_t, mp_srcptr, mp_size_t, mp_ptr); #define mpn_mul_fft_full __gmpn_mul_fft_full void __gmpn_mul_fft_full (mp_ptr, mp_srcptr, mp_size_t, mp_srcptr, mp_size_t); #define mpn_fft_next_size __gmpn_fft_next_size mp_size_t __gmpn_fft_next_size (mp_size_t, int); #define mpn_mulmod_bnm1_next_size __gmpn_mulmod_bnm1_next_size mp_size_t mpn_mulmod_bnm1_next_size (mp_size_t); #define mpn_fft_best_k __gmpn_fft_best_k int __gmpn_fft_best_k (mp_size_t, int); #endif[/CODE] But I think it's not hitting the else like it should? |
[QUOTE=wombatman;403536]Building in VS 2012 with MPIR 2.7.0 (also built in VS 2012) as 64-bit using CUDA 7.0.
The definition is supposed to be here: [CODE]#if defined( __MPIR_RELEASE ) && __MPIR_RELEASE >= 20600 #if __MPIR_RELEASE == 20600 #error MPIR 2.6 does not support GMP-ECM, please use an alternative version #endif /* WARNING - the following two defintions map the internal interface of the new FFT in MPIR 2.6 (and later) to the GMP FFT interface - they work in this context but the parameters for mpn_fft_next_size and fft_adjust_limbs have different semantics, which means that these definitions may fail if used in other circumstances */ # define mpn_fft_best_k(n, k) (0) # define mpn_fft_next_size(n, k) fft_adjust_limbs(n) #else #define mpn_mul_fft __gmpn_mul_fft mp_limb_t __gmpn_mul_fft (mp_ptr, mp_size_t, mp_srcptr, mp_size_t, mp_srcptr, mp_size_t, int); #define mpn_mulmod_bnm1 __gmpn_mulmod_bnm1 void mpn_mulmod_bnm1 (mp_ptr, mp_size_t, mp_srcptr, mp_size_t, mp_srcptr, mp_size_t, mp_ptr); #define mpn_mul_fft_full __gmpn_mul_fft_full void __gmpn_mul_fft_full (mp_ptr, mp_srcptr, mp_size_t, mp_srcptr, mp_size_t); #define mpn_fft_next_size __gmpn_fft_next_size mp_size_t __gmpn_fft_next_size (mp_size_t, int); #define mpn_mulmod_bnm1_next_size __gmpn_mulmod_bnm1_next_size mp_size_t mpn_mulmod_bnm1_next_size (mp_size_t); #define mpn_fft_best_k __gmpn_fft_best_k int __gmpn_fft_best_k (mp_size_t, int); #endif[/CODE] But I think it's not hitting the else like it should?[/QUOTE] No, that is not the problem, as it shouold NOT take the else branch for MPIR 2.6 and later versions. The missing function declaration (mpn_mul_fft) should be picked up from gmp.h here: [CODE] #define mpn_mul_fft __MPN(mul_fft) __GMP_DECLSPEC int mpn_mul_fft __GMP_PROTO((mp_ptr rp, mp_size_t rn, mp_srcptr ap, mp_size_t an, mp_srcptr bp, mp_size_t bn, int k)); [/CODE] so we need to work out why this declaration is not being seen (or is failing in some way) during the build of GMP-ECM. |
I went into my MPIR 2.7.0 folder and found gmp.h. I removed the one in the GMP-ECM VS solution and added that one. That gmp.h doesn't have mpn_mul_fft either.
This is the only bit found when searching gmp.h for "mpn_mul_fft": [CODE]/**************** MPN API for FFT ****************/ #define mpn_mul_fft_main __MPN(mul_fft_main) __GMP_DECLSPEC void mpn_mul_fft_main __GMP_PROTO ((mp_ptr r1, mp_srcptr i1, mp_size_t n1, mp_srcptr i2, mp_size_t n2));[/CODE] |
[QUOTE=wombatman;403572]I went into my MPIR 2.7.0 folder and found gmp.h. I removed the one in the GMP-ECM VS solution and added that one. That gmp.h doesn't have mpn_mul_fft either.
This is the only bit found when searching gmp.h for "mpn_mul_fft": [CODE]/**************** MPN API for FFT ****************/ #define mpn_mul_fft_main __MPN(mul_fft_main) __GMP_DECLSPEC void mpn_mul_fft_main __GMP_PROTO ((mp_ptr r1, mp_srcptr i1, mp_size_t n1, mp_srcptr i2, mp_size_t n2));[/CODE][/QUOTE] If gmp.h doesn't have a declaration of mpn_mul_fft, something has gone wrong when MPIR was built. During an MPIR build, the files mpir.h and gmp.h are automatically generated from the file gmp_h.in in the MPIR root directory. If gmp_h.in includes the declaation of mpn_mul_fft, (check this, as it should do so), then it seems that the gmp.h and mpir.h files are not being written correctly when MPIR is built. It would make sense to do a completely clean build of MPIR to see if these files are being generated properly. |
Will do, and I'll post again when I've had time to do it and see if it works. Thanks for the suggestions.
[STRIKE]Edit: Just checked gmp_h.in from my MPIR 2.7.0 source folder, and it only has the "mpn_mul_fft_main" definition as posted above. The file is showing a last modified date of 4-2-2014. Here's the header:[/STRIKE] Edit 2: Seems I didn't have a recent enough version. I downloaded the alpha 12 version of 2.7.0 and it has the mpn_mul_fft defined in gmp_h.in. I'll work on rebuilding the GPU-ECM and see it finds everything now. Edit 3: The mpn_mul_fft unresolved symbols are taken care of. Now I've got [CODE]LNK2001: unresolved external symbol __gmp_bits_per_limb[/CODE] This exists in the gmp.h that's being included, so I'm not sure what the issue is. |
[QUOTE=wombatman;403594]Will do, and I'll post again when I've had time to do it and see if it works. Thanks for the suggestions.
[STRIKE]Edit: Just checked gmp_h.in from my MPIR 2.7.0 source folder, and it only has the "mpn_mul_fft_main" definition as posted above. The file is showing a last modified date of 4-2-2014. Here's the header:[/STRIKE] Edit 2: Seems I didn't have a recent enough version. I downloaded the alpha 12 version of 2.7.0 and it has the mpn_mul_fft defined in gmp_h.in. I'll work on rebuilding the GPU-ECM and see it finds everything now. Edit 3: The mpn_mul_fft unresolved symbols are taken care of. Now I've got [CODE]LNK2001: unresolved external symbol __gmp_bits_per_limb[/CODE] This exists in the gmp.h that's being included, so I'm not sure what the issue is.[/QUOTE] The symbol mp_bits_per_limb is redfined as __gmp_bits_per_limb and is supplied in the MPIR file mp_bpl.c The link failure is odd since I don't think this symbol is used by GMP-ECM anyway (maybe I am wrong about this). If It is used it should be possible to use GMP_LIMB_BITS instead. |
So to be clear, anywhere "mp_bits_per_limb" is used, I should try substituting GMP_LIMB_BITS? It looks like mp_bits_per_limb only shows up once.
Edit: Felt saucy, so I went ahead and changed the one line where mp_bits_per_limb shows up from [CODE]if (mp_bits_per_limb != GMP_NUMB_BITS)[/CODE] to [CODE]if (GMP_LIMB_BITS != GMP_NUMB_BITS)[/CODE] This allows it to compile, although it now needs mpir.dll to be present, which I don't remember needing before. Regardless, it also throws a c0000005 exception when trying to use test.gpuecm. Some progress, though! |
[QUOTE=wombatman;403635]So to be clear, anywhere "mp_bits_per_limb" is used, I should try substituting GMP_LIMB_BITS? It looks like mp_bits_per_limb only shows up once.[/QUOTE]
Can you let me know where it shows up before I answer that? |
Heh, we managed to cross-post a bit. See my edit.
Oh, and the line I changed is found in main.c of gmp-ecm. |
[QUOTE=wombatman;403637]Heh, we managed to cross-post a bit. See my edit.
Oh, and the line I changed is found in main.c of gmp-ecm.[/QUOTE] For some reason the Visual Studio search I did didn't find this (I probably set the wrong search space). But now that you have found a use it is clear that this cannot be substituted in the way I suggested as it appears to need a run-time check rather than a compile time one. So, I am afraid that we need to work out why this symbol is not available to the link process. Do you know how to use DUMP to inspect the MPIR library binary to determine if the missing symbol is present? It would be useful to know if it is not present or whether it is present but is just not being seen. |
I don't. Can you point me to a tutorial or how-to? I'll look on my own as well, but if you have a good resource, I'd be happy to look at that too.
|
Ok, another step forward. I realized that I had built the mpir dll AFTER building the lib. These dump into the same directory by default, and the mpir.lib file was being overwritten with something that didn't have all that was needed. I rebuilt just the mpir.lib file, and GPU-ECM compiled flawlessly.
Upon testing, though, I get a c0000374 error: [CODE]0xC0000374: A heap has been corrupted (parameters: 0x0000000076FEB4B0).[/CODE] This happens after Step 1 is finished: [CODE]Input number is 458903930815802071188998938170281707063809443792768383215233 (60 digits) Using B1=125, B2=0, sigma=3:227-3:242 (16 curves) Block: 64x16x1 Grid: 1x1x1 Starting iterations = 175 Computing 16 Step 1 took 15ms of CPU time / 369ms of GPU time (Errors here)[/CODE] I'll keep playing around with it, but I wanted to update. |
[QUOTE=wombatman;403645]Ok, another step forward. I realized that I had built the mpir dll AFTER building the lib. These dump into the same directory by default, and the mpir.lib file was being overwritten with something that didn't have all that was needed. I rebuilt just the mpir.lib file, and GPU-ECM compiled flawlessly.[/QUOTE]
Yes, the build files for VS 2012 are a long way out of date and do not separate the DLL and Library outputs. The 2012 build is no longer being maintained so it is not all that surprising that there are issues in using these to build MPIR 2.7.0 You will have a lot less trouble with the Visual Studio 2013 buid files, which leads me to wonder why you need to use VS 2012? |
No particular need. Just had VS 2012 :smile:
I'll see about getting some version of VS 2013 installed. |
[QUOTE=wombatman;403652]No particular need. Just had VS 2012 :smile:
I'll see about getting some version of VS 2013 installed.[/QUOTE] Visual Studio 2013 Community is free for open source development and has all the features of the professsional version. It supports the NVIDIA 'Nsight for Visual Studio' add-on that is needed for GPU development. This is the version I am using to maintain the MSVC build files for both MPIR and GMP-ECM so it is pretty likely to work 'out of the box'. I also support MPIR builds using Visual Studio 2015 Community RC but the NVIDIA add-on is not yet available for this version so the GPU build for GMP-ECM cannot be built with this. MPIR development has been limited recently because Jason Moxham, our brilliant lead assembler developer for x86_64, sadly passed away. But Bill Hart has jusst secured financial support for hiring a full time developer for a year to work on MPIR assembler optimisation. And I will be working to ensure that such developmentts are available on WIndows using the native development tools (i.e. the MS and Intel compilers). |
Apologies for taking so long to get around to this, but I've just tried building the latest GPU-ECM SVN with VS2013 (Ultimate? Dunno, but it was free from Microsoft).
I rebuilt MPIR (both the dll and lib files for sandybridge/ivybridge) and that worked flawlessly. I'm now working on GPU-ECM. I pointed the include directories to the lib/ folder of MPIR. libecm_gpu builds without an issue. ecm_gpu, though, still throws this error: [CODE]LNK2019: unresolved external symbol mpn_mul_fft referenced in function __ecm_mpres_mul (mpmod.obj) LNK2001: unresolved external symbol mpn_mul_fft (schoen_strass.obj)[/CODE] Any ideas? :smile: |
The symbol mpn_mul_fft is redefined in gmp.h (or mpir.h) as __gmpn_mul_fft so the fact that it turns up without being redefined during the link stage of the build suggests that gmp.h or mpir.h is not being included properly in the build (or the wrong version of the header is being included).
__ecm_mpres_mul is a redefinition of mpres_mul(...) in the file mpmod.c in gmp-ecm. If you open this file in VS 2013 and locate line 1443 where mpn_mul_fft(...) is called, you can use Visual Studio to goto the definition of mpn_mul_fft and this should take you to the line: #define mpn_mul_fft __MPN(mul_fft) in gmp.h - does this work for you? I am also a bit puzzled why you need to change the location of include directories as I have these already set up in the build. I should also mention that building ecm_gpu with Visual Studio 2013 requires that Visual Studio 2012 is also installeld because the NVIDIA CUDA tools require the earlier Microsoft compiler. |
It does not bring up gmp.h, no.
The only reference I can find (aside from the mpmod and schoen .c files) is from ecm-gmp.h: [CODE]#define mpn_mul_fft __gmpn_mul_fft mp_limb_t __gmpn_mul_fft (mp_ptr, mp_size_t, mp_srcptr, mp_size_t, mp_srcptr, mp_size_t, int);[/CODE] So I think you're right that gmp.h is not being included properly or something. As for the directories, I had to change it because I'm a bit unorganized and thus my MPIR folder is in a pretty different place than ECM-GPU. That's all. :smile: And I do have VS2012 still installed, so that shouldn't be an issue. |
I also checked to make sure that gmp.h was included in both libecm_gpu and ecm_gpu. It is, and it's the file from the MPIR 2.7.0 VS2013 build. This is the only thing that comes up when searching for mpn_mul_fft:
[CODE]#define mpn_mul_fft_main __MPN(mul_fft_main) __GMP_DECLSPEC void mpn_mul_fft_main __GMP_PROTO ((mp_ptr r1, mp_srcptr i1, mp_size_t n1, mp_srcptr i2, mp_size_t n2));[/CODE] So it seems I don't have the right gmp.h? |
Is the gmp.h file listed in the project solution the correct one (i.e if you open it does it open the correct version - the one in your MPIR location?). I am assuming that you have changed the MPIR library location in the link stage to your library location.
|
It is, yes. Although when I look at the "Date Modified" on the gmp.h file, it's 5/8/2014 while the mpir.lib file is from when I actually built it. Does that sound right, or should the gmp.h be built every time as well? Is there another way to do the rebuild on gmp.h (such as deleting it from the folder)?
|
The gmp.h file is only updated if it has changed so it can be a lot older than the binary. What we need to discover is why the file mpmod.c in ecm_gpu doesn't see gmp.h. Since this occurs when the libecm_gpu project is built, it might make sense to change the implicit inclusion of gmp.h in this project to an explicit one to see if this makes any difference. To do this expand the 'External Dependencies' item for the libecm_gpu project in the Visual Studio Solution Explorer, locate and right click on the gmp.h file and select 'include in Project'. After doing this open the gmp.h file and check that it has opened the right one. Then try to rebuild the two projects.
|
That's actually what I did to make sure gmp.h was being included for both projects. Rebuilding still gives the same error.
[STRIKE]My concern is that I don't see the mpn_ful_fft line in gmp.h that should be there (#define mpn_mul_fft __MPN(mul_fft)). Instead, I only have this:[/STRIKE] [CODE]#define mpn_mul_fft_main __MPN(mul_fft_main) __GMP_DECLSPEC void mpn_mul_fft_main __GMP_PROTO ((mp_ptr r1, mp_srcptr i1, mp_size_t n1, mp_srcptr i2, mp_size_t n2));[/CODE] [STRIKE]That's why I was wondering if I don't have the correct gmp.h file built.[/STRIKE] Edit: I just went back to the MPIR site and found that there's a later version than what I have. I had MPIR 2.7.0 but it was from well before June 26, 2015. So I've download that and rebuilt the files, and now gmp.h has the appropriate mpn_ful_fft. I haven't built GPU-ECM yet, but I suspect it will work now. I'll follow up shortly. Edit 2: Well, ECM-GPU builds now (using CUDA 7.0 and VS2013), but it still errors on the first test in test.gpuecm. |
The error thrown, by the way, is this: [CODE]Unhandled exception at 0x0000000140012DAF in ecm_gpu.exe: 0xC0000005: Access violation reading location 0x0000000032B7FFF8.[/CODE]
at line 32 [CODE]MPN_SIZEINBASE (result, PTR(x), ABSIZ(x), base);[/CODE] in sizeinbase.c. |
Yes, I can also see some test failures. These tests were working before I updated to CUDA 7.0 so it would seem that this update may be problematic.
|
Ok. I'll play around with 6.0 and 6.5 and see if they work for me. Thanks for all your help (and tolerating my idiocy :bangheadonwall:) with this.
|
6.0 doesn't work with VS2013. 6.5 compiles fine, but has the same error as 7.0.
The error I'm getting is consistent. It pops up in cudawrapper.c at Line 350: [CODE]mpres_init (P.y, modulus);[/CODE] Note that mpres_init(P.x, modulus); seems to be fine. As best I can tell, P.y isn't passing in properly or something. Weird. |
One last correction. Initialization of P.x is not working. The error just shows up as the next line. So maybe something with mpres_init?
|
This appears to be a compiler issue of some kind. If I compile ecm_gpu in debug mode all the tests then pass.
|
I have found the problem. This line in cudawrapper.c:
[CODE] ASSERT_ALWAYS (mpmod_init (modulus, n, repr) == 0); [/CODE] becomes: [CODE] assert(mpmod_init (modulus, n, repr) == 0); [/CODE] and will be completely eliminated when NDEBUG is true, as it will be in release mode. As a result the values in modulus never get initialised. This following quick fix removes the problem: [CODE] #ifdef _MSC_VER { int ret; ret = mpmod_init (modulus, n, repr); ASSERT_ALWAYS(ret == 0); } #else ASSERT_ALWAYS (mpmod_init (modulus, n, repr) == 0); #endif [/CODE] I'll ask the GMP-ECM team about a more permanent fix for this. |
That seems to have worked in terms of fixing the crash. Now getting this:
[CODE]cudakernel.cu(219) : Error cuda : an illegal memory access was encountered.[/CODE] |
I don't see this I'm afraid - the GPU tests now pass for me with CUDA 7.0.
Since this problem seems to be CUDA related, I imagine that our results are different because we are running different GPUs (I am running an NVIDIA Quadro K2000M). Maybe you need to comiple with different architecture/capability settings? I am not sure I can help but if you think of anything that I might do, just let me know. |
Yeah, I'm running on a GTX 560 (CC 2.0). I did change that to the correct one in the VS2013 properties, but I'll keep playing around with it. Thanks for all your help!
|
Just in case...
Edit: Figured out the memory error. Caused by my messing around previously. Still doesn't pass any of the tests, but it isn't crashing now at least.
|
Last post since it now works perfectly. Note to self: stop messing with code. You don't know what you're doing.
Thanks again Brian for helping out. |
I am glad to have been able to help - especially so, since you picked up a bug in the Visual Studio GMP-ECM build that I was unaware of. In fact, I suspect that this might well be a bug in GMP-ECM itself.
It is an ongoing nuisance that to build with the NVIDIA tools, both Visual Studio 2012 and 2013 need to be installed. And, with Visual Studio 2015 arriving next week, we may then need all three versions installed! |
CUDA Kernel issue solved
[QUOTE=Singularity;402886]Good evening,
I have just downloaded the latest version of ecm and compiled it with --enable-gpu=sm_21 in order to try to run stage1 on my Nvidia GeForce GT 525M. It compiles without issues and all the ECM tests pass during "make check". However when testing the GPU with "./test.gpuecm ./ecm" I get the following CUDA error: When I launch a manual test with "./ecm -v -gpu 125" I get the following more verbose output ending with the same error message: After some googling I seems the issue is that the cuda kernel is using too much registers/variables/ressources for my card to handle. Unfortunately I don't know if this is the expected behaviour on my card or how to fix it. Any help will be greatly appreciated. I tried to reduce the number of simultaneous curves with the -gpucurves parameter hoping it would reduce the ressources to something acceptable. It turns out that the minimum value is 32 and I still get the same error. In case it's helpful here is my version of nvcc:[/QUOTE] I finally managed to get it working. In case someone else will bump into a similar issue, here's what I did: I changed the "ECM_GPU_CURVES_BY_BLOCK" constant in cudakernel.h from 32 to 16 in the block corresponding to my compute capability (which is 2.1). Running "make" after that recompiles the CUDA kernel and all tests are passed successfully when running "./test.gpuecm ./ecm". |
[QUOTE=Singularity;407657]I finally managed to get it working. In case someone else will bump into a similar issue, here's what I did: I changed the "ECM_GPU_CURVES_BY_BLOCK" constant in cudakernel.h from 32 to 16 in the block corresponding to my compute capability (which is 2.1).
Running "make" after that recompiles the CUDA kernel and all tests are passed successfully when running "./test.gpuecm ./ecm".[/QUOTE] Any chance of putting up a link to the binaries? |
build woes
I follow the instructions in README.dev:
% sudo apt-get install libtool % svn co [url]https://scm.gforge.inria.fr/anonscm/svn/ecm[/url] % cd ecm/trunk % aclocal % automake -c -a % autoconf % ./configure automake tells me [code] pumpkin@pumpkin:~/cuda-ecm/ecm-20151118/ecm/trunk$ automake -c -a configure.ac:158: installing './compile' configure.ac:9: installing './config.guess' configure.ac:9: installing './config.sub' configure.ac:8: installing './install-sh' configure.ac:167: error: required file './ltmain.sh' not found configure.ac:8: installing './missing' Makefile.am: installing './INSTALL' configure.ac:11: error: required file 'config.h.in' not found Makefile.am: installing './depcomp' [/code] and then I have [code] configure: creating ./config.status config.status: error: cannot find input file: `Makefile.in' [/code] This is on ubuntu-14.04 |
[QUOTE=fivemack;416519]I follow the instructions in README.dev:
% sudo apt-get install libtool % svn co [url]https://scm.gforge.inria.fr/anonscm/svn/ecm[/url] % cd ecm/trunk % aclocal % automake -c -a % autoconf % ./configure This is on ubuntu-14.04[/QUOTE] I've never tried that part of the instructions. Have you tried the other part? This part has always worked for me: [CODE]$ autoreconf -i $ ./configure --with-gmp=<directory_where_gmp_is_installed> $ make $ make check[/CODE] And, if gmp is installed, then you don't have to add the "--with-gmp" part. |
[QUOTE=WraithX;416533]And, if gmp is installed, then you don't have to add the "--with-gmp" part.[/QUOTE]Yes and no.
If, like me, you run the development version of GMP but the system uses the stable one, you may wish to set --with-gmp if you want to live at the bleeding edge. |
autoreconf -i appears to be working fine for me now that I've installed libtool; the error messages it gave previously were not the least cryptic way of saying 'please install libtool'.
[code] pumpkin@pumpkin:~/cuda-ecm/ecm-20151118/ecm/trunk$ ./ecm -gpudevice 0 -gpu -gpucurves 1664 -save FROG-1664.s1 1e5 1 < c155 GMP-ECM 7.0-dev [configured with GMP 6.1.0, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM] Input number is 95209938255048826235189575712705128366296557149606415206280987204268594538412191641776798249266895999715600261737863698825644292938050707507901970225804581 (155 digits) Using B1=100000, B2=1, sigma=3:4188778356-3:4188780019 (1664 curves) Computing 1664 Step 1 took 1008ms of CPU time / 140212ms of GPU time [/code] |
GPU-ECM error on Ubuntu
I'm trying to get GPU-ECM compiled on Ubuntu with a GTX 570 (CC 2.0) card. nvidia-smi recognizes the card and gives the correct driver version (352.xx). I compiled GMP-ECM with enable-gpu=sm_20, and it passes the non-gpu checks with flying colors. On the first GPU check, however, it fails and says that too many resources were requested.
This card worked fine on my Windows box for GPU-ECM until it was replaced, so it shouldn't be a hardware issue. Is there something I'm forgetting to do or need to set? CUDA version is 7.5. |
Well, changing the curves per block parameter in the appropriate header from 32 to 16 for CC2.0 worked.
|
I'm just a factoring newb with relatively powerful gaming machine. M1277 is bugging me because its not prime but does not have any known factors yet. I can't trial factor it with prime95 or mfaktc because it's too small. Only thing I can do with it is ECM, but its taking so long on my intel 5960x. I was looking for a GPU accelerated version of ECM and saw this thread, but there seems to be a lot of talk of linux.
Is there a GPU version of ECM that I can use to work on finding factors for numbers like M1277 and larger? If so, is there a windows version that is already compiled and ready to download that will work on my GTX 980's with relatively little fuss? |
[QUOTE=ssateneth;425418]I'm just a factoring newb with relatively powerful gaming machine. M1277 is bugging me because its not prime but does not have any known factors yet. I can't trial factor it with prime95 or mfaktc because it's too small. Only thing I can do with it is ECM, but its taking so long on my intel 5960x. I was looking for a GPU accelerated version of ECM and saw this thread, but there seems to be a lot of talk of linux.
Is there a GPU version of ECM that I can use to work on finding factors for numbers like M1277 and larger? If so, is there a windows version that is already compiled and ready to download that will work on my GTX 980's with relatively little fuss?[/QUOTE] Unfortunately, the actual GPU-ECM is hard-limited to 1018 bits, otherwise I would be more than happy to compile one for you. Just consider that GPU-ECM can run about 500 stge1 curves at once, but has no GPU speedup for stage 2. Luigi |
[QUOTE=ssateneth;425418]I'm just a factoring newb with relatively powerful gaming machine. M1277 is bugging me because its not prime but does not have any known factors yet. I can't trial factor it with prime95 or mfaktc because it's too small. Only thing I can do with it is ECM, but its taking so long on my intel 5960x. I was looking for a GPU accelerated version of ECM and saw this thread, but there seems to be a lot of talk of linux.
Is there a GPU version of ECM that I can use to work on finding factors for numbers like M1277 and larger? If so, is there a windows version that is already compiled and ready to download that will work on my GTX 980's with relatively little fuss?[/QUOTE] Just curious, are you using Prime95 for stage1 and GMP-ECM for stage2? If so, what B1 and B2 settings? |
[QUOTE=VictordeHolland;425455]Just curious, are you using Prime95 for stage1 and GMP-ECM for stage2?
If so, what B1 and B2 settings?[/QUOTE] This. The fastest way to ECM for M1277 is to run prime95 with B2 set to 1, and a flag (GmpEcmHook=1) in prime.txt file to output residues after stage 1. ECM is then run on the results file to do stage 2. P95 is about 30% faster on M1277 for stage 1 than gmp-ecm. I found B1 = 1.6e9 and B2 = 51e12 to be good settings; stage 2 takes 9GB for those. Since stage 2 takes quite a bit less time than stage 1, you can run multiple threads of P95 and feed the outputs sequentially to ECM; a 3 to 1 ratio will almost allow ECM to keep up while using all 4 threads. I am specifically trying to take advantage of large memory available, so I chose bigger bounds than the usual B1 = 800M. The bigger bounds trades a small amount of efficiency in finding a ~65 digit factor for a greater chance (per unit time) to find a ~70 digit factor. On a machine with more memory, I chose B1 = 4.5e9 to try for yet larger factors (B2 = 200e12 takes 17GB of memory). |
Hello,
I'm looking for a binary of GMP-ECM for an Intel Core i7 (Windows 7). Is there a chance to get such a binary? At the moment I'm very frustrated since I've got some binaries, but none of them is running. In most cases Windows reports that the program could not be started. msieve's poly selection step with -np1 is running, so the card should be GMP-ECM "compatible". Thank you in advance. Alfred |
[QUOTE=Alfred;425489]Hello,
I'm looking for a binary of GMP-ECM for an Intel Core i7 (Windows 7). Is there a chance to get such a binary? At the moment I'm very frustrated since I've got some binaries, but none of them is running. In most cases Windows reports that the program could not be started. msieve's poly selection step with -np1 is running, so the card should be GMP-ECM "compatible". Thank you in advance. Alfred[/QUOTE] [url]http://www.mersenneforum.org/showthread.php?t=4087[/url] |
[QUOTE=VictordeHolland;425530][url]http://www.mersenneforum.org/showthread.php?t=4087[/url][/QUOTE]
Thank you. GMP-ECM (CPU) is running well (of course). I meant GPU-ECM (GMP-ECM for CUDA). Caused my frustration my misspelling? I'm ashamed. BTW, I'm not able to reach the url where your [url]http://www.mersenneforum.org/showthread.php?t=4087[/url] is pointing at. I get server error 404. Alfred |
[QUOTE=Alfred;425534]Thank you.
GMP-ECM (CPU) is running well (of course). I meant GPU-ECM (GMP-ECM for CUDA). Caused my frustration my misspelling? I'm ashamed. BTW, I'm not able to reach the url where your [URL]http://www.mersenneforum.org/showthread.php?t=4087[/URL] is pointing at. I get server error 404. Alfred[/QUOTE] It's working on Firefox 43, the thread is named: "Links to Precompiled GMP-ECM versions" in the GMP-ECM subforum. |
Here there is nothing wrong with the link, it points where it is assumed to point.
|
[QUOTE=VictordeHolland;425573]It's working on Firefox 43, the thread is named:
"Links to Precompiled GMP-ECM versions" in the GMP-ECM subforum.[/QUOTE] [QUOTE=LaurV;425591]Here there is nothing wrong with the link, it points where it is assumed to point.[/QUOTE] Perhaps OP is talking about the very first link *in* that thread? |
None of the binaries I have compiled in the other thread runs on GPU. I have not looked into compiling GMP-ECM for GPU.
|
Apparently it is not possible to compile the gpu version with MSYS2+Mingw for Windows as it looks for "libcudart.so" which does not exist in the Windows version of CUDA. Any possibility of adding that option?
I have no idea how to compile it with Visual Studio. |
GPU ECM for Windows
About to buy a new laptop with a GTX 980 card. Looking through the posts in this thread, it would appear that some of you have gotten a gpu version of ecm to work on Windows. However, I also see post stating that a Windows binary is 'wanted' by some. Could somebody tell me the current status of running ecm on a Nvidia gpu in windows (Windows 7/8/10)? Thanks
|
I run GPU-ECM on Windows 7 with a GTX 980 Ti. I have to compile it with Visual Studio.
|
[QUOTE=wombatman;437455]I run GPU-ECM on Windows 7 with a GTX 980 Ti. I have to compile it with Visual Studio.[/QUOTE]
Do you think there will be an issue between Windows 7 and Windows 10? Also, what version of VS did you compile it with? |
Honestly, I have no idea about 7 to 10 compatibility, but I don't think it should. I compile with VS2012 since I got a free copy of the full edition somewhere along the line.
|
Also consider that GPU-EGM (the version of GMP-ECM that uses Nvidia acceleration) has a limitation to 1018 bits.
Luigi |
Running into a problem I posted previously on here, but my solution then isn't working now. I'm trying to get GPU-ECM running on Ubuntu 14.04 with a GTX 570. I configured for CC 2.0, and everything compiles without issue.
I can pass all the standard ECM tests, so I don't think there's anything wrong there, but when I try to run test.gpuecm, I fail immediately with a "too many resources requested for launch" error. Using verbose, the correct GPU is identified along with the correct compute capability (2.0). The block size is 32x32x1, and the grid size is 1x1x1. CUDA version is 7.5. Any help/advice is appreciated. |
[QUOTE=wombatman;438539]The block size is 32x32x1, and the grid size is 1x1x1. CUDA version is 7.5.[/QUOTE]
I'm just guessing but 32 x 32 = 1048 but GTX 570 only has 480 cores. That is more than twice the cores. |
[QUOTE=RichD;438542]I'm just guessing but 32 x 32 = 1048...[/QUOTE]
32 * 32 == 2^5 * 2^5 == 2^10 == 1024.... :wink: |
[QUOTE=RichD;438542]I'm just guessing but 32 x 32 = 1048 but GTX 570 only has 480 cores. That is more than twice the cores.[/QUOTE]
True, but it does the same with my 980Ti, which passes all tests using the test.gpuecm. |
| All times are UTC. The time now is 04:22. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.