![]() |
So I'm running into an incompatibility issue with CUDA--essentially, nvcc won't play with any gcc version newer than 4.6. Is the simplest solution to just get gcc 4.6, put it in a separate directory, and point this particular configure step to that gcc?
|
I'm actually going to try the CUDA 5.5 release candidate--it appears to have support for GCC 4.7.
|
Well, downloaded CUDA 5.5 toolkit (and VS2012 Express, which is now required) and got everything installed. Tried the compilation again, and it still fails with:
[CODE]checking for nvcc... /c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v5.5/bin/nvcc checking for compatibility between gcc and nvcc... no configure: error: gcc version is not compatible with nvcc GCC is version 4.7.2[/CODE] According to the release notes [URL="http://developer.download.nvidia.com/compute/cuda/5_5/rc/docs/CUDA_Toolkit_Release_Notes.pdf"]here[/URL], GCC 4.7 is supported on at least a few recent Linux distributions. Accordingly, it seems like it should be supported on Windows as well. Any suggestions on code that can modified for the configure file to get around this issue? EDIT: Running [CODE]nvcc -dryrun -c conftest.cu -o conftest.o[/CODE] which is the test used in the configure file outputs the following: [CODE]$ nvcc -dryrun -c conftest.cu -o conftest.o #$ _SPACE_= #$ _CUDART_=cudart #$ _HERE_=c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin #$ _THERE_=c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin #$ _TARGET_SIZE_= #$ _TARGET_DIR_= #$ _TARGET_SIZE_=64 #$ _WIN_PLATFORM_=x64 #$ TOP=c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin/.. #$ NVVMIR_LIBRARY_DIR=c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bi n/../nvvm/libdevice #$ PATH= *SNIIIIIIP* #$ INCLUDES="-Ic:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\bin/../in clude" #$ LIBRARIES= "/LIBPATH:c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5 \bin/../lib/x64" #$ CUDAFE_FLAGS= #$ OPENCC_FLAGS= #$ PTXAS_FLAGS= nvcc : fatal error : nvcc cannot find a supported version of Microsoft Visual Studio. Only the versions 2008, 2010, and 2012 are supported[/CODE] |
In windows, nvcc requires that you have an installation of Microsoft's compiler. I use the free VS2008 download. You can structure your code so that you use the CUDA driver interface with any compiler and link to the Nvidia library like any other DLL. Use of the CUDA runtime interface with the fancy C++ extensions requires that you run nvcc (only) on host code, but you must still have Microsoft's compiler even to compile device code.
|
Yeah, I have cl.exe (and have confirmed that it will operate by running "cl" from the MinGW command line). I actually went and dug up a version of mingw with gcc 4.6.4, which should, by my reading of various websites and compatibility checks, be fine with CUDA 5.0.
I run this: [CODE]$ ./configure --enable-gpu=sm_20 --with-cc-for-cuda=/c/Documents\ and\ Settings/Ben/Desktop/4.6.4/mingw32/bin[/CODE] Which, according to the readme.gpu file, should compile with support for CC 2.0 (I have a GTX 570, so that's what I need) and use the gcc from the location I specified. I still get the same "compatibility" error as before. This is on a Windows XP 32-bit desktop. I can compile, for instance, MSIEVE with CUDA support with no issue. I've confirmed that the script compiles with no issue when the --enable-gpu flag is not present, so it doesn't seem to be anything wrong with the configure script itself. Edit: And I tried downloading CUDA 5.5 again to see if that would help at all. It doesn't, even though the host_config.h file now looks for gcc versions more recent than 4.9. So it's something within the script itself. I'll keep digging. |
I'm still searching for answers, but the immediate issue seems to be in getting nvcc to play nice with Microsoft's compiler. The configure file uses this as a test:
[CODE]nvcc -c conftest.cu -o conftest.o[/CODE] to make sure nvcc is functioning fine. When I run this myself, it returns: [CODE]$ nvcc -c testconf.cu -o testconf.o nvcc : fatal error : Microsoft Visual Studio configuration file 'vcvars64.bat' could not be found for installation at 'c:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin/../..'[/CODE] Apparently this file is only included in the professional installations of VS, not the express one. The solution that I can find is to use the SetEnv program with "/x64" to accomplish what vcvars64.bat does. Unfortunately, I don't know how this might be accomplished with MinGW. Edit: Trying to configure and make without the gpu support gives this error during the make step: [CODE]stage2.c: In function '__ecm_memory_use': stage2.c:297:44: error: 'MUL_NTT_THRESHOLD' undeclared (first use in this function) stage2.c:297:44: note: each undeclared identifier is reported only once for each function it appears in stage2.c:306:6: error: 'MPZSPV_NORMALISE_STRIDE' undeclared (first use in this function)[/CODE] As always, I'm open to suggestions! |
Doesn't the free Visual Studio install come with a special command prompt that has all the environment stuff set? It should be in the Start Menu.
(You can run all the MinGW binaries from outside the MinGW prompt if the directories with all the binaries are in your path) |
Do you mean to run MSYS from within that 64-bit environment command prompt?
And thanks for taking the time to make suggestions, Jason. I appreciate it. |
Quick update on some things that I got worked out--
This error: [CODE]stage2.c: In function '__ecm_memory_use': stage2.c:297:44: error: 'MUL_NTT_THRESHOLD' undeclared (first use in this function) stage2.c:297:44: note: each undeclared identifier is reported only once for each function it appears in stage2.c:306:6: error: 'MPZSPV_NORMALISE_STRIDE' undeclared (first use in this function)[/CODE] is just from ecm-params.h not being properly defined. If I run tune.exe from my ecm-6.4.4 folder (which works fine) and copy that ecm-params.h file over, I'm good. I also had to copy the following lines from ecm-impl.h: [CODE]#define TUNE_MULREDC_TABLE {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0} #define TUNE_SQRREDC_TABLE {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0} #define LIST_MUL_TABLE {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}[/CODE] for the same reason. On both Windows XP and Windows 7 (running MinGW), running [CODE]./configure && make ecm-params[/CODE] errors with the following: [CODE]Optimising parameters for your system, please be patient. test -z "ecm-params.h" || rm -f ecm-params.h ./tune > ecm-params.h Assertion failed: __gmpz_divisible_ui_p (f[i], M[i][i]), file ks-multiply.c, line 164[/CODE] The tune.exe from ecm-6.4.4 works just fine, and the ecm-params from that will allow the "make" step to work. I'm running the Cunningham numbers and no errors show up from that. I should also note that when I run "make check" I get this: [CODE]GMP-ECM 7.0-dev [configured with GMP 5.1.1, --enable-assert] [ECM] Input number is 31622776601683791911 (20 digits) Error, parametrization ECM_PARAM_BATCH_SQUARE works only with GMP_NUMB_BITS=64 Please report internal errors at <ecm-discuss@lists.gforge.inria.fr>. ############### ERROR ############### Expected return code 0 but got 1 make: *** [check] Error 1[/CODE] Still can't get --enable-gpu to work, but I'll keep working. |
Sorry to keep on posting to myself, but I can't edit after an hour.
Another small victory! To fix the error when running "make ecm-params", I used [CODE]./configure --disable-assert[/CODE] I still have to add [CODE]#define TUNE_MULREDC_TABLE {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0} #define TUNE_SQRREDC_TABLE {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}[/CODE] to ecm-params.h myself, however. And "make check" still fails with [CODE]Error, parametrization ECM_PARAM_BATCH_SQUARE works only with GMP_NUMB_BITS=64[/CODE] |
Small victory #2.
If I run [CODE]touch conftest.cu nvcc --machine 32 -c conftest.cu -o conftest.o[/CODE] nvcc compiles without an error. I added the "--machine 32" line to the configure file, but I still get gcc/nvcc error from configure. The relevant lines from the configure are: [CODE]touch conftest.cu nvcc_flags="--machine 32 -c conftest.cu -o conftest.o" { $as_echo "$as_me:${as_lineno-$LINENO}: checking for compatibility between gcc and nvcc" >&5 $as_echo_n "checking for compatibility between gcc and nvcc... " >&6; } if test -n "$cc_for_cuda"; then : nvcc_flags+=" --compiler-bindir $cc_for_cuda" fi $NVCC $nvcc_flags > /dev/null 2>&1 if test "$?" -eq "0"; then : { $as_echo "$as_me:${as_lineno-$LINENO}: result: ok" >&5 $as_echo "ok" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; } as_fn_error $? "gcc version is not compatible with nvcc" "$LINENO" 5[/CODE] The command used to run the configure file is: [CODE]./configure --disable-assert --enable-gpu=sm_20[/CODE] |
Ok, narrowing down the issue even more--I removed the > /dev/null part from the nvcc configure section so I could see what error was being output. What I got is this:
[CODE]checking for nvcc... /c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v5.0/bin/nvcc checking for compatibility between gcc and nvcc... [B]./configure: line 14746: /c/Program: No such file or directory[/B][/CODE] So even though it correctly finds nvcc in $CUDA_PATH, when it tries to run [CODE]$NVCC $nvcc_flags 2>&1[/CODE] it errors out due to the spaces in the path name. Is there a way around this? I tried passing the CUDA path in quotes using --with-cuda flag, but that didn't work either. |
Well, I got it to compile. Victory there. But it still won't make. Defeat. Here's what I've done so far.
To get it to compile: I added NVCC="nvcc" at Line 14737, immediately before "touch conftest.cu". This gets around the issue with the path to nvcc. In all cases of 'test' commands, I went ahead and put quotes if any variable with "$" was used. This solved most of the other issues with nvcc compilation. I should also note that the "--machine 32" flag is NOT necessary for successful configuration. I was wrong about that. For the library search, I inserted [CODE]CUDA_LIB_FLAGS="-L$cu_dir/lib$lib_suffix/win32 -lcudart"[/CODE] at Line 14848 and commented out the error message. Running make, however, I get this: [CODE]libtool: link: ar cru .libs/libecm.a libecm_la-ecm.o libecm_la-ecm2.o libecm_la-pm1.o libecm_la-pp1.o libecm_la-getprime.o libecm_la-listz.o libecm_la-lucas.olibecm_la-stage2.o libecm_la-mpmod.o libecm_la-mul_lo.o libecm_la-polyeval.o libecm_la-median.o libecm_la-schoen_strass.o libecm_la-ks-multiply.o libecm_la-rho.o libecm_la-bestd.o libecm_la-auxlib.o libecm_la-random.o libecm_la-factor.o libecm_la-sp.o libecm_la-spv.o libecm_la-spm.o libecm_la-mpzspm.o libecm_la-mpzspv.o libecm_la-ntt_gfp.o libecm_la-ecm_ntt.o libecm_la-pm1fs2.o libecm_la-sets_long.o libecm_la-auxarith.o libecm_la-batch.o libecm_la-parametrizations.o libecm_la -cudawrapper.o libecm_la-listz_handle.o libecm_la-addlaws.o libecm_la-torsions.o cudakernel.o C:\MinGW\bin\ar.exe: cudakernel.o: No such file or directory[/CODE] So yeah....I'm becoming more convinced that it can't be compiled on MinGW, because the Linux CUDA can't be set up properly within MinGW. I'll keep tinkering, but unless one of the programmers sees fit to make some changes to enable Windows usage, I don't see it working. [B]EDIT: I'm going to backtrack on some of my pessimism about the likelihood of compiling this in MinGW. Some further tweaking gets me most of the way through the make...[/B] |
Ok, I can get basically all the way through the make process, but I'm falling short at this step:
[CODE]libtool: compile: nvcc --compile "-I/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v5.0/include" -DGPU_CC20 -DWITH_GPU -arch=sm_20 --machine 32 --ptxas-options=-v -O2 --compiler-options -fno-strict-aliasing cudakernel.cu -o cudakernel.o cl : Command line warning D9002 : ignoring unknown option '-fno-strict-aliasing' cudakernel.cu[/CODE] The error (and there are several of the same at different lines) is: [CODE]c:\users\ben meekins\desktop\gmp-ecm\trunk\cudakernel_default.cu(15): error: expected a "("[/CODE] Here are the 1st 22 lines of the cudakernel_default.cu--errors are thrown on lines 15, 18, and 20: [CODE]/* Default code for GPU */ /* A compute capability of 2.0 at least is required */ __device__ void Cuda_Fully_Normalize (biguint_t A, bigint_t cy) { carry_t cytemp; unsigned int thm1; while(__any(cy[threadIdx.x])!=0) { thm1 = (threadIdx.x - 1) % ECM_GPU_NB_DIGITS; cytemp = cy[thm1]; [B]__add_cc(A[threadIdx.x], A[threadIdx.x], cytemp);[/B] if (cytemp >= 0) [B] __addcy(cy[threadIdx.x]);[/B] else /* if (cytemp < 0) */ [B] __subcy(cy[threadIdx.x]);[/B] } }[/CODE] It seems that it doesn't like the double underscore, but I don't understand why or what I can do about it. If we can get this figured out, I'm actually hopeful that it might compile... |
Wombatman, have you seen this thread? [url]http://www.mersenneforum.org/showpost.php?p=338530&postcount=9[/url] Might be helpful for you.
|
You know, I think I did look through it at some point. That's good for my laptop (which is 64 bit), but it'd be nice to get GPU-ECM compiled for my 32-bit desktop as well since the video card there is a bit more powerful. Thank you for bringing it to my attention, though!
|
mklasson, your VS work actually fixed one of my errors! I also changed [CODE]__volatile__[/CODE] to [CODE]volatile[/CODE] and now cudakernel.cu compiles correctly! However...as always, there is a small error stopping me. I encounter this error immediately afterwards--the command is included:
[CODE]/bin/sh ./libtool --tag=CC *SNIP* /mingw/lib/libgmp.a -L"/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v5.0/lib/win32/" -lcudart -lm -lm -lm -lm -lm [B]libtool: link: cannot find the library `' or unhandled argument `Files/NVIDIA'[/B][/CODE] Got around the above error by copy to the CUDA v5.0 folder to a path without any white spaces. Running into the following two errors now: [CODE]*** Warning: Trying to link with static lib archive /mingw/lib/libgmp.a. *** I have the capability to make that library automatically link in when *** you link to this library. But I can only do this if you have a *** shared version of the library, which you do not appear to have *** because the file extensions .a of this argument makes me believe *** that it is just a static archive that I should not use here.[/CODE] [CODE]Warning: .drectve `/DEFAULTLIB:"libcpmt" /DEFAULTLIB:"LIBCMT" /DEFAULTLIB:"OLDNAMES" ' unrecognized ./.libs/libecm.a(cudakernel.o):(.text[_lround]+0x13): undefined reference to `_ftol2_sse' ./.libs/libecm.a(cudakernel.o):(.text[_llround]+0x13): undefined reference to `_ftol2' c:/mingw/bin/../lib/gcc/mingw32/4.7.2/../../../../mingw32/bin/ld.exe: ./.libs/libecm.a(cudakernel.o): bad reloc address 0x13 in section `.text[_llround]' c:/mingw/bin/../lib/gcc/mingw32/4.7.2/../../../../mingw32/bin/ld.exe: final link failed: Invalid operation collect2.exe: error: ld returned 1 exit status make[2]: *** [ecm.exe] Error 1[/CODE] SO CLOSE. |
Got close, but it looks like the ".drectve" error is caused by the fact that you can't link to libraries generated outside of MinGW when compiling using nvcc (see [url]http://mingw.5.n7.nabble.com/hundreds-of-undefined-references-td9008.html[/url]).
Part of the problem is that I'm not sure which library is not allowing the linking, so I'm lost. Here's the relevant output: [CODE]libtool: link: gcc -std=gnu99 -g -DGPU_CC20 -DWITH_GPU -W -Wall -Wundef -m32 -O2 -pedantic -fomit-frame-pointer -mtune=amdfam10 -arch=amdfam10 -o ecm.exe ecm-auxi.o ecm-b1_ainc.o ecm-candi.o ecm-eval.o ecm-random.o ecm-main.o ecm-resume.o ecm-getprime.o ./.libs/libecm.a -L/c/CUDA/v5.5/lib/win32/cudart.lib /c/mingw/lib/libgmp.a Warning: .drectve `/DEFAULTLIB:"libcpmt" /DEFAULTLIB:"LIBCMT" /DEFAULTLIB:"OLDNAMES" ' unrecognized ./.libs/libecm.a(cudakernel.o):(.text[?cuda_errCheck@@YAXW4cudaError@@PBDH@Z]+0x a): undefined reference to `cudaGetErrorString@4' ./.libs/libecm.a(cudakernel.o):(.text[?__cudaUnregisterBinaryUtil@@YAXXZ]+0x11): undefined reference to `__cudaUnregisterFatBinary@4' c:/mingw/bin/../lib/gcc/mingw32/4.7.2/../../../../mingw32/bin/ld.exe: ./.libs/li becm.a(cudakernel.o): bad reloc address 0x11 in section `.text[?__cudaUnregister BinaryUtil@@YAXXZ]' c:/mingw/bin/../lib/gcc/mingw32/4.7.2/../../../../mingw32/bin/ld.exe: final link failed: Invalid operation[/CODE] If anyone has a workaround, I'm all ears. |
Well, decided to try with --enable-shared to hopefully allow for full compilation. The cudakernel file compiles without issue, but I error out with the following output:
[CODE].libs/libecm_la-cudawrapper.o: In function `_ecm_gpu_ecm_stage1': c:\Documents and Settings\Ben\Desktop\Dorkdom\SVN Checkout\trunk/cudawrapper.c:174: undefined reference to `cuda_Main' c:\Documents and Settings\Ben\Desktop\Dorkdom\SVN Checkout\trunk/cudawrapper.c:174: undefined reference to `cuda_Main' .libs/libecm_la-cudawrapper.o: In function `_ecm_gpu_ecm': c:\Documents and Settings\Ben\Desktop\Dorkdom\SVN Checkout\trunk/cudawrapper.c:375: undefined reference to `select_and_init_GPU' c:\Documents and Settings\Ben\Desktop\Dorkdom\SVN Checkout\trunk/cudawrapper.c:373: undefined reference to `select_and_init_GPU'[/CODE] The relevant cudawrapper.c code is: [CODE]extern int select_and_init_GPU (int, unsigned int*, int); extern float cuda_Main (biguint_t, biguint_t, biguint_t, digit_t, biguint_t*, biguint_t*, biguint_t*, biguint_t*, mpz_t, unsigned int, unsigned int, int); *gputime=cuda_Main (h_N, h_3N, h_M, h_invN, h_xarray, h_zarray, h_x2array, h_z2array, s, firstsigma, number_of_curves, (verbose >= OUTPUT_DEVVERBOSE)); youpi = select_and_init_GPU (device, nb_curves, 1);[/CODE] And the functions are defined in cudakernel.cu as: [CODE]extern "C" float cuda_Main (biguint_t h_N, biguint_t h_3N, biguint_t h_M, digit_t h_invN, biguint_t *h_xarray, biguint_t *h_zarray, biguint_t *h_x2array, biguint_t *h_z2array, mpz_t s, unsigned int firstinvd, unsigned int number_of_curves, int is_verbose_enough) { extern "C" int select_and_init_GPU (int device, unsigned int *number_of_curves, int verbose) {[/CODE] No idea why these two functions aren't found. Advice appreciated. |
NVIDIA has at long last added CUDA capabilities to Visual Studio 2012 so I have been able to update the ECM GPU build. But I now find that the lowest compute capability supported is 2.0 whereas my GTX cards have 1.3 capabilities, which means that I can't test the binaries. :sad:
But I can supply binaries if anyone is interested and willing to see if they work (I would need to know which compute capability to build for). Brian |
[QUOTE=Brian Gladman;348517]
But I can supply binaries if anyone is interested and willing to see if they work (I would need to know which compute capability to build for). Brian[/QUOTE] Brian, I'd like to test a 64 bit sm_35 binary. |
[QUOTE=Karl M Johnson;348521]Brian, I'd like to test a 64 bit sm_35 binary.[/QUOTE]
Can you PM me with an email address to send it? I need to warn you that it is quite likely to fail as it will be completely untested. Brian |
I would love to test a 32 bit SM_20 binary, Brian.
|
[QUOTE=wombatman;348535]I would love to test a 32 bit SM_20 binary, Brian.[/QUOTE]
Hi, I must admit that I haven't produced a 32-bit binary for several years now but I tried this and it did compile. But I don't know which cuda redistributable libraries you would need (cuda toolkit v5.5) - do you have these already? If you PM me with an email address I can send you a binary without the redistributables. Edit: I think I can compile against a static cuda library so, maybe, you wouldn't need the redistributables. |
I just want to announce to everyone here that Brian's 32-bit binary worked wonderfully on Windows 7 (and I'll be testing it on Win XP to confirm that tonight). The only thing to note is that you need the appropriate cudart32_**.dll file in the same folder as the binary. For CUDA 5.5, it's cudart32_55.dll, which can be found in the /bin/ folder of your CUDA installation.
|
Brian's binary did not work on Windows XP, but I was able to get the VS2010 solution to build, and it seems to be working successfully (passed a few tests). I'm going to run it through the Cunningham.in file for at least a few hours, and then I'll post up a thread with the binary (assuming that's permissible).
HUGE thanks to Brian Gladman and mklasson. |
So does this mean we have a functioning ECM factoring for GPUs program?
|
After installing the [URL="http://www.microsoft.com/en-us/download/details.aspx?id=30679"]VC dependency[/URL], it worked fine.
No true way to compare to classic gmp-ecm, for the GPU version uses a single thread for stage2 (perhaps there is a way, I just don't know how to force it to use n threads), but here are the benchmark results nevertheless: [CODE] 7942818512396221120536898926034544680614447375617547588581707383587816365157999 512 curves B1=1M B2=1B CPU version used ecm70dev-svn2518-x64-bulldozer via YAFU. GPU version used ecm_gpu supplied by Brian. CPU: 245s GPU: S1 19.687s CPU + 338.498s GPU ; S2 723.033s CPU [/CODE]1. Is there a better way to benchmark CPU vs GPU performance? I've noticed GPU was only used to ~70% by ECM, that's not enough. 2. Is there a way to force S2 to use n threads? If not, why was it compiled with OpenMP in the first place? |
I'd love to know the answer to the Stage 2 question as well. As for pushing more out of the GPU, you can pass -gpucurves n (n = # of curves to run in parallel) as an argument. That can be used to force more curves to be run at once, which may achieve what you want.
What CPU do you have? Mine takes an inordinately long time (think 600+ seconds for stage 2 of 480 curves with B1=1e4), but it's a Phenom II X4 that's about 6 years old at this point as well. |
I've tried several powers of 2 for gpu curves (256, 512, 1024) and multiples of 2 (1344, 2688).
As for the CPU, it's FX-8150@4 Ghz, nothing special. The binary, however, is built specifically for Bulldozer arch(xop and avx included) with latest and greatest MPIR. That could be the source of decent speedup. |
[QUOTE=wombatman;348629]Brian's binary did not work on Windows XP, but I was able to get the VS2010 solution to build, and it seems to be working successfully (passed a few tests). I'm going to run it through the Cunningham.in file for at least a few hours, and then I'll post up a thread with the binary (assuming that's permissible).
HUGE thanks to Brian Gladman and mklasson.[/QUOTE] Which version are you using on XP? I am interested because it would be useful to know what causes a 32-bit version that works on Windows 7/8 to fail on XP. |
Remember that you'll have to adjust B1 & B2.
My experience: You should use B1 about one step higher than when done on CPU, so for a t35 you'd use B1=3e6. You'll also be using a far lower B2/B1-ratio. Aim for stg2time/(stage1time+stage2time)=0.41 The speedup has been around ~5 times for me. |
Hmm...I might have to pull the latest MPIR then. Thanks.
[QUOTE=Karl M Johnson;348656]I've tried several powers of 2 for gpu curves (256, 512, 1024) and multiples of 2 (1344, 2688). As for the CPU, it's FX-8150@4 Ghz, nothing special. The binary, however, is built specifically for Bulldozer arch(xop and avx included) with latest and greatest MPIR. That could be the source of decent speedup.[/QUOTE] |
I sent you an email, but I'm not sure what you mean by which version.
-printconfig gives this: [CODE]Compilation options: Included MPIR header files version 2.6.0 ECM_TUNE_CASE not defined. GWNUM_VERSION undefined HAVE_SSE2 undefined HAVE___GMPN_ADD_NC undefined HAVE___GMPN_MOD_34LSUB1 = 1 HAVE___GMPN_REDC_1 = 1 USE_ASM_REDC undefined WANT_ASSERT undefined _OPENMP = 200203 MPZMOD_THRESHOLD = 30 REDC_THRESHOLD = 512 MUL_NTT_THRESHOLD = 4096 NTT_GFP_TWIDDLE_DIF_BREAKOVER = 13 NTT_GFP_TWIDDLE_DIT_BREAKOVER = 13 PREREVERTDIVISION_NTT_THRESHOLD = 1024 POLYINVERT_NTT_THRESHOLD = 8192 POLYEVALT_NTT_THRESHOLD = 4096 MPZSPV_NORMALISE_STRIDE = 256 WITH_GPU = 1[/CODE] [QUOTE=Brian Gladman;348657]Which version are you using on XP? I am interested because it would be useful to know what causes a 32-bit version that works on Windows 7/8 to fail on XP.[/QUOTE] |
[QUOTE=wombatman;348685]I sent you an email, but I'm not sure what you mean by which version.
-printconfig gives this: [CODE]Compilation options: Included MPIR header files version 2.6.0 ECM_TUNE_CASE not defined. GWNUM_VERSION undefined HAVE_SSE2 undefined HAVE___GMPN_ADD_NC undefined HAVE___GMPN_MOD_34LSUB1 = 1 HAVE___GMPN_REDC_1 = 1 USE_ASM_REDC undefined WANT_ASSERT undefined _OPENMP = 200203 MPZMOD_THRESHOLD = 30 REDC_THRESHOLD = 512 MUL_NTT_THRESHOLD = 4096 NTT_GFP_TWIDDLE_DIF_BREAKOVER = 13 NTT_GFP_TWIDDLE_DIT_BREAKOVER = 13 PREREVERTDIVISION_NTT_THRESHOLD = 1024 POLYINVERT_NTT_THRESHOLD = 8192 POLYEVALT_NTT_THRESHOLD = 4096 MPZSPV_NORMALISE_STRIDE = 256 WITH_GPU = 1[/CODE][/QUOTE] I believe there are three versions in circulation. Mklasson's correction of my outdated build files. My recent correction of these build files in the GMP-ECM repository (which I intend to remove shortly). And my recent addition to the GMP-ECM repository (ecmlib_gpu and ecm_gpu). The latter is for VS 2012. Since you built from source, I just wondered which of the three sets of source code you are using. |
Does the latest redistributable package support XP? Have they begun removing support for it?
|
[QUOTE=Brian Gladman;348688]I believe there are three versions in circulation. Mklasson's correction of my outdated build files. My recent correction of these build files in the GMP-ECM repository (which I intend to remove shortly). And my recent addition to the GMP-ECM repository (ecmlib_gpu and ecm_gpu). The latter is for VS 2012.
Since you built from source, I just wondered which of the three sets of source code you are using.[/QUOTE] Ah, ok! I pulled from the repository -- SVN Revision 2521 from July 20th. The ecm-params.h I actually got by compiling under MinGW. I can get all the tuning/benching programs to run under MinGW, just not the final product. I guess I should probably run the tune compiled with VS2010 to make sure I'm not screwing anything up. Does that answer your question? |
[QUOTE=henryzz;348695]Does the latest redistributable package support XP? Have they begun removing support for it?[/QUOTE]
If you mean the VS2012 redistributable, it does have an x86 version that installed without issue on XP, although I was still unable to run Brian's binary, even with that. Still gives the same "bad file number/Not a valid Win32 app" error. |
[QUOTE=wombatman;348697]Ah, ok! I pulled from the repository -- SVN Revision 2521 from July 20th. The ecm-params.h I actually got by compiling under MinGW. I can get all the tuning/benching programs to run under MinGW, just not the final product. I guess I should probably run the tune compiled with VS2010 to make sure I'm not screwing anything up. Does that answer your question?[/QUOTE]
Yes, thanks. I updated the repository after your revision with what I hope will be the definitive Visual Studio 2012 version of ecm_gpu. I assume you must have updated the out of date build files for the old version. So we still don't know why the new binaries don't run on XP :-( Brian |
Can you walk through how you set yours up? Mine was something like this:
Compiler: VS2010 GPU Code: CUDA 5.5 Built libecm_gpu.lib Linked to libecm_gpu.lib and mpir.lib explicitly (that is, I put the directory paths, not using the mp_lib stuff) Built ecm_gpu.exe. Unfortunately, I can't try and build the VS2012 on XP since VS2012 flat out will not work on it. I will try, however, building your latest SVN version on my laptop (Win 7) and seeing if I can generate a binary that does work on my XP machine. |
[QUOTE=wombatman;348704]Can you walk through how you set yours up? Mine was something like this:
Compiler: VS2010 GPU Code: CUDA 5.5 Built libecm_gpu.lib Linked to libecm_gpu.lib and mpir.lib explicitly (that is, I put the directory paths, not using the mp_lib stuff) Built ecm_gpu.exe. Unfortunately, I can't try and build the VS2012 on XP since VS2012 flat out will not work on it. I will try, however, building your latest SVN version on my laptop (Win 7) and seeing if I can generate a binary that does work on my XP machine.[/QUOTE] I use VS2012 with the CUDA Toolkit v5.5 and NSight 3.1. I build against the current GIT repository version of MPIR. I load the solution ecm_gpu.sln, which is configured to find MPIR via mp_lib. This contains two projects, libecm_gpu.lib and ecm_gpu.lib, which I build in this order (after cleaning them as partial rebuilds sometimes give link errors). Brian |
Well, crap. Now I'm getting a vsyasm error when it tries to compile the two assembler files for libecm_gpu. Do you use the VS2010 vsyasm from [URL="http://yasm.tortall.net/Download.html"]here[/URL]? I tried both the 32-bit and 64-bit versions with no success.
|
[QUOTE=wombatman;348731]Well, crap. Now I'm getting a vsyasm error when it tries to compile the two assembler files for libecm_gpu. Do you use the VS2010 vsyasm from [URL="http://yasm.tortall.net/Download.html"]here[/URL]? I tried both the 32-bit and 64-bit versions with no success.[/QUOTE]
Ive sent you my 64-bit version. Brian |
1 Attachment(s)
Alright, it compiled! I won't be able to test it on my WinXP machine for a few hours at minimum (and maybe not tonight at all). So I'm going to post it here--if you run WinXP, could you please test it? It passes my tests on Windows 7.
|
Karl, I found this in the readme file for GMP-ECM. It looks like the multithreading is supposed to occur automatically, but even after setting OMP_NUM_THREADS as an environment variable, I didn't see any change.
"As of version 6.2, a new stage 2 for the P-1 and P+1 algorithms is implemented. It uses less memory and is faster than the previous code, thus allowing larger B2 values. If GMP-ECM is configured with the "--enable-openmp" flag and is compiled with a compiler that implements OpenMP, it uses multi-threading for computation of polynomial roots and NTT multiplication. When not using the NTT, it benefits from multi-threading only in the computation of roots phase. The number of threads to use can be controlled with the OMP_NUM_THREADS environment variable. Unlike the previous generic stage 2, the new stage 2 cannot use the Brent-Suyama extension (-power and -dickson parameters). Specifying these options on the command line forces use of the generic stage 2. Note: the notation of the parameters follows that in the paper, the number of multi-point evaluations (similar to "blocks") is given by s_2. You can specify a lower limit for s_2 by the -k command line parameter." [QUOTE=Karl M Johnson;348646]After installing the [URL="http://www.microsoft.com/en-us/download/details.aspx?id=30679"]VC dependency[/URL], it worked fine. No true way to compare to classic gmp-ecm, for the GPU version uses a single thread for stage2 (perhaps there is a way, I just don't know how to force it to use n threads), but here are the benchmark results nevertheless: [CODE] 7942818512396221120536898926034544680614447375617547588581707383587816365157999 512 curves B1=1M B2=1B CPU version used ecm70dev-svn2518-x64-bulldozer via YAFU. GPU version used ecm_gpu supplied by Brian. CPU: 245s GPU: S1 19.687s CPU + 338.498s GPU ; S2 723.033s CPU [/CODE]1. Is there a better way to benchmark CPU vs GPU performance? I've noticed GPU was only used to ~70% by ECM, that's not enough. 2. Is there a way to force S2 to use n threads? If not, why was it compiled with OpenMP in the first place?[/QUOTE] |
Well, I just tried it on my Win XP box, and it fails with the same error. So, if you want a 32-bit version for now, grab the one from the other thread.
|
[QUOTE=wombatman;348748]Karl, I found this in the readme file for GMP-ECM. It looks like the multithreading is supposed to occur automatically, but even after setting OMP_NUM_THREADS as an environment variable, I didn't see any change.
[/QUOTE] Yep, I've also tried adding the -ntt flag (it's there by default, right?), but that didn't change a thing. Any devs care to shed some light on the situation? |
[QUOTE=Karl M Johnson;348815]Yep, I've also tried adding the -ntt flag (it's there by default, right?), but that didn't change a thing.
Any devs care to shed some light on the situation?[/QUOTE] Did anybody address the bug that prevents Stage2 to show the found factor in gpu-ecm? Luigi |
[QUOTE=wombatman;348796]Well, I just tried it on my Win XP box, and it fails with the same error. So, if you want a 32-bit version for now, grab the one from the other thread.[/QUOTE]
In Visual Studio 2012, there is a specific compiler -- Visual Studio 2012 - Windows XP (v110_xp) -- for building binaries for XP. You would need to use this for building both MPIR and GMP-ECM for use on XP. So we cannot expect 'normal' VS2012 binaries to work on XP (I suspect this is also true for VS2010). |
What's the bug? I hadn't seen anything about that.
[QUOTE=ET_;348834]Did anybody address the bug that prevents Stage2 to show the found factor in gpu-ecm? Luigi[/QUOTE] |
[QUOTE=wombatman;348865]What's the bug? I hadn't seen anything about that.[/QUOTE]
Post #86 and forward in this thread describes that bug, and how to work around it. |
Haha, I just went back through and found that. Odd bug--hopefully someone can get it worked out.
Unrelated question: how is loop mode activated? I've tried -c 0 -one, but that didn't do it. Also wanted to note that the bug mentioned here: [url]http://mersenneforum.org/showpost.php?p=345528&postcount=94[/url] still exists, so the workaround for GPU factors not showing up doesn't appear to work. |
Good catch. I went back through and it was set to VS2012, not VS2012 -v110_xp. I've recompiled and used "file" in MinGW, which returns:
[CODE]ecm_gpu_32bit.exe: PE32 executable for MS Windows (console) Intel 80386 32-bit[/CODE] So if anybody is willing to test out this program, please PM me your email address, and I'll email the file to you. Thanks! [QUOTE=Brian Gladman;348837]In Visual Studio 2012, there is a specific compiler -- Visual Studio 2012 - Windows XP (v110_xp) -- for building binaries for XP. You would need to use this for building both MPIR and GMP-ECM for use on XP. So we cannot expect 'normal' VS2012 binaries to work on XP (I suspect this is also true for VS2010).[/QUOTE] |
Well, never heard from anybody, but the VS2012 version I compiled does work on Win XP, though it require VCOMP110D.dll because it was compiled with OpenMP enabled. I have a .rar file if anybody wants it.
|
[QUOTE=wombatman;348867] Unrelated question: how is loop mode activated? I've tried -c 0 -one, but that didn't do it.
[/QUOTE] I'm using: ecm -c 270 -one -nn 25e4 <s4663441_17-1.ini to factor small numbers from factordb. The count after -c can vary but should be 1 or more. s4663441_17-1.ini contains the number to factor (name and contents vary). -one is optional. See the man page for ecm for more details. Chris |
Yeah, that works for me with the regular ecm, but it doesn't work on the gpu-enabled version. Maybe it's a bug, but the "-c" and "-one" options appear to be ignored.
|
Does the GPU version do several curves in parallel on the GPU? Try -c 10000 (any number larger than the number of cores on the GPU) and see what happens.
Chris PS. I don't have a GPU so I'm guessing here. But asking for zero curves with -c 0 as in your first post looks very odd. |
The -c parameter is replaced by -gpucurves in the CUDA version of ECM.
|
[QUOTE=Karl M Johnson;349137]The -c parameter is replaced by -gpucurves in the CUDA version of ECM.[/QUOTE]
-c still specifies how many curves to compute. -gpucurves specifies how many to do at a time. At least that's my experience. |
I just realized I'm a moron. I was running with things like "-c 2" with the expectation that it would run twice and then stop. Why? I have no idea. Running with, say, c=100000 and gpucurves=240 does just fine.
Also, Lorgix is right. Both -c and -gpucurves work, gpucurves is # in parallel and -c is total number of curves. |
[QUOTE=lorgix;349146]-c still specifies how many curves to compute.
-gpucurves specifies how many to do at a time.[/QUOTE] Right-o, my bad:smile: |
Why is the maximum number the GPU is allowed to take only 2^1018? I know where it's set in the code, but I'm not clear on why. Is this a hard limit?
|
Someone earlier in this thread changed that parameter to 512 bits instead of 1024, and did not find errors. You should try doubling it to 2048 to see if it will compile/run. That version would run at half the speed for every number, but if it can handle larger numbers that would be good to know/test.
|
[QUOTE=wombatman;349253]Why is the maximum number the GPU is allowed to take only 2^1018? I know where it's set in the code, but I'm not clear on why. Is this a hard limit?[/QUOTE]The arithmetic is coded for 32-bit integers with 32-word precision, a design decision which maps on well to the hardware architecture and for "most" usage patterns. Although 1=32*32=1024, a few bits are needed to guard against overflow.
I and others have reduced the precision to 512 bits without any noticeable problem. I can't see any reason why it can't be increase but you'll have to suck it and see. |
Thanks! I'll play around with it and see what I can come up with. Did we ever come up with an answer on the multithreading question for stage 2?
|
Well, it seems to work ok! In ecm-gpu.h, I changed
[CODE]#define ECM_GPU_SIZE_DIGIT 32[/CODE] to [CODE]#define ECM_GPU_SIZE_DIGIT 64[/CODE] I'm currently running through the cunningham.in list, and it's handling everything without issue. The largest number in the list I have is 300+ digits, and 2^1018 is 307 digits. I'm going to let it run through the list (B1=10000 and B2=100, just to speed things up), but it appears as though this will work fine. I should note that I tried changing ECM_GPU_NB_DIGITS from 32 to 64 previously, and that (as best I recall) did throw an error. But I'll try it again later and attempt to confirm. |
[QUOTE=wombatman;349284]Thanks! I'll play around with it and see what I can come up with. Did we ever come up with an answer on the multithreading question for stage 2?[/QUOTE]
Yes, I'd like to know the answer to this one too. All this fancy GPU acceleration goes down the drain if S2 is single threaded. And the early mfaktc-like workaround is a poor solution to the problem for the GPU utilization is quite decent already. Looked at [URL="https://pastee.org/gvrp4"]verbose output[/URL] of gpu-ecm, found no errors or hints. Perhaps the devs of this fine beta know what's going on? |
[QUOTE=Karl M Johnson;349356]All this fancy GPU acceleration goes down the drain if S2 is single threaded.[/QUOTE]Not really, in my experience. I use a save file and run stage 1 only on the GPU. The cpu usage of this is tiny in comparison with the GPU.
When the batch has finished, I use split(1) to feed multiple instances of stage 2 only, each on a separate core. While this is going on, another batch of stage 1 can be running on the GPU. This approach has found me seven previously unknown factors of GCW numbers in the last two or three weeks. There's a fair chance that another one or two are waiting for me in the presently unprocessed save file. |
Well now, you learn something new every day.
Reading README cleared out other questions, except for why stage 2 is not behaving like it should have. OpenMP support for stage 2 was added in version 6.2, and it[openmp] was last mentioned in May-June 2011, according to the changelog. Did it ever work as intended for Windows? |
Are you using split to generate pieces of your save file and then running each piece on a core? I looked at split's help, but I'm also wondering if there's a way to have it split evenly or not. If not, it's not a huge deal. Lastly, do you encounter that savefile B1 bug where B1=1 no matter what?
[QUOTE=xilman;349382]Not really, in my experience. I use a save file and run stage 1 only on the GPU. The cpu usage of this is tiny in comparison with the GPU. When the batch has finished, I use split(1) to feed multiple instances of stage 2 only, each on a separate core. While this is going on, another batch of stage 1 can be running on the GPU. This approach has found me seven previously unknown factors of GCW numbers in the last two or three weeks. There's a fair chance that another one or two are waiting for me in the presently unprocessed save file.[/QUOTE] |
[QUOTE=wombatman;349397]Are you using split to generate pieces of your save file and then running each piece on a core? I looked at split's help, but I'm also wondering if there's a way to have it split evenly or not. If not, it's not a huge deal. Lastly, do you encounter that savefile B1 bug where B1=1 no matter what?[/QUOTE]I generally just say "split -2240 gpu.save" to produce a bunch of files with at most 2240 lines in each. The choice of that magic number arises from my GTX460 running curves in batches of 224.
I don't see a bug in the save file. Anyway, wasn't that involving sigma rather than B1? The version here announces itself as follows:[code] [pcl@anubis nums]$ ecm -printconfig Compilation options: Included GMP header files version 5.0.2 Tuning parameters from x86_64/k8/params.h GWNUM_VERSION undefined HAVE_SSE2 undefined HAVE___GMPN_ADD_NC = 1 HAVE___GMPN_MOD_34LSUB1 = 1 HAVE___GMPN_REDC_1 = 1 USE_ASM_REDC = 1 WINDOWS64_ABI undefined WANT_ASSERT = 1 _OPENMP undefined MPZMOD_THRESHOLD = 21 REDC_THRESHOLD = 512 MUL_NTT_THRESHOLD = 8192 NTT_GFP_TWIDDLE_DIF_BREAKOVER = 12 NTT_GFP_TWIDDLE_DIT_BREAKOVER = 17 PREREVERTDIVISION_NTT_THRESHOLD = 16 POLYINVERT_NTT_THRESHOLD = 1024 POLYEVALT_NTT_THRESHOLD = 1024 MPZSPV_NORMALISE_STRIDE = 256 WITH_GPU = 1 [pcl@anubis nums]$ ecm 1000 GMP-ECM 7.0-dev [configured with GMP 5.0.2, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM] [/code] and it's SVN 2520. If you want to play, the following three saved intermediates are from the "xaa" file in the current batch. Run [code]ecm -resume xaa 3000000 5706890290[/code] and you should find a p40 factor in the second. [code] METHOD=ECM; PARAM=3; SIGMA=3646482223; B1=3000000; N=118322990695036678955785432384207179831828752600563592556520452205316920104672116469999707769353959745936851261210323402525408166267638430422594163638611365443718648278136032418816241316531648524461801740836064884241844413591028350073777; X=0x11bd5008598207557858a50fd00a46b1fe573ef40d5702618f6f3860281f13c281bd10ec2558ebafa6b1d1f85120cccdb40067c2262a1af09d0fee4c61c10ffefc6461fa139b0ae80a7560fb8faf8a2c14b106941b27f8fcec42ff55f4f0ccf569b83; CHECKSUM=2630827221; PROGRAM=GMP-ECM 7.0-dev; X0=0x0; Y0=0x0; WHO=pcl@anubis.home.brnikat.com; TIME=Sun Aug 11 00:42:44 2013; METHOD=ECM; PARAM=3; SIGMA=3646482222; B1=3000000; N=118322990695036678955785432384207179831828752600563592556520452205316920104672116469999707769353959745936851261210323402525408166267638430422594163638611365443718648278136032418816241316531648524461801740836064884241844413591028350073777; X=0xec72526e665388a431e623a76fca2c8c9bb86ee213a55405e0fbdea1f30caffbfdc74e558b15b26293390b7ea7ab35e5b0ee466a0d79b1e6a30eb6ffc19bea76324f273b4ff5409ad19d1f029c84b1b2ff7d9416bb323347b2e475b135743539ef83; CHECKSUM=3204746086; PROGRAM=GMP-ECM 7.0-dev; X0=0x0; Y0=0x0; WHO=pcl@anubis.home.brnikat.com; TIME=Sun Aug 11 00:42:44 2013; METHOD=ECM; PARAM=3; SIGMA=3646482221; B1=3000000; N=118322990695036678955785432384207179831828752600563592556520452205316920104672116469999707769353959745936851261210323402525408166267638430422594163638611365443718648278136032418816241316531648524461801740836064884241844413591028350073777; X=0x8b0872778fbd3525ad4823e50648f1f629b3b1539c32b9e4a5d056566beaa81a3c06e68ec0dc15e200ba31ba772845f8d7d663570fe06219c191458480ccd37a00aa3e184e6684ac213d1c90d74cba5f626d6dcad6d2c8add56c69714c1de7aa1073; CHECKSUM=1793142081; PROGRAM=GMP-ECM 7.0-dev; X0=0x0; Y0=0x0; WHO=pcl@anubis.home.brnikat.com; TIME=Sun Aug 11 00:42:44 2013; [/code] |
I was actually referring to the bug below when using the save flag. But I'll double-check and make sure it's still happening. If it is, I'll see if I can figure out a way to fix it.
Thanks for the advice on split--my gpu does 480 curves, so I'll adjust it accordingly. And I'll play around with that save file as well to make sure my GPU-ECM finds the factor appropriately. Thanks! [QUOTE=Ralf Recker;345528]The -save(a) option of current SVN HEAD generates savefiles with B1 set to 1. Here is an example from a GPU run: [CODE]METHOD=ECM; PARAM=3; SIGMA=1338409529; [B]B1=1[/B]; N=107337638919967483141063623365542229910680957563823617797446929412356389149831517148531981651634180847916212539333192454506616947630553940911141905872245222153540770799676352775350889317617472274748340543075085097852507186666728619014330952808580810310164700637173471479651231324428337639337605274423; X=0x80e2c39ebf6af056d56c5ca03fdb34173afd8189a7ae9c4e42291b3cc4e2dcfec3cb5b27f3d45e3c02cded29e571c7f00cb0d4cb536eb10e4a585e888b1ba37804d73c0d6dc715129091957eb888d831bc16417c30180be983c5fa89d77d0caac142d864eed7a09b14340a41c46dd30a49f4ce673ce450fb3bbca03d; CHECKSUM=1807052727; PROGRAM=GMP-ECM 7.0-dev; X0=0x0; Y0=0x0; WHO=ralf@quadriga; TIME=Sun Jul 7 12:54:06 2013;[/CODE]and from a CPU run: [CODE]METHOD=ECM; PARAM=1; SIGMA=2099601580; [B]B1=1[/B]; N=107337638919967483141063623365542229910680957563823617797446929412356389149831517148531981651634180847916212539333192454506616947630553940911141905872245222153540770799676352775350889317617472274748340543075085097852507186666728619014330952808580810310164700637173471479651231324428337639337605274423; X=0xec9ba92e05563539a531b6eb6aaf42e3eb8c2c05ddd1b2bdf08d4908b5ceb4875a3ceb7a6a5f9046af2ba2f27aaca39d08c51cb927ae3ffabc682df4420515b2b354631183762317ea2c4a35a965d15f5c892c63daf8e97f672bc38f2f5268b39e0d14667b82d542bde08c6d72ccd602bbf0d7586d39b08992502d5f; CHECKSUM=1971712291; PROGRAM=GMP-ECM 7.0-dev; Y=0x0; X0=0x0; Y0=0x0; WHO=ralf@quadriga; TIME=Sun Jul 7 12:57:27 2013;[/CODE]B1 was set to 11e3 in both (test)cases. batch_last_B1_used contains the B1 value given in the command line. B1done=1 when write_resumefile is called from main.c:1581 Here is a debugger output from the GPU run: [CODE]#0 main (argc=2, argv=0x7fffffffe3b0) at main.c:1581 (gdb) display params 1: params = {{method = 0, x = {{_mp_alloc = 6955, _mp_size = 6954, _mp_d = 0x7ec450}}, y = {{_mp_alloc = 1, _mp_size = 0, _mp_d = 0x6da4c0}}, param = 3, sigma = {{_mp_alloc = 2, _mp_size = 1, _mp_d = 0x6da4e0}}, sigma_is_A = 0, E = 0x6da300, go = {{ _mp_alloc = 1, _mp_size = 1, _mp_d = 0x6da5c0}}, [B]B1done = 1[/B], B2min = {{ _mp_alloc = 1, _mp_size = -1, _mp_d = 0x6da5e0}}, B2 = {{ _mp_alloc = 1, _mp_size = -1, _mp_d = 0x6da600}}, k = 0, S = 0, repr = 0, nobase2step2 = 0, verbose = 1, os = 0x7ffff72c57a0, es = 0x7ffff72c5880, chkfilename = 0x0, TreeFilename = 0x0, maxmem = 0, stage1time = 0, rng = {{_mp_seed = {{_mp_alloc = 313, _mp_size = 32767, _mp_d = 0x6da620}}, _mp_alg = GMP_RAND_ALG_DEFAULT, _mp_algdata = { _mp_lc = 0x4af140}}}, use_ntt = 1, stop_asap = 0x405a1c <stop_asap_test>, batch_s = {{_mp_alloc = 249, _mp_size = 249, _mp_d = 0x6de290}}, [B]batch_last_B1_used = 11000[/B], gpu = 1, gpu_device = -1, gpu_device_init = 1, gpu_number_of_curves = 448, gw_k = 0, gw_b = 0, gw_n = 0, gw_c = 0}} (gdb) [/CODE]Trying to resume from this savefile(s) leads to an internal error: [CODE]Error, x0 should be equal to 2 with this parametrization Please report internal errors at <ecm-discuss@lists.gforge.inria.fr>.[/CODE][/QUOTE] |
So I ran your save file as suggested, and it only found the input number as a factor.
[CODE]Input number is 1183229906950366789557854323842071798318287526005635925565204522 05316920104672116469999707769353959745936851261210323402525408166267638430422594163638611365443718648278136032418816241316531648524461801740836064884241844413591028350073777 (237 digits) Using B1=3000000-3000000, B2=5706890290, sigma=3:3646482222-3:3646482349 (128 curves) Computing 128 Step 1 took 54367ms of CPU time / 638453ms of GPU time ********** Factor found in step 2: 118322990695036678955785432384207179831828752 600563592556520452205316920104672116469999707769353959745936851261210323402525408166267638430422594163638611365443718648278136032418816241316531648524461801740836064884241844413591028350073777 Found input number N[/CODE] Here's my -printconfig [CODE]$ gpu_ecm.exe -printconfig Compilation options: Included MPIR header files version 2.6.0 Tuning parameters from x86_64/corei7/params.h GWNUM_VERSION undefined HAVE_SSE2 undefined HAVE___GMPN_ADD_NC = 1 HAVE___GMPN_MOD_34LSUB1 = 1 HAVE___GMPN_REDC_1 = 1 USE_ASM_REDC undefined WANT_ASSERT undefined _OPENMP undefined MPZMOD_THRESHOLD = 21 REDC_THRESHOLD = 512 MUL_NTT_THRESHOLD = 256 NTT_GFP_TWIDDLE_DIF_BREAKOVER = 17 NTT_GFP_TWIDDLE_DIT_BREAKOVER = 17 PREREVERTDIVISION_NTT_THRESHOLD = 8 POLYINVERT_NTT_THRESHOLD = 128 POLYEVALT_NTT_THRESHOLD = 128 MPZSPV_NORMALISE_STRIDE = 512 WITH_GPU = 1[/CODE] This is the 64-bit version that Brian compiled. I'm not sure which revision this is. |
[QUOTE=xilman;349272]The arithmetic is coded for 32-bit integers with 32-word precision, a design decision which maps on well to the hardware architecture and for "most" usage patterns. Although 1=32*32=1024, a few bits are needed to guard against overflow.
I and others have reduced the precision to 512 bits without any noticeable problem. I can't see any reason why it can't be increase but you'll have to suck it and see.[/QUOTE] Would it be tricky to make it say 768 bits? [QUOTE=wombatman;349340]Well, it seems to work ok! In ecm-gpu.h, I changed [CODE]#define ECM_GPU_SIZE_DIGIT 32[/CODE] to [CODE]#define ECM_GPU_SIZE_DIGIT 64[/CODE]I'm currently running through the cunningham.in list, and it's handling everything without issue. The largest number in the list I have is 300+ digits, and 2^1018 is 307 digits. I'm going to let it run through the list (B1=10000 and B2=100, just to speed things up), but it appears as though this will work fine. I should note that I tried changing ECM_GPU_NB_DIGITS from 32 to 64 previously, and that (as best I recall) did throw an error. But I'll try it again later and attempt to confirm.[/QUOTE] This is very interesting. If you're getting it to work correctly, could you provide binaries for say 512 bit and 2048 bit? (minus a few bits to guard against overflow) |
1 Attachment(s)
Happy to! I've attached the file here. It has both the 512 and 2048 bit ones. They're both compiled in VS2010 on Win XP 32-bit. The 2048 one is the one I currently use, and the 512 one should work fine too. In addition, these have the iterative countdown, so you can see how many loops are left. If the 512-bit exe doesn't work, just let me know and I'll test it out.
|
[QUOTE=wombatman;350024]Happy to! I've attached the file here. It has both the 512 and 2048 bit ones. They're both compiled in VS2010 on Win XP 32-bit. The 2048 one is the one I currently use, and the 512 one should work fine too. In addition, these have the iterative countdown, so you can see how many loops are left. If the 512-bit exe doesn't work, just let me know and I'll test it out.[/QUOTE]
Thanks! Since I'm running 64 bit I get an error message. Missing cudart[B]32[/B]_55.dll. Would you happen to have a 64 bit-version? |
[QUOTE=wombatman;349405]I was actually referring to the bug below when using the save flag. But I'll double-check and make sure it's still happening. If it is, I'll see if I can figure out a way to fix it.
Thanks for the advice on split--my gpu does 480 curves, so I'll adjust it accordingly. And I'll play around with that save file as well to make sure my GPU-ECM finds the factor appropriately. Thanks![/QUOTE] I've found where it is happening on my build (Git clone of SVN2525), and it makes me wonder how Paul isn't running into the same thing. SVN2519 added the following at the end of ecm_factor. [CODE] else /* restore original value */ 189 : p->B1done = oldB1done; [/CODE] When stepping through, p->B1done is set to 11000, which is right for the parameters given, but then when it hits this line, it is returned to 1.0000009536743164, which is what it was before the procedure executed. The write_resumefile method uses param->B1done, and so it writes B1=1. This also occurs for save files which do not use the GPU. I'm going to try using batch_last_B1_used instead. (It seems to hold the correct B1 value even for CPU calculations, unless someone knows where this could be a problem. |
[QUOTE=joral;350034]I've found where it is happening on my build (Git clone of SVN2525), and it makes me wonder how Paul isn't running into the same thing.
SVN2519 added the following at the end of ecm_factor. [CODE] else /* restore original value */ 189 : p->B1done = oldB1done; [/CODE] When stepping through, p->B1done is set to 11000, which is right for the parameters given, but then when it hits this line, it is returned to 1.0000009536743164, which is what it was before the procedure executed. The write_resumefile method uses param->B1done, and so it writes B1=1. This also occurs for save files which do not use the GPU. I'm going to try using batch_last_B1_used instead. (It seems to hold the correct B1 value even for CPU calculations, unless someone knows where this could be a problem.[/QUOTE]Very strange. However, I can assure you that it really does work for me. I even posted representative output. The bug whereby stage 2 finds factor=N unless a save file is used is still present and Cyril still can't reproduce it, AFAIK. Nailing bugs like these can be incredibly frustrating. Paul |
[QUOTE=lorgix;350026]Thanks!
Since I'm running 64 bit I get an error message. Missing cudart[B]32[/B]_55.dll. Would you happen to have a 64 bit-version?[/QUOTE] Not at the moment, but I might be able to tomorrow. I'm having some issues getting CUDA 5.5 reinstalled on my laptop running Windows 7, so I'm not sure. If I do build it 64-bit, it'll be with CUDA 5.0 most likely. If you have CUDA installed, you should actually have that dll--I think it just has to be in the same folder as the exe. |
[QUOTE=wombatman;350024]Happy to! I've attached the file here. It has both the 512 and 2048 bit ones. They're both compiled in VS2010 on Win XP 32-bit. The 2048 one is the one I currently use, and the 512 one should work fine too. In addition, these have the iterative countdown, so you can see how many loops are left. If the 512-bit exe doesn't work, just let me know and I'll test it out.[/QUOTE]
Is there a way to have the new feature on Linux? Luigi |
Yup, the GPU-enabled version of GMP-ECM works on Linux as well, see up this topic :smile:
|
[QUOTE=wombatman;350056]Not at the moment, but I might be able to tomorrow. I'm having some issues getting CUDA 5.5 reinstalled on my laptop running Windows 7, so I'm not sure. If I do build it 64-bit, it'll be with CUDA 5.0 most likely.
If you have CUDA installed, you should actually have that dll--I think it just has to be in the same folder as the exe.[/QUOTE] It turns out I don't have that dll. I have some older CUDA version. And nVidia wouldn't let me install the 5.5 toolkit because I don't have a supported version of Visual Studio... Then I tried updating my graphics driver, because the latest one was supposed to include support for content created with 5.5 and earlier. But that didn't give me that dll either. I'm hoping someone who knows where to get it can send me a PM. |
[QUOTE=lorgix;350112]It turns out I don't have that dll. I have some older CUDA version.
And nVidia wouldn't let me install the 5.5 toolkit because I don't have a supported version of Visual Studio... Then I tried updating my graphics driver, because the latest one was supposed to include support for content created with 5.5 and earlier. But that didn't give me that dll either. I'm hoping someone who knows where to get it can send me a PM.[/QUOTE] [url=https://dl.dropboxusercontent.com/s/qy365grbx7yhaoi/cudart32_55.dll]There you go[/url]. |
Thanks for doing that Karl!
|
Thanks a bunch!
It's working now, but I have to run. I'll test it more in a few hours. |
[QUOTE=debrouxl;350098]Yup, the GPU-enabled version of GMP-ECM works on Linux as well, see up this topic :smile:[/QUOTE]
I just "have to suck it and see"... :smile: Luigi |
[QUOTE=wombatman;350024]Happy to! I've attached the file here. It has both the 512 and 2048 bit ones. They're both compiled in VS2010 on Win XP 32-bit. The 2048 one is the one I currently use, and the 512 one should work fine too. In addition, these have the iterative countdown, so you can see how many loops are left. If the 512-bit exe doesn't work, just let me know and I'll test it out.[/QUOTE]
I tested the 512 bit one with a c152. I used values of B1 from 2e3 to 25e4. The speed-up was only 0.9~1.1%. IIRC, xilman managed to get sizable speed-ups. I wonder what he did. |
Maybe 32 bit vs 64 bit? I've just managed to get CUDA 5.5 reinstalled, so I could play around with making a 64 bit binary for you to test out.
|
Just a heads-up. The 512 bit and 2048 bit programs I generated fail the 1st test in the test.gpuecm file. It is supposed to find a factor in Step 1, but it exits reporting no factor found. I confirmed that a standard 1024 bit version works fine.
|
Apologies if I've missed something -- was a link to a 64-bit build for Linux (or source) posted earlier on this thread?
|
I'm pretty sure there's one in here somewhere, though I don't remember exactly where.
|
Is there a simple way to only run, say, stage 1 of GPU-ECM? I'd like to run the stage 1 on my desktop's GPU and then run stage 2 on my laptop (64-bit processor). I can churn through 480 stage 1 curves at a time on my GPU, but then my (relatively) ancient processor takes an inordinately long amount of time to do the stage 2 work before the residues show up in the save file. So if at all possible, I'd like to cut out stage 2 and simply use "-resume" with the residues file.
Edit: And just like that, I seem to have answered my own question. Setting B2=0 appears to have done it. |
Hello
Can this gpu-ecm.exe executable be used with yafu by just renaming it? Or it's not compatible? |
I imagine it would work fine by directing yafu to it, but you wouldn't be using the GPU to actually run the curves. As such, it would be simpler to use the standard GMP-ECM. In addition, there's a bug (at least in the versions I've compiled) where factors that should be found in Stage 2 are not found by the GPU version. So you'd likely miss factors if you tried to use the GPU-enabled version.
|
Has anyone gotten save/resume to work? Here's what I get:
[CODE]./ecm -gpu -save c192.ecm 4e4 0 < C192 GMP-ECM 7.0-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert, --enable-openmp] [ECM] Input number is 361867761627749619574674677769104106609222942556126375131641520128131226876942053491915389383216114202550615939995196059592933366284139274937165773815556682024285008268664983092158298987980517 (192 digits) Using B1=40000, B2=0, sigma=3:3642364246-3:3642365077 (832 curves) Computing 832 Step 1 took 472ms of CPU time / 21426ms of GPU time [/CODE] This gives a save file like [CODE] METHOD=ECM; PARAM=3; SIGMA=3642365077; B1=1; N=361867761627749619574674677769104106609222942556126375131641520128131226876942053491915389383216114202550615939995196059592933366284139274937165773815556682024285008268664983092158298987980517; X=0x7cb7309ea2ec4ee5d88962fa7516f43385ad69f16e28b04b8beee48d15091cf009cd4119d57a505d6c062036b1d8a784d11d4d87873df12d62bc03b9f10e7a1671e666030d895d55d22d5da9743992e; CHECKSUM=3136759905; PROGRAM=GMP-ECM 7.0-dev; X0=0x0; Y0=0x0; WHO=cluster@compute-0-1.local; TIME=Wed Oct 9 17:49:08 2013; [/CODE] Then on resume: [CODE]./ecm -resume c192.ecm 4e4 GMP-ECM 7.0-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert, --enable-openmp] [ECM] Resuming ECM residue saved by cluster@compute-0-1.local with GMP-ECM 7.0-dev on Wed Oct 9 17:49:08 2013 Input number is 361867761627749619574674677769104106609222942556126375131641520128131226876942053491915389383216114202550615939995196059592933366284139274937165773815556682024285008268664983092158298987980517 (192 digits) Error, x0 should be equal to 2 with this parametrization Please report internal errors at <ecm-discuss@lists.gforge.inria.fr>. [/CODE] Does this work for anyone? Thanks! |
Sorry to have edited so many times, but you're using the right command--it looks like the issue is with the B1 being wrong in the save file. I just ran with the GPU using the input number, and I have no issues using -resume. I don't remember what was done to fix the B1 save file issue though.
|
Here's a post that might have a fix for the B1=1 bug:
[url]http://mersenneforum.org/showpost.php?p=350034&postcount=174[/url] Also, my version is 32-bit. Are you using a 32 or 64 bit build? I went ahead and ran 480 curves of your C192 at 4e4--no factors found. |
| All times are UTC. The time now is 04:22. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.