![]() |
Msieve v1.47 feedback
Just starting things up here. This version includes some tuning for Nvidia's latest graphics cards, so if you have a Fermi card and know how to run NFS polynomial selection, then please give it a try. The precompiled binary on sourceforge now has two sets of PTX files, and if you have a Fermi card and version 3.0 or later of Nvidia's CUDA toolkit, then use the ptx files in the 'fermi' subdirectory.
|
Thank you for the new version.
About MPI code, can it be used in a local area network? I want to take advantage of all computers I have in my LAN. If it is a stupid question please say so. |
[QUOTE=em99010pepe;230463]Thank you for the new version.
About MPI code, can it be used in a local area network? I want to take advantage of all computers I have in my LAN. If it is a stupid question please say so.[/QUOTE] Yes, you can install MPI on all your LAN machines and spread out the work via your LAN. |
When I compile msieve with VS 2010 for 32bit Release mode using MPIR 2.1.2, it seems to work when doing QS. The 64bit version works fine and the pre-compiled 32bit from Jason. Can anyone else reproduce this with VS 32bit binary?
It crashes for me with this number: 432717521981854371284364875772580616114001895598914408712561227 [CODE]Sun Sep 19 13:56:15 2010 Sun Sep 19 13:56:15 2010 Sun Sep 19 13:56:15 2010 Msieve v. 1.47 Sun Sep 19 13:56:15 2010 random seeds: 2d89fa40 345674ae Sun Sep 19 13:56:15 2010 factoring 432717521981854371284364875772580616114001895598914408712561227 (63 digits) Sun Sep 19 13:56:16 2010 searching for 15-digit factors Sun Sep 19 13:56:16 2010 commencing quadratic sieve (63-digit input) Sun Sep 19 13:56:16 2010 using multiplier of 43 Sun Sep 19 13:56:16 2010 using VC8 32kb sieve core Sun Sep 19 13:56:16 2010 sieve interval: 10 blocks of size 32768 Sun Sep 19 13:56:16 2010 processing polynomials in batches of 21 Sun Sep 19 13:56:16 2010 using a sieve bound of 101281 (4788 primes) Sun Sep 19 13:56:16 2010 using large prime bound of 5064050 (22 bits) Sun Sep 19 13:56:16 2010 using trial factoring cutoff of 22 bits Sun Sep 19 13:56:16 2010 polynomial 'A' values have 8 factors Sun Sep 19 13:56:28 2010 5073 relations (2328 full + 2745 combined from 23261 partial), need 4884 Sun Sep 19 13:56:28 2010 begin with 25589 relations Sun Sep 19 13:56:28 2010 reduce to 7450 relations in 2 passes Sun Sep 19 13:56:28 2010 attempting to read 7450 relations Sun Sep 19 13:56:28 2010 recovered 7450 relations Sun Sep 19 13:56:28 2010 recovered 5951 polynomials Sun Sep 19 13:56:28 2010 attempting to build 5073 cycles Sun Sep 19 13:56:28 2010 found 5073 cycles in 1 passes Sun Sep 19 13:56:28 2010 distribution of cycle lengths: Sun Sep 19 13:56:28 2010 length 1 : 2328 Sun Sep 19 13:56:28 2010 length 2 : 2745 Sun Sep 19 13:56:28 2010 largest cycle: 2 relations Sun Sep 19 13:56:28 2010 matrix is 4788 x 5073 (0.6 MB) with weight 139188 (27.44/col) Sun Sep 19 13:56:28 2010 sparse part has weight 139188 (27.44/col) Sun Sep 19 13:56:28 2010 filtering completed in 4 passes Sun Sep 19 13:56:28 2010 matrix is 4427 x 4491 (0.5 MB) with weight 119972 (26.71/col) Sun Sep 19 13:56:28 2010 sparse part has weight 119972 (26.71/col) Sun Sep 19 13:56:28 2010 commencing Lanczos iteration Sun Sep 19 13:56:28 2010 memory use: 0.6 MB[/CODE] At the Lanczos stage it crashes with this error: [B]Unhandled exception at 0x0046b060 in msieve32.exe: 0xC0000005: Access violation reading location 0x00b4e000.[/B] The actual violation happens in the last line of this code snippet: [CODE]#elif defined(MSC_ASM32A) i = 0; ASM_M { push ebx mov edi,y lea ebx,c mov esi,x mov ecx,i align 16 L0: movq mm0,[edi+ecx*8] [/CODE] So at the L0: stage. Here is what the memory browser is showing for values: L0 0x0046b060 L0 void * ecx 15355 unsigned long edi 11730984 unsigned long i 0 unsigned int mm0 10956266787831142436 unsigned __int64 Any ideas? |
[QUOTE=Jeff Gilchrist;230468]Yes, you can install MPI on all your LAN machines and spread out the work via your LAN.[/QUOTE]
Note that while it's possible to use a collection of machines on a LAN to run an arbitrary MPI program, in practice it will help a great deal if all the machines are similar speed and the switch you use is as fast as possible. Don't even bother with MPI on a LAN for the LA if you don't have a gigabit switch. Jeff, does the crash happen with v1.46 too? (I would predict yes). It's possible something in the LA is not initialized properly when compiling with MSVC. Actually, a small matrix like this would use the unpacked solver code, and the crash happens in the vector-vector multiply. An array index of 15355 is a lot bigger than the matrix dimension, so is something wrapping around? |
[QUOTE=jasonp;230472]Jeff, does the crash happen with v1.46 too? (I would predict yes). It's possible something in the LA is not initialized properly when compiling with MSVC. Actually, a small matrix like this would use the unpacked solver code, and the crash happens in the vector-vector multiply. An array index of 15355 is a lot bigger than the matrix dimension, so is something wrapping around?[/QUOTE]
Yes it does crash in v1.46 as well when compiled with MSVC. Not sure about the wrapping stuff, is there something in particular I can check for you? Brian might be able to debug this more easily. |
New 64bit Windows binaries now available for 1.47 including both default settings and a second binary compiled with LARGEBLOCKS and TD=80
[url]http://gilchrist.ca/jeff/factoring/[/url] Jeff. |
Thanks Jeff,
Your binaries are always helpfull to me and I'll be happy to test the LARGEBLOCK version of your binaries with the c154 I am dealing with. A question : I am very interested in the latest changes of the gpu enabled version of the polynomial selection, because of the introduction of a 12 multiplier instead of the "classical" 60 for the smallest coefficient. This is the multiplier I used with ggnfs (amongst 144 and 720 to cover different search intervals). My problem is that I am totally unable to compile gpu version of msieve with cygwin (I fear this isn't possible), and that I cannot test sources changes by myself... I'd have a request : Do you have the tools, and the time, to compile a 64 bits version of the gpu_enabled source ? I currently do np1 with sourceforge 32 bits gpu'able binaries and plan to switch to np2 with yours...should it works, should the step 2 be faster this way ? Thanks again for your work, and also a great thanks to Jason for improving msieve ! Regards, Philippe |
[QUOTE=Jeff Gilchrist;230475]Yes it does crash in v1.46 as well when compiled with MSVC. Not sure about the wrapping stuff, is there something in particular I can check for you? Brian might be able to debug this more easily.[/QUOTE]
I can see this bug in win32, which I suspect is a compiler bug because the registers are not being loaded with the correct values. Brian |
Philippe, I thought of you when jrk committed this patch :) It should be possible to compile the GPU version of msieve in cygwin, but you'd need to find a way to link to Nvidia's precompiled DLLs, and must have Microsoft's compiler to run Nvidia's compiler (the free MSVC Express is what I use). Note that poly selection really does not benefit from using a 64-bit binary, though optimizations are possible in the CPU branch that can significantly improve performance.
Oh, and you're welcome :) |
[QUOTE=Brian Gladman;230511]I can see this bug in win32, which I suspect is a compiler bug because the registers are not being loaded with the correct values.
Brian[/QUOTE] Sadly, this was a bug in my inline assembler code, which I have now corrected this in the msieve SVN repository. I have also corrected a minor win32 build configuration issue. Brian |
| All times are UTC. The time now is 04:50. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.