mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2011-02-21, 22:22   #12
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Did this run try to go straight to the square root after the LA? If yes, maybe there would be some difference if you only ran with -ncr? If one of the nodes can't find a dependency file because the final dependencies are getting written out, then calls exit(), then mpirun then kills node zero, it could cause the errors you're seeing.

Last fiddled with by jasonp on 2011-02-21 at 22:23
jasonp is offline   Reply With Quote
Old 2011-02-21, 23:36   #13
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3×17×23 Posts
Default

Quote:
Originally Posted by jasonp View Post
Did this run try to go straight to the square root after the LA? If yes, maybe there would be some difference if you only ran with -ncr? If one of the nodes can't find a dependency file because the final dependencies are getting written out, then calls exit(), then mpirun then kills node zero, it could cause the errors you're seeing.
It was with -nc3 as well. I just tried with -ncr by itself and same thing:

Code:
linear algebra at 99.8%, ETA 0h 3mf 5860705 dimensions (99.8%, ETA 0h 3m)    
checkpointing every 180000 dimensions860705 dimensions (99.9%, ETA 0h 2m)    
lanczos halted after 92679 iterations (dim = 5860478)s (100.0%, ETA 0h 0m)    
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)

lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos halted after 92679 iterations (dim = 5860478)
lanczos error: only trivial dependencies found
lanczos error: only trivial dependencies found
lanczos error: only trivial dependencies found
lanczos error: only trivial dependencies found
BLanczosTime: 311
BLanczosTime: 311
elapsed time 00:05:13
BLanczosTime: 311
elapsed time 00:05:13
BLanczosTime: 311
lanczos error: only trivial dependencies found
lanczos error: only trivial dependencies found
elapsed time 00:05:13
elapsed time 00:05:13
lanczos error: only trivial dependencies found
lanczos error: only trivial dependencies found
BLanczosTime: 310
BLanczosTime: 311
elapsed time 00:05:13
BLanczosTime: 311
elapsed time 00:05:13
BLanczosTime: 310
elapsed time 00:05:13
elapsed time 00:05:13
lanczos error: only trivial dependencies found
BLanczosTime: 311
lanczos error: only trivial dependencies found
elapsed time 00:05:13
BLanczosTime: 311
elapsed time 00:05:13
lanczos error: only trivial dependencies found
lanczos error: only trivial dependencies found
BLanczosTime: 311
elapsed time 00:05:13
BLanczosTime: 311
elapsed time 00:05:13
lanczos error: only trivial dependencies found
BLanczosTime: 311
lanczos error: only trivial dependencies found
lanczos error: only trivial dependencies found
elapsed time 00:05:13
lanczos error: only trivial dependencies found
BLanczosTime: 311
BLanczosTime: 311
elapsed time 00:05:13
elapsed time 00:05:13
BLanczosTime: 311
elapsed time 00:05:13
lanczos error: only trivial dependencies found
BLanczosTime: 312
elapsed time 00:05:14
lanczos error: only trivial dependencies found
BLanczosTime: 312
elapsed time 00:05:14
lanczos error: only trivial dependencies found
BLanczosTime: 312
elapsed time 00:05:14
lanczos error: only trivial dependencies found
BLanczosTime: 312
elapsed time 00:05:14
lanczos error: only trivial dependencies found
BLanczosTime: 312
elapsed time 00:05:14
lanczos error: only trivial dependencies found
BLanczosTime: 312
elapsed time 00:05:15
lanczos error: only trivial dependencies found
BLanczosTime: 313
elapsed time 00:05:15
lanczos error: only trivial dependencies found
BLanczosTime: 313
elapsed time 00:05:15
recovered 38 nontrivial dependencies
BLanczosTime: 320
elapsed time 00:05:22

Last fiddled with by Jeff Gilchrist on 2011-02-22 at 00:01
Jeff Gilchrist is offline   Reply With Quote
Old 2011-02-22, 01:32   #14
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Quote:
recovered 38 nontrivial dependencies
BLanczosTime: 320
On the contrary, looks like it worked!

More fundamentally, it makes more sense for the LA to detect that multiple MPI processes are running and just make all the non-root MPI processes go away, instead of making up a fake error condition. You've shown that the fake error is worse than confusing, it actually sabotages the entire run. Though it also is of questionable value to make the square root continue on one node while all the others wait for it to finish, possibly for hours.

Last fiddled with by jasonp on 2011-02-22 at 01:38
jasonp is offline   Reply With Quote
Old 2011-02-22, 14:19   #15
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3×17×23 Posts
Default

Quote:
Originally Posted by jasonp View Post
On the contrary, looks like it worked!

More fundamentally, it makes more sense for the LA to detect that multiple MPI processes are running and just make all the non-root MPI processes go away, instead of making up a fake error condition. You've shown that the fake error is worse than confusing, it actually sabotages the entire run. Though it also is of questionable value to make the square root continue on one node while all the others wait for it to finish, possibly for hours.
Sorry, you are right, running -ncr did work. I missed the successful message near the end with all the other error messages in there. It did indeed create a .dep file and I was able to complete -nc3 and got the factors. Woohoo!

Besides changing the error messages is there anything you can do to fix something running -nc2 -nc3 with MPI so that it doesn't kill rank 0 before it finishes writing the .dep file?

Jeff.
Jeff Gilchrist is offline   Reply With Quote
Old 2011-02-22, 17:28   #16
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

67258 Posts
Default

Honestly the best policy might be to just refuse to run if you want to do parallel LA and the square root consecutively. You can do the square root easily on multiple MPI processes, but there's little point in doing so unless the parallelism is pushed into the code for a single square root. You also run the risk of exhausting memory if all the MPI processes are on a single machine and you're doing several dependencies at once.

The alternative is to make the LA have all the non-root processes call MPI_Finalize when the Lanczos code finishes; this makes all the non-root MPI processes wait until the dependencies are written, but also makes them wait while the root completes the square root.

Thanks again for the bug report, it probably has saved a lot of headaches for everyone else.

Last fiddled with by jasonp on 2011-02-22 at 18:05
jasonp is offline   Reply With Quote
Old 2011-02-23, 04:19   #17
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3·17·23 Posts
Default

I guess I was just so used to chaining the 3 stages on a multi-threaded machine, that I automatically did the same thing with the MPI version. Now that I know the pitfalls, I can easily avoid it.
Jeff Gilchrist is offline   Reply With Quote
Old 2011-02-26, 11:18   #18
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3×17×23 Posts
Default

I think I found another bug. I completed another 30 bit factoring job using a 5x5 grid with MPI and it worked fine. I tried to launch it using a 6x6 grid with 36 CPUs but it fails saying that the maxtrix expects <= 35 MPI procs. Here is the log from rank 0.

Code:
Fri Feb 25 22:15:36 2011  
Fri Feb 25 22:15:36 2011  
Fri Feb 25 22:15:36 2011  Msieve v. 1.48
Fri Feb 25 22:15:36 2011  random seeds: 8f6dcfec 1e657090
Fri Feb 25 22:15:36 2011  MPI process 0 of 36
Fri Feb 25 22:15:36 2011  factoring 61377934632499616387908546877397575571582016265152677612398342795764922510357526469234310265459567285560840877704465244744514347092220346785330673622832591683289857296301979438391898112628510050636796071812183520024551173853 (224 digits)
Fri Feb 25 22:15:39 2011  no P-1/P+1/ECM available, skipping
Fri Feb 25 22:15:39 2011  commencing number field sieve (224-digit input)
Fri Feb 25 22:15:39 2011  R0: -20000000000000000000000000000000000000
Fri Feb 25 22:15:39 2011  R1:  1
Fri Feb 25 22:15:39 2011  A0:  1
Fri Feb 25 22:15:39 2011  A1:  0
Fri Feb 25 22:15:39 2011  A2:  0
Fri Feb 25 22:15:39 2011  A3:  0
Fri Feb 25 22:15:39 2011  A4:  0
Fri Feb 25 22:15:39 2011  A5:  0
Fri Feb 25 22:15:39 2011  A6:  6250
Fri Feb 25 22:15:39 2011  skew 0.23, size 3.143e-11, alpha -0.950, combined = 1.477e-12 rroots = 0
Fri Feb 25 22:15:39 2011  
Fri Feb 25 22:15:39 2011  commencing linear algebra
Fri Feb 25 22:15:39 2011  initialized process (0,0) of 6 x 6 grid
Fri Feb 25 22:15:45 2011  read 5168723 cycles
Fri Feb 25 22:16:02 2011  cycles contain 14325173 unique relations
Fri Feb 25 22:21:09 2011  read 14325173 relations
Fri Feb 25 22:21:54 2011  using 20 quadratic characters above 1073739768
Fri Feb 25 22:23:11 2011  building initial matrix
Fri Feb 25 22:28:30 2011  memory use: 1962.6 MB
Fri Feb 25 22:28:49 2011  read 5168723 cycles
Fri Feb 25 22:28:51 2011  matrix is 5168543 x 5168723 (1719.3 MB) with weight 516830953 (99.99/col)
Fri Feb 25 22:28:51 2011  sparse part has weight 393855757 (76.20/col)
Fri Feb 25 22:30:01 2011  filtering completed in 1 passes
Fri Feb 25 22:30:03 2011  matrix is 5168543 x 5168723 (1719.3 MB) with weight 516830953 (99.99/col)
Fri Feb 25 22:30:03 2011  sparse part has weight 393855757 (76.20/col)
Fri Feb 25 22:31:24 2011  error: matrix expects MPI procs <= 35
All the nodes except rank 0 have an initialized line like:
commencing linear algebra
initialized process (0,1) of 6 x 6 grid

4 of the non-rank 0 processes also have the error message after their initialized process message:
error: matrix expects MPI procs <= 35

Is there any other info you want to see or want me to try?

Jeff.
Jeff Gilchrist is offline   Reply With Quote
Old 2011-02-26, 14:00   #19
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DD516 Posts
Default

Try the latest SVN; this was an error check that needed fixing, and was noticed by Ilya Popovyan a few weeks ago.

Last fiddled with by jasonp on 2011-02-26 at 14:01
jasonp is offline   Reply With Quote
Old 2011-03-10, 13:14   #20
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

2A0116 Posts
Default

Quote:
Originally Posted by jasonp View Post
Now available on sourceforge.

The major change in this release is a huge overhaul of stage 1 of NFS polynomial selection, courtesy of a great deal of work by jrk with help from me. The code is a great deal better, big CPU serches are actually feasible now, and the GPU code has received an overhaul that should enable higher throughput (no numbers here yet though). Poly selection is fast enough that degree 4 is used for all inputs < 110 digits
Has anyone managed to build this under 64-bit Linux and CUDA 3.2 for a Fermi gpu?

Running into all sorts of hassles here and have had to hack the Makefile in several ways to make progress. CUDA_ROOT had to be set explicitly (to /usr/local/cuda/ in my case) and the library is presumably /usr/local/cuda/lib64/cudart.so --- there is no cuda.lib as given in the Makefile.

The linker keeps faiing to find the cuda library routines:
Code:
gcc -D_FILE_OFFSET_BITS=64 -O3 -fomit-frame-pointer -march=k8 -DNDEBUG -D_LARGEFILE64_SOURCE  -Wall -W -I. -Iinclude -Ignfs -Ignfs/poly -Ignfs/poly/stage1 -I"/usr/local/cuda/include" -DHAVE_CUDA demo.c -o msieve  \
        libmsieve.a -lz -lgmp -lm -lpthread
libmsieve.a(stage1.no): In function `poly_stage1_run':
stage1.c:(.text+0x280): undefined reference to `cuCtxCreate_v2'
stage1.c:(.text+0x2a1): undefined reference to `cuModuleLoad'
stage1.c:(.text+0x2c2): undefined reference to `cuModuleLoad'
stage1.c:(.text+0x7ba): undefined reference to `cuCtxDestroy'
libmsieve.a(stage1_sieve_gpu_nosq.no): In function `sieve_lattice_gpu_nosq':
stage1_sieve_gpu_nosq.c:(.text+0x516): undefined reference to `cuModuleGetFunction'
stage1_sieve_gpu_nosq.c:(.text+0x6af): undefined reference to `cuMemAlloc_v2'
stage1_sieve_gpu_nosq.c:(.text+0x6d0): undefined reference to `cuModuleGetGlobal_v2'
stage1_sieve_gpu_nosq.c:(.text+0x6ef): undefined reference to `cuFuncGetAttribute'
stage1_sieve_gpu_nosq.c:(.text+0x715): undefined reference to `cuFuncSetBlockShape'
stage1_sieve_gpu_nosq.c:(.text+0x75e): undefined reference to `cuMemAlloc_v2'
stage1_sieve_gpu_nosq.c:(.text+0x952): undefined reference to `cuParamSetv'
stage1_sieve_gpu_nosq.c:(.text+0x982): undefined reference to `cuParamSetv'
stage1_sieve_gpu_nosq.c:(.text+0x999): undefined reference to `cuParamSetSize'
stage1_sieve_gpu_nosq.c:(.text+0xa50): undefined reference to `cuModuleGetFunction'
stage1_sieve_gpu_nosq.c:(.text+0xaf6): undefined reference to `cuParamSeti'
stage1_sieve_gpu_nosq.c:(.text+0xc0e): undefined reference to `cuMemcpyHtoD_v2'
stage1_sieve_gpu_nosq.c:(.text+0xc29): undefined reference to `cuParamSeti'
stage1_sieve_gpu_nosq.c:(.text+0xcb3): undefined reference to `cuMemcpyHtoD_v2'
stage1_sieve_gpu_nosq.c:(.text+0xccd): undefined reference to `cuParamSeti'
stage1_sieve_gpu_nosq.c:(.text+0xd0c): undefined reference to `cuLaunchGrid'
stage1_sieve_gpu_nosq.c:(.text+0xd3a): undefined reference to `cuMemcpyDtoH_v2'
stage1_sieve_gpu_nosq.c:(.text+0xf24): undefined reference to `cuMemFree_v2'
stage1_sieve_gpu_nosq.c:(.text+0xf35): undefined reference to `cuMemFree_v2'
libmsieve.a(stage1_sieve_gpu_sq.no): In function `trans_batch_sq.clone.1':
stage1_sieve_gpu_sq.c:(.text+0xb9): undefined reference to `cuParamSetv'
stage1_sieve_gpu_sq.c:(.text+0xe6): undefined reference to `cuParamSetv'
stage1_sieve_gpu_sq.c:(.text+0x101): undefined reference to `cuParamSeti'
stage1_sieve_gpu_sq.c:(.text+0x12e): undefined reference to `cuParamSetf'
stage1_sieve_gpu_sq.c:(.text+0x145): undefined reference to `cuParamSetSize'
stage1_sieve_gpu_sq.c:(.text+0x18f): undefined reference to `cuMemcpyHtoD_v2'
stage1_sieve_gpu_sq.c:(.text+0x28e): undefined reference to `cuMemcpyHtoD_v2'
stage1_sieve_gpu_sq.c:(.text+0x2a9): undefined reference to `cuParamSeti'
stage1_sieve_gpu_sq.c:(.text+0x2d9): undefined reference to `cuLaunchGrid'
stage1_sieve_gpu_sq.c:(.text+0x2f9): undefined reference to `cuMemcpyDtoH_v2'
libmsieve.a(stage1_sieve_gpu_sq.no): In function `sieve_lattice_gpu_sq':
stage1_sieve_gpu_sq.c:(.text+0x728): undefined reference to `cuModuleGetFunction'
stage1_sieve_gpu_sq.c:(.text+0x745): undefined reference to `cuModuleGetFunction'
stage1_sieve_gpu_sq.c:(.text+0x819): undefined reference to `cuMemAlloc_v2'
stage1_sieve_gpu_sq.c:(.text+0x834): undefined reference to `cuMemAlloc_v2'
stage1_sieve_gpu_sq.c:(.text+0x853): undefined reference to `cuFuncGetAttribute'
stage1_sieve_gpu_sq.c:(.text+0x879): undefined reference to `cuFuncSetBlockShape'
stage1_sieve_gpu_sq.c:(.text+0x898): undefined reference to `cuFuncGetAttribute'
stage1_sieve_gpu_sq.c:(.text+0x8be): undefined reference to `cuFuncSetBlockShape'
stage1_sieve_gpu_sq.c:(.text+0x910): undefined reference to `cuMemAlloc_v2'
stage1_sieve_gpu_sq.c:(.text+0xbd2): undefined reference to `cuParamSetv'
stage1_sieve_gpu_sq.c:(.text+0xc07): undefined reference to `cuParamSetv'
stage1_sieve_gpu_sq.c:(.text+0xc20): undefined reference to `cuParamSeti'
stage1_sieve_gpu_sq.c:(.text+0xc55): undefined reference to `cuParamSetv'
stage1_sieve_gpu_sq.c:(.text+0xc6c): undefined reference to `cuParamSetSize'
stage1_sieve_gpu_sq.c:(.text+0xdfc): undefined reference to `cuMemcpyHtoD_v2'
stage1_sieve_gpu_sq.c:(.text+0xe1a): undefined reference to `cuParamSeti'
stage1_sieve_gpu_sq.c:(.text+0xf5a): undefined reference to `cuMemcpyHtoD_v2'
stage1_sieve_gpu_sq.c:(.text+0xf75): undefined reference to `cuParamSeti'
stage1_sieve_gpu_sq.c:(.text+0xf93): undefined reference to `cuLaunchGrid'
stage1_sieve_gpu_sq.c:(.text+0xfb9): undefined reference to `cuMemcpyDtoH_v2'
stage1_sieve_gpu_sq.c:(.text+0x11e7): undefined reference to `cuMemFree_v2'
stage1_sieve_gpu_sq.c:(.text+0x11fd): undefined reference to `cuMemFree_v2'
stage1_sieve_gpu_sq.c:(.text+0x1213): undefined reference to `cuMemFree_v2'
stage1_sieve_gpu_sq.c:(.text+0x169b): undefined reference to `cuModuleGetFunction'
libmsieve.a(cuda_xface.o): In function `gpu_init':
cuda_xface.c:(.text+0x1e9): undefined reference to `cuInit'
cuda_xface.c:(.text+0x1f9): undefined reference to `cuDeviceGetCount'
cuda_xface.c:(.text+0x229): undefined reference to `cuDeviceGet'
cuda_xface.c:(.text+0x256): undefined reference to `cuDeviceGetName'
cuda_xface.c:(.text+0x274): undefined reference to `cuDeviceComputeCapability'
cuda_xface.c:(.text+0x288): undefined reference to `cuDeviceGetProperties'
cuda_xface.c:(.text+0x2f1): undefined reference to `cuDeviceTotalMem_v2'
cuda_xface.c:(.text+0x30c): undefined reference to `cuDeviceGetAttribute'
cuda_xface.c:(.text+0x326): undefined reference to `cuDeviceGetAttribute'
cuda_xface.c:(.text+0x33d): undefined reference to `cuDeviceGetAttribute'
collect2: ld returned 1 exit status
make: *** [x86_64] Error 1
/etc/ld.so.conf.d/cuda.conf contains /usr/local/cuda/lib64 and I've also tried setting $LD_LIBRARY_PATH to no avail.

I'm stuck at the moment.

Paul
xilman is offline   Reply With Quote
Old 2011-03-10, 14:26   #21
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

cuda.lib is the name of the driver library under windows. On linux it would be libcuda.a, though I don't know where in the file tree it would be, probably right next to libcudart.a

Msieve uses the driver API, not the runtime API, so that a compiled binary can work on a machine with only the Nvidia driver installed (and not the whole CUDA toolkit).

FWIW I had access to a linux system and could not even install the CUDA toolkit to the point where the sample applications worked.
jasonp is offline   Reply With Quote
Old 2011-03-10, 16:20   #22
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default

Quote:
Originally Posted by jasonp View Post
cuda.lib is the name of the driver library under windows. On linux it would be libcuda.a, though I don't know where in the file tree it would be, probably right next to libcudart.a

Msieve uses the driver API, not the runtime API, so that a compiled binary can work on a machine with only the Nvidia driver installed (and not the whole CUDA toolkit).

FWIW I had access to a linux system and could not even install the CUDA toolkit to the point where the sample applications worked.
AFAICT, the standard install doesn't have a libcudart.a --- there are only shareable libraries on the system.

If you wish I can give you ssh access to a machine with CUDA 3.2, 64-bit Fedora 14, a GT460 and the 260.19.36 Nvidia driver. The CUDA kit installed without any problems here and the SDK examples all run correctly.

Paul
xilman is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Msieve 1.53 feedback xilman Msieve 149 2018-11-12 06:37
Msieve 1.50 feedback firejuggler Msieve 99 2013-02-17 11:53
Msieve 1.43 feedback Jeff Gilchrist Msieve 47 2009-11-24 15:53
Msieve 1.42 feedback Andi47 Msieve 167 2009-10-18 19:37
Msieve 1.41 Feedback Batalov Msieve 130 2009-06-09 16:01

All times are UTC. The time now is 01:00.


Sat Jul 17 01:00:07 UTC 2021 up 49 days, 22:47, 1 user, load averages: 1.65, 1.41, 1.36

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.