mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2012-04-12, 14:28   #23
jrk
 
jrk's Avatar
 
May 2008

100010001112 Posts
Default

Quote:
Originally Posted by poily View Post
I'll play with the GPU mode later after jrk sends me the patches he promised.
Check your messages.
jrk is offline   Reply With Quote
Old 2012-04-12, 20:56   #24
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

223338 Posts
Default

Quote:
Originally Posted by Batalov View Post
The question when to bail everyone decides for themselves - there are too many parameters (how many sievers will be used, how many poly select threads are used, etc). Or leave it up to the built-in timeout...
It occurred to me that one change that can be very easily added (and is probably about right) is
to multiply those expected values by another 1.15 -- #ifdef HAVE_CUDA.
Batalov is offline   Reply With Quote
Old 2012-04-13, 18:44   #25
jrk
 
jrk's Avatar
 
May 2008

3·5·73 Posts
Default

Quote:
Originally Posted by jrk View Post
Check your messages.
For the record, the problem that poily & I identified was a discrepancy in pointer sizes between the msieve binary and the PTX assembly on his macbook. The msieve binary was built for 64bit and the PTX for 32bit and this resulted in an overflow in the CUDA parameter assignment, which led to the reported error.

The fix was to add -m 64 to the nvcc build options, which is apparently necessary on the mac platform. I'm not sure what is the most proper way to handle this in the Makefile.
jrk is offline   Reply With Quote
Old 2012-04-13, 18:47   #26
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DD216 Posts
Default

Looks like this project is starting to really hurt from not having an autoconf script. I've been desperately avoiding trying to learn how the autotools work, but the contortions to avoid them are starting to exceed the pain of learning.
jasonp is offline   Reply With Quote
Old 2012-04-13, 18:56   #27
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

3·5·17·37 Posts
Default

You could go really fancy and redress it as a CMake project.
I know that that's what Illumina did to their sequencing pipeline. Looks trendy.
Batalov is offline   Reply With Quote
Old 2012-04-14, 07:30   #28
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default

On the one hand, definitions for CMake are usually somewhat easier to create than definitions for autotools.
On the other hand, CMake is much less widely used, and doesn't support a range of platforms, functions, quirks and workarounds as wide as the autotools do.
debrouxl is offline   Reply With Quote
Old 2012-04-17, 09:44   #29
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

246768 Posts
Default Linear algebra timings

I've a matrix which is 3564431 x 3564656 x 251590133 (70.58/col). Installed is stock OpenMPI on a Fedora 15 system.

Running "msieve -nc2 -t 8 ..." on a dual 4-core machine gives a predicted time of 15h38m --- in line with previous experience with other matrices of comparable size.

Using "mpirun -np 8 msieve -nc2 2x4 ..." on the same machine predicts 17h34m --- quite significantly longer. I would hope that performance would be better or at least comparable but perhaps I've misunderstood others' postings here.


What, if anything, am I likely to be doing wrong?


(Incidentally, I found out the hard way that setting -t 8 and using mpirun leads to a messy crash. I wasn't trying to use multithreading in this run, just failed to cut and paste correctly.)


Paul
xilman is online now   Reply With Quote
Old 2012-04-17, 11:55   #30
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

2×29×61 Posts
Default

Did you mean '2,4' and not '2x4'? The CADO tools take the latter format but msieve should not. Anyway, 8 processes are kind of a lot for a matrix of that size, does 2x2 do any better?

Finally, Tom reported in his blog that there are some nasty contortions needed to assign process affinity when the processes are generated through a script like mpirun.
jasonp is offline   Reply With Quote
Old 2012-04-17, 12:21   #31
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

24×3×7×19 Posts
Default

Try something like

nohup mpirun -n 8 --bind-to-core --report-bindings numactl -l msieve -v -nc2 2,4 >> aus 2>> err

(the numactl -l might be unnecessary, it's to ensure that the jobs allocate memory only on the NUMA partition that they're bound to; they should be doing that by default). And ensure that the machine is otherwise idle, because (as you know) starting new jobs on any processor that's running an MPI-slice slows the whole job down - I found that I even needed to move hourly cronjobs off my big machine.

My timings for a 1M matrix on a machine that you might recognise are in

http://www.mersenneforum.org/showpos...5&postcount=65
fivemack is offline   Reply With Quote
Old 2012-04-17, 12:26   #32
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

2×3×13×137 Posts
Default

Quote:
Originally Posted by fivemack View Post
Try something like

nohup mpirun -n 8 --bind-to-core --report-bindings numactl -l msieve -v -nc2 2,4 >> aus 2>> err

(the numactl -l might be unnecessary, it's to ensure that the jobs allocate memory only on the NUMA partition that they're bound to; they should be doing that by default). And ensure that the machine is otherwise idle, because (as you know) starting new jobs on any processor that's running an MPI-slice slows the whole job down - I found that I even needed to move hourly cronjobs off my big machine.

My timings for a 1M matrix on a machine that you might recognise are in

http://www.mersenneforum.org/showpos...5&postcount=65
Thanks. (And thanks also to jasonp for pointing out my typo and suggesting that 4 processors may be more appropriate than 8).

I'll let the current run complete and then conduct experiments based on Tom's advice.

Paul
xilman is online now   Reply With Quote
Old 2012-04-18, 08:12   #33
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

101001101111102 Posts
Default

Quote:
Originally Posted by xilman View Post
I'll let the current run complete and then conduct experiments based on Tom's advice.
Finished successfully and factors found on first dependency.
Code:
Tue Apr 17 10:47:26 2012  linear algebra at 0.0%, ETA 17h 5m
Tue Apr 17 10:47:35 2012  checkpointing every 210000 dimensions
Wed Apr 18 04:34:02 2012  lanczos halted after 56365 iterations (dim = 3564427)
Wed Apr 18 04:34:08 2012  recovered 35 nontrivial dependencies
Wed Apr 18 04:34:08 2012  BLanczosTime: 64456
Wed Apr 18 04:34:08 2012  elapsed time 17:54:17
Now to start experimenting.
xilman is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Msieve 1.53 feedback xilman Msieve 149 2018-11-12 06:37
Msieve v1.48 feedback Jeff Gilchrist Msieve 48 2011-06-10 18:18
Msieve 1.43 feedback Jeff Gilchrist Msieve 47 2009-11-24 15:53
Msieve 1.42 feedback Andi47 Msieve 167 2009-10-18 19:37
Msieve 1.41 Feedback Batalov Msieve 130 2009-06-09 16:01

All times are UTC. The time now is 15:16.

Sun May 16 15:16:50 UTC 2021 up 38 days, 9:57, 0 users, load averages: 3.61, 3.57, 3.62

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.