![]() |
![]() |
#23 |
May 2008
44716 Posts |
![]() |
![]() |
![]() |
![]() |
#24 | |
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
11×19×47 Posts |
![]() Quote:
to multiply those expected values by another 1.15 -- #ifdef HAVE_CUDA. |
|
![]() |
![]() |
![]() |
#25 |
May 2008
3×5×73 Posts |
![]()
For the record, the problem that poily & I identified was a discrepancy in pointer sizes between the msieve binary and the PTX assembly on his macbook. The msieve binary was built for 64bit and the PTX for 32bit and this resulted in an overflow in the CUDA parameter assignment, which led to the reported error.
The fix was to add -m 64 to the nvcc build options, which is apparently necessary on the mac platform. I'm not sure what is the most proper way to handle this in the Makefile. |
![]() |
![]() |
![]() |
#26 |
Tribal Bullet
Oct 2004
5×709 Posts |
![]()
Looks like this project is starting to really hurt from not having an autoconf script. I've been desperately avoiding trying to learn how the autotools work, but the contortions to avoid them are starting to exceed the pain of learning.
|
![]() |
![]() |
![]() |
#27 |
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
11·19·47 Posts |
![]()
You could go really fancy and redress it as a CMake project.
I know that that's what Illumina did to their sequencing pipeline. Looks trendy. |
![]() |
![]() |
![]() |
#28 |
Sep 2009
97810 Posts |
![]()
On the one hand, definitions for CMake are usually somewhat easier to create than definitions for autotools.
On the other hand, CMake is much less widely used, and doesn't support a range of platforms, functions, quirks and workarounds as wide as the autotools do. |
![]() |
![]() |
![]() |
#29 |
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2C2316 Posts |
![]()
I've a matrix which is 3564431 x 3564656 x 251590133 (70.58/col). Installed is stock OpenMPI on a Fedora 15 system.
Running "msieve -nc2 -t 8 ..." on a dual 4-core machine gives a predicted time of 15h38m --- in line with previous experience with other matrices of comparable size. Using "mpirun -np 8 msieve -nc2 2x4 ..." on the same machine predicts 17h34m --- quite significantly longer. I would hope that performance would be better or at least comparable but perhaps I've misunderstood others' postings here. What, if anything, am I likely to be doing wrong? (Incidentally, I found out the hard way that setting -t 8 and using mpirun leads to a messy crash. I wasn't trying to use multithreading in this run, just failed to cut and paste correctly.) Paul |
![]() |
![]() |
![]() |
#30 |
Tribal Bullet
Oct 2004
354510 Posts |
![]()
Did you mean '2,4' and not '2x4'? The CADO tools take the latter format but msieve should not. Anyway, 8 processes are kind of a lot for a matrix of that size, does 2x2 do any better?
Finally, Tom reported in his blog that there are some nasty contortions needed to assign process affinity when the processes are generated through a script like mpirun. |
![]() |
![]() |
![]() |
#31 |
(loop (#_fork))
Feb 2006
Cambridge, England
11001001101012 Posts |
![]()
Try something like
nohup mpirun -n 8 --bind-to-core --report-bindings numactl -l msieve -v -nc2 2,4 >> aus 2>> err (the numactl -l might be unnecessary, it's to ensure that the jobs allocate memory only on the NUMA partition that they're bound to; they should be doing that by default). And ensure that the machine is otherwise idle, because (as you know) starting new jobs on any processor that's running an MPI-slice slows the whole job down - I found that I even needed to move hourly cronjobs off my big machine. My timings for a 1M matrix on a machine that you might recognise are in http://www.mersenneforum.org/showpos...5&postcount=65 |
![]() |
![]() |
![]() |
#32 | |
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
101100001000112 Posts |
![]() Quote:
I'll let the current run complete and then conduct experiments based on Tom's advice. Paul |
|
![]() |
![]() |
![]() |
#33 | |
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
11,299 Posts |
![]() Quote:
Code:
Tue Apr 17 10:47:26 2012 linear algebra at 0.0%, ETA 17h 5m Tue Apr 17 10:47:35 2012 checkpointing every 210000 dimensions Wed Apr 18 04:34:02 2012 lanczos halted after 56365 iterations (dim = 3564427) Wed Apr 18 04:34:08 2012 recovered 35 nontrivial dependencies Wed Apr 18 04:34:08 2012 BLanczosTime: 64456 Wed Apr 18 04:34:08 2012 elapsed time 17:54:17 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Msieve 1.53 feedback | xilman | Msieve | 149 | 2018-11-12 06:37 |
Msieve v1.48 feedback | Jeff Gilchrist | Msieve | 48 | 2011-06-10 18:18 |
Msieve 1.43 feedback | Jeff Gilchrist | Msieve | 47 | 2009-11-24 15:53 |
Msieve 1.42 feedback | Andi47 | Msieve | 167 | 2009-10-18 19:37 |
Msieve 1.41 Feedback | Batalov | Msieve | 130 | 2009-06-09 16:01 |