mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2013-12-15, 18:08   #1
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default CUDA poly failure, SVN 956

I'll be investigating further, but if anyone has any ideas to short circuit it ...

Code:
pcl@maat ~/msieve/trunk/fac $ ../msieve -np -i w3_621.n
deadline: 400 CPU-seconds per coefficient
coeff 12 specialq 1 - 19171315 other 8002 - 19207
error (line 734): unexpected error
pcl@maat ~/msieve/trunk/fac $ tail -16 msieve.log
Sun Dec 15 18:00:19 2013  Msieve v. 1.52 (SVN 956)
Sun Dec 15 18:00:19 2013  random seeds: 43e1a3b7 342cd8f7
Sun Dec 15 18:00:19 2013  factoring 686576626044476447828739679360983394818886834910307572215033151362313366218819423561586972940254871463193639906413334652671424206383019103791825401 (147 digits)
Sun Dec 15 18:00:20 2013  no P-1/P+1/ECM available, skipping
Sun Dec 15 18:00:20 2013  commencing number field sieve (147-digit input)
Sun Dec 15 18:00:20 2013  commencing number field sieve polynomial selection
Sun Dec 15 18:00:20 2013  polynomial degree: 5
Sun Dec 15 18:00:20 2013  max stage 1 norm: 3.17e+22
Sun Dec 15 18:00:20 2013  max stage 2 norm: 4.81e+20
Sun Dec 15 18:00:20 2013  min E-value: 6.57e-12
Sun Dec 15 18:00:20 2013  poly select deadline: 491470
Sun Dec 15 18:00:20 2013  time limit set to 136.52 CPU-hours
Sun Dec 15 18:00:20 2013  expecting poly E from 7.90e-12 to > 9.09e-12
Sun Dec 15 18:00:20 2013  searching leading coefficients from 1 to 7323407
Sun Dec 15 18:00:20 2013  using GPU 0 (Quadro FX 1700)
Sun Dec 15 18:00:20 2013  selected card has CUDA arch 1.1
pcl@maat ~/msieve/trunk/fac $
This is an amd64 Gentoo system with CUDA 5.5


Paul
xilman is offline   Reply With Quote
Old 2013-12-15, 18:38   #2
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

23·52·13 Posts
Default

Doesn't the lasts version require a 2.0 cuda arch?

Last fiddled with by firejuggler on 2013-12-15 at 18:39
firejuggler is offline   Reply With Quote
Old 2013-12-15, 18:56   #3
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default

Quote:
Originally Posted by firejuggler View Post
Doesn't the lasts version require a 2.0 cuda arch?
Ah ...

May well do. Interesting that the 1.1 and 1.0 ptx files are still around.


Thanks. I'll investigate. A 580 should roll up tomorrow, all being well.
xilman is offline   Reply With Quote
Old 2013-12-15, 21:17   #4
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Polynomial selection can still use a capability 1.1 card; unfortunately one of the kernels uses an atomic operation that disqualifies use of 1.0 cards.

How much memory does the card have?
jasonp is offline   Reply With Quote
Old 2013-12-16, 11:24   #5
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default

Quote:
Originally Posted by jasonp View Post
Polynomial selection can still use a capability 1.1 card; unfortunately one of the kernels uses an atomic operation that disqualifies use of 1.0 cards.

How much memory does the card have?
512M.
xilman is offline   Reply With Quote
Old 2013-12-16, 13:21   #6
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

67258 Posts
Default

That could be the problem; the code tries to use at most 30% of the on-card memory, or a max of about 300M, whichever comes first.

Another possibility is that the maximum size of a kernel launch is more restricted for capability < 2.0, and the code could be silently violating that.
jasonp is offline   Reply With Quote
Old 2013-12-21, 10:02   #7
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

10,753 Posts
Default

Quote:
Originally Posted by jasonp View Post
That could be the problem; the code tries to use at most 30% of the on-card memory, or a max of about 300M, whichever comes first.

Another possibility is that the maximum size of a kernel launch is more restricted for capability < 2.0, and the code could be silently violating that.
Looks like the memory size is the cause. I plugged in a 4GB C1060 Tesla card, also CC=1.1, and it is now running fine.

Paul
xilman is offline   Reply With Quote
Old 2014-01-10, 03:20   #8
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

381710 Posts
Default

Am I having a similar or entirely different failure? I have 1024 MB:
Code:
MPI process 0 of 1
factoring 68482425847675570595542326599064126332924657356117138067976231353171490880041876507923980473551830545978048235812132484593056452451287 (134 digits)
searching for 15-digit factors
commencing number field sieve (134-digit input)
commencing number field sieve polynomial selection
polynomial degree: 5
max stage 1 norm: 2.23e+20
max stage 2 norm: 4.75e+18
min E-value: 3.55e-11
poly select deadline: 98277
time limit set to 27.30 CPU-hours
expecting poly E from 4.86e-11 to > 5.59e-11
searching leading coefficients from 1 to 1639297
using GPU 0 (Quadro FX 880M)
selected card has CUDA arch 1.2
deadline: 200 CPU-seconds per coefficient
coeff 12 specialq 1 - 1490691 other 4027 - 9666
error (line 734): unexpected error
Msieve Error: return value 255. Is CUDA enabled? Terminating...
Thanks for all...
EdH is offline   Reply With Quote
Old 2014-01-10, 14:42   #9
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

If those two MPI processes are on the same machine then they will overwrite each other's output files. MPI polynomial selection has never been tested, so try a non-MPI binary to remove one potential source of trouble.
jasonp is offline   Reply With Quote
Old 2014-01-10, 15:53   #10
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

EE916 Posts
Default

Quote:
Originally Posted by jasonp View Post
If those two MPI processes are on the same machine then they will overwrite each other's output files. MPI polynomial selection has never been tested, so try a non-MPI binary to remove one potential source of trouble.
I assumed "MPI process 0 of 1" meant only one process was running.

I recompiled without MPI and without change:
Code:
Msieve v. 1.52 (SVN 956)
Fri Jan 10 10:33:06 2014
random seeds: 9f4e89db 24e94162
factoring 58056616587745511771680421207297639376513580077124191292534786503244595213334531384547593301 (92 digits)
searching for 15-digit factors
commencing number field sieve (92-digit input)
commencing number field sieve polynomial selection
polynomial degree: 4
max stage 1 norm: 4.70e+15
max stage 2 norm: 6.39e+14
min E-value: 2.74e-08
poly select deadline: 778
time limit set to 0.22 CPU-hours
expecting poly E from 4.32e-08 to > 4.97e-08
searching leading coefficients from 1 to 14786534
using GPU 0 (Quadro FX 880M)
selected card has CUDA arch 1.2
deadline: 5 CPU-seconds per coefficient
coeff 12 specialq 1 - 299020 other 930 - 2232
error (line 734): unexpected error
I changed to a smaller composite to see if that affected anything and watched the GPU statistics. The memory never showed any shift from 19%, but the GPU Utilization jumped to 99% briefly.

Although msieve compiled, is it possible this is a CUDA library linking issue? I was not totally clear with that section of my install...

Thanks for any help...
EdH is offline   Reply With Quote
Old 2014-01-25, 04:45   #11
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

29×61 Posts
Default

Because I'm having a CUDA problem as well with Revision 956, I figured I would add on to here.

If I compile without CUDA (i.e., WIN=1 ECM=1), it works fine. If I compile with CUDA, however, running msieve generates a c0000005 exception code when it tries to start the polynomial search.

OS is Windows 7 Ultimate 64-bit. -march flag is corei7 and CUDA is V5.5.

Again, MSieve compiles without error in MinGW64, but throws the exception when I try to actually run it--but only when compiled with CUDA.

Any help is greatly appreciated.
wombatman is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Assertion failure in 6.4.2 bsquared GMP-ECM 4 2013-03-01 15:52
Which poly to use? Karl M Johnson Msieve 29 2010-06-09 02:51
LA Failure R.D. Silverman NFSNET Discussion 10 2007-05-23 21:53
What does this failure indication mean? krunsj Hardware 5 2004-07-17 16:09
Failure Functioins Unregistered Miscellaneous Math 0 2004-02-12 11:51

All times are UTC. The time now is 00:51.


Sat Jul 17 00:51:20 UTC 2021 up 49 days, 22:38, 1 user, load averages: 1.39, 1.50, 1.41

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.