mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   cudaPm1 Stage 2 does not start (https://www.mersenneforum.org/showthread.php?t=24636)

dcheuk 2019-07-29 21:37

cudaPm1 Stage 2 does not start
 
This thread is reposted to avoid hijacking [URL="https://www.mersenneforum.org/showthread.php?t=24625"]dominicanpapi82's thread[/URL].

[M]333898333[/M]

My computer ran cudapm1 but it refuses to commence stage 2 after stage 1 is complete. The program executes but immediately (1-3 seconds) exits after throwing the following code, without any changes to the directory.

[CODE]
Using up to 6560M GPU memory.
Selected B1=3130000, B2=71207500, 3.82% chance of finding a factor
Using B1 = 3130000 from savefile.
Continuing stage 2 from a partial result of M333898333 fft length = 20480K
Starting stage 2.
Using b1 = 3130000, b2 = 71207500, d = 2310, e = 12, nrp = 21
(program auto exists after this line ...)[/CODE]

Any help would be appreciated.

Update: I have started PRP test and skipping the P-1. If anyone interested to run P-1 Stage 2, you can find my stage 1 save and worktodo.txt for cudapm1 below (Google Drive). I will remove the files upon completion of PRP or P-1. :smile:

[url]https://drive.google.com/open?id=1PhyBj02HJgmRbghHZLTrkb-uhTc5iLhx[/url]

LaurV 2019-07-30 17:07

Smells like memory allocation failure. Stage 2 needs a lot of RAM. How much RAM does the card have?
OTOH, you should report stage 1 result, use B2=B1 and do a manual report.

kriesel 2019-07-30 21:31

Sometimes a retry or 3 will work. Sometimes bumping the fft length up on the command line and retrying will work. Sometimes it's helped by reducing the interval between save files. What gpu was this run on?

From post 373 in [URL]http://www.mersenneforum.org/showthread.php?t=17835[/URL]
"excessive stage 2 round-off errors simply halt the program without error messages."

dcheuk 2019-07-30 23:15

[QUOTE=LaurV;522614]Smells like memory allocation failure. Stage 2 needs a lot of RAM. How much RAM does the card have?
OTOH, you should report stage 1 result, use B2=B1 and do a manual report.[/QUOTE]

[QUOTE=kriesel;522646]Sometimes a retry or 3 will work. Sometimes bumping the fft length up on the command line and retrying will work. Sometimes it's helped by reducing the interval between save files. What gpu was this run on?

From post 373 in [URL]http://www.mersenneforum.org/showthread.php?t=17835[/URL]
"excessive stage 2 round-off errors simply halt the program without error messages."[/QUOTE]

It seems like someone has completed the P-1 that was quick. Thanks One Man! :thumbs-up:

The card is an RTX 2080, I believe it has 8GB of gddr6 some reason it says only about 6gb was available.

kriesel 2019-07-31 00:21

[QUOTE=dcheuk;522659]It seems like someone has completed the P-1 that was quick. Thanks One Man! :thumbs-up:

The card is an RTX 2080, I believe it has 8GB of gddr6 some reason it says only about 6gb was available.[/QUOTE]Perhaps some gpu ram was occupied by the display, but that's a lot of difference.

The bounds reported by One Man B1=97,122, B2=1,165,464 are [B]tiny[/B] compared to what's appropriate; see [URL]https://www.mersenne.ca/exponent/333898333[/URL]
gpu72 bounds B1=2,600,000 B2=59,800,000; 123 GhzDays 3.5% probability of factor; half a week on Tesla C2075, or 1.5 days on GTX1080Ti; around 6 days for an i7-8750H 6-core worker. (All figures for both stages; about half as long for 1 stage.)

CUDAPm1 runs in progress can be promoted to gpus with equal or more RAM, but typically stage 2 does not demote to smaller ram cards and work, so my Tesla is out (6GB nom, 5.25 net of ECC). I don't have a way of moving P-1 in progress between CUDAPm1 and prime95, so the cpus are out.
You might want to repost that interim file. The earlier link gives a 404 error now. If you did, I might give it a shot after some other work clears out of the queue.

dcheuk 2019-07-31 01:39

[QUOTE=kriesel;522665]Perhaps some gpu ram was occupied by the display, but that's a lot of difference.

The bounds reported by One Man B1=97,122, B2=1,165,464 are [B]tiny[/B] compared to what's appropriate; see [URL]https://www.mersenne.ca/exponent/333898333[/URL]
gpu72 bounds B1=2,600,000 B2=59,800,000; 123 GhzDays 3.5% probability of factor; half a week on Tesla C2075, or 1.5 days on GTX1080Ti; around 6 days for an i7-8750H 6-core worker. (All figures for both stages; about half as long for 1 stage.)

CUDAPm1 runs in progress can be promoted to gpus with equal or more RAM, but typically stage 2 does not demote to smaller ram cards and work, so my Tesla is out (6GB nom, 5.25 net of ECC). I don't have a way of moving P-1 in progress between CUDAPm1 and prime95, so the cpus are out.
You might want to repost that interim file. The earlier link gives a 404 error now. If you did, I might give it a shot after some other work clears out of the queue.[/QUOTE]

Oh okay I fixed the link the folder is up again.

I ran it on an identical graphics card without any display attached and produced the following message

[CODE]No GeForceRTX2080_fft.txt file found. Using default fft lengths.
For optimal fft selection, please run
./CUDAPm1 -cufftbench 1 8192 r
for some small r, 0 < r < 6 e.g.
CUDA reports 6705M of 8192M GPU memory free.
Using threads: norm1 512, mult 256, norm2 256.
No stage 2 checkpoint.
Using up to 6560M GPU memory.
Selected B1=3130000, B2=71207500, 3.82% chance of finding a factor
Using B1 = 3130000 from savefile.
Continuing stage 2 from a partial result of M333898333 fft length = 20480K
Starting stage 2.
Using b1 = 3130000, b2 = 71207500, d = 2310, e = 12, nrp = 21[/CODE]

... then it crashed/quit again. No warnings, errors, no files modified. :bangheadonwall:

dcheuk 2019-07-31 01:45

[QUOTE=dcheuk;522670]Oh okay I fixed the link the folder is up again.

I ran it on an identical graphics card without any display attached and produced the following message

[CODE]No GeForceRTX2080_fft.txt file found. Using default fft lengths.
For optimal fft selection, please run
./CUDAPm1 -cufftbench 1 8192 r
for some small r, 0 < r < 6 e.g.
[COLOR="SeaGreen"]CUDA reports 6705M of 8192M GPU memory free.[/COLOR]
Using threads: norm1 512, mult 256, norm2 256.
No stage 2 checkpoint.
Using up to 6560M GPU memory.
Selected B1=3130000, B2=71207500, 3.82% chance of finding a factor
Using B1 = 3130000 from savefile.
Continuing stage 2 from a partial result of M333898333 fft length = 20480K
Starting stage 2.
Using b1 = 3130000, b2 = 71207500, d = 2310, e = 12, nrp = 21[/CODE]

... then it crashed/quit again. No warnings, errors, no files modified. :bangheadonwall:[/QUOTE]

Found it weird that only 6705M of 8192M is free. Restarted the pc, got the same message as above, and then as expected it crashed.

Note that this is ran on the secondary (but identical) graphics card on a computer with no external display plugged in. Hmmm it's weird.

kriesel 2019-07-31 05:58

[QUOTE=dcheuk;522672]Found it weird that only 6705M of 8192M is free. Restarted the pc, got the same message as above, and then as expected it crashed.

Note that this is ran on the secondary (but identical) graphics card on a computer with no external display plugged in. Hmmm it's weird.[/QUOTE]Yes, odd.
Which version of CUDAPm1 are you running? I use v0.20 not v0.22 for production.
I notice you have no fft file for your gpu, indicating you haven't done the fft tuning or threads tuning yet.

dcheuk 2019-07-31 19:53

[QUOTE=kriesel;522689]Yes, odd.
Which version of CUDAPm1 are you running? I use v0.20 not v0.22 for production.
I notice you have no fft file for your gpu, indicating you haven't done the fft tuning or threads tuning yet.[/QUOTE]

Oh oops my bad. Tried it on a different card but forgot about the benchmarkings.

cudapm1 0.22.

Did it again now and got exactly the same message. Oh well.

kriesel 2019-07-31 21:18

[QUOTE=dcheuk;522770]Oh oops my bad. Tried it on a different card but forgot about the benchmarkings.

cudapm1 0.22.

Did it again now and got exactly the same message. Oh well.[/QUOTE]Quick death reproducible here, CUDAPm1 V0.20, GTX 1080 Ti.[CODE]batch wrapper reports (re)launch at Wed 07/31/2019 15:58:59.30 reset count 0 of max 3
CUDAPm1 v0.20
------- DEVICE 0 -------
name GeForce GTX 1080 Ti
Compatibility 6.1
clockRate (MHz) 1620
memClockRate (MHz) 5505
totalGlobalMem zu
totalConstMem zu
l2CacheSize 2883584
sharedMemPerBlock zu
regsPerBlock 65536
warpSize 32
memPitch zu
maxThreadsPerBlock 1024
maxThreadsPerMP 2048
multiProcessorCount 28
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 2147483647,65535,65535
textureAlignment zu
deviceOverlap 1

CUDA reports 10988M of 11264M GPU memory free.
Using threads: norm1 32, mult 32, norm2 64.
No stage 2 checkpoint.
Using up to 5120M GPU memory.
Selected B1=2740000, B2=67130000, 3.71% chance of finding a factor
Using B1 = 3130000 from savefile.
Continuing stage 2 from a partial result of M333898333 fft length = 20480K
batch wrapper reports exit at Wed 07/31/2019 16:00:23.95 [/CODE]Will try a couple other things. Renamed c file out of the way, redoing s1 gcd from t file now. If it works, it will give bigger NRP, maybe larger e, on the 11GB.

kriesel 2019-07-31 22:23

[QUOTE=kriesel;522780]Renamed c file out of the way, redoing s1 gcd from t file now. If it works, it will give bigger NRP, maybe larger e, on the 11GB.[/QUOTE]Nope.
[CODE]batch wrapper reports (re)launch at Wed 07/31/2019 16:00:24.21 reset count 1 of max 3
CUDAPm1 v0.20
------- DEVICE 0 -------
name GeForce GTX 1080 Ti
Compatibility 6.1
clockRate (MHz) 1620
memClockRate (MHz) 5505
totalGlobalMem zu
totalConstMem zu
l2CacheSize 2883584
sharedMemPerBlock zu
regsPerBlock 65536
warpSize 32
memPitch zu
maxThreadsPerBlock 1024
maxThreadsPerMP 2048
multiProcessorCount 28
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 2147483647,65535,65535
textureAlignment zu
deviceOverlap 1

CUDA reports 10988M of 11264M GPU memory free.
Using threads: norm1 32, mult 32, norm2 64.
Using up to 5120M GPU memory.
Selected B1=2740000, B2=67130000, 3.71% chance of finding a factor
Using B1 = 3130000 from savefile.
Continuing stage 1 from a partial result of M333898333 fft length = 20480K, iteration = 4515001
M333898333, 0xea7d398e8effff52, n = 20480K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 27:58:46batch wrapper reports exit at Wed 07/31/2019 16:37:40.58
[/CODE][CODE]Problem signature:
Problem Event Name: APPCRASH
Application Name: CUDAPm1_win64_20130923_CUDA_55.exe
Application Version: 0.0.0.0
Application Timestamp: 523f9925
Fault Module Name: CUDAPm1_win64_20130923_CUDA_55.exe
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 523f9925
Exception Code: c0000005
Exception Offset: 000000000000d884
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 1033
Additional Information 1: 44b2
Additional Information 2: 44b2372ff3e894f68e6c85eaaa6183b2
Additional Information 3: cc46
Additional Information 4: cc46d3b48197ff03ca4e5004a6dbb86f

Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409

If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt
[/CODE]Exception c0..05 is access violation. Similar results from that resulting c file, with a variety of fft lengths in v0.20 and v0.22. (Which are much faster to try since it crashes quickly, but my cpu is slow so gcds are long.)
Now it's starting to tick me off.

kriesel 2019-07-31 23:23

I don't see anything unusual in the headers of the files originally posted to Google drive, except that the fft length is a bit large, at 20480K. "N 20971520". FFT files I have here indicate it could be done in 18816K.[CODE]>perl cudapm1export.pl

CUDAPm1export for Windows V0.1 2018-08-15

Input file name received as c333898333s1.
Extracted exponent q 333898333 from filename
Opened file c333898333s1 for read
Length of input file data read is 41737392 bytes, 10434348 words.
Exported file's header will read as follows:

Format Mersenne Neutral Exchange V1.0
FileOrigin "CUDAPm1export for Windows" "V0.1 2018-08-15" c333898333s1 2019 Jul 31 22:42:11 UTC
Type P-1 stage 2
Exponent 333898333
Iteration 4515547
N 20971520
AccumulatedTime 100719
B1 3130000
Reserved6 0
Reserved7 0
Reserved8 0
Reserved9 0
B2 0
D 0
E 0
NRP 0
M 0
K 0
T 0
Midpasstransforms 0
Itran_done 0
PtrandonePlusNumtran 0
Itime 0
Ptime 0
Reserved22 0
Reserved23 0
Reserved24 0
DataFormat binary bytes
CRC32 0x53c404f5
DataBinaryByteCount 41737292
EndOfHeader 0x2e04cfbb

Exported file header has length 546 including record terminators.
Processing bulk binary data input without output to console, please wait until program indicates Done.

Res64 of binary form of data is: 0xea7d398e8effff52

Printing 41737292 bytes bulk binary data in binary form.
Done.


>perl cudapm1export.pl

CUDAPm1export for Windows V0.1 2018-08-15

Input file name received as t333898333s1.
Extracted exponent q 333898333 from filename
Opened file t333898333s1 for read
Length of input file data read is 41737392 bytes, 10434348 words.
Exported file's header will read as follows:

Format Mersenne Neutral Exchange V1.0
FileOrigin "CUDAPm1export for Windows" "V0.1 2018-08-15" t333898333s1 2019 Jul 31 22:45:14 UTC
Type P-1 stage 1
Exponent 333898333
Iteration 4515001
N 20971520
AccumulatedTime 100719
B1 3130000
Reserved6 0
Reserved7 0
Reserved8 0
Reserved9 0
DataFormat binary bytes
CRC32 0x24bdce34
DataBinaryByteCount 41737292
EndOfHeader 0xe45759b0

Exported file header has length 389 including record terminators.
Processing bulk binary data input without output to console, please wait until program indicates Done.

Res64 of binary form of data is: 0x2f8ad881eed5913c

Printing 41737292 bytes bulk binary data in binary form.
Done.
[/CODE]

dcheuk 2019-08-01 03:57

[QUOTE=kriesel;522780]Quick death reproducible here, CUDAPm1 V0.20, GTX 1080 Ti.[CODE]batch wrapper reports (re)launch at Wed 07/31/2019 15:58:59.30 reset count 0 of max 3
CUDAPm1 v0.20
------- DEVICE 0 -------
name GeForce GTX 1080 Ti
Compatibility 6.1
clockRate (MHz) 1620
memClockRate (MHz) 5505
totalGlobalMem zu
totalConstMem zu
l2CacheSize 2883584
sharedMemPerBlock zu
regsPerBlock 65536
warpSize 32
memPitch zu
maxThreadsPerBlock 1024
maxThreadsPerMP 2048
multiProcessorCount 28
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 2147483647,65535,65535
textureAlignment zu
deviceOverlap 1

CUDA reports 10988M of 11264M GPU memory free.
Using threads: norm1 32, mult 32, norm2 64.
No stage 2 checkpoint.
Using up to 5120M GPU memory.
Selected B1=2740000, B2=67130000, 3.71% chance of finding a factor
Using B1 = 3130000 from savefile.
Continuing stage 2 from a partial result of M333898333 fft length = 20480K
batch wrapper reports exit at Wed 07/31/2019 16:00:23.95 [/CODE]Will try a couple other things. Renamed c file out of the way, redoing s1 gcd from t file now. If it works, it will give bigger NRP, maybe larger e, on the 11GB.[/QUOTE]

[QUOTE=kriesel;522793]I don't see anything unusual in the headers of the files originally posted to Google drive, except that the fft length is a bit large, at 20480K. "N 20971520". FFT files I have here indicate it could be done in 18816K.[/QUOTE]

I could be wrong, but I think the RTX does this faster on 20480K that's why it was chosen.

Yeah this is weird. Oh well


All times are UTC. The time now is 18:02.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.