mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2019-09-04, 21:37   #1321
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

101010110112 Posts
Default

Quote:
Originally Posted by pinhodecarlos View Post
How much memory those P-1 tasks need. Can it be ran without having a GIMPS account? I can dedicate a couple of cores to the cause.
In general P-1 stage2 is faster with more RAM available. With GpuOwl I'd recommend a GPU with 8GB or more. I need to do a more detailed write-up on the memory use and performance impact.
preda is offline   Reply With Quote
Old 2019-09-04, 22:24   #1322
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10101001010112 Posts
Default First V6.x P-1 success on NVIDIA, a good-news/bad-news story

On Windows 7 Pro x64, NVIDIA GTX 1080 Ti:

By command line option specification...

Successful stage 1 factor find:
Code:
{"exponent":"4444091", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.7-2-g7cf95d0"}, "timestamp":"2019-09-04 19:13:31 UTC", "fft-length":229376, "B1":40000, "factors":["1809798096458971047321927127"]}
Successful 2 stages no factor run:
Code:
{"exponent":"6972593", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.7-2-g7cf95d0"}, "timestamp":"2019-09-04 19:33:17 UTC", "fft-length":360448, "B1":60000, "B2":780000}
By worktodo file:
50M appeared to run successfully, but the program crashed before stage 2 gcd completed and after the next worktodo line has already begun, a 100M task. It was apparently in the midst of updating the worktodo file; there's a worktodo.bak containing the line
Code:
B1=440000,B2=8360000;PFactor=0,1,2,50001781,-1,73,2
and a worktodo.txt without it; no indication in gpuowl.log or console whether the factor was found or not. A flush to log and console immediately upon stage completion might be good to preserve whatever result it achieved. (Nice result you got there; it would be a shame if anything happened to it.) It is reproducible from the command line also.

Issue also occurred for cmd line input
Code:
gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 24036583 -B1 220000 -B2 3960000
so it appears to be parameter or memory size related; max gpu usage was ~1.9GB during that run.

Many cases, summarized very briefly, run from the command line options as shown; exponent up to 10M ok, 11M and up a problem. (Meanwhile on AMD v6.6 happily runs 200M and probably higher.)
Code:
:gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 4444091 -B1 40000 -B2 480000 factor in stage 1
:gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 6972593 -B1 60000 -B2 780000 ok no factor
:gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 50001781 -B1 830000 -B2 17430000 fails near stage 2 gcd

:gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 24036583 -B1 220000 -B2 3960000 fails near stage 2 gcd
:gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 24000001 -B1 220000 -B2 3960000 factor in stage 1
:gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 13466917 -B1 110000 -B2 1760000 fails near stage 2 gcd
:gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 13466951 -B1 110000 -B2 1760000 factor in stage 1
:gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 10000019 -B1 80000 -B2  1200000 factor in stage 1

:gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 10000139 -B1 80000 -B2  1200000 completes 2 stages no factor, peak 9183MiB
:gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 12000073 -B1 110000 -B2 1650000 peak 7495MiB, fails near stage 2 gcd

gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 11000159 -B1 100000 -B2 1500000 peak 10218MiB, fails near stage 2 gcd
Generally: CUDAPm1 outputs what the number of relative primes was run together in a pass in stage 2, and what the e value for the Brent Suyama extension is. I don't see either of these in gpuowl's output. Prime95 as I recall shows them, and includes e in the result record if larger than 2.
Gpuowl uses us/sq in P-1 stage 1, but ms/mul in stage 2. The timings are comparable.
Attached Thumbnails
Click image for larger version

Name:	50M P-1 crash.png
Views:	89
Size:	93.6 KB
ID:	20997  
Attached Files
File Type: txt 50M P-1 appcrash.txt (813 Bytes, 73 views)

Last fiddled with by kriesel on 2019-09-04 at 22:27
kriesel is online now   Reply With Quote
Old 2019-09-05, 11:24   #1323
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

55B16 Posts
Default

Quote:
Originally Posted by kriesel View Post
50M appeared to run successfully, but the program crashed before stage 2 gcd completed and after the next worktodo line has already begun, a 100M task. It was apparently in the midst of updating the worktodo file; there's a worktodo.bak containing the line
Code:
B1=440000,B2=8360000;PFactor=0,1,2,50001781,-1,73,2
and a worktodo.txt without it; no indication in gpuowl.log or console whether the factor was found or not. A flush to log and console immediately upon stage completion might be good to preserve whatever result it achieved. (Nice result you got there; it would be a shame if anything happened to it.) It is reproducible from the command line also.

Issue also occurred for cmd line input
Code:
gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 24036583 -B1 220000 -B2 3960000
so it appears to be parameter or memory size related; max gpu usage was ~1.9GB during that run.
What it looks like happened there is:
- P-1 stage2 completed, and GCD was started on CPU in the background,
- worktodo.txt was corretly updated
- at this point, the program is supposed to just stay there and wait for the GCD to finish. Instead it crashed.

So, there was no result to write (or flush) yet. I did run your example (-pm1 24036583 -B1 220000 -B2 3960000) on my AMD gpu, no crash. Is your crash reproducible? happens every single time? Only on Nvidia? what is the last message in the log, before the crash?

I see in the screen-shot with the crash, there is the next task starting. But you can also reproduce it with -pm1 on the command line, where there is not any "next" task, right?

Did stage2 find any successful factor -- maybe when trying a known-factor that should be found in stage2 ?

What is the "additional information" on the windows crash window?

With command line "-pm1", if there is a crash, is there a delay between the last line written to output, and the crash? (i.e. does the crash occur when the GCD (on CPU) comes back?)

Could you ever run stage2 to completion on windows without a crash? (i.e. maybe the cause is Windows, not Nvidia?)

Last fiddled with by preda on 2019-09-05 at 11:41
preda is offline   Reply With Quote
Old 2019-09-05, 12:26   #1324
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

137110 Posts
Default

Ken: crash at end of stage2 might be fixed now, please retry.

Last fiddled with by preda on 2019-09-05 at 12:28
preda is offline   Reply With Quote
Old 2019-09-05, 16:19   #1325
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by preda View Post
What it looks like happened there is:
- P-1 stage2 completed, and GCD was started on CPU in the background,
- worktodo.txt was correctly updated
- at this point, the program is supposed to just stay there and wait for the GCD to finish. Instead it crashed.

So, there was no result to write (or flush) yet. I did run your example (-pm1 24036583 -B1 220000 -B2 3960000) on my AMD gpu, no crash. Is your crash reproducible? happens every single time? Only on Nvidia? what is the last message in the log, before the crash?

I see in the screen-shot with the crash, there is the next task starting. But you can also reproduce it with -pm1 on the command line, where there is not any "next" task, right?
Yes, both worktodo and command line cases crash and were provided. I didn't duplicate exactly, but explored at what p the issue occurs vs. doesn't.
Quote:
Did stage2 find any successful factor -- maybe when trying a known-factor that should be found in stage2 ?
For p=50001781, k= 43927 815717 583124 = 22 × 29 × 983 × 94709 × 4 067587 so
Quote:
{"exponent":"50001781", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.6-5-g667954b"}, "timestamp":"2019-09-03 21:12:48 UTC", "fft-length":2883584, "B1":95000, "B2":4100000, "factors":["4392938042637898431087689"]}
on AMD RX480 and Windows 7, but on different system with NVIDIA GTX 1080 Ti and Windows 7 with somewhat different bounds same exponent it gave this in the log:
Code:
2019-09-04 12:38:12 Note: no config.txt file found
2019-09-04 12:38:12 config: -device 0 -use ORIG_X2 -maxAlloc 10240 
2019-09-04 12:38:12 50001781 FFT 2816K: Width 8x8, Height 256x8, Middle 11; 17.34 bits/word
2019-09-04 12:38:12 using short carry kernels
2019-09-04 12:38:13 OpenCL args "-DEXP=50001781u -DWIDTH=64u -DSMALL_HEIGHT=2048u -DMIDDLE=11u -DWEIGHT_STEP=0xc.a3abda189b648p-3 -DIWEIGHT_STEP=0xa.208a4c5c1d58p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a115506d8p-4 -DORIG_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-09-04 12:38:18 

2019-09-04 12:38:18 OpenCL compilation in 5272 ms
2019-09-04 12:38:19 50001781 P-1 starting stage1
2019-09-04 12:38:47 50001781       10000  1.58%; 2804 us/sq; ETA 0d 00:29; 044528bffde3983b
2019-09-04 12:39:15 50001781       20000  3.15%; 2816 us/sq; ETA 0d 00:29; 415fbce27a9baab2
2019-09-04 12:39:44 50001781       30000  4.73%; 2824 us/sq; ETA 0d 00:28; 21f570c38ee14685
2019-09-04 12:40:12 50001781       40000  6.30%; 2835 us/sq; ETA 0d 00:28; 4bdecfdeb836f41e
2019-09-04 12:40:41 50001781       50000  7.88%; 2838 us/sq; ETA 0d 00:28; 95d006c6a337a424
2019-09-04 12:41:09 50001781       60000  9.45%; 2853 us/sq; ETA 0d 00:27; 81c56d44009699ea
2019-09-04 12:41:38 50001781       70000 11.03%; 2859 us/sq; ETA 0d 00:27; be957ab31eaa093a
2019-09-04 12:42:07 50001781       80000 12.60%; 2858 us/sq; ETA 0d 00:26; 453bc86fd585b610
2019-09-04 12:42:35 50001781       90000 14.18%; 2858 us/sq; ETA 0d 00:26; d4859abbb88f685e
2019-09-04 12:43:04 50001781      100000 15.75%; 2866 us/sq; ETA 0d 00:26; 2acc32952e6a8413
2019-09-04 12:43:33 50001781      110000 17.33%; 2870 us/sq; ETA 0d 00:25; 1a465e1dd2daf5e7
2019-09-04 12:44:01 50001781      120000 18.90%; 2870 us/sq; ETA 0d 00:25; ad8304e19075e6d2
2019-09-04 12:44:30 50001781      130000 20.48%; 2869 us/sq; ETA 0d 00:24; 9e83c080c8678aa4
2019-09-04 12:44:59 50001781      140000 22.05%; 2870 us/sq; ETA 0d 00:24; 437a7a3a4b6cbf1c
2019-09-04 12:45:28 50001781      150000 23.63%; 2869 us/sq; ETA 0d 00:23; e0f5c2193df0bafc
2019-09-04 12:45:56 50001781      160000 25.20%; 2870 us/sq; ETA 0d 00:23; fd62439dd6f046a9
2019-09-04 12:46:25 50001781      170000 26.78%; 2870 us/sq; ETA 0d 00:22; 852da47da8286b37
2019-09-04 12:46:54 50001781      180000 28.35%; 2870 us/sq; ETA 0d 00:22; e1cce4b69a7acf51
2019-09-04 12:47:23 50001781      190000 29.93%; 2869 us/sq; ETA 0d 00:21; c3b8528c7dd22b9a
2019-09-04 12:47:51 50001781      200000 31.50%; 2870 us/sq; ETA 0d 00:21; c8635b0ffaea5d2c
2019-09-04 12:48:20 50001781      210000 33.08%; 2870 us/sq; ETA 0d 00:20; d1b17f46043060f7
2019-09-04 12:48:49 50001781      220000 34.65%; 2870 us/sq; ETA 0d 00:20; d901653e490557cf
2019-09-04 12:49:18 50001781      230000 36.23%; 2870 us/sq; ETA 0d 00:19; 7952ed49512db373
2019-09-04 12:49:47 50001781      240000 37.80%; 2872 us/sq; ETA 0d 00:19; 4ce2439936761fbb
2019-09-04 12:50:15 50001781      250000 39.38%; 2870 us/sq; ETA 0d 00:18; 0ce0df8abef61983
2019-09-04 12:50:44 50001781      260000 40.95%; 2867 us/sq; ETA 0d 00:18; 007b9e0a50ad8d5a
2019-09-04 12:51:13 50001781      270000 42.53%; 2874 us/sq; ETA 0d 00:17; 74cfe20ed732099c
2019-09-04 12:51:42 50001781      280000 44.10%; 2870 us/sq; ETA 0d 00:17; 682a467ea133a2cb
2019-09-04 12:52:10 50001781      290000 45.68%; 2870 us/sq; ETA 0d 00:16; 4eef8363ac560c71
2019-09-04 12:52:39 50001781      300000 47.25%; 2869 us/sq; ETA 0d 00:16; 539a790a21e18704
2019-09-04 12:53:08 50001781      310000 48.83%; 2872 us/sq; ETA 0d 00:16; 9f35af2bf71dfaa3
2019-09-04 12:53:37 50001781      320000 50.40%; 2870 us/sq; ETA 0d 00:15; f41e03524a1b3550
2019-09-04 12:54:05 50001781      330000 51.98%; 2870 us/sq; ETA 0d 00:15; 41e0191cbdb2c2e2
2019-09-04 12:54:34 50001781      340000 53.55%; 2869 us/sq; ETA 0d 00:14; fc1568cbec210f4a
2019-09-04 12:55:03 50001781      350000 55.13%; 2869 us/sq; ETA 0d 00:14; d2f37138d933edd7
2019-09-04 12:55:32 50001781      360000 56.70%; 2872 us/sq; ETA 0d 00:13; 65f321d7053de9a7
2019-09-04 12:56:01 50001781      370000 58.28%; 2870 us/sq; ETA 0d 00:13; 4dbbb77f22fb8d41
2019-09-04 12:56:29 50001781      380000 59.85%; 2872 us/sq; ETA 0d 00:12; f3ba9e252fc12f01
2019-09-04 12:56:58 50001781      390000 61.43%; 2872 us/sq; ETA 0d 00:12; 53c045418c81bd8b
2019-09-04 12:57:27 50001781      400000 63.00%; 2869 us/sq; ETA 0d 00:11; e52f23bff88b0a9a
2019-09-04 12:57:56 50001781      410000 64.58%; 2870 us/sq; ETA 0d 00:11; d287c9bbab002582
2019-09-04 12:58:24 50001781      420000 66.15%; 2870 us/sq; ETA 0d 00:10; aea3464b6dfe3a8f
2019-09-04 12:58:53 50001781      430000 67.73%; 2870 us/sq; ETA 0d 00:10; 5d1edc8ceca74739
2019-09-04 12:59:22 50001781      440000 69.30%; 2869 us/sq; ETA 0d 00:09; ea09dc56f88da2d5
2019-09-04 12:59:51 50001781      450000 70.88%; 2870 us/sq; ETA 0d 00:09; b8f65c44da16de4a
2019-09-04 13:00:19 50001781      460000 72.45%; 2870 us/sq; ETA 0d 00:08; 3d736562f926feef
2019-09-04 13:00:48 50001781      470000 74.03%; 2872 us/sq; ETA 0d 00:08; f3d3c48e37f43911
2019-09-04 13:01:17 50001781      480000 75.61%; 2869 us/sq; ETA 0d 00:07; bba070d88e98da6c
2019-09-04 13:01:46 50001781      490000 77.18%; 2872 us/sq; ETA 0d 00:07; cf8f1fc9f1b625f2
2019-09-04 13:02:14 50001781      500000 78.76%; 2870 us/sq; ETA 0d 00:06; d8f71fe45cd6f826
2019-09-04 13:02:43 50001781      510000 80.33%; 2869 us/sq; ETA 0d 00:06; 98741e7599457544
2019-09-04 13:03:12 50001781      520000 81.91%; 2870 us/sq; ETA 0d 00:05; 52abcb428b1911ce
2019-09-04 13:03:41 50001781      530000 83.48%; 2872 us/sq; ETA 0d 00:05; 090130b256f94a74
2019-09-04 13:04:10 50001781      540000 85.06%; 2869 us/sq; ETA 0d 00:05; 126d786c4aac7519
2019-09-04 13:04:38 50001781      550000 86.63%; 2869 us/sq; ETA 0d 00:04; 791bb2e18fe5c2c3
2019-09-04 13:05:07 50001781      560000 88.21%; 2870 us/sq; ETA 0d 00:04; f5d36f0483684169
2019-09-04 13:05:36 50001781      570000 89.78%; 2870 us/sq; ETA 0d 00:03; 810578cfd7c6080e
2019-09-04 13:06:05 50001781      580000 91.36%; 2870 us/sq; ETA 0d 00:03; b40d29da4dcc0e70
2019-09-04 13:06:33 50001781      590000 92.93%; 2872 us/sq; ETA 0d 00:02; 2a9a2764b6e6b770
2019-09-04 13:07:02 50001781      600000 94.51%; 2869 us/sq; ETA 0d 00:02; 0aa6b86e8621d47d
2019-09-04 13:07:31 50001781      610000 96.08%; 2872 us/sq; ETA 0d 00:01; 73d8a20f091dd815
2019-09-04 13:08:00 50001781      620000 97.66%; 2872 us/sq; ETA 0d 00:01; c2adf07b3ea6c52c
2019-09-04 13:08:28 50001781      630000 99.23%; 2869 us/sq; ETA 0d 00:00; 7fc58198458ecd8a
2019-09-04 13:08:43 P-1 stage2 using 291 buffers of 22.0 MB each
2019-09-04 13:08:43 P-1 (B1=440000, B2=8360000, D=30030): primes 525451, expanded 529413, doubles 91707 (left 343777), singles 342037, total 433744 (83%)
2019-09-04 13:08:43 50001781 P-1 stage2: 264 blocks starting at block 15 (433744 selected)
2019-09-04 13:11:06 Round 1 of 9: init 5.40 s; 3.14 ms/mul; 43851 muls
2019-09-04 13:11:06 50001781 P-1 stage1 GCD: no factor
2019-09-04 13:13:28 Round 2 of 9: init 5.01 s; 3.14 ms/mul; 43611 muls
2019-09-04 13:15:51 Round 3 of 9: init 4.99 s; 3.14 ms/mul; 43957 muls
2019-09-04 13:18:13 Round 4 of 9: init 5.02 s; 3.14 ms/mul; 43828 muls
2019-09-04 13:20:36 Round 5 of 9: init 4.99 s; 3.14 ms/mul; 43849 muls
2019-09-04 13:22:59 Round 6 of 9: init 5.04 s; 3.14 ms/mul; 43914 muls
2019-09-04 13:25:21 Round 7 of 9: init 5.01 s; 3.14 ms/mul; 43766 muls
2019-09-04 13:27:43 Round 8 of 9: init 5.09 s; 3.14 ms/mul; 43653 muls
2019-09-04 13:30:06 Round 9 of 9: init 5.07 s; 3.14 ms/mul; 43909 muls
2019-09-04 13:30:07 100002499 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word
2019-09-04 13:30:07 using short carry kernels
2019-09-04 14:13:12 Note: no config.txt file found
That 43 minute gap at the end of the log is where the crash occurred.
Quote:

What is the "additional information" on the windows crash window?
Re "additional information" fields 1 through 4 best I could find with a bit of web searching was the not encouraging https://stackoverflow.com/questions/...h-message-mean

Quote:

With command line "-pm1", if there is a crash, is there a delay between the last line written to output, and the crash? (i.e. does the crash occur when the GCD (on CPU) comes back?)
A rerun of the rather faster 11M case from .bat file, same parameters as before, with task manager cpu usage observed for the process gave a successful completion just now. I didn't change anything from the 11M run that failed yesterday.
Code:
Thu 09/05/2019 10:15:18.53 C:\Users\ken\Documents\gpuowl-win-v6.7-2-g7cf95d0>gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 11000159
 -B1 100000 -B2 1500000
2019-09-05 10:15:18 gpuowl v6.7-2-g7cf95d0
2019-09-05 10:15:18 Note: no config.txt file found
2019-09-05 10:15:18 config: -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 11000159 -B1 100000 -B2 1500000
2019-09-05 10:15:18 11000159 FFT 576K: Width 8x8, Height 64x8, Middle 9; 18.65 bits/word
2019-09-05 10:15:18 using short carry kernels
2019-09-05 10:15:19 OpenCL args "-DEXP=11000159u -DWIDTH=64u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DWEIGHT_STEP=0xa.327adcb358cc8p-3 -DIWEIGHT_ST
EP=0xc.8d6f66d928aap-4 -DWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p-3 -DIWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-4 -DORIG_X2=1  -I. -cl-fast-relaxed-math -
cl-std=CL2.0"
2019-09-05 10:15:19

2019-09-05 10:15:19 OpenCL compilation in 124 ms
2019-09-05 10:15:19 11000159 P-1 starting stage1
2019-09-05 10:15:26 11000159       10000  6.93%;  684 us/sq; ETA 0d 00:02; 857dc3f55c49f7f9
2019-09-05 10:15:33 11000159       20000 13.85%;  678 us/sq; ETA 0d 00:01; 8c796ecf1913c37c
2019-09-05 10:15:39 11000159       30000 20.78%;  679 us/sq; ETA 0d 00:01; f57a3dec737f6b42
2019-09-05 10:15:46 11000159       40000 27.71%;  678 us/sq; ETA 0d 00:01; d1c262f051139c4c
2019-09-05 10:15:53 11000159       50000 34.63%;  686 us/sq; ETA 0d 00:01; 7685ea05184d048c
2019-09-05 10:16:00 11000159       60000 41.56%;  684 us/sq; ETA 0d 00:01; 567e68b51cd96072
2019-09-05 10:16:07 11000159       70000 48.48%;  684 us/sq; ETA 0d 00:01; 77e00f0b94aabab3
2019-09-05 10:16:14 11000159       80000 55.41%;  684 us/sq; ETA 0d 00:01; 6c257069b38766c0
2019-09-05 10:16:21 11000159       90000 62.34%;  684 us/sq; ETA 0d 00:01; b4861281d5a4562a
2019-09-05 10:16:27 11000159      100000 69.26%;  684 us/sq; ETA 0d 00:01; 27ab55b3171901c7
2019-09-05 10:16:34 11000159      110000 76.19%;  684 us/sq; ETA 0d 00:00; 9a4e427a3913db26
2019-09-05 10:16:41 11000159      120000 83.12%;  684 us/sq; ETA 0d 00:00; e0cb66fc9cf30198
2019-09-05 10:16:48 11000159      130000 90.04%;  684 us/sq; ETA 0d 00:00; 7aac556735c65680
2019-09-05 10:16:55 11000159      140000 96.97%;  684 us/sq; ETA 0d 00:00; a366518040c11593
2019-09-05 10:16:58 P-1 stage2 using 1646 buffers of 4.5 MB each
2019-09-05 10:16:58 P-1 (B1=100000, B2=1500000, D=30030): primes 104563, expanded 104563, doubles 19996 (left 64571), singles 64571, total 8
4567 (81%)
2019-09-05 10:16:58 11000159 P-1 stage2: 48 blocks starting at block 3 (84567 selected)
2019-09-05 10:17:40 Round 1 of 1: init 8.28 s; 0.70 ms/mul; 47901 muls
2019-09-05 10:17:40 11000159 P-1 stage1 GCD: no factor
2019-09-05 10:17:41 11000159 P-1 final GCD: no factor
2019-09-05 10:17:41 {"exponent":"11000159", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.7-2-g7cf95d0"}, "time
stamp":"2019-09-05 15:17:41 UTC", "fft-length":589824, "B1":100000, "B2":1500000}
 2019-09-05 10:17:41 Bye
So I tried
Code:
gpuowl-win -device 0 -use ORIG_X2 -maxAlloc 10240 -pm1 24000577 -B1 220000 -B2 3960000
and saw no notable delay between the round 18 of 18 console line output and the appcrash; certainly much closer than the stage1gcd took. Cpu usage profile on the Windows 7 GTX1080Ti run was a bit more (~5 seconds more) than one core average during stage 1, rising to 2 cores during stage 2 and stage1gcd overlap, returning to 1 core usually during stage2. This is on an 8-core hyperthreaded system, dual xeon e5520; approx 12:54 cpu shown in task manager to the point of crash. (Unfortunately the process entry disappears from task manager when the crash occurs.)
Note, in a duplicating run of 24000577 to same bounds on Win7 x64 and AMD RX550 running now, I do not see the equivalent cpu overhead; 15 minutes in to a 1 hour stage one estimated duration, it's showing 0% cpu usage and 8 seconds accumulated on that process; that is the same system as contains the RX480 and is dual Xeon E5645.
Quote:
Could you ever run stage2 to completion on windows without a crash? (i.e. maybe the cause is Windows, not Nvidia?)
Yes, see the known-prime case provided earlier p=6972593 and now 11M on Windows and NVIDIA GTX1080Ti. Also larger exponents I've run on Win7 and AMD including p~200M.

Thanks for the quick turnaround to post 1324, one hour after your questions posed. Will give it a try.

Last fiddled with by kriesel on 2019-09-05 at 17:14
kriesel is online now   Reply With Quote
Old 2019-09-05, 18:55   #1326
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

124538 Posts
Default

Quote:
Originally Posted by preda View Post
Ken: crash at end of stage2 might be fixed now, please retry.
Better on v6.7-3-745faae; running multiple test cases from worktodo now on GTX1080Ti.
It's hit an odd case, when a worktodo line is ignored, it gets confused about exponent, reporting a result for an exponent it completed but substituting the 10M exponent value of one it's still running, at the console, in the log, and in results.txt in the record with the fftlength, B1, and B2 of the 24M exponent of a Mersenne prime used as a high-confidence no-factor case test:
Code:
2019-09-05 13:25:02 24036583      260000 81.88%; 1366 us/sq; ETA 0d 00:01; 8ce7bc8b431645fa
2019-09-05 13:25:16 24036583      270000 85.03%; 1366 us/sq; ETA 0d 00:01; 204263c8f6c0b028
2019-09-05 13:25:30 24036583      280000 88.18%; 1368 us/sq; ETA 0d 00:01; 70804d872c97c737
2019-09-05 13:25:43 24036583      290000 91.32%; 1374 us/sq; ETA 0d 00:01; 1ad360e96af9200b
2019-09-05 13:25:57 24036583      300000 94.47%; 1373 us/sq; ETA 0d 00:00; 44e5901c72dbac9f
2019-09-05 13:26:11 24036583      310000 97.62%; 1371 us/sq; ETA 0d 00:00; 7f422aba9a518aa5
2019-09-05 13:26:21 P-1 stage2 using 156 buffers of 10.0 MB each
2019-09-05 13:26:22 P-1 (B1=220000, B2=3960000, D=30030): primes 260946, expanded 262000, doubles 47491 (left 166492), singles 165964, total
 213455 (82%)
2019-09-05 13:26:22 24036583 P-1 stage2: 126 blocks starting at block 7 (213455 selected)
2019-09-05 13:26:40 Round 1 of 18: init 1.45 s; 1.49 ms/mul; 11487 muls
2019-09-05 13:26:59 Round 2 of 18: init 1.28 s; 1.49 ms/mul; 11549 muls
2019-09-05 13:26:59 24036583 P-1 stage1 GCD: no factor
2019-09-05 13:27:17 Round 3 of 18: init 1.31 s; 1.49 ms/mul; 11388 muls
2019-09-05 13:27:35 Round 4 of 18: init 1.31 s; 1.49 ms/mul; 11496 muls
2019-09-05 13:27:54 Round 5 of 18: init 1.31 s; 1.49 ms/mul; 11508 muls
2019-09-05 13:28:12 Round 6 of 18: init 1.31 s; 1.49 ms/mul; 11644 muls
2019-09-05 13:28:31 Round 7 of 18: init 1.33 s; 1.49 ms/mul; 11554 muls
2019-09-05 13:28:49 Round 8 of 18: init 1.31 s; 1.49 ms/mul; 11629 muls
2019-09-05 13:29:08 Round 9 of 18: init 1.31 s; 1.49 ms/mul; 11486 muls
2019-09-05 13:29:26 Round 10 of 18: init 1.33 s; 1.49 ms/mul; 11571 muls
2019-09-05 13:29:45 Round 11 of 18: init 1.29 s; 1.49 ms/mul; 11589 muls
2019-09-05 13:30:04 Round 12 of 18: init 1.33 s; 1.49 ms/mul; 11554 muls
2019-09-05 13:30:22 Round 13 of 18: init 1.29 s; 1.49 ms/mul; 11597 muls
2019-09-05 13:30:41 Round 14 of 18: init 1.33 s; 1.49 ms/mul; 11544 muls
2019-09-05 13:30:59 Round 15 of 18: init 1.33 s; 1.49 ms/mul; 11607 muls
2019-09-05 13:31:18 Round 16 of 18: init 1.31 s; 1.49 ms/mul; 11527 muls
2019-09-05 13:31:36 Round 17 of 18: init 1.34 s; 1.49 ms/mul; 11631 muls
2019-09-05 13:31:55 Round 18 of 18: init 1.33 s; 1.49 ms/mul; 11716 muls
2019-09-05 13:31:55 worktodo.txt: ":B1=460000,B2=8740000;PFactor=0,1,2,51558151,-1,73,2" ignored
2019-09-05 13:31:55 100000081 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.34 bits/word
2019-09-05 13:31:56 using short carry kernels
2019-09-05 13:31:56 OpenCL args "-DEXP=100000081u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xc.a5067a8c5cb2p-3 -DIWEIGHT
_STEP=0xa.1f74af2719fap-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DORIG_X2=1  -I. -cl-fast-relaxed-mat
h -cl-std=CL2.0"
2019-09-05 13:32:00

2019-09-05 13:32:00 OpenCL compilation in 3650 ms
2019-09-05 13:32:01 100000081 P-1 starting stage1
2019-09-05 13:32:21 100000081 P-1 final GCD: no factor
2019-09-05 13:32:21 {"exponent":"100000081", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.7-3-g745faae"}, "tim
estamp":"2019-09-05 18:32:21 UTC", "aid":"0", "fft-length":1310720, "B1":780000, "B2":17380000}
2019-09-05 13:32:39 100000081       10000  0.89%; 3804 us/sq; ETA 0d 01:11; 3201fb08e5351d29
 2019-09-05 13:33:17 100000081       20000  1.78%; 3799 us/sq; ETA 0d 01:10; 195100e088564810
FYI on v6.7-2-7cf95d0 I was able to duplicate the stage2 crash on a 4GB AMD RX480 with
Code:
[gpuowl-win -device 1 -user kriesel -cpu condorella/rx550 -use ORIG_X2 -pm1 24000577 -B1 220000 -B2 3960000
I also note stage2 P-1 lacks ETA and res64 values in console output and the log. One of the ways to tell in CUDAPm1 that something has gone awry in stage 2 is the res64 values are bad values, repeating, or cycling among very few values.

Last fiddled with by kriesel on 2019-09-05 at 19:26
kriesel is online now   Reply With Quote
Old 2019-09-05, 20:04   #1327
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by kriesel View Post
Better on v6.7-3-745faae; running multiple test cases from worktodo now on GTX1080Ti.
It's hit an odd case, when a worktodo line is ignored, it gets confused about exponent, reporting a result for an exponent it completed but substituting the 10M exponent value of one it's still running, at the console, in the log, and in results.txt in the record with the fftlength, B1, and B2 of the 24M exponent of a Mersenne prime used as a high-confidence no-factor case test
Actually, the 24M exponent NF result appeared with the 24M fft length but the 10M exponent, B1 and B2 values. And it appears to happen whether there was an ignored worktodo line or a normal one.
Stage1 of the current line runs in parallel with the stage2 gcd of the previous line, and can have differing exponent, B1, B2, fft length. Result record and console and log output get a mix of variables when the stage 2 gcd completes; what is output contains the unfortunate blend

currentexponent, previousfftlength, previous factor & factor-state, currentB1, currentB2.

A temporary workaround is I think to use command line options, do not use worktodo.txt

Last fiddled with by kriesel on 2019-09-05 at 21:00
kriesel is online now   Reply With Quote
Old 2019-09-05, 20:29   #1328
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

231328 Posts
Default

Quote:
Originally Posted by kriesel View Post
Perhaps he and uncwilly could put together a thread for cleaning that up.
If Aaron will provide the list, I will tend it over at the DC & TC thread. Or another in the Marin's Mersenne-aries sub.
Uncwilly is offline   Reply With Quote
Old 2019-09-05, 21:24   #1329
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by kriesel View Post
it gets confused about exponent, reporting a result for an exponent it completed but substituting the 10M exponent value of one it's still running
Yes, thanks, hopefully fixed now.
preda is offline   Reply With Quote
Old 2019-09-06, 14:32   #1330
AlsXZ
 
Oct 2009
Ukraine

32 Posts
Default

Hello! I just started play with gpuOwL and have some questions. As I understood that soft optimized for AMD GPUs. I'm trying to run it on Nvidia GeForce GTX 1050 TI on Linux. It works with -use NO_ASM only without it I have errors) and I got such results for 92M exponent: ETA 16.5 days and 15590 us/sq. What do you think - is it ok speed for this GPU or something going wrong?
AlsXZ is offline   Reply With Quote
Old 2019-09-06, 14:36   #1331
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,419 Posts
Default

Quote:
Originally Posted by preda View Post
Yes, thanks, hopefully fixed now.
Testing v6.7-4-g278407a P-1, looks promising so far.
Sure would be nice if it also had these features (without breaking anything for PRP of course):
  • save files for P-1 & resume from them. Higher exponents or slower gpus can take days or weeks. Probably save under different extensions than for PRP and versus stage.
  • By default, setting P-1 bounds automatically for a fit to gputo72 bounds based on exponent, rather than the current fixed B1 and B2 defaults. See https://www.mersenneforum.org/showpo...postcount=1306 The pdf attachment at https://www.mersenneforum.org/showpo...7&postcount=23 gives simple power fits for B1 and B2. Rounding would be good. Rounding up might be good. Exponent-scaled bounds seem straightforward to implement. It would make running P-1 more automatic, less user involvement.
  • less cpu usage on NVIDIA in P-1. See near end of https://www.mersenneforum.org/showpo...postcount=1325 for a description of considerably higher cpu usage on NVIDIA than on AMD in gpuowl P-1, an additional core continuously.. I have no idea why it does that.
  • If there's a constraint on stage 1 or stage 2 feasible exponent, due to program limits or gpu memory or whatever, log to console and log file what the issue is and skip a worktodo item that exceeds the limit and try to continue with the next worktodo entry. It would in some cases be possible to run stage 1 and not stage 2.

Last fiddled with by kriesel on 2019-09-06 at 14:37
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 20:30.


Sun Aug 1 20:30:45 UTC 2021 up 9 days, 14:59, 0 users, load averages: 2.79, 2.32, 1.95

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.