mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

kriesel 2017-08-30 13:27

CUDALucas infinite loop; anyone else seen similar?
 
Has anyone else seen this? I've run gpu-years by now, and don't recall seeing another instance of this. (That doesn't necessarily mean it hasn't happened.)

| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Aug 28 07:04:43 | M75556297 39500000 0xfd71092a79b5e89b | 4096K 0.24219 4.9563 247.81s | 2:01:40:09 52.27% |
| Aug 28 07:08:50 | M75556297 39550000 0x0a21c1ab52fdbbf3 | 4096K 0.23438 4.9660 248.30s | 2:01:36:01 52.34% |
batch wrapper reports (re)launch at Mon 08/28/2017 8:54:34.63

CUDALucas v2.06beta 64-bit build, compiled May 5 2017 @ 12:59:32
...

Using threads: square 256, splice 128.

Continuing M75556297 @ iteration 39500001 with fft length 4096K, 52.28% done

(no progress in two days! GPU-Z monitoring looked normal; 100% busy)

SIGINT caught, writing checkpoint. Estimated time spent so far: 100:33:54

batch wrapper reports CUDALucas2.06beta-CUDA6.0-Windows-x64 -d 0(re)launch at Wed 08/30/2017 7:07:48.84

CUDALucas v2.06beta 64-bit build, compiled May 5 2017 @ 12:59:32
...

You may experience a small delay on 1st startup to due to Just-in-Time Compilation

Using threads: square 256, splice 128.

Continuing M75556297 @ iteration 39514211 with fft length 4096K, 52.30% done

| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Aug 30 07:10:43 | M75556297 39550000 0x0a21c1ab52fdbbf3 | 4096K 0.22656 4.8179 172.43s | 3:19:35:53 52.34% |
SIGINT caught, writing checkpoint. Estimated time spent so far: 100:37:49

batch wrapper reports CUDALucas2.06beta-CUDA6.0-Windows-x64 -d 0(re)launch at Wed 08/30/2017 7:12:06.45

CUDALucas v2.06beta 64-bit build, compiled May 5 2017 @ 12:59:32
...

You may experience a small delay on 1st startup to due to Just-in-Time Compilation

Using threads: square 256, splice 128.

Continuing M75556297 @ iteration 39562502 with fft length 4096K, 52.36% done

| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Aug 30 07:15:12 | M75556297 39600000 0x5518207e34330659 | 4096K 0.23438 4.9058 183.96s | 3:19:25:02 52.41% |
| Aug 30 07:19:21 | M75556297 39650000 0x2bad9f5347c15e3f | 4096K 0.25391 4.9974 249.87s | 3:19:14:17 52.47% |
| Aug 30 07:23:30 | M75556297 39700000 0x409714b1fb6de406 | 4096K 0.24219 4.9680 248.40s | 3:19:03:30 52.54% |

From directory listing of savefile subdirectory,
08/28/2017 06:06 AM 9,444,580 s75556297.38800000.2589cc1517d6bf02.cls
08/28/2017 06:15 AM 9,444,580 s75556297.38900000.77d2868b541f949b.cls
08/28/2017 06:23 AM 9,444,580 s75556297.39000000.369fe67ddaa37664.cls
08/28/2017 06:31 AM 9,444,580 s75556297.39100000.c51042a328070f8e.cls
08/28/2017 06:39 AM 9,444,580 s75556297.39200000.080e6383149b4ee6.cls
08/28/2017 06:48 AM 9,444,580 s75556297.39300000.e29f1559f91747be.cls
08/28/2017 06:56 AM 9,444,580 s75556297.39400000.013a1dbe94811c4f.cls
08/28/2017 07:04 AM 9,444,580 s75556297.39500000.fd71092a79b5e89b.cls <---2 day gap in progress while gpu works at 100% per GPU-Z
08/30/2017 07:03 AM 9,444,580 s75556297.39514210.d3573505f18befcc.cls <---first ctl-c & restart 8/30
08/30/2017 07:11 AM 9,444,580 s75556297.39562501.d81f1aa9f965799c.cls <---second ctl-c & restart 8/30
08/30/2017 07:15 AM 9,444,580 s75556297.39600000.5518207e34330659.cls <---progress has resumed
08/30/2017 07:23 AM 9,444,580 s75556297.39700000.409714b1fb6de406.cls

(increasing savefile interval, the above is too often)

storm5510 2017-08-30 16:11

[QUOTE=kriesel]| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Aug 28 07:04:43 | M75556297 39500000 0xfd71092a79b5e89b | 4096K [B]0.24219[/B] 4.9563 247.81s | 2:01:40:09 52.27% |
| Aug 28 07:08:50 | M75556297 39550000 0x0a21c1ab52fdbbf3 | 4096K [B]0.23438[/B] 4.9660 248.30s | 2:01:36:01 52.34% |[/QUOTE]

If the numbers in bold above are the error rates then you are right on the edge. In the past, I have solved this by using a larger FFT. Try 8192K and see what happens.

kriesel 2017-08-30 19:34

[QUOTE=storm5510;466688]If the numbers in bold above are the error rates then you are right on the edge. In the past, I have solved this by using a larger FFT. Try 8192K and see what happens.[/QUOTE]

Thanks, that's an interesting idea.

I'd expect an excessive round-off error to give console output and trigger using the next fft length up under program control, which would be 4608, and otherwise look sort of like the following:

| Jul 08 09:34:57 | M43162057 34300000 0x0f7a084c7086eb32 | 2304K 0.28125 2.9831 149.15s | 7:24:04 79.46% |
| Jul 08 09:37:27 | M43162057 34350000 0xce13ce4d53e1aed8 | 2304K 0.29688 2.9920 149.60s | 7:21:33 79.58% |
Round off error at iteration = 34372600, err = 0.35938 > 0.35, fft = 2304K.
Restarting from last checkpoint to see if the error is repeatable.

Using threads: square 256, splice 128.

Continuing M43162057 @ iteration 34350001 with fft length 2304K, 79.58% done

| Jul 08 09:41:03 | M43162057 34400000 0x3c181e61808c70ec | 2304K 0.31250 2.9630 148.14s | 7:19:02 79.69% |
Looks like the error went away, continuing.

| Jul 08 09:43:31 | M43162057 34450000 0x6a69bb9057f0c09c | 2304K 0.28125 2.9751 148.75s | 7:16:32 79.81% |
| Jul 08 09:46:00 | M43162057 34500000 0x487d782e6829d566 | 2304K 0.27686 2.9697 148.48s | 7:14:01 79.93% |

(And, that one with larger round-off error went ok, since it matched one computed by someone else almost 9 years earlier)

The relevant part of the fft file for the recent event is
4096 75846319 4.4721
4608 85111207 5.3082
5184 95507747 5.9772
5292 97454309 6.7933
5600 103000823 6.9233
5832 107174381 6.9765
6144 112781477 7.4773
6272 115080019 7.5848
6480 118813021 7.9218
6561 120266023 8.0761
6912 126558077 8.0778
7168 131142761 8.4497
7200 131715607 8.8208
8192 149447533 9.2877

storm5510 2017-08-31 04:34

[QUOTE=kriesel]...And, that one with larger round-off error went ok, since it matched one computed by someone else almost 9 years earlier....[/QUOTE]

My fft file for a GTX 480 is somewhat different. I have some sizes you do not have, and vice-versa. Also, I cannot run threads square splice at the default values. The most I can go is 64 and 64. Beyond that, I get lots of resets. I'm told it is a "Compute 2.0" issue. Either way, I plan on replacing it sometime soon.

kriesel 2017-08-31 14:38

[QUOTE=storm5510;466725]My fft file for a GTX 480 is somewhat different. I have some sizes you do not have, and vice-versa. Also, I cannot run threads square splice at the default values. The most I can go is 64 and 64. Beyond that, I get lots of resets. I'm told it is a &quot;Compute 2.0&quot; issue. Either way, I plan on replacing it sometime soon.[/QUOTE]

Yes; the same version of CUDALucas will produce a different fft file when run on different gpu models.

The same gpu model will produce different fft files when run with different CUDALucas executables, linked for different CUDA levels or 32 vs. 64 bit cpu code.

There's a spread of about 10% on performance on the exact same piece of hardware, vs CUDA level and 32 vs 64 bit, in my experience. It does not seem predictable from one card model, what level will be best for another model. And it varies versus fft length too, which is fastest! There's some statistical variation mixed in too, to keep it interesting. Benchmarking twice does not produce quite the same results.

There are also some CUDA levels that produce bad results; 4.0 and 4.1 are to be avoided. The max fft length can cause benchmarking to go badly wrong.
Some card & software combinations should not be run with 1024 threads even in benchmarking. These distinctions are not present in the current software documentation.

Specific to your 480, not being able to run above 64 threads is odd. On one of my 480s, fft file excerpt:
4096 75846319 7.2006
4320 79902611 7.9204
4374 80879779 8.5764
4608 85111207 8.7498
4704 86845813 9.0271
5120 94353877 9.3045
5184 95507747 9.7606
5600 103000823 10.6475
6048 111056879 11.0352
6144 112781477 11.2654
6480 118813021 12.4754
7168 131142761 12.9233
7200 131715607 13.6493
7840 143161159 14.9880
8064 147162241 15.7425
8192 149447533 16.2815

from a threadbench, prohibiting using 1024 squaring threads, it's ok:
fft = 4608K, ave time = 8.7540 ms, square: 32, splice: 128
fft = 4608K, ave time = 8.7602 ms, square: 64, splice: 128
fft = 4608K, ave time = 8.8904 ms, square: 128, splice: 128
fft = 4608K, ave time = 8.9752 ms, square: 256, splice: 128
fft = 4608K, ave time = 9.0668 ms, square: 512, splice: 128
fft = 4608K, ave time = 8.7557 ms, square: 32, splice: 32
fft = 4608K, ave time = 8.7544 ms, square: 32, splice: 64
fft = 4608K, ave time = 8.7549 ms, square: 32, splice: 128
fft = 4608K, ave time = 8.7565 ms, square: 32, splice: 256
fft = 4608K, ave time = 8.7575 ms, square: 32, splice: 512
fft = 4608K, ave time = 8.7593 ms, square: 32, splice: 1024
fft = 4608K, min time = 8.7544 ms, square: 32, splice: 64

if 1024 for squaring threads is allowed, bad short timings are obtained and the problem thread count selected:

fft = 4608K, ave time = 1.3619 ms, square: 32, splice: 128
fft = 4608K, ave time = 1.3639 ms, square: 64, splice: 128
fft = 4608K, ave time = 1.4984 ms, square: 128, splice: 128
fft = 4608K, ave time = 1.6014 ms, square: 256, splice: 128
fft = 4608K, ave time = 1.6759 ms, square: 512, splice: 128<-up to here ok
fft = 4608K, ave time = 0.6920 ms, square: 1024, splice: 128<-all 1024-square-threads timings in this set on cc 2.x are too short. probably a failed call somewhere.
fft = 4608K, ave time = 0.6934 ms, square: 1024, splice: 32
fft = 4608K, ave time = 0.6922 ms, square: 1024, splice: 64
fft = 4608K, ave time = 0.6921 ms, square: 1024, splice: 128
fft = 4608K, ave time = 0.6923 ms, square: 1024, splice: 256
fft = 4608K, ave time = 0.6920 ms, square: 1024, splice: 512
fft = 4608K, ave time = 0.6953 ms, square: 1024, splice: 1024
fft = 4608K, min time = 0.6920 ms, square: 1024, splice: 512 <-- this sort of thing will produce all 0xff..fd residues if I recall correctly.

A runtime check added to the code for drastic speed differences would stop this from being implemented by users who don't catch that issue, and go on to produce bad residues quickly.

Similar issues appear in CUDAPm1.

Are you eying a nice shiny new fast GTX1080 Ti?

storm5510 2017-08-31 17:57

[QUOTE=kriesel;466758]Are you eying a nice shiny new fast GTX1080 Ti?[/QUOTE]

A GTX 1080, yes. As for the "Ti" variant, well, that is a bit out of my budget range, at least for now. I've seen price drops on 1080's recently. Some are in the mid $550 range now.

Here is the one I am looking at: It's clock speeds are higher than some others:

[URL="https://www.amazon.com/dp/B01GAI64GO/_encoding=UTF8?coliid=I34G2U9SILZLM5&colid=1NMURU3BH2FCN"]https://www.amazon.com/dp/B01GAI64GO/_encoding=UTF8?coliid=I34G2U9SILZLM5&colid=1NMURU3BH2FCN[/URL]

I have a space consideration also. The one above is 10.5" in length. Other are over 12, and I doubt I could fit them In this case. Before I order anything, I will actually measure to see what I have.

kladner 2017-08-31 19:55

Did you consider this Gigabyte model? 11" $549.99
[LIST][*] Boost: 1860 MHz / base: 1721 MHz in OC mode [*] Boost: 1835 MHz / base: 1695 MHz in gaming mode.Card size:H=41 L=280 W=114 mm 280 mm=11.02 in[/LIST]I am a big Gigabyte fanboy. I think their coolers are outstanding. I have 2 bought new: a 640 and a 1060. I also have an Ebay 570. The first 2 are running at this moment.

The 570 ran for quite a while, and still would if I replace the power connectors. These turned out to have much wear and poor contact, such that I realize one day that the PSU connectors were melting. I found replacements online, but never followed through.

I really like the 3 fan coolers on the high end GPUs. I know first hand that they are quieter at max RPM than Asus GTX 580 and an MSI 580. The Gigabyte 570 ran cool at a fairly high OC running MFAKTC and was much quieter than the 580s.

storm5510 2017-09-01 01:14

[QUOTE=kladner;466791]Did you consider this Gigabyte model? 11" $549.99
[LIST][*] Boost: 1860 MHz / base: 1721 MHz in OC mode [*] Boost: 1835 MHz / base: 1695 MHz in gaming mode.Card size:H=41 L=280 W=114 mm 280 mm=11.02 in[/LIST][/QUOTE]

I believe this is the one you are referring to:

[URL="https://www.amazon.com/Gigabyte-GeForce-Gaming-Graphics-GV-N1080G1/dp/B01GJEE9BG/ref=sr_1_3?s=pc&ie=UTF8&qid=1504227434&sr=1-3&keywords=gtx+1080"]https://www.amazon.com/Gigabyte-GeForce-Gaming-Graphics-GV-N1080G1/dp/B01GJEE9BG/ref=sr_1_3?s=pc&ie=UTF8&qid=1504227434&sr=1-3&keywords=gtx+1080[/URL]

As I wrote, I will have to get in there and do some measuring. There are two fans just behind the face covering. Either could be moved to another location. That would add an extra inch.

The one I was looking at is $84 more. Thanks for the reference! :smile:

kladner 2017-09-01 01:24

Eek! No link. Yes. that is the one. Sorry.

EDIT: Further "I have 2 bought new: a 640 and a 1060" should read ..."a 460 and a 1060." :redface:

storm5510 2017-09-01 06:26

[QUOTE=kladner;466819]Eek! No link. Yes. that is the one. Sorry.

EDIT: Further "I have 2 bought new: a 640 and a 1060" should read ..."a 460 and a 1060." :redface:[/QUOTE]

Try this one:

[URL="https://www.amazon.com/dp/B01GJEE9BG/_encoding=UTF8?coliid=IRF6EYBTMSNOT&colid=1NMURU3BH2FCN"]https://www.amazon.com/dp/B01GJEE9BG/_encoding=UTF8?coliid=IRF6EYBTMSNOT&colid=1NMURU3BH2FCN[/URL]

It's the same as you describe.

kriesel 2017-09-01 07:25

[QUOTE=storm5510;466775]A GTX 1080, yes. As for the "Ti" variant, well, that is a bit out of my budget range, at least for now. I've seen price drops on 1080's recently. Some are in the mid $550 range now.

Here is the one I am looking at: It's clock speeds are higher than some others:

[URL]https://www.amazon.com/dp/B01GAI64GO/_encoding=UTF8?coliid=I34G2U9SILZLM5&colid=1NMURU3BH2FCN[/URL]

I have a space consideration also. The one above is 10.5" in length. Other are over 12, and I doubt I could fit them In this case. Before I order anything, I will actually measure to see what I have.[/QUOTE]

Try Newegg, either on their site or eBay. I've seen specials down to $500.


All times are UTC. The time now is 22:30.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.