mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

Manpowre 2013-05-10 09:17

[QUOTE=NBtarheel_33;339938]Thanks! I am interested, as I am sure all the GPU code authors are, in seeing if we get a match.

How many cores of the i5 are you using? 11 days is actually faster than CUDALucas! Must be better FFT size selection on Prime95.[/QUOTE]

I guess it depends which Nvidia card you run CudaLucas on. its only GK110 chip that has great FFT (If I understand it right).. so its gonna be interesting when I start this number on one of the titans today.. picking them up later..

Karl M Johnson 2013-05-10 13:11

[QUOTE=frmky;339933]Here's a trial Windows x64 binary:
[URL]https://www.dropbox.com/s/4lh34niqddm5tf8/CUDAmemtest_20130509.zip[/URL][/QUOTE]
[URL="https://pastee.org/tahpx"]Interesting[/URL]
That was the famous sand titan at stock memory clock(6Ghz).

And this is CUDALucas right now (same clocks): [CODE]
Continuing work from a partial result of M53988731 fft length = 3145728 iteration = 42964071
Iteration 42970000 M( 53988731 )C, 0xd0a32b33d0542180, n = 3145728, CUDALucas v2.03 err = 0.0840 (0:20 real, 1.9940 ms/iter, ETA 6:05:53)
Iteration 42980000 M( 53988731 )C, 0xe31d0672684cb622, n = 3145728, CUDALucas v2.03 err = 0.0859 (0:34 real, 3.3769 ms/iter, ETA 10:19:05)
Iteration 42990000 M( 53988731 )C, 0x0ee40a6432691ff6, n = 3145728, CUDALucas v2.03 err = 0.0869 (0:34 real, 3.3679 ms/iter, ETA 10:16:52)
Iteration 43000000 M( 53988731 )C, 0x54e37345d6613daf, n = 3145728, CUDALucas v2.03 err = 0.0869 (0:33 real, 3.3192 ms/iter, ETA 10:07:24)
Iteration 43010000 M( 53988731 )C, 0x13f54f8841c01dfa, n = 3145728, CUDALucas v2.03 err = 0.0869 (0:32 real, 3.2240 ms/iter, ETA 9:49:26)
iteration = 43010501 >= 1000 && err = 0.5 >= 0.35, fft length = 3145728, writing checkpoint file (because -t is enabled) and exiting.

Continuing work from a partial result of M53988731 fft length = 3145728 iteration = 43010402
Iteration 43020000 M( 53988731 )C, 0x115b0365163e0b69, n = 3145728, CUDALucas v2.03 err = 0.0830 (0:31 real, 3.1710 ms/iter, ETA 9:39:13)
iteration = 43026901 >= 1000 && err = 0.5 >= 0.35, fft length = 3145728, writing checkpoint file (because -t is enabled) and exiting.

Continuing work from a partial result of M53988731 fft length = 3145728 iteration = 43026802
Iteration 43030000 M( 53988731 )C, 0xb80b52bcce9110fe, n = 3145728, CUDALucas v2.03 err = 0.0820 (0:11 real, 1.1023 ms/iter, ETA 3:21:09)
Iteration 43040000 M( 53988731 )C, 0xc80fce5467cd62e8, n = 3145728, CUDALucas v2.03 err = 0.0820 (0:33 real, 3.3450 ms/iter, ETA 10:09:54)
Iteration 43050000 M( 53988731 )C, 0x27893285255058a0, n = 3145728, CUDALucas v2.03 err = 0.0859 (0:34 real, 3.3506 ms/iter, ETA 10:10:21)
Iteration 43060000 M( 53988731 )C, 0xe2588f231af43e8c, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.2605 ms/iter, ETA 9:53:24)
Iteration 43070000 M( 53988731 )C, 0x0f006ea53af12c03, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.3543 ms/iter, ETA 10:09:55)
Iteration 43080000 M( 53988731 )C, 0x56a7a9d2693abbca, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.2534 ms/iter, ETA 9:51:02)
Iteration 43090000 M( 53988731 )C, 0x77749b0f6f221371, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:32 real, 3.2034 ms/iter, ETA 9:41:25)
Iteration 43100000 M( 53988731 )C, 0x9ed7b5ca32a464a6, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.2563 ms/iter, ETA 9:50:28)
Iteration 43110000 M( 53988731 )C, 0x8497d31db8621538, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:32 real, 3.2765 ms/iter, ETA 9:53:35)
Iteration 43120000 M( 53988731 )C, 0xd90c203344268bc0, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.2987 ms/iter, ETA 9:57:03)
Iteration 43130000 M( 53988731 )C, 0xe2e8bdb7c2e04b8a, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:34 real, 3.3487 ms/iter, ETA 10:05:32)
Iteration 43140000 M( 53988731 )C, 0xc66ce771642d1044, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:34 real, 3.3608 ms/iter, ETA 10:07:10)
Iteration 43150000 M( 53988731 )C, 0xf0098612e91fa014, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.3550 ms/iter, ETA 10:05:34)
Iteration 43160000 M( 53988731 )C, 0x69390b6d9c644e2d, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.3053 ms/iter, ETA 9:56:03)
Iteration 43170000 M( 53988731 )C, 0xfb77cbb27e4a5cb8, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.2925 ms/iter, ETA 9:53:11)
iteration = 43175501 >= 1000 && err = 0.5 >= 0.35, fft length = 3145728, writing checkpoint file (because -t is enabled) and exiting.

Continuing work from a partial result of M53988731 fft length = 3145728 iteration = 43175402
Iteration 43180000 M( 53988731 )C, 0x3b1b57b0b16e2b96, n = 3145728, CUDALucas v2.03 err = 0.0820 (0:15 real, 1.5490 ms/iter, ETA 4:38:49)
Iteration 43190000 M( 53988731 )C, 0xe662ce8bf9221fd1, n = 3145728, CUDALucas v2.03 err = 0.0840 (0:32 real, 3.2411 ms/iter, ETA 9:42:51)
Iteration 43200000 M( 53988731 )C, 0xef6cbde69ab2165a, n = 3145728, CUDALucas v2.03 err = 0.0868 (0:33 real, 3.3382 ms/iter, ETA 9:59:45)
Iteration 43210000 M( 53988731 )C, 0xc9432422a09d65ec, n = 3145728, CUDALucas v2.03 err = 0.0868 (0:34 real, 3.3202 ms/iter, ETA 9:55:58)
Iteration 43220000 M( 53988731 )C, 0xd5daa75a96232c05, n = 3145728, CUDALucas v2.03 err = 0.0868 (0:33 real, 3.3503 ms/iter, ETA 10:00:49)
Iteration 43230000 M( 53988731 )C, 0x8f160ae9d4490cc8, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:34 real, 3.3508 ms/iter, ETA 10:00:21)
Iteration 43240000 M( 53988731 )C, 0x084c0440c9f1a1e1, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.3625 ms/iter, ETA 10:01:53)
Iteration 43250000 M( 53988731 )C, 0xad42703256c2c238, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.3433 ms/iter, ETA 9:57:53)
Iteration 43260000 M( 53988731 )C, 0x91e07bfbc1fa095e, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.3392 ms/iter, ETA 9:56:36)
Iteration 43270000 M( 53988731 )C, 0x57aaf145da36d83c, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.3599 ms/iter, ETA 9:59:44)
Iteration 43280000 M( 53988731 )C, 0x77a9f70113de0680, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:34 real, 3.3484 ms/iter, ETA 9:57:08)
Iteration 43290000 M( 53988731 )C, 0x2a4477ce9b0e246f, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.3111 ms/iter, ETA 9:49:55)
Iteration 43300000 M( 53988731 )C, 0x5e5d3b5801ccce4a, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.3335 ms/iter, ETA 9:53:21)
Iteration 43310000 M( 53988731 )C, 0xf474556799ff5b03, n = 3145728, CUDALucas v2.03 err = 0.0938 (0:33 real, 3.3038 ms/iter, ETA 9:47:31)
[/CODE]

owftheevil 2013-05-10 14:04

D%#!. Back to the drawing board.

TObject 2013-05-10 18:19

Does the video card get as hot during the memory test as when the CudaLucas is running?

owftheevil 2013-05-10 18:27

Just from casual observation, it seems that it does. But that's an interesting thought. The way it works, with lots of memory it would tend not to.

Karl M Johnson 2013-05-10 18:34

[URL="http://i.imgur.com/2ZapHWl.jpg"]Temperature[/URL] [URL="http://i.imgur.com/dxXqvQE.jpg"]is not[/URL] [URL="http://i.imgur.com/dlPi1YL.jpg"]the problem![/URL]
7 fans should be enough to cool one GPU, especially at stock clocks.
The problem is probably with vRAM voltage(too goddamn low), since GDDR5 should be able to run at 6Ghz.

owftheevil 2013-05-10 18:54

Its interesting in that in my test, the memory accesses are spread out whereas in CuLu, one chunk gets most of the load. I am going to alter the memtest to focus on one chunk at a time and see if the results are different.

Manpowre 2013-05-10 22:44

So, my titans, is setup, full development vs2010 amd x64, cuda 5.0 with latest nsight.
I finally managed to build CudaLucas code with my own project, with platform V100, awesome.. Then I can use the 5.0 lib of Cuda fully instead of the platform V90.

the 48th prime is estimated to run 60h and 19min.. with Titan set to double precision:

Starting M57885161 fft length = 3145728
iteration = 26 < 1000 && err = 0.359375 >= 0.25, increasing n from 3145728
Starting M57885161 fft length = 3670016
Iteration 10000 M( 57885161 )C, 0x76c27556683cd84d, n = 3670016, CUDALucas v2.03
err = 0.0117 (0:38 real, 3.7500 ms/iter, ETA 60:16:51)

The prime NBtarheel_33 came M82090249 with is estimated to run 104 hours.

Starting M82090249 fft length = 4194304
iteration = 23 < 1000 && err = 0.327148 >= 0.25, increasing n from 4194304
Starting M82090249 fft length = 4718592
Iteration 10000 M( 82090249 )C, 0x2b2f46c90b703416, n = 4718592, CUDALucas v2.03
err = 0.1211 (0:45 real, 4.5735 ms/iter, ETA 104:16:35)

kracker 2013-05-10 22:59

[QUOTE=Manpowre;340005]
The prime NBtarheel_33 came M82090249 with is estimated to run 104 hours.

Starting M82090249 fft length = 4194304
iteration = 23 < 1000 && err = 0.327148 >= 0.25, increasing n from 4194304
Starting M82090249 fft length = 4718592
Iteration 10000 M( 82090249 )C, 0x2b2f46c90b703416, n = 4718592, CUDALucas v2.03
err = 0.1211 (0:45 real, 4.5735 ms/iter, ETA 104:16:35)[/QUOTE]

M82090249 is not prime, by the way.

Manpowre 2013-05-10 23:02

[QUOTE=kracker;340009]M82090249 is not prime, by the way.[/QUOTE]

Yeah, I figured, someone wanted to run a test on it.. and I just wanted to see what kind of performance I could get compared to it.

owftheevil 2013-05-12 03:35

1 Attachment(s)
New and improved version of the memory test. I had to give up the ability to distinguish read and write errors to more closely mimic CuLu and CPm1's memory use patterns. My bad card gave 1555 errors in a 45 minute test, the good card again is without errors for the same test.


All times are UTC. The time now is 23:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.