mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-03-01, 20:21   #859
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

21438 Posts
Default I wasn't so lucky...

So I finsihed M26026433 with 1.49 and 1.61. They matched until iteration 20270000:

1.49 -
Code:
Iteration 20260000 6.5 msec/Iter ETA:37481.8 sec M( 26026433 )C, 0xc33c4e6850d51e00, n = 1572864, CUDALucas v1.49 
Iteration 20270000 6.5 msec/Iter ETA:37416.8 sec M( 26026433 )C, 0x20a4fdbbb8670afe, n = 1572864, CUDALucas v1.49
1.61 -
Code:
 
Iteration 20260000 M( 26026433 )C, 0xc33c4e6850d51e00, n = 1572864, CUDALucas v1.61 (0:58 real, 5.8703 ms/iter, ETA 9:23:32)
Iteration 20270000 M( 26026433 )C, 0xaf1bbcd34aa82da2, n = 1572864, CUDALucas v1.61 (0:59 real, 5.8674 ms/iter, ETA 9:22:17)
It's driving me crazy...

I attached the full run of both 1.49 and 1.61 just in case anyone else wants to do some testing.

Everyone else is having good luck with 1.58, so i think that it must be ok. I probably have bad video card memory. I have a few more papers to write and then I can concentrate on compliling the memory test. If anyone else beats me to it, no hard feelings.

BTW - 1.61 is a lot faster, so hopefully it will work. I'm going to start a few tests on 1.61 all by itself with a couple of exponents that have been posted since M26026433. Maybe by video card just doesn't like that exponent?

PS - Also accepting more 1.58 and 1.61 results posted here so we can continue to troubleshoot and gain confidence.
Attached Files
File Type: zip 149 & 161 results.zip (105.4 KB, 69 views)
flashjh is offline   Reply With Quote
Old 2012-03-01, 21:07   #860
apsen
 
Jun 2011

2038 Posts
Default

BTW we need to put flushing after each output line back in under Windows (the half lines and delays are driving me crazy )
apsen is offline   Reply With Quote
Old 2012-03-02, 02:18   #861
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3·3,221 Posts
Default

I just finished this morning testing of another two exponents and some others still had few hours to go when I left the house for job, one hour ago.

Both of them were done with v1.61 on (intentionally) the same card (GTX580 3GB mem version, overclocked to 822MHz (factory 791MHz)).

Strange is the fact that there was a match for 26166407, and one mismatch for 26177689.

I have the screen output saved in text files, for both the good and the bad, every 30k iterations, if someone need them for something (retest, comparisons, etc).

Quote:
Originally Posted by flashjh View Post
So I finsihed M26026433 with 1.49 and 1.61. They matched until iteration 20270000:
How about the final result? Did it match? I want to distinguish between a case when only the screen output is wrong and a case where the calculus itself is wrong. For example, if you do 756839 or 859433 with v1.3 and v1.48, you get half of the residues different, but they will both say the numbers are prime at the end. And if you try 240007, also you get some of them different, but the final residues match. Both v1.3 and v1.47 had a problem with writing the output on the screen (maybe some mis-synchronization, this is not my invention, you can test them together and against p95, and see), but they did the calculus right. I did not have any mismatch with v1.3alpha_eoc, 4.0/cc2.0, and when I had a mismatch, it proves later that my result was good and the original result was bad, as I said in this thread long ago.

But since we started with non-power-of-2-FFT optimization, everything got insane...

I like v1.61, much faster, nice output format. I will help with testing, I want to help making it more stable, but I don't have the time and the knowledge to help with programming right now.

If you want me to do something special related to that, just say.

I will redo this one from you (26026433), together with mine one from above which was mismatched (26177689), using the same idea (test them on the same card in parallel, if the card gets nuts/hot/bored, both expos should be fk'd'up) over the weekend and I will let you know. This time I will save all the partial files (by copying them every 30 minutes from c_xxxxxxx or t_xxxxxxx into c/t_xxxxxx_001/002/etc, I already wrote a batch file last night to help me with this). So in case we get a mismatch, we can re-do only the bad part. This will help to insulate the problem.

Last fiddled with by LaurV on 2012-03-02 at 02:34
LaurV is offline   Reply With Quote
Old 2012-03-02, 03:28   #862
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
Originally Posted by LaurV View Post
Strange is the fact that there was a match for 26166407, and one mismatch for 26177689.
Same here.

Quote:
I have the screen output saved in text files, for both the good and the bad, every 30k iterations, if someone need them for something (retest, comparisons, etc).
Yes, I did the same thing (attached to my initial post)

Quote:
How about the final result? Did it match? I want to distinguish between a case when only the screen output is wrong and a case where the calculus itself is wrong. For example, if you do 756839 or 859433 with v1.3 and v1.48, you get half of the residues different, but they will both say the numbers are prime at the end. And if you try 240007, also you get some of them different, but the final residues match. Both v1.3 and v1.47 had a problem with writing the output on the screen (maybe some mis-synchronization, this is not my invention, you can test them together and against p95, and see), but they did the calculus right. I did not have any mismatch with v1.3alpha_eoc, 4.0/cc2.0, and when I had a mismatch, it proves later that my result was good and the original result was bad, as I said in this thread long ago.
The final results did not match. After 20270000 they were different until completion.

Quote:
But since we started with non-power-of-2-FFT optimization, everything got insane...
I like v1.61, much faster, nice output format. I will help with testing, I want to help making it more stable, but I don't have the time and the knowledge to help with programming right now.

If you want me to do something special related to that, just say.
Much appreciated! I have a few more papers to write and then I can concentrate on getting the memtest working and helping to fix the problem.

Quote:
I will redo this one from you (26026433), together with mine one from above which was mismatched (26177689), using the same idea (test them on the same card in parallel, if the card gets nuts/hot/bored, both expos should be fk'd'up) over the weekend and I will let you know. This time I will save all the partial files (by copying them every 30 minutes from c_xxxxxxx or t_xxxxxxx into c/t_xxxxxx_001/002/etc, I already wrote a batch file last night to help me with this). So in case we get a mismatch, we can re-do only the bad part. This will help to insulate the problem.
Thanks. I haven't had the time to automate it yet. I still don't quite get it, but I'm sure we'll figure it out.

I wonder if my video card is getting too hot? Doesn't seem like the problem though because 1.49 has matched like 4 or 5 times now while the rest match sometimes and not other times.
flashjh is offline   Reply With Quote
Old 2012-03-02, 04:45   #863
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Hi,
Ver 1.62
Fix inv warning.
Fix durty print when exponent is prime.
Add fflush (Thanks apsen).
Add device information print.
Change residue test exponent.
Add save all check point file option.
Add Set fft length option.
Code Cleanup.
Thank you for lots of help.
Code:
$ ./CUDALucas -r
DEVICE:0------------------------
name                GeForce GTX 460
totalGlobalMem      804454400
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
totalConstMem       65536
major.minor         2.1
clockRate           1350000
textureAlignment    512
deviceOverlap       1
multiProcessorCount 7
Iteration 10000 M( 756893 )C, 0xb94c673f25fe7ded, n = 65536, CUDALucas v1.62 (0:04 real, 0.3873 ms/iter, ETA 4:46)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 65536, CUDALucas v1.62 (0:04 real, 0.3737 ms/iter, ETA 5:13)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 98304, CUDALucas v1.62 (0:05 real, 0.5869 ms/iter, ETA 12:07)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 98304, CUDALucas v1.62 (0:07 real, 0.6078 ms/iter, ETA 13:58)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.62 (0:09 real, 0.9124 ms/iter, ETA 45:00)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.62 (0:09 real, 0.9012 ms/iter, ETA 45:12)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.62 (0:21 real, 2.1014 ms/iter, ETA 4:03:45)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.62 (0:42 real, 4.1782 ms/iter, ETA 15:36:37)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.62 (0:58 real, 5.7841 ms/iter, ETA 33:42:29)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.62 (1:07 real, 6.7199 ms/iter, ETA 44:50:12)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.62 (1:22 real, 8.1972 ms/iter, ETA 59:05:16)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.62 (1:32 real, 9.2709 ms/iter, ETA 78:15:42)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.62 (1:32 real, 9.2723 ms/iter, ETA 83:53:17)
err = 0.411325, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.62 (1:30 real, 8.9835 ms/iter, ETA 92:40:47)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.62 (1:58 real, 11.7533 ms/iter, ETA 139:10:43)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.62 (1:58 real, 11.7494 ms/iter, ETA 140:39:59)

$ ./CUDALucas -c 1000 -s 756893
DEVICE:0------------------------
name                GeForce GTX 460
~~~
start M756893 fft length = 65536
Iteration 1000 M( 756893 )C, 0x615ea033a371ca9a, n = 65536, CUDALucas v1.62 (0:01 real, 0.5074 ms/iter, ETA 6:23)
Iteration 2000 M( 756893 )C, 0x76f26a440d5ccbf0, n = 65536, CUDALucas v1.62 (0:00 real, 0.3674 ms/iter, ETA 4:37)
Iteration 3000 M( 756893 )C, 0x09ce424e95d1537d, n = 65536, CUDALucas v1.62 (0:01 real, 0.3669 ms/iter, ETA 4:36)
Iteration 4000 M( 756893 )C, 0x8d8b29a43e8bda9e, n = 65536, CUDALucas v1.62 (0:00 real, 0.3797 ms/iter, ETA 4:45)
Iteration 5000 M( 756893 )C, 0x4ed704ca77266721, n = 65536, CUDALucas v1.62 (0:00 real, 0.3876 ms/iter, ETA 4:51)
Iteration 6000 M( 756893 )C, 0x7c8272c8bdd405cb, n = 65536, CUDALucas v1.62 (0:01 real, 0.3858 ms/iter, ETA 4:49)
^C caught.  Writing checkpoint.
$ ls
c756893    CUDALucas.cu  cuda_safecalls.h  s756893.1001  s756893.3001  s756893.5001  s756893.6452  timeval.c
CUDALucas  CUDALucas.o   Makefile          s756893.2001  s756893.4001  s756893.6001  t756893
Attached Files
File Type: bz2 CUDALucas.1.62.tar.bz2 (11.7 KB, 57 views)
msft is offline   Reply With Quote
Old 2012-03-02, 07:23   #864
msft
 
msft's Avatar
 
Jul 2009
Tokyo

11428 Posts
Default

I make mismatch score.
Code:
M26176441,1.49,19283A19B247BA__ by "LaurV" on 2012-02-12(#733)
M26176441,Prime95,C7949A2F450242__ by "Laurent Deniel"

M26026433,1.50,190df3dc67d21885,flashjh(#762)
M26026433,1.50,ee1b55e2b3e0c8b5,flashjh(#762)
M26026433,1.49,457f73d49f90b822,flashjh(#806)
M26026433,1.58,457f73d49f90b822,frmky(#852)
M26026433,1.61,8785a2a489e2a25b,flashjh(#859)
M26026433,1.49,457F73D49F90B822 by "LaurV" on 2012-02-12
M26026433,Prime95,457F73D49F90B822 by "Phantomas" on 2009-04-09

M26166389,1.58,CA3209D01495D1__ by "LaurV" on 2012-02-29(#836)
M26166389,Prime95,074DDCFBF8F78__ by "Ahmer Ali"

M26176597,1.58,AD22DF54329720__ by "LaurV" on 2012-02-29(#836)
M26176597,Prime95,458D46360B696D__ by "Alessandro Polverini"

M26177689,1.61,4D941722F9216A__ by "LaurV" on 2012-03-02(#861)
M26177689,Prime95,943F047277037E__ by "C. Cooper / S. Boone"
msft is offline   Reply With Quote
Old 2012-03-02, 08:17   #865
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

I have enhancement,Please wait next version.
msft is offline   Reply With Quote
Old 2012-03-02, 08:32   #866
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3×3,221 Posts
Default

Quote:
Originally Posted by msft View Post
I make mismatch score.
we have many other unreported, I can give you a list, but I think it does not help at all. Let's concentrate to the one we have the residue lists already and try to see where the error is coming from. I will do this over the weekend with v1.61

Quote:
Originally Posted by msft View Post
I have enhancement,Please wait next version.
Eagerly waiting for next version! KUTGW!
kudos!

Last fiddled with by LaurV on 2012-03-02 at 08:34
LaurV is offline   Reply With Quote
Old 2012-03-02, 09:06   #867
msft
 
msft's Avatar
 
Jul 2009
Tokyo

10011000102 Posts
Default

Quote:
Originally Posted by LaurV View Post
we have many other unreported, I can give you a list, but I think it does not help at all. Let's concentrate to the one we have the residue lists already and try to see where the error is coming from. I will do this over the weekend with v1.61
I think.
step 1)
Retry Prime95 (or gpulucas!).
step 2)
Retry CUDALucas (use mismatch report version).

Now I start M26166409 on Prime95.
msft is offline   Reply With Quote
Old 2012-03-02, 10:36   #868
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Quote:
Originally Posted by msft View Post
Now I start M26166409 on Prime95.
I sarch wrong place.

I start M26176441 on Prime95,and M26166389 on gpulucas.

I need reconfirm mismatch on my GTX-460 with mismach version.
msft is offline   Reply With Quote
Old 2012-03-02, 10:41   #869
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Ver 1.63
Only use complex to complex fft.
Code:
$ ./CUDALucas -r
DEVICE:0------------------------
name                GeForce GTX 460
totalGlobalMem      804454400
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
totalConstMem       65536
major.minor         2.1
clockRate           1350000
textureAlignment    512
deviceOverlap       1
multiProcessorCount 7
Iteration 10000 M( 756893 )C, 0xb94c673f25fe7ded, n = 65536, CUDALucas v1.63 (0:04 real, 0.3923 ms/iter, ETA 4:50)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 65536, CUDALucas v1.63 (0:04 real, 0.3830 ms/iter, ETA 5:21)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 98304, CUDALucas v1.63 (0:05 real, 0.5458 ms/iter, ETA 11:16)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 98304, CUDALucas v1.63 (0:06 real, 0.5427 ms/iter, ETA 12:28)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.63 (0:08 real, 0.7994 ms/iter, ETA 39:26)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.63 (0:08 real, 0.7788 ms/iter, ETA 39:04)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.63 (0:19 real, 1.8962 ms/iter, ETA 3:39:57)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.63 (0:37 real, 3.7644 ms/iter, ETA 14:03:50)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.63 (0:51 real, 5.1375 ms/iter, ETA 29:56:25)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.63 (1:00 real, 6.0356 ms/iter, ETA 40:16:16)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.63 (1:14 real, 7.4174 ms/iter, ETA 53:28:01)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2723 ms/iter, ETA 69:49:53)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2783 ms/iter, ETA 74:53:45)
err = 0.367069, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.63 (1:30 real, 9.0108 ms/iter, ETA 92:57:40)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.63 (1:46 real, 10.5272 ms/iter, ETA 124:39:33)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.63 (1:45 real, 10.5222 ms/iter, ETA 125:58:25)
Attached Files
File Type: bz2 CUDALucas.1.63.tar.bz2 (10.2 KB, 59 views)
msft is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 13:00.


Fri Aug 6 13:00:17 UTC 2021 up 14 days, 7:29, 1 user, load averages: 3.22, 2.90, 2.70

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.