![]() |
[QUOTE=Dubslow;290170]Well okay, even ignoring the advertising, the fact remains that many different GT*s have turned in many different successful double checks with various versions of CUDALucas.[/QUOTE]
And many haven't... More than we should be comfortable with. I emphatically second George's (Prime95's) suggestion of a CUDALucas torture test. |
I do appreciate the discussion on the subject... I am going to swap 580s to see if I can get a good LL DC test on the card that has been doing TFing since December.
If that doesn't work, I'm going to go back to 1.49 and see if that helps. I haven't had a good LL DC check in quite a while with the current card, so I hope that is the problem. And I agree with the general problem here... it's not that I expect hardware to never fail, but I would have never suspected it was bad from the factory. If that ends up being the problem, I'll be disappointed. At this point though, who would know it the card was bad new or if it failed because of the 100% load from LL testing. I can't imagine nVidia runs these things at 100% for weeks at a time before release. Looking forward to a CUDALucas tortue test. Edit: By the way... the torture test is going to have to be robust because I [B]can [/B]run smaller tests on Prime and non-prime exponents and the card works fine. I have done this multiple times to troubleshoot. That's why I'm not 100% sure it's the card's problem. Right now I've been running LL DC tests that take about 24 hours and they have all failed. So, short tests may not be the answer unless someone can find a better way?? |
[QUOTE=Dubslow;290156]Well consider that the majority of tests have turned out well so far, and certainly nVidia wouldn't advertise the Compute Compatibilty and overall CUDA-ness of the GTXs if they weren't capable of doing accurate math. Having lightly been following this, it seems the only major difference is the CL version (meaning that previous versions at the same hardware settings have been successful, correct LaurV?).[/QUOTE]
No reason to be offended, or to be crossed at me :D, the fact that your personal gtx's are running well "sometime", it does not mean they are perfect. For example all of the cards I tested gave me [B][U]correct results[/U][/B] for ANY overclock I tried (close to 950MHz, which is 20%-30% more then their 750M-781M factory settings) when I was testing M216091, M756839 M859433. These tests take a couple of minutes, mostly half hour, and they stress the card a lot, their results are easy to remember (all are prime, no need to remember residues). I could say these are the "most prime" numbers in the world, because I proved them prime 100's of times, for every card I had in my hand, for every overclock I had in my head, for every version of CudaLucas I found on the forum, and for every burp I had come from my stomach. But with all the efforts, ALL (!!) versions of CL gave false residues from time to time, even on Tesla boards (I own two). So, of course I am still missing a reliable torture test! Or [B]at least an option (a switch) to instruct CL to make a FIXED number of iterations for a given exponent[/B]. With this I could make my own "test files" with residues, and a small batch file to compare the results, and still avoid running lenghtless hours for a test which I won't know if it is good or not. Implementing the "randomized" shift would be also brilliant, especially if it is done with a k given as cmd line parameter. As it is now, there is no control of this, even the -c switch does not work for the screen (it prints 10k iterations regardless of what -c value I gave to it). This was once fixed in the past (version 1.3 alpha, I think). Don't get me wrong, CudaLucas is a good program, and I like it. That is the reason why I want to have it better, otherwise I would not care... :P |
Hi ,
I can not update rw.cu. Now I try delete rw.cu. Thank you, |
1 Attachment(s)
Hi,
Ver 1.52 delete rw.cu. [code] cudalucas.1.52$ make /usr/local/cuda/bin/nvcc -O3 -I/usr/local/include -I/home/msft/NVIDIA_GPU_Computing_SDK/C/common/inc CUDALucas.cu -arch=sm_13 -c g++ -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm cudalucas.1.52$ cat input 216091 216093 cudalucas.1.52$ ./CUDALucas Usage: ./CUDALucas [-d device_number] -r | exponent | input_filename cudalucas.1.52$ ./CUDALucas input Start test of file 'input' Iteration 10000 2.6 msec/Iter ETA 6:52 M( 216091 )C, 0x30247786758b8792, n = 524288, CUDALucas v1.52 Iteration 20000 2.5 msec/Iter ETA 6:32 M( 216091 )C, 0x13e968bf40fda4d7, n = 524288, CUDALucas v1.52 Iteration 30000 2.6 msec/Iter ETA 6:12 M( 216091 )C, 0x540772c2abb7833a, n = 524288, CUDALucas v1.52 ^C cudalucas.1.52$ ./CUDALucas input Start test of file 'input' continuing work from a partial result Iteration 40000 2.6 msec/Iter ETA 5:52 M( 216091 )C, 0xc26da9695ac418c1, n = 524288, CUDALucas v1.52 Iteration 50000 2.5 msec/Iter ETA 5:32 M( 216091 )C, 0x95ce3ff44abdd1e5, n = 524288, CUDALucas v1.52 Iteration 60000 2.6 msec/Iter ETA 5:12 M( 216091 )C, 0x99aa87c495daffe7, n = 524288, CUDALucas v1.52 ^C cudalucas.1.52$ ./CUDALucas input Start test of file 'input' continuing work from a partial result Iteration 70000 2.6 msec/Iter ETA 4:52 M( 216091 )C, 0x505d249be3145893, n = 524288, CUDALucas v1.52 Iteration 80000 2.6 msec/Iter ETA 4:32 M( 216091 )C, 0xddf612c72037b8a1, n = 524288, CUDALucas v1.52 Iteration 90000 2.5 msec/Iter ETA 4:12 M( 216091 )C, 0xb5d8309a1ce9e2b6, n = 524288, CUDALucas v1.52 Iteration 100000 2.6 msec/Iter ETA 3:52 M( 216091 )C, 0x4de7f101ee1cb7a5, n = 524288, CUDALucas v1.52 Iteration 110000 2.6 msec/Iter ETA 3:32 M( 216091 )C, 0x10aa3286c0b03369, n = 524288, CUDALucas v1.52 Iteration 120000 2.5 msec/Iter ETA 3:12 M( 216091 )C, 0x3981b56788b529e2, n = 524288, CUDALucas v1.52 Iteration 130000 2.6 msec/Iter ETA 2:52 M( 216091 )C, 0x80438af231f8fccd, n = 524288, CUDALucas v1.52 Iteration 140000 2.5 msec/Iter ETA 2:32 M( 216091 )C, 0x669382faea06df89, n = 524288, CUDALucas v1.52 Iteration 150000 2.6 msec/Iter ETA 2:12 M( 216091 )C, 0x1b73cb121df7d6fa, n = 524288, CUDALucas v1.52 Iteration 160000 2.5 msec/Iter ETA 1:52 M( 216091 )C, 0xb391010f29c70ee1, n = 524288, CUDALucas v1.52 Iteration 170000 2.6 msec/Iter ETA 1:32 M( 216091 )C, 0x04055d84a77be1d8, n = 524288, CUDALucas v1.52 Iteration 180000 2.5 msec/Iter ETA 1:12 M( 216091 )C, 0xe3d74c104f02967d, n = 524288, CUDALucas v1.52 Iteration 190000 2.6 msec/Iter ETA 0:52 M( 216091 )C, 0x54b2a8b9cb149f9f, n = 524288, CUDALucas v1.52 Iteration 200000 2.7 msec/Iter ETA 0:32 M( 216091 )C, 0xf433496947b7b103, n = 524288, CUDALucas v1.52 Iteration 210000 2.6 msec/Iter ETA 0:12 M( 216091 )C, 0xcfe091c8f59f8a7b, n = 524288, CUDALucas v1.52 M( 216091 )P, n = 524288, CUDALucas v1.52 The checkpoint doesn't match current test. Current test will be restarted Iteration 10000 2.6 msec/Iter ETA 6:52 M( 216093 )C, 0x2161ee10fc60c1d0, n = 524288, CUDALucas v1.52 ^C cudalucas.1.52$ ./CUDALucas input Continue test of file 'input' at line 1 continuing work from a partial result Iteration 20000 2.5 msec/Iter ETA 6:32 M( 216093 )C, 0xa6efa5c6d1e423cb, n = 524288, CUDALucas v1.52 ^C cudalucas.1.52$ ./CUDALucas 216091 The checkpoint doesn't match current test. Current test will be restarted Iteration 10000 2.6 msec/Iter ETA 6:52 M( 216091 )C, 0x30247786758b8792, n = 524288, CUDALucas v1.52 ^C cudalucas.1.52$ ./CUDALucas -d 1 216091 device_number >= device_count ... exiting [/code] |
[QUOTE=msft;290397]Hi,
Ver 1.52 delete rw.cu. [/QUOTE] Thank you. I see some major changes, only one cu-file left. Read-write integrated into CUDALucas.cu? Will look into it later, must go to work. A quick compile is not possible for me: [CODE]F:\Eigene Dateien\Computing\CUDALucas\cudalucas.1.52\src>make "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0/bin/nvcc" -c CUDALucas.cu -o CUDALucas.cuda4.0.sm_20.WIN64.obj -m64 --ptxas-options=-v "-ccbin=C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\/bin" -DWIN64 -Xcompiler /EHsc,/ W3,/nologo,/Ox,/Oy,/GL -arch=sm_20 -DMERS_PACKAGE -DBIT_SIEVE -DTESTING_SMALL_EXPONENTS -DSIEVE_SIZE_IN_BYTES=32 -DNUM_S MALL_PRIMES=32768 -DDO_NOT_USE_LONG_DOUBLE "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0/include" "-IC:\Pr ogram Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0/include/cudart" "-IC:/ProgramData/NVIDIA Corporation/NVIDIA GPU Compu ting SDK 4.0/C/common/inc" -D__x86_64__ -O3 CUDALucas.cu CUDALucas.cu(10) : fatal error C1083: Cannot open include file: 'unistd.h': No such file or directory make: *** [CUDALucas.cuda4.0.sm_20.WIN64.obj] Fehler 2[/CODE] |
There is definitively a problem with higher-versions of CL (starting from 1.4x, where the FFT size was reduced). Because I still keep getting mismatched residues, I decided to do the following test:
I launched two copies of cudalucas on the same GTX580 card, overclocked at 820M (standard 781). The first is CL [B]v1.3alpha_eoc[/B], with fat-FFT (powers of 2), and the second is [B]v1.5[/B], posted here above (with "optimized" fft). Using the same card make sure that the test is not influenced by the different temperature/memory/etc of the different cards. The residues are output in (almost) the same time on the screen and they stop matching quite early, no test needed more then 100k iterations. I repeated the test many times, v1.3 gave always the same sequence, but the sequence of outputs for v1.5 changed one time. Lowering the clock to the standard 781MHz and repeating the test, they both output the same sequence of residues, but the sequences stop matching quite early too. This can not be detected for "mersenne prime" exponents related in the previous posts (mine, and msft's too) as both versions will use the same FFT sizes for such small exponents. [edit: I forget to mention, but is obvious, both copies were testing the same exponent, and the exponent was in the 26.02M range, the output was set to every 1k iterations, but only v1.3 take the -c into account; v1.5 still continued to output every 10k iterations, and I believe there is an offset difference between these numbering]. There is also a "resume" issue. I tested v1.50 for many exponents from 25M to 38M evenly distributed (about 5000 difference between them) and let each exponent run for a while and watch the size of the selected FFT's. For some exponents I got the message "rounding error is 0.48xxx, resuming with a larger fft", or so, but the test STARTED FROM THE BEGINNING (from the iteration 0, which is not quite normal!) and also.... it started with the SAME size of the FFT. But curiously, the error did not repeat second time. Sometime it repeated at a different iteration. Is there any "randomness" implemented there? (about suminp/errors checking, etc?) Question: Does version 1.52 solves the other reported issues? Like the "-c" switch for screen, larger accuracy for displayed ms per iteration, etc. Or I better should not waste my time to upgrading? Just now I decided to return to v1.3 and stay there until I will read good reports for the last versions.... |
1 Attachment(s)
Hi ,
Ver 1.53 support residue test. [code] cudalucas.1.53$ ./CUDALucas -r Iteration 10000 2.6 msec/Iter ETA 2:46:19 M( 5000000 )C, 0x8c6648628dd918a6, n = 524288, CUDALucas v1.53 Iteration 10000 4.3 msec/Iter ETA 11:05:59 M( 10000000 )C, 0x55318a84ffd14bc7, n = 786432, CUDALucas v1.53 Iteration 10000 4.9 msec/Iter ETA 16:39:19 M( 15000000 )C, 0x7a34e75acea86da1, n = 1048576, CUDALucas v1.53 Iteration 10000 6.9 msec/Iter ETA 33:18:59 M( 20000000 )C, 0xb6475f8cb0888740, n = 1310720, CUDALucas v1.53 Iteration 10000 8.5 msec/Iter ETA 55:31:59 M( 25000000 )C, 0x667565040b5b7aa3, n = 1572864, CUDALucas v1.53 Iteration 10000 9.5 msec/Iter ETA 74:58:29 M( 30000000 )C, 0xbf70feed29774eba, n = 1835008, CUDALucas v1.53 Iteration 10000 9.5 msec/Iter ETA 87:28:29 M( 35000000 )C, 0x1f1fb94d69da44f8, n = 2097152, CUDALucas v1.53 Iteration 10000 12.3 msec/Iter ETA 133:17:59 M( 40000000 )C, 0x2318fe9e59886055, n = 2359296, CUDALucas v1.53 Iteration 10000 14 msec/Iter ETA 174:57:39 M( 45000000 )C, 0x59b7ec093d4375da, n = 2621440, CUDALucas v1.53 Iteration 10000 17.2 msec/Iter ETA 259:40:29 M( 55000000 )C, 0x78f7b93265e51196, n = 3145728, CUDALucas v1.53 Iteration 10000 19.5 msec/Iter ETA 343:00:09 M( 65000000 )C, 0x679258d6765d8e5b, n = 3670016, CUDALucas v1.53 Iteration 10000 22.4 msec/Iter ETA 427:42:59 M( 70000000 )C, 0x652d4a670f44317e, n = 3932160, CUDALucas v1.53 Iteration 10000 19.5 msec/Iter ETA 395:46:49 M( 75000000 )C, 0xded1605fcc4a0f88, n = 4194304, CUDALucas v1.53 Iteration 10000 24.9 msec/Iter ETA 566:35:59 M( 85000000 )C, 0xf3211d616c78bc9f, n = 4718592, CUDALucas v1.53 Iteration 10000 28.6 msec/Iter ETA 738:48:39 M( 95000000 )C, 0x99f8f6024ac3c4d6, n = 5242880, CUDALucas v1.53 Iteration 10000 38.9 msec/Iter ETA 1055:26:59 M( 100000000 )C, 0x92ce4f1ec07668a3, n = 5767168, CUDALucas v1.53 [/code] |
1 Attachment(s)
Ver 1.54
Fix #776 issue. |
[QUOTE=msft;290432]Hi ,
Ver 1.53 support residue test. [code] <chop-chop> [/code][/QUOTE] Got the same residues with v1.50 (gave them one by one and waited for the first screen output). Unfortunately there is no way to check if they are right. As a proof, they are all different if I use version 1.3... Could you at least use the first prime AFTER the numbers you use? As for example 240007 instead of 240000. And in that case (prime exponent) we could use P95 with InterimResidues=10000 in prime.txt, to check if your residues are right. By the way, they are not, all are shifted with 2, comparing with P95. For example, CudaLucasL v1.50 says residue 10000 of M(240007)C, is 49C47F679CDBAD76, but Prime95 says: M240007 interim We4 residue 2C63EF13C3A62352 at iteration 10000 M240007 interim We4 residue FA7B1F9E5B420748 at iteration 10001 M240007 interim We4 residue 49C47F679CDBAD76 at iteration 10002 You see, they are all shifted with 2. Some of the screen outputs match P95's, some others are garbages, but curiously, the final residue is right: M( 240007 )C, 0xdedd56684fe9f928, n = 524288, CUDALucas v1.50 no more input Prime95: UID: LaurV/xxxxx, M240007 is not prime. Res64: DEDD56684FE9F928. We4: xxxxxxxx,28156,00000000 Can you better dig into this first? And the cosmetic after (like arranging the output in v1.3 format: [CODE]Iteration 24030000 M( 26116141 )C, 0x307ff04e86d033d7, n = 2097152, CUDALucas v1.3alpha_eoc (1:46 real, 3.5199 ms/iter, ETA 2:01:26) Iteration 24060000 M( 26116141 )C, 0xc275891c5f3f0fba, n = 2097152, CUDALucas v1.3alpha_eoc (1:50 real, 3.6570 ms/iter, ETA 2:04:20)[/CODE] and taking care of -c parameter - you see above there is a difference of 30k between screen outputs, helping me to have all the test in the 1000-lines buffer on screen) |
The off-by-2 is because the LL test runs p-2 iterations, but Prime95 fudges it to make it look like there are p iterations. If you had it print out every residue, the first one would be "Iteration 3/p".
|
| All times are UTC. The time now is 23:09. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.