![]() |
[QUOTE=flashjh;360493]Yes, I'll need to talk with James to have the PHP code updated to recognize the 2.05 format.[/QUOTE]Yes, you will. :smile:
If you can keep the [b]M( <exponent )C[/b] style that would be useful. As in, something like:[code]M( 10061 )C, 0x56eb9bb91825b188, offset = 4054, n = 1K, CUDALucas v2.05 Beta[/code] Other than that, the only change is the addition of the "offset" parameter? If so I can add support for that relatively easily. |
That's no problem, do you want the rest of the line to stay the same?
Edit: keep the AID at the end, if it is there? BTW - Thanks! |
These lines are now handled by mersenne.ca, and will (soon, hopefully) be handled by near-identical code on mersenne.org when I finish debugging the new manual-results parser there:[code]M( 10061 )C, 0x56eb9bb91825b188, offset = 9029, n = 1K, CUDALucas v2.05 Beta
M( 216091 )P, offset = 1234, n = 12K, CUDALucas v2.05 Beta[/code]I don't know if offset is relevant or would be printed in case of a prime, but it's handled anyways. Lines can also be prefixed by userID/compID if known:[code]UID: flashjh/Server, M( 25928543 )C, 0x24b8387cb9765463, n = 1572864, CUDALucas v1.46[/code]And yes, keep the AID at the end if available. So, with everything in:[code]UID: flashjh/Server, M( 10061 )C, 0x56eb9bb91825b188, offset = 4054, n = 1K, CUDALucas v2.05 Beta, AID: DD556623539A3B33B816E3C5F77D1D97[/code] |
[QUOTE=James Heinrich;360500]<>I don't know if offset is relevant or would be printed in case of a prime, but it's handled anyways.<>[/QUOTE]
You are correct, this is the line:[CODE]M( 57885161 )P, n = 3136K, CUDALucas v2.05 Beta[/CODE] |
Memtest results GTX 570, 844 core, 1600 vram
Last few lines of:
E:\CUDA\2.05-BETA>CudaLucas.exe -memtest 56 1 -d 1 [QUOTE]Position 22, Data Type 0, Iteration 1110000, Errors: 0, completed 47.23%, Read 82.60GB/s, Write 27.53GB/s, ETA 18:20) Position 22, Data Type 1, Iteration 1120000, Errors: 0, completed 47.66%, Read 82.44GB/s, Write 27.48GB/s, ETA 18:11) Position 22, Data Type 2, Iteration 1130000, Errors: 0, completed 48.09%, Read 82.32GB/s, Write 27.44GB/s, ETA 18:02) Position 22, Data Type 3, Iteration 1140000, Errors: 0, completed 48.51%, Read 82.41GB/s, Write 27.47GB/s, ETA 17:53) Position 22, Data Type 4, Iteration 1150000, Errors: 0, completed 48.94%, Read 82.67GB/s, Write 27.56GB/s, ETA 17:44) Position 23, Data Type 0, Iteration 1160000, Errors: 0, completed 49.36%, Read 82.30GB/s, Write 27.43GB/s, ETA 17:35) Position 23, Data Type 1, Iteration 1170000, Errors: 0, completed 49.79%, Read 82.50GB/s, Write 27.50GB/s, ETA 17:27) Position 23, Data Type 2, Iteration 1180000, Errors: 0, completed 50.21%, Read 82.60GB/s, Write 27.53GB/s, ETA 17:18) Position 23, Data Type 3, Iteration 1190000, Errors: 0, completed 50.64%, Read 82.67GB/s, Write 27.56GB/s, ETA 17:09) Position 23, Data Type 4, Iteration 1200000, Errors: 0, completed 51.06%, Read 82.28GB/s, Write 27.43GB/s, ETA 17:00) C:/CUDA/CuLu/src/CUDALucas.cu(1438) : cudaSafeCall() Runtime API error 2: out of memory.[/QUOTE] EDIT: The GTX 570 is a secondary GPU which does not drive a display, FWIW. I am now running CUDALucas -memtest 35 10 -d 1 4.11% complete, ETA: 04:09. |
1 Attachment(s)
[QUOTE=kladner;360682].....
I am now running CUDALucas -memtest 35 10 -d 1 4.11% complete, ETA: 04:09.[/QUOTE] The above completed successfully. GTX 570, 844 core, 1600 VRAM Attached is the latter part of the run; the buffer for cmd was not large enough. |
[QUOTE=LaurV;360489]:shock: They are allowed for ages, since 1.48 (the first stable one), few years ago. <...> Edit: sorry, let me be stupid few minutes each day... No coffee yet, this morning.<...> Beside of "shifts", any reasons to switch?[/QUOTE]
I read your original post the other day, and came back today since I finally have time, and saw that you edited you post. :smile: I am, indeed, talking about being able to do both tests with CUDALucas. There have been a significant amount of changes besides the shifting. A lot of work was done to eliminate errors from bad FFT selection. Primarily, now, CUDALucas handles FFT errors by reverting to the last save and increasing FFT appropriately. In fact, the original FFT selection is much better too. It makes sense to run the -cufftbench as you discussed to generate a good FFT file for your card. Memtest is also incorporated into this version now. [QUOTE]Edit 2: some simple mechanism to protect against fraud is still missing, I would[U] vote against[/U] accepting "first-time LL" [B][U]and[/U][/B] "DC" from cudaLucas, for the same exponent. What stops me to edit the "offset" parameter, to get the credit two times? You will find after 20 years that we missed a prime because some idiot credit-whore (I learned the word here on the forum, as someone called it, sorry). At least, with P95 is not so easy for childish individuals to fake a report, due to the we1 checksum, etc. Some simple security mechanism should be implemented, beside of shifting, to make it safer. Don't get me wrong, no disrespect for your work, shifting is an [B][U]immense[/U][/B] improvement to guard against software (FFT bugs), for which I am very grateful.[/QUOTE] I agree, but was falsely under the impression that the shift was what was missing to allow 1st-time and DCs on the same exponent. Your suggestion is heard loud and clear, but we need to know what ideas can be implemented to allow for both checks. Ideas? [QUOTE]Edit 3: (BTW, after updating the drivers, I am also getting negative iteration times and negative ETA's too, which are very accurate if you multiply them with (about) minus 28 (!?!??!), and consider them in minutes, not in hours :smile:, using the "old good version" 2.04, untouched since Dubslow made it. But the residues are right, and it is about 1% faster, so I let it be).[/QUOTE] If I may, I request that you please download and test 2.05Beta after your next exponent completes (there is a debug version if you need it). -t is always enabled now, but it is quite fast. I compile it with CUDA 5.5 using sm 13,20,30,35. I have actually tried to 'break' it messing with FFT sizes, etc. With rare exception it handles everything I throw at it and despite all my testing the checksums have all matched (so far). I've even had two that matched with two bad checksums ([URL="http://www.mersenne.org/report_exponent/?exp_lo=30424021&exp_hi=10000&B1=Get+status"]30424021[/URL] & [URL="http://www.mersenne.org/report_exponent/?exp_lo=30793229&exp_hi=10000&B1=Get+status"]30793229[/URL]). I also ran M[URL="http://www.mersenne.org/report_exponent/?exp_lo=62807803&exp_hi=62807803&B1=Get+status"]62807803[/URL] that [URL="http://www.mersenneforum.org/showthread.php?p=359101#post359101"]Lan_party[/URL] had trouble with and got a match, though I can't submit the result :rolleyes: If you (or anyone else) start testing, please test the keyboard interaction on Windows as I have been unable to get it to work. If it does work for you then I'll know it's my system. [QUOTE=James Heinrich;360500]These lines are now handled by mersenne.ca, and will (soon, hopefully) be handled by near-identical code on mersenne.org<>[/QUOTE] [QUOTE=Prime95;359314]Can we change the intermediate output (example below) so that it does not look very much like the final result lines?<>[/QUOTE] The changes to the output of 2.05Beta are done and I'll upload them shortly. The results file now outputs the format above, but the screen output is simplified for better formatting and to not allow GIMPS to read the result without a lot of changes that someone would need to do on purpose. In the future I'm sure we can implement output formatting similar to mfaktX. |
I ran the r47 version (not debug) overnight on both cards: 570 and 580. Event viewer shows two display driver restarts.
>>I experimented last night with the interactive feature. These are some great enhancements! Increase/decrease checkpoint interval works fine, as does +/- FFT. Auto FFT seems to make the best choices, so far. Decreasing always provoked a "restart with larger" response. Increasing several steps gave progressively slower performance. Toggle Polite works, as it always has.<< I have up to date FFT and Threads files, generated yesterday. Stability seems to have improved, as I got multiple successful -r selftest runs on both cards. -memtest 35 10 completed on the 570. It ran for about two hours on the 580 without errors, but I got impatient and did not let it finish. Let me know if there is any other info I might be able to provide. The two assignments are still running, so I don't know about residue matches, yet. I have 13 and 17 hours to go on those. |
1 Attachment(s)
[URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]r48[/URL] is up with the updated display and results file output
So the Windows version allows you to use the interactive mode? Edit: Attached a zip file with a .bat of [URL="http://www.mersenneforum.org/showthread.php?p=359102#post359102"]LaurV's post[/URL] (corrected) :smile: |
[QUOTE=flashjh;360733]r48 is up with the updated display and results file output
So the Windows version allows you to use the interactive mode?[/QUOTE] Yes. Win 7 Pro, 64 bit. 331.82 drivers. I'll get r48 if that is on Sourceforge now. EDIT: Got r48. Should have refreshed the page the last time I looked. |
[QUOTE=kladner;360732]I ran the r47 version (not debug) overnight on both cards: 570 and 580. Event viewer shows two display driver restarts.
[/QUOTE] I get device driver restart witn 2.03 and latest drivers too on both 580 and 590 boards. I found 306.23 to be the most stable driver for 5xx boards. |
What FFT lengths are you using when the restarts happen?
On my current 580 system with the latest drivers (and several of the last few) I get restarts on the following FFT lengths:[CODE] CUDALucas -cufftbench 3360 3360 6 CUDALucas -cufftbench 5040 5040 6 CUDALucas -cufftbench 5670 5670 6 CUDALucas -cufftbench 6720 6720 6[/CODE] I discovered these when I ran the .bat file I attached above. I have more 580s to test this on, but I haven't yet. I have tried updating and downgrading drivers with no luck. I have not rolled back to 306.23, I may try and see if it helps. Anyone notice if these match the ones you're having trouble with or are they different? [STRIKE]So the interactive does not work on my system. Anyone have any ideas as to what setting could be causing that? If I press a key with interactive enabled, I can see it (them) on the screen, but the program pauses and the only way to restart is to ^c and restart. [/STRIKE] I realized you have to press ENTER after the key press :redface: EDIT:[QUOTE=kladner;360735]EDIT: Got r48. Should have refreshed the page the last time I looked.[/QUOTE] I just put it there a while ago, it was probably after you looked the first time. |
[QUOTE=flashjh;360724]I agree, but was falsely under the impression that the shift was what was missing to allow 1st-time and DCs on the same exponent. Your suggestion is heard loud and clear, but we need to know what ideas can be implemented to allow for both checks. Ideas?[/QUOTE]
We have 3 options that I see: 1) Disallow CUDALucas from double-checking CUDALucas results (the status quo). 2) Allow double-checks as long as the shift counts are different. The downside: it is real easy to forge a double-check. 3) Add a security code to the CUDALucas final result output (a simple hash of the exponent and shift count). This code could be secret, which isn't allowed if CUDALucas is GPL. Those building executables would be entrusted with the secret code. Or, the code can be public. At least a forger of results has to go to the trouble of reading C code. There is an optional add-on to options 2 and 3. The Primenet server and/or GPU72 server can be upgraded to only give double-check exponents that were first tested by prime95. |
[QUOTE=Prime95;360756]1) Disallow CUDALucas from double-checking CUDALucas results (the status quo).
There is an optional add-on to options 2 and 3. The Primenet server and/or GPU72 server can be upgraded to only give double-check exponents that were first tested by prime95.[/QUOTE] I would like to argue that Option #1 is the only one which guarantees the integrity of the GIMPS knowledge of the status of MPs. Or, at least, doesn't leave it wide-open to attack. There are rarely times where security through obscurity is warranted, but this might be one of them. [COLOR="White"](Anyone good with a de-compiler / dis-assembler could still break this thing, though...)[/COLOR] Towards the end of not wasting resources, I will add to the top of my todo list to implement an additional option on the GPU72 manual assignment page: For CPU, For GPU. |
[QUOTE=chalsall;360759]I would like to argue that Option #1 is the only one which guarantees the security of the GIMPS knowledge of the status of MPs.[/QUOTE]
I don't disagree, except that we are already vulnerable. Prime95 uses option 3 with "secret" code. Will we make matters any worse by giving CUDALucas the exact same vulnerability? |
[QUOTE=Prime95;360761]Will we make matters any worse by giving CUDALucas the exact same vulnerability?[/QUOTE]
Perhaps this vulnerability should be closed. |
[QUOTE=kladner;360735]Yes. Win 7 Pro, 64 bit. 331.82 drivers.
[/QUOTE] Updated driver to 331.93 BETA. No noticeable difference. r48 seems determined to output each check point on two line, no matter how wide the CMD box. |
[QUOTE=kladner;360775]Updated driver to 331.93 BETA. No noticeable difference.
r48 seems determined to output each check point on two line, no matter how wide the CMD box.[/QUOTE] It always scrolled over two lines anyway, so I added a newline to make it easier to read. [STRIKE]Are you on windows or linux?[/STRIKE] Edit: I'll remove the newline on the next commit. |
[QUOTE=flashjh;360776]It always scrolled over two lines anyway, so I added a newline to make it easier to read. [STRIKE]Are you on windows or linux?[/STRIKE]
Edit: I'll remove the newline on the next commit.[/QUOTE] No biggy. I just wanted to confirm that it is not a malfunction. EDIT: Can it be made an option, to split or not to split? |
It's no problem. Changes are posted on sourceforge. If anything else needs updating, etc. just post it and I'll include it in a future commit.
I want to bring in the custom output formatting from mfactx, so I'll also look at line breaks, also. That will also allow for adding username and computer id to the results file line. |
[QUOTE=flashjh;360752]What FFT lengths are you using when the restarts happen?
....... [/QUOTE] Got the data, but forgot to post it, till now-[INDENT]30.8M exponent, 1728K, GTX 570 37.5M exponent, 2048K, GTX 580 [/INDENT] |
306.23 gives this-
[CODE]E:\CUDA\2.05-BETA\CL_2.05_A>cudalucas -r device_number >= device_count ... exiting (This is probably a driver problem)[/CODE]I'll move back up to something a bit more recent and see what happens. EDIT: Shoot! 314.22 gives the same result with CUDALucas.......R49. |
Back on driver 331.82. Seeming to run pretty well, again.
I have noticed that CUDALucas does not load the GPU's as heavily as mfaktc. My line-measured power consumption is down ~80 W with CL running on both cards. This is with nearly the OC core settings that mfaktc will run at. I still feel better about turning down the VRAM even from stock speeds to run CL, and it does affect the iteration time and the power consumption. Regardless, with CL running on both cards, the whole system is pulling ~720 W with P95 running all eight cores of an FX-8350 on P-1, with 24 GB of RAM allowed. If the GPU's were running mfaktc, the power draw would be a bit over 800 W. |
May I gently ask to also post GPU type, OS source name and release when you test new drivers?
Thanks :bow: Luigi |
[QUOTE=kladner;360788]306.23 gives this-
[CODE]E:\CUDA\2.05-BETA\CL_2.05_A>cudalucas -r device_number >= device_count ... exiting (This is probably a driver problem)[/CODE]I'll move back up to something a bit more recent and see what happens. EDIT: Shoot! 314.22 gives the same result with CUDALucas.......R49.[/QUOTE] 320.18 is the first WHQL built on CUDA 5.5, which should be the earliest release driver that would work with this build of CUDALucas. I have all CUDA install from 3.2 and up. I could try building earlier versions if you're interested. |
[QUOTE=kladner;360792]Back on driver 331.82. Seeming to run pretty well, again.
I have noticed that CUDALucas does not load the GPU's as heavily as mfaktc. My line-measured power consumption is down ~80 W with CL running on both cards. This is with nearly the OC core settings that mfaktc will run at. I still feel better about turning down the VRAM even from stock speeds to run CL, and it does affect the iteration time and the power consumption. Regardless, with CL running on both cards, the whole system is pulling ~720 W with P95 running all eight cores of an FX-8350 on P-1, with 24 GB of RAM allowed. If the GPU's were running mfaktc, the power draw would be a bit over 800 W.[/QUOTE] This is probably due to the amount of memcopy done back and forth between each iteration from host->device->host. As far as I have understood MfaktC, is that it keeps data in device mem. therefore activating the card alot more. |
[QUOTE=ET_;360795]May I gently ask to also post GPU type, OS source name and release when you test new drivers?
Thanks :bow: Luigi[/QUOTE] Sorry. I was running in sloppy late-night mode. Driver 331.82, latest WHQL The cards are a Gigabyte GTX 570, and an Asus GTX 580. Windows 7 Pro 64 bit, SP 1, all current Windows updates. More on request if I missed something. EDIT: Completed a DC on each card, matched residues on both. Before completion, the 580 log showed three batch file starts, or two restarts. This is an incomplete picture as it restarted several times in the previous evening. Some of these were spontaneous, while others had to do with switching out drivers. |
[QUOTE=flashjh;360781]It's no problem. Changes are posted on sourceforge. If anything else needs updating, etc. just post it and I'll include it in a future commit.
I want to bring in the custom output formatting from mfactx, so I'll also look at line breaks, also. That will also allow for adding username and computer id to the results file line.[/QUOTE] Thanks, Jerry. For some reason, I can sort out the lines more easily without the break, in spite of the rather wide box that requires. EDIT: Another display driver restart, GTX 570 running CUDALucas_BETA_2.05_r49, 580 running mfaktc. [CODE]C:/CUDA/CuLu/src/CUDALucas.cu(372) : cudaSafeCall() Runtime API error 30: unknown error.[/CODE]Aside from the CL batch file loop restart, this did not seem to cause any disruption. mfaktc appeared to be unaffected. |
Well, it took me a wile to uncover this bug... I was beginning to think I am stupid :smile:, because for all of you it was working, but for me not...
Then suddenly it came... (I had to take the options one by one and play with them!) I got so many errors about my cards not having enough memory, registers, wheels, purple lights, whatever, it even said I have minus few terabytes of RAM (!?!?), I was ready to give up... Then I tried to use the -info switch to see what freaking card he believes I have... ... And with -info switch it worked! Here is where it did hit me! I have "PrintDeviceInfo=0" in the ini file ("who the hack need that? I know what kind of card I have!"). If you have "PrintDeviceInfo=0" in the ini file, then the program not only ignore printing them on screen, but also ignores reading them for himself... :razz: [CODE] e:\CudaLucas\CL0>cl205b_x64r49 -info ------- DEVICE 0 ------- name GeForce GTX 580 Compatibility 2.0 clockRate (MHz) 1564 memClockRate (MHz) 2004 totalGlobalMem 1610612736 totalConstMem 65536 l2CacheSize 786432 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 16 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment 512 deviceOverlap 1 mkdir: cannot create directory `backup0': File exists Using threads: norm1 256, mult 128, norm2 128. Starting M37500769 fft length = 2048K SIGINT caught, writing checkpoint. Estimated time spent so far: 0:39 [COLOR=Red]<it works perfectly>[/COLOR] e:\CudaLucas\CL0>cl205b_x64r49 mkdir: cannot create directory `backup0': File exists Using threads: norm1 256, mult 128, norm2 128. over specifications Grid = 4096 try increasing norm1 threads (256) or decreasing FFT length (2048K) [COLOR=Red]<freaks out>[/COLOR] e:\CudaLucas\CL0> [/CODE] |
LaurV, Good find!
Found the problem in the init_device function. I tested it, but please test again, thanks! :smile: Committed the change and updated the .exe [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]files[/URL]. Edit: [QUOTE=Prime95;360761]I don't disagree, except that we are already vulnerable. Prime95 uses option 3 with "secret" code. Will we make matters any worse by giving CUDALucas the exact same vulnerability?[/QUOTE] [QUOTE=chalsall;360762]Perhaps this vulnerability should be closed.[/QUOTE] I like the idea of keeping the code open. I think the changes to Primenet/G72 are a great option for now, but they require a reasonable amount of work, right? (And there is no guarantee that someone getting assignments would use the right option anyway). As such, I think leaving things as they are, may work best and once CUDALucas is stable and produces reliable results we can readdress the need for the secret code. Thoughts? Also, CUDA 6 is going to (potentially significantly) change CUDALucas. This is one reason I don't think making big changes right now is a good idea. |
Just for recording, and as a guy who makes a living from writing code, I have nothing against "secret" CRCs. Small function in a dll, cudaLucas can call to it and generate some key, which may also depend on the assignment key (if the work was "legally" reserved). It can call the function every 1M iterations, and every time add few characters to the key string. At the end, they would be easy to be verified without re-doing whole the work. We should not be afraid of "vulnerabilities", and does not need to be something very complicate. Prime95 is fine as it is.
My point is that people who [U]know[/U] how to exploit the vulnerability are too clever and too mature to use the exploit, they are "above" the "credit hunting fever". You don't get money for it (you can not "fake" a prime, for example - it will be verified by others immediately), and you even don't get "fame", contrarily, someone can realize you are cheating the system and you will have more to lose and suffer from the community. The "guarding" has to be against "childish" and "cmd*-like" stuff, like editing a text line and reporting two times, which anybody could do. (I wanted to write "any kid", but realized that kids today are so clever... hehe...) (* for the new users here, "cmd" is a mersenneforum user who liked to do this kind of stupid things line adding all numbers with 37 digits to factorDB) |
Did [URL="http://www.mersenne.org/report_exponent/?exp_lo=37500769&exp_hi=&B1=Get+status"]37500769[/URL] few times, it is already "multiple" checked :smile: (I changed the version of cL to 2.05 on the way to a triple test, first was mismatch, then I realized that it didn't resume the work, but started from scratch. As the other test was almost finish, I decided to go back to 2.04 and finish both of them. So, now I got the same residue 3 times, so mine is good, for sure! Bookmarked, waiting for some P95 checkers to confirm...
The release 50 seems to work ok. I like the interactive options, and the thread tuning. The new drivers are a bit slower for older cards (gtx 580), but this is compensated by the tuning features and other small nice things... Good job! |
When I get some time, I'm going to try and compile from other CUDA versions to see if that helps with speed at all. I've gotten away from tracking the exact iteration times because I want to get the program stable. Maybe I'll worry about the speed when I'm racing to beat you all for DCing the next MP :razz:
So far when testing 2.05 the mismatches already had two LLs done, so my DC matched one. I completed [URL="http://www.mersenne.org/report_exponent/?exp_lo=30612941&exp_hi=&B1=Get+status"]30612941[/URL] yesterday and it was a mismatch. petrw1 picked up the assignment for DC, so I'll have to wait until it's done. I could run it for a TC, maybe tomorrow? For 2.05s sake, hopefully the DC matches mine. |
[QUOTE=Prime95;360756]We have 3 options that I see:
1) Disallow CUDALucas from double-checking CUDALucas results (the status quo). 2) Allow double-checks as long as the shift counts are different. The downside: it is real easy to forge a double-check. 3) Add a security code to the CUDALucas final result output (a simple hash of the exponent and shift count). This code could be secret, which isn't allowed if CUDALucas is GPL. Those building executables would be entrusted with the secret code. Or, the code can be public. At least a forger of results has to go to the trouble of reading C code. There is an optional add-on to options 2 and 3. The Primenet server and/or GPU72 server can be upgraded to only give double-check exponents that were first tested by prime95.[/QUOTE] The code that generates the security code could be a separate application. |
[QUOTE=LaurV;360902]Did [URL="http://www.mersenne.org/report_exponent/?exp_lo=37500769&exp_hi=&B1=Get+status"]37500769[/URL] few times, it is already "multiple" checked :smile: [/QUOTE]
Yay... I somehow got it. I'll probably toss it on my 7770. |
[QUOTE=kracker;360921]Yay... I somehow got it. I'll probably toss it on my 7770.[/QUOTE]
If you do that, you will get no credit. PrimeNet will not accept a result for this expo, if it does not come from P95. That is why all the discussions about shifts and secret CRCs. In the past we had a thread for this type of exponents (which were DC-ed and TC-ed [U]with CudaLucas[/U], and still gave different residues from the original test), to warn the potential crunchers that they must use P95 for them, otherwise they waste the time. I even was moderator and did the maintenance for that thread, but since the rights were restricted, the thread is forgotten and I can not find it. |
[QUOTE=LaurV;360925]If you do that, you will get no credit. PrimeNet will not accept a result for this expo, if it does not come from P95. That is why all the discussions about shifts and secret CRCs. In the past we had a thread for this type of exponents (which were DC-ed and TC-ed [U]with CudaLucas[/U], and still gave different residues from the original test), to warn the potential crunchers that they must use P95 for them, otherwise they waste the time. I even was moderator and did the maintenance for that thread, but since the rights were restricted, the thread is forgotten and I can not find it.[/QUOTE]
I use clLucas. I guess it is same as CudaLucas or considered different? EDIT: "if it does not come from P95" Sorry didn't see that. Tossing the exponent on a CPU. EDIT2: [URL="http://mersenneforum.org/showthread.php?t=16281"]This?[/URL] |
[QUOTE=kracker;360926]EDIT: "if it does not come from P95" Sorry didn't see that. Tossing the exponent on a CPU.
[/QUOTE] Yes, one of the checks, either the first LL or the DC, must come from P95, there is no other way, because up to now (i.e. before cudalucas 2.05, done last week) only P95 implemented the random shift. All other programs used shift zero (clLucas included) and in case of a FFT error, they will all get the same (wrong) residue. [QUOTE] EDIT2: [URL="http://mersenneforum.org/showthread.php?t=16281"]This?[/URL][/QUOTE]Yes, thanks, didn't really look (1.55 A.M. here!), had bookmark, which "expired", hehe. I will actualize the bookmark. You should read it, it has some basic theory which will help you understand the shifting process :razz: |
[QUOTE=owftheevil;360911]The code that generates the security code could be a separate application.[/QUOTE]
Would it be a program called by CUDALucas or could it be in a .dll file? Do you have something in mind? |
[QUOTE=flashjh;360932]Would it be a program called by CUDALucas or could it be in a .dll file? Do you have something in mind?[/QUOTE]
Could be either, or something else, and yes I have something in mind. |
FWIW, I have run one DC on my GTX 580, and two on the 570. All have matched. All were single first-time LL's. These runs included both intentional restarts, and program crash restarts. Various interactive features were invoked, and I played with clock speeds for both core and VRAM in the course of the tests.
|
I haven't been around for a while, but I recently started TF'ing with mfaktc on a 580, 670 and C2075. Now, with the latest updates to CUDALucas 2.05, I am actually able to run it on these GPUs (although the 580 has random runtime API errors if FFT length is not tweaked).
Are results from CUDALucas 2.05 accepted at PrimeNet? I completed an exponent in a few days, and the system didn't find any CUDALucas lines in the results.txt, but there was a line there indicating version 2.05. Also, are there any tests I should run and report here? |
1 Attachment(s)
Welcome back!
A few of us have some particular FFTs that won't work and the issue with the program stopping will get addressed. We're using a [URL="http://www.mersenneforum.org/showpost.php?p=360417&postcount=2030"][COLOR=#0066cc]simple loop[/COLOR][/URL] to keep CUDALucas going if it stops until the code is fixed. As for tests: 1) For your cards, you should run the batch file attached for each card. It will take a while and some of the FFTs may fail as you've experienced, but it will create two files that help fine-tune CUDALucas for each card. 2) Run the built-in [URL="http://www.mersenneforum.org/showpost.php?p=359754&postcount=2003"][COLOR=#0066cc]memtest[/COLOR][/URL]. CUDALucas -memtest k n. Read from mid Nov threads until now to see more info. 3) Run the built-in test CUDALucas -r. Make sure all residues match. The results are accepted as long as the exponent(s) don't already have a CUDALucas/mlucas residue. Download the latest version from [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"][COLOR=#0066cc]sourceforge[/COLOR][/URL] and it will format the results.txt file correctly. Use the format to properly format previous results. If you have any bugs/suggestions, let us know. Thanks for testing and your contribution. |
[QUOTE=flashjh;360992]The results are accepted as long as the exponent(s) don't already have a CUDALucas/mlucas [U]same[/U] residue [U](i.e. different residues are accepted, the server can't know which one is good, until DC-ed) and as long as you don't use the "user/computer/timestamp" option of cudaLucas. You ca use manual report form to report the results[/U].[/QUOTE]
underlined text is mine. The rest is a Jerry said. |
Said much better, thanks.:smile:
|
[QUOTE=owftheevil;360911]The code that generates the security code could be a separate application.[/QUOTE]
The API to call that program wouldn't be secret though and that could probably be abused. |
Thanks for the advice. I was running r47, so the formatting changes were not included. Once I reformatted the results.txt, [URL="http://www.mersenne.org/report_exponent/?exp_lo=56803127&exp_hi=&B1=Get+status"]PrimeNet recognized it[/URL]. I left the "AID" part at the end of the line. Should I run the same exponent again just to verify? This result is from the 670. I'll have another result in just 92 hours! :smile:
All cards pass all residue tests (CUDALucas -r). I ran a few very short memory tests (i.e. -memtest 6 2), and a longer one is presently running on the 580. One thing to note about the 580 that always has runtime API errors, is that it is also display card. Often the driver stops responding and recovers (331.82). The other two cards on which I have never seen a runtime error (yet) are on two different machines and are not the display cards. I ran the batch script, which generated the fft and threads .txt files, but some of the results are surprising to me. At 2592k, the optimal threads drops off: [CODE]... 2048 512 512 256 2.8779 2240 512 512 256 3.3209 2304 512 512 128 3.3607 2352 512 512 1024 3.8242 2592 64 32 32 3.9552 2688 64 64 32 4.6925 2880 64 32 32 4.6117 3024 64 32 32 5.1544 3136 64 32 32 4.9940 ...[/CODE] That probably makes sense for a 580 with 3GB, but I just wanted to make sure. |
[QUOTE=henryzz;361055]The API to call that program wouldn't be secret though and that could probably be abused.[/QUOTE]
You are right. I thought more about it last night and came to the same conclusion. Personally, I have no problem with changing the license to account for a closed source authenticator. |
[QUOTE=chappjc;361056]Thanks for the advice. I was running r47, so the formatting changes were not included. Once I reformatted the results.txt, [URL="http://www.mersenne.org/report_exponent/?exp_lo=56803127&exp_hi=&B1=Get+status"]PrimeNet recognized it[/URL]. I left the "AID" part at the end of the line. Should I run the same exponent again just to verify? This result is from the 670. I'll have another result in just 92 hours! :smile:
All cards pass all residue tests (CUDALucas -r). I ran a few very short memory tests (i.e. -memtest 6 2), and a longer one is presently running on the 580. One thing to note about the 580 that always has runtime API errors, is that it is also display card. Often the driver stops responding and recovers (331.82). The other two cards on which I have never seen a runtime error (yet) are on two different machines and are not the display cards. I ran the batch script, which generated the fft and threads .txt files, but some of the results are surprising to me. At 2592k, the optimal threads drops off: [CODE]... 2048 512 512 256 2.8779 2240 512 512 256 3.3209 2304 512 512 128 3.3607 2352 512 512 1024 3.8242 2592 64 32 32 3.9552 2688 64 64 32 4.6925 2880 64 32 32 4.6117 3024 64 32 32 5.1544 3136 64 32 32 4.9940 ...[/CODE]That probably makes sense for a 580 with 3GB, but I just wanted to make sure.[/QUOTE] That looks fishy to me. The third thread parameter flops around a lot, but the first two are usually pretty stable. How are the timings as compared to the fft bench test? I'd like to see the corresponding section of <gpu> fft.txt. |
[QUOTE=chappjc;361056]<>Should I run the same exponent again just to verify? This result is from the 670. I'll have another result in just 92 hours! :smile:[/QUOTE]You can run it again, but use P95. Otherwise, just let the natural DC process test it (whenever that will happen). Also, if you're in the process of 'verifying' that your cards are stable, I recommend you pull DCs from Primenet or GPU72; that way you will know if your card is producing good results or not. If it mismatches, you can post it [URL="http://www.mersenneforum.org/showthread.php?p=333819&goto=newpost"]here[/URL] which will tell others not to use CUDALucas to DC/TC the exponent. Sometimes folks will do a quick run on it for you so you can see which one (or both) was wrong. You can always do another run on the GPU, Primenet won't accept the run unless the residue is different.
[QUOTE]One thing to note about the 580 that always has runtime API errors, is that it is also display card. Often the driver stops responding and recovers (331.82). The other two cards on which I have never seen a runtime error (yet) are on two different machines and are not the display cards.[/QUOTE]I have a 580 with the same issue, and others have this problem with other cards. owftheevil said it's caused by the drivers, but it will get fixed. My 580 is not the display card and it still happens. |
[QUOTE=flashjh;361062]You can run it again, but use P95. Otherwise, just let the natural DC process test it (whenever that will happen). Also, if you're in the process of 'verifying' that your cards are stable, I recommend you pull DCs from Primenet or GPU72; that way you will know if your card is producing good results or not. If it mismatches, you can post it [URL="http://www.mersenneforum.org/showthread.php?p=333819&goto=newpost"]here[/URL] which will tell others not to use CUDALucas to DC/TC the exponent. Sometimes folks will do a quick run on it for you so you can see which one (or both) was wrong. You can always do another run on the GPU, Primenet won't accept the run unless the residue is different.
I have a 580 with the same issue, and others have this problem with other cards. owftheevil said it's caused by the drivers, but it will get fixed. My 580 is not the display card and it still happens.[/QUOTE] Maybe I misunderstand you, but the problem won't be fixed until Nvidia does something about their drivers. All I'm trying to do is make the batch files unnecessary for restarting CL when the error does occur. It won't take away the fft hangs, resetting drivers etc. By the way I have it working on Linux, but Windows is again another story. |
[QUOTE=owftheevil;361069]Maybe I misunderstand you, but the problem won't be fixed until Nvidia does something about their drivers. All I'm trying to do is make the batch files unnecessary for restarting CL when the error does occur. It won't take away the fft hangs, resetting drivers etc. By the way I have it working on Linux, but Windows is again another story.[/QUOTE]
Ok, so you can detect and restart, but the 'real' problem is the drivers? I thought it was a good fix. Sorry for the confusion. If you have the code working for Linux, can you commit/merge it with the changes on SourceForge so I can take a look at it on Windows? |
[QUOTE=owftheevil;361060]That looks fishy to me. The third thread parameter flops around a lot, but the first two are usually pretty stable. How are the timings as compared to the fft bench test? I'd like to see the corresponding section of <gpu> fft.txt.[/QUOTE]
From "GeForce GTX 580 fft.txt": [CODE] 2048 38492887 2.9761 2160 40551479 3.5742 2240 42020509 3.6679 2304 43194913 3.6846 2592 48471289 3.9861 2880 53735041 4.6150 3072 57237889 4.9730 3136 58404433 4.9740[/CODE] Do you want to see the full output from -cufftbench 2592 2592 6? |
[QUOTE=flashjh;361070]If you have the code working for Linux, can you commit/merge it with the changes on SourceForge so I can take a look at it on Windows?[/QUOTE]
Just putting this out there for thought... If you're not having fun, perhaps you should be doing something different. Clearly we're having fun here, even if some don't understand the interest, the work, or the humor.... |
[QUOTE=chappjc;361071]From "GeForce GTX 580 fft.txt":
[CODE] 2048 38492887 2.9761 2160 40551479 3.5742 2240 42020509 3.6679 2304 43194913 3.6846 2592 48471289 3.9861 2880 53735041 4.6150 3072 57237889 4.9730 3136 58404433 4.9740[/CODE]Do you want to see the full output from -cufftbench 2592 2592 6?[/QUOTE] Yes, that would be useful. |
1 Attachment(s)
Attached is the output of [FONT="Courier New"]CUDALucas -cufftbench 2592 2592 6[/FONT]. I don't get what it means by "best time" as it seems unrelated to the "ave time" values reported for the different threads.
|
Thanks for those results. I'm still perplexed.
The first 36 lines are only timing the two normalization kernels which are the only things that depend on the thread values being varied. The last six lines are testing a full LL iteration with the two normalization kernels, the multiplication kernel, and two ffts. |
CUDALucas 2.05 Beta r52 posted to sourceforge. New Windows executables are [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]here[/URL].
The code to exit when one of those fft hangs occurs is deleted. The problem is that windows resets the driver after the timeout error and the code needs to wait and then check to see if everything is ready to go. [B]Just a headsup, the timing is now handled a little differently, so tests resumed from old savefiles will give incorrect ETAs.[/B] There is also now a simple checksum to verify the disk data, rather than the old, "does the save file have the prime q in the correct location" method of verification. So that old savefiles can be used with this new format, it doesn't enforce this yet but does give a warning that the checksums don't match. Other changes: 1. overflow error checking 2. consolidated device momory allocations, reduces amount used slightly 3. tighter fft selection 4. better error handling 5. method for thread testing a range instead of just a single fft (eg ./CUDALucas -cufftbench 8192 1 5, end of range first) 6. put the ffts back into threads test, slower but much more accurate results on cards used for display Please test and post results. Anyone have any verified mismatches with r50, please post and if you get any with this version, let us know. Thanks! |
[QUOTE=flashjh;361490]CUDALucas 2.05 Beta r52 posted to sourceforge. New Windows executables are [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]here[/URL].
The code to exit when one of those fft hangs occurs is deleted. The problem is that windows resets the driver after the timeout error and the code needs to wait and then check to see if everything is ready to go. [B]Just a headsup, the timing is now handled a little differently, so tests resumed from old savefiles will give incorrect ETAs.[/B] There is also now a simple checksum to verify the disk data, rather than the old, "does the save file have the prime q in the correct location" method of verification. So that old savefiles can be used with this new format, it doesn't enforce this yet but does give a warning that the checksums don't match. Other changes: 1. overflow error checking 2. consolidated device momory allocations, reduces amount used slightly 3. tighter fft selection 4. better error handling 5. method for thread testing a range instead of just a single fft (eg ./CUDALucas -cufftbench 8192 1 5, end of range first) 6. put the ffts back into threads test, slower but much more accurate results on cards used for display Please test and post results. Anyone have any verified mismatches with r50, please post and if you get any with this version, let us know. Thanks![/QUOTE] Is this a Windows-only update? Luigi |
No, it applies to Linux, also. I requested a Linux file for SourceForge, but if you can compile it, you the updates need to be tested. Thanks.
|
I'm still getting the stop error and the batch file needs to keep CUDALucas going.
This is the code identified for the error:[CODE][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff] void[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] reset_err([/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]float[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]* [/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080]maxerr[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2], [/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]float[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080]value[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]) { [/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#6f008a][FONT=Consolas][SIZE=2][COLOR=#6f008a][FONT=Consolas][SIZE=2][COLOR=#6f008a]cutilSafeCall[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] (cudaMemset (g_err, 0, [/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]sizeof[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] ([/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff][FONT=Consolas][SIZE=2][COLOR=#0000ff]float[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]))); *[/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080]maxerr[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2] *= [/SIZE][/FONT][/SIZE][/FONT][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080][FONT=Consolas][SIZE=2][COLOR=#808080]value[/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][/COLOR][/SIZE][/FONT][FONT=Consolas][SIZE=2][FONT=Consolas][SIZE=2]; } [/SIZE][/FONT][/SIZE][/FONT][/CODE] This is the screen output:[CODE]Using threads: norm1 128, mult 256, norm2 128. C:/CUDA/CuLu/src/CUDALucas.cu(543) : cufftSafeCall() CUFFT error 9999: CUFFT Unknown error code[/CODE] |
[QUOTE] Code:
Using threads: norm1 128, mult 256, norm2 128. C:/CUDA/CuLu/src/CUDALucas.cu(543) : cufftSafeCall() CUFFT error 9999: CUFFT Unknown error code [/QUOTE] Interesting. The error has changed, at least from the one I saw when the program quit for me. |
It's different now because owftheevil made changes to the code. Still seeing if we can get the program the catch and clear the fault without exiting on Windows.
|
I'm getting this error, if it's of any use to anybody:
[CODE]D:\Cuda\CUDALucas>CUDALucas_205Beta_x64_r52.exe -cufftbench 1 8192 5 ------- DEVICE 0 ------- name GeForce GTX 570 Compatibility 2.0 clockRate (MHz) 1464 memClockRate (MHz) 1900 totalGlobalMem 1342177280 totalConstMem 65536 l2CacheSize 655360 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 15 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment 512 deviceOverlap 1 CUDA bench, testing reasonable fft sizes 1K to 8192K, doing 5 passes. fft size = 1K, ave time = 0.0273 msec, max-ave = 0.00060 fft size = 2K, ave time = 0.0329 msec, max-ave = 0.00684 fft size = 3K, ave time = 0.0716 msec, max-ave = 0.00737 fft size = 4K, ave time = 0.0540 msec, max-ave = 0.00778 fft size = 5K, ave time = 0.0529 msec, max-ave = 0.00765 fft size = 6K, ave time = 0.0525 msec, max-ave = 0.00007 fft size = 7K, ave time = 0.1198 msec, max-ave = 0.00315 fft size = 8K, ave time = 0.0514 msec, max-ave = 0.00005 fft size = 9K, ave time = 0.0540 msec, max-ave = 0.00015 fft size = 10K, ave time = 0.0605 msec, max-ave = 0.00526 fft size = 12K, ave time = 0.0639 msec, max-ave = 0.00018 fft size = 14K, ave time = 0.0599 msec, max-ave = 0.00308 fft size = 15K, ave time = 0.1332 msec, max-ave = 0.00249 fft size = 16K, ave time = 0.0682 msec, max-ave = 0.00304 fft size = 18K, ave time = 0.0624 msec, max-ave = 0.00113 fft size = 20K, ave time = 0.0726 msec, max-ave = 0.00299 fft size = 21K, ave time = 0.0738 msec, max-ave = 0.00237 fft size = 24K, ave time = 0.0891 msec, max-ave = 0.00284 fft size = 25K, ave time = 0.1378 msec, max-ave = 0.00328 fft size = 27K, ave time = 0.1417 msec, max-ave = 0.00018 fft size = 28K, ave time = 0.0928 msec, max-ave = 0.00369 fft size = 30K, ave time = 0.0948 msec, max-ave = 0.00302 fft size = 32K, ave time = 0.0824 msec, max-ave = 0.00008 fft size = 35K, ave time = 0.1550 msec, max-ave = 0.00241 fft size = 36K, ave time = 0.0995 msec, max-ave = 0.00247 fft size = 40K, ave time = 0.1051 msec, max-ave = 0.00247 fft size = 42K, ave time = 0.1085 msec, max-ave = 0.00206 fft size = 45K, ave time = 0.1684 msec, max-ave = 0.00200 fft size = 48K, ave time = 0.1081 msec, max-ave = 0.00262 fft size = 49K, ave time = 0.1212 msec, max-ave = 0.00285 fft size = 50K, ave time = 0.1188 msec, max-ave = 0.00167 fft size = 54K, ave time = 0.1316 msec, max-ave = 0.00317 fft size = 56K, ave time = 0.1183 msec, max-ave = 0.00104 fft size = 60K, ave time = 0.1417 msec, max-ave = 0.00308 fft size = 63K, ave time = 0.1869 msec, max-ave = 0.00165 fft size = 64K, ave time = 0.1429 msec, max-ave = 0.00278 fft size = 70K, ave time = 0.1678 msec, max-ave = 0.00228 fft size = 72K, ave time = 0.1714 msec, max-ave = 0.00192 fft size = 75K, ave time = 0.2364 msec, max-ave = 0.00292 fft size = 80K, ave time = 0.1697 msec, max-ave = 0.00258 fft size = 81K, ave time = 0.1969 msec, max-ave = 0.00242 fft size = 84K, ave time = 0.1873 msec, max-ave = 0.00353 fft size = 90K, ave time = 0.1956 msec, max-ave = 0.00296 fft size = 96K, ave time = 0.1912 msec, max-ave = 0.00261 fft size = 98K, ave time = 0.2060 msec, max-ave = 0.00251 fft size = 100K, ave time = 0.2082 msec, max-ave = 0.00247 fft size = 105K, ave time = 0.2809 msec, max-ave = 0.01314 fft size = 108K, ave time = 0.2220 msec, max-ave = 0.00268 fft size = 112K, ave time = 0.2066 msec, max-ave = 0.00269 fft size = 120K, ave time = 0.2396 msec, max-ave = 0.00223 fft size = 125K, ave time = 0.3224 msec, max-ave = 0.00303 fft size = 126K, ave time = 0.2600 msec, max-ave = 0.00272 fft size = 128K, ave time = 0.2473 msec, max-ave = 0.00267 fft size = 135K, ave time = 0.3416 msec, max-ave = 0.00243 fft size = 140K, ave time = 0.2864 msec, max-ave = 0.00152 fft size = 144K, ave time = 0.2645 msec, max-ave = 0.00264 fft size = 147K, ave time = 0.3659 msec, max-ave = 0.00392 fft size = 150K, ave time = 0.3193 msec, max-ave = 0.00333 fft size = 160K, ave time = 0.2903 msec, max-ave = 0.00386 fft size = 162K, ave time = 0.3330 msec, max-ave = 0.00242 fft size = 168K, ave time = 0.3331 msec, max-ave = 0.00439 fft size = 175K, ave time = 0.4022 msec, max-ave = 0.00344 fft size = 180K, ave time = 0.3385 msec, max-ave = 0.00578 fft size = 189K, ave time = 0.4385 msec, max-ave = 0.00424 fft size = 192K, ave time = 0.3540 msec, max-ave = 0.00371 fft size = 196K, ave time = 0.3763 msec, max-ave = 0.00530 fft size = 200K, ave time = 0.3905 msec, max-ave = 0.00511 fft size = 210K, ave time = 0.4171 msec, max-ave = 0.00389 fft size = 216K, ave time = 0.4135 msec, max-ave = 0.00383 fft size = 224K, ave time = 0.3805 msec, max-ave = 0.00748 fft size = 225K, ave time = 0.4789 msec, max-ave = 0.00789 fft size = 240K, ave time = 0.4466 msec, max-ave = 0.01557 fft size = 243K, ave time = 0.4917 msec, max-ave = 0.00647 fft size = 245K, ave time = 0.5389 msec, max-ave = 0.00815 fft size = 250K, ave time = 0.4767 msec, max-ave = 0.00844 fft size = 252K, ave time = 0.4824 msec, max-ave = 0.00267 fft size = 256K, ave time = 0.4456 msec, max-ave = 0.00454 fft size = 270K, ave time = 0.5332 msec, max-ave = 0.00474 fft size = 280K, ave time = 0.5253 msec, max-ave = 0.00931 fft size = 288K, ave time = 0.4752 msec, max-ave = 0.01467 fft size = 294K, ave time = 0.5797 msec, max-ave = 0.01844 fft size = 300K, ave time = 0.5838 msec, max-ave = 0.01188 fft size = 315K, ave time = 0.6671 msec, max-ave = 0.00862 fft size = 320K, ave time = 0.5398 msec, max-ave = 0.00571 fft size = 324K, ave time = 0.6093 msec, max-ave = 0.00350 fft size = 336K, ave time = 0.6200 msec, max-ave = 0.00447 fft size = 343K, ave time = 0.6894 msec, max-ave = 0.00486 fft size = 350K, ave time = 0.6783 msec, max-ave = 0.00658 fft size = 360K, ave time = 0.6460 msec, max-ave = 0.00605 fft size = 375K, ave time = 0.8148 msec, max-ave = 0.00743 fft size = 378K, ave time = 0.7359 msec, max-ave = 0.00680 fft size = 384K, ave time = 0.6703 msec, max-ave = 0.00187 fft size = 392K, ave time = 0.7014 msec, max-ave = 0.00381 fft size = 400K, ave time = 0.7023 msec, max-ave = 0.00418 fft size = 405K, ave time = 0.8098 msec, max-ave = 0.00206 C:/CUDA/CuLu/src/CUDALucas.cu(1877) : cudaSafeCall() Runtime API error 6: the launch timed out and was terminated. C:/CUDA/CuLu/src/CUDALucas.cu(1886) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.[/CODE] |
I just tried r52 - no luck. I get the error "device_number >= device_count". I'm presently running CUDALucas 2.00 without problems on this Windows 7 box with a GTX 460.
|
[QUOTE=Prime95;362071]I just tried r52 - no luck. I get the error "device_number >= device_count". I'm presently running CUDALucas 2.00 without problems on this Windows 7 box with a GTX 460.[/QUOTE]I get that error if I use version 2xx.xx drivers with r52. Upgrading the drivers solved this for me.
|
I'm getting bad selftests on a GTX460 with r52. I have never had this before with earlier versions.
[CODE] C:\Users\John\Desktop\cudalucas>CUDALucas_205Beta_x64_r52.exe -r ------- DEVICE 0 ------- name GeForce GTX 460 Compatibility 2.1 clockRate (MHz) 1430 memClockRate (MHz) 1800 totalGlobalMem 1073741824 totalConstMem 65536 l2CacheSize 524288 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 7 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment 512 deviceOverlap 1 Using threads: norm1 256, mult 128, norm2 128. Starting self test M86243 fft length = 4K Iteration 10000 / 86243, 0x23992ccd735a03d9, 4K, CUDALucas v2.05 Beta err = 0.26563 (0:01 real, 0.0651 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M132049 fft length = 8K Iteration 10000 / 132049, 0x4c52a92b54635f9e, 8K, CUDALucas v2.05 Beta err = 0.00046 (0:01 real, 0.0709 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M216091 fft length = 16K Iteration 10000 / 216091, 0x30247786758b8792, 16K, CUDALucas v2.05 Beta err = 0.00001 (0:00 real, 0.0884 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M756839 fft length = 40K Iteration 10000 / 756839, 0x5d2cbe7cb24a109a, 40K, CUDALucas v2.05 Beta err = 0.03320 (0:02 real, 0.1868 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M859433 fft length = 48K Iteration 10000 / 859433, 0x3c4ad525c2d0aed0, 48K, CUDALucas v2.05 Beta err = 0.01074 (0:02 real, 0.1988 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M1257787 fft length = 64K Iteration 10000 / 1257787, 0x3f45bf9bea7213ea, 64K, CUDALucas v2.05 Beta err = 0.10938 (0:03 real, 0.2440 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M1398269 fft length = 128K Iteration 10000 / 1398269, 0xa4a6d2f0e34629db, 128K, CUDALucas v2.05 Beta err = 0.00000 (0:04 real, 0.4409 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M2976221 fft length = 256K Iteration 10000 / 2976221, 0x2a7111b7f70fea2f, 256K, CUDALucas v2.05 Beta err = 0.00001 (0:09 real, 0.8995 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M3021377 fft length = 256K Iteration 10000 / 3021377, 0x6387a70a85d46baf, 256K, CUDALucas v2.05 Beta err = 0.00001 (0:09 real, 0.8994 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M6972593 fft length = 512K Iteration 10000 / 6972593, 0x88f1d2640adb89e1, 512K, CUDALucas v2.05 Beta err = 0.00011 (0:18 real, 1.7766 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M13466917 fft length = 1024K Iteration 10000 / 13466917, 0x9fdc1f4092b15d69, 1024K, CUDALucas v2.05 Beta err = 0.00009 (0:37 real, 3.6937 ms/iter) This residue is correct. The fft length 2048K is too large for exponent 20996011, decreasing to 1024K Using threads: norm1 256, mult 128, norm2 128. Starting self test M20996011 fft length = 1024K Iteration 10000 / 20996011, 0x2a354d3a0f96e64e, 1024K, CUDALucas v2.05 Beta err = 0.50000 (0:37 real, 3.6876 ms/iter) [COLOR=red]Expected residue [5fc58920a821da11] does not match actual residue [2a354d3a0f96e64e] [/COLOR]The fft length 2048K is too large for exponent 24036583, decreasing to 1024K Using threads: norm1 256, mult 128, norm2 128. Starting self test M24036583 fft length = 1024K Iteration 10000 / 24036583, 0x47fba1785d32a924, 1024K, CUDALucas v2.05 Beta err = 1.00000 (0:51 real, 5.1785 ms/iter) [COLOR=red]Expected residue [cbdef38a0bdc4f00] does not match actual residue [47fba1785d32a924][/COLOR] Using threads: norm1 256, mult 128, norm2 128. Starting self test M25964951 fft length = 2048K Iteration 10000 / 25964951, 0x62eb3ff0a5f6237c, 2048K, CUDALucas v2.05 Beta err = 0.00008 (1:14 real, 7.4363 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M30402457 fft length = 2048K Iteration 10000 / 30402457, 0x0b8600ef47e69d27, 2048K, CUDALucas v2.05 Beta err = 0.00131 (1:15 real, 7.4195 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M32582657 fft length = 2048K Iteration 10000 / 32582657, 0x02751b7fcec76bb1, 2048K, CUDALucas v2.05 Beta err = 0.00537 (1:14 real, 7.4358 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M37156667 fft length = 2048K Iteration 10000 / 37156667, 0x67ad7646a1fad514, 2048K, CUDALucas v2.05 Beta err = 0.11719 (1:14 real, 7.4356 ms/iter) This residue is correct. The fft length 4096K is too large for exponent 42643801, decreasing to 2048K Using threads: norm1 256, mult 128, norm2 128. Starting self test M42643801 fft length = 2048K Iteration 10000 / 42643801, 0x93ec1e0141513b57, 2048K, CUDALucas v2.05 Beta err = 1.00000 (1:15 real, 7.4357 ms/iter) [COLOR=red]Expected residue [8f90d78d5007bba7] does not match actual residue [93ec1e0141513b57] [/COLOR]The fft length 4096K is too large for exponent 43112609, decreasing to 2048K Using threads: norm1 256, mult 128, norm2 128. Starting self test M43112609 fft length = 2048K Iteration 10000 / 43112609, 0x93f526f2d01c1686, 2048K, CUDALucas v2.05 Beta err = 1.00000 (1:14 real, 7.4352 ms/iter) [COLOR=red]Expected residue [e86891ebf6cd70c4] does not match actual residue [93f526f2d01c1686] [/COLOR]Using threads: norm1 256, mult 128, norm2 128. Starting self test M57885161 fft length = 4096K Iteration 10000 / 57885161, 0x76c27556683cd84d, 4096K, CUDALucas v2.05 Beta err = 0.00076 (2:37 real, 15.7022 ms/iter) This residue is correct. [COLOR=red]Error: There were 4 bad selftests! [/COLOR]C:\Users\John\Desktop\cudalucas>pause Press any key to continue . . . [/CODE] |
I can't speak for the bad self test yet, but the other problems are probably from the driver version, as stated above. I build with CUDA 5.5 now. If you need a different version let me know and I'll try to build one. Otherwise, updating to the newest drivers should fix the problem.
The bad self test may have something to do with FFT selection. We'll look at it. |
[QUOTE=mognuts;362083]I get that error if I use version 2xx.xx drivers with r52. Upgrading the drivers solved this for me.[/QUOTE]
I'm using driver 311.06. I'll try a newer one. |
[QUOTE=mognuts;362084]I'm getting bad selftests on a GTX460 with r52. I have never had this before with earlier versions.[/QUOTE]
FWIW, my GTX460 passes the selftest. I do have one minor bug. I ran "-cufftbench 2000 4100 1". It ran all the benches successfully, but the file to mail to james contained only one line for FFT length 2048K. |
[QUOTE=Prime95;362117]FWIW, my GTX460 passes the selftest.
I do have one minor bug. I ran "-cufftbench 2000 4100 1". It ran all the benches successfully, but the file to mail to james contained only one line for FFT length 2048K.[/QUOTE] -cufftbench is broken for me with r52. It crashes but doesn't bring down the driver. Makes no difference if I'm benchmarking a range of FFTs, or threads for a given FFT. r50 was fine. |
A lot of code was re written for r52. Will need to debugging. Keep posting errors and bugs, thanks :smile:
|
[QUOTE=mognuts;362084]I'm getting bad selftests on a GTX460 with r52. I have never had this before with earlier versions.
[CODE] C:\Users\John\Desktop\cudalucas>CUDALucas_205Beta_x64_r52.exe -r ------- DEVICE 0 ------- name GeForce GTX 460 Compatibility 2.1 clockRate (MHz) 1430 memClockRate (MHz) 1800 totalGlobalMem 1073741824 totalConstMem 65536 l2CacheSize 524288 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 7 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment 512 deviceOverlap 1 Using threads: norm1 256, mult 128, norm2 128. Starting self test M86243 fft length = 4K Iteration 10000 / 86243, 0x23992ccd735a03d9, 4K, CUDALucas v2.05 Beta err = 0.26563 (0:01 real, 0.0651 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M132049 fft length = 8K Iteration 10000 / 132049, 0x4c52a92b54635f9e, 8K, CUDALucas v2.05 Beta err = 0.00046 (0:01 real, 0.0709 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M216091 fft length = 16K Iteration 10000 / 216091, 0x30247786758b8792, 16K, CUDALucas v2.05 Beta err = 0.00001 (0:00 real, 0.0884 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M756839 fft length = 40K Iteration 10000 / 756839, 0x5d2cbe7cb24a109a, 40K, CUDALucas v2.05 Beta err = 0.03320 (0:02 real, 0.1868 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M859433 fft length = 48K Iteration 10000 / 859433, 0x3c4ad525c2d0aed0, 48K, CUDALucas v2.05 Beta err = 0.01074 (0:02 real, 0.1988 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M1257787 fft length = 64K Iteration 10000 / 1257787, 0x3f45bf9bea7213ea, 64K, CUDALucas v2.05 Beta err = 0.10938 (0:03 real, 0.2440 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M1398269 fft length = 128K Iteration 10000 / 1398269, 0xa4a6d2f0e34629db, 128K, CUDALucas v2.05 Beta err = 0.00000 (0:04 real, 0.4409 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M2976221 fft length = 256K Iteration 10000 / 2976221, 0x2a7111b7f70fea2f, 256K, CUDALucas v2.05 Beta err = 0.00001 (0:09 real, 0.8995 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M3021377 fft length = 256K Iteration 10000 / 3021377, 0x6387a70a85d46baf, 256K, CUDALucas v2.05 Beta err = 0.00001 (0:09 real, 0.8994 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M6972593 fft length = 512K Iteration 10000 / 6972593, 0x88f1d2640adb89e1, 512K, CUDALucas v2.05 Beta err = 0.00011 (0:18 real, 1.7766 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M13466917 fft length = 1024K Iteration 10000 / 13466917, 0x9fdc1f4092b15d69, 1024K, CUDALucas v2.05 Beta err = 0.00009 (0:37 real, 3.6937 ms/iter) This residue is correct. The fft length 2048K is too large for exponent 20996011, decreasing to 1024K Using threads: norm1 256, mult 128, norm2 128. Starting self test M20996011 fft length = 1024K Iteration 10000 / 20996011, 0x2a354d3a0f96e64e, 1024K, CUDALucas v2.05 Beta err = 0.50000 (0:37 real, 3.6876 ms/iter) [COLOR=red]Expected residue [5fc58920a821da11] does not match actual residue [2a354d3a0f96e64e] [/COLOR]The fft length 2048K is too large for exponent 24036583, decreasing to 1024K Using threads: norm1 256, mult 128, norm2 128. Starting self test M24036583 fft length = 1024K Iteration 10000 / 24036583, 0x47fba1785d32a924, 1024K, CUDALucas v2.05 Beta err = 1.00000 (0:51 real, 5.1785 ms/iter) [COLOR=red]Expected residue [cbdef38a0bdc4f00] does not match actual residue [47fba1785d32a924][/COLOR] Using threads: norm1 256, mult 128, norm2 128. Starting self test M25964951 fft length = 2048K Iteration 10000 / 25964951, 0x62eb3ff0a5f6237c, 2048K, CUDALucas v2.05 Beta err = 0.00008 (1:14 real, 7.4363 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M30402457 fft length = 2048K Iteration 10000 / 30402457, 0x0b8600ef47e69d27, 2048K, CUDALucas v2.05 Beta err = 0.00131 (1:15 real, 7.4195 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M32582657 fft length = 2048K Iteration 10000 / 32582657, 0x02751b7fcec76bb1, 2048K, CUDALucas v2.05 Beta err = 0.00537 (1:14 real, 7.4358 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M37156667 fft length = 2048K Iteration 10000 / 37156667, 0x67ad7646a1fad514, 2048K, CUDALucas v2.05 Beta err = 0.11719 (1:14 real, 7.4356 ms/iter) This residue is correct. The fft length 4096K is too large for exponent 42643801, decreasing to 2048K Using threads: norm1 256, mult 128, norm2 128. Starting self test M42643801 fft length = 2048K Iteration 10000 / 42643801, 0x93ec1e0141513b57, 2048K, CUDALucas v2.05 Beta err = 1.00000 (1:15 real, 7.4357 ms/iter) [COLOR=red]Expected residue [8f90d78d5007bba7] does not match actual residue [93ec1e0141513b57] [/COLOR]The fft length 4096K is too large for exponent 43112609, decreasing to 2048K Using threads: norm1 256, mult 128, norm2 128. Starting self test M43112609 fft length = 2048K Iteration 10000 / 43112609, 0x93f526f2d01c1686, 2048K, CUDALucas v2.05 Beta err = 1.00000 (1:14 real, 7.4352 ms/iter) [COLOR=red]Expected residue [e86891ebf6cd70c4] does not match actual residue [93f526f2d01c1686] [/COLOR]Using threads: norm1 256, mult 128, norm2 128. Starting self test M57885161 fft length = 4096K Iteration 10000 / 57885161, 0x76c27556683cd84d, 4096K, CUDALucas v2.05 Beta err = 0.00076 (2:37 real, 15.7022 ms/iter) This residue is correct. [COLOR=red]Error: There were 4 bad selftests! [/COLOR]C:\Users\John\Desktop\cudalucas>pause Press any key to continue . . . [/CODE][/QUOTE] This should be fixed with r53. Forgot to reinitialize a pointer after freeing the memory. |
[QUOTE=Prime95;362117]FWIW, my GTX460 passes the selftest.
I do have one minor bug. I ran "-cufftbench 2000 4100 1". It ran all the benches successfully, but the file to mail to james contained only one line for FFT length 2048K.[/QUOTE] Found the problem. I was making the silly assumption that limits would always be powers of 2. I should have the time to fix it tonight. |
[QUOTE=mognuts;362127]-cufftbench is broken for me with r52. It crashes but doesn't bring down the driver. Makes no difference if I'm benchmarking a range of FFTs, or threads for a given FFT. r50 was fine.[/QUOTE]
Crashes how? |
[QUOTE=owftheevil;362199]Crashes how?[/QUOTE]
This is the console output:[CODE] C:\Users\John\Desktop\cudalucas>CUDALucas_205Beta_x64_r52 -cufftbench 2048 1 5 ------- DEVICE 0 ------- name GeForce GTX 460 Compatibility 2.1 clockRate (MHz) 1430 memClockRate (MHz) 1800 totalGlobalMem 1073741824 totalConstMem 65536 l2CacheSize 524288 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 7 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment 512 deviceOverlap 1 Thread bench, testing various thread sizes for ffts 1K to 2048K, doing 5 passes. fft size = 1K, ave time = 6.5100 msec, Norm1 threads 32, Norm2 threads 32 fft size = 1K, ave time = 6.5094 msec, Norm1 threads 32, Norm2 threads 64 fft size = 1K, ave time = 6.5098 msec, Norm1 threads 32, Norm2 threads 128 fft size = 1K, ave time = 6.5084 msec, Norm1 threads 32, Norm2 threads 256 fft size = 1K, ave time = 6.5089 msec, Norm1 threads 32, Norm2 threads 512 fft size = 1K, ave time = 6.5085 msec, Norm1 threads 32, Norm2 threads 1024 fft size = 1K, ave time = 6.5084 msec, Norm1 threads 64, Norm2 threads 32 fft size = 1K, ave time = 6.5088 msec, Norm1 threads 64, Norm2 threads 64 fft size = 1K, ave time = 6.5085 msec, Norm1 threads 64, Norm2 threads 128 fft size = 1K, ave time = 6.5087 msec, Norm1 threads 64, Norm2 threads 256 fft size = 1K, ave time = 6.5080 msec, Norm1 threads 64, Norm2 threads 512 fft size = 1K, ave time = 6.5087 msec, Norm1 threads 64, Norm2 threads 1024 fft size = 1K, ave time = 6.5084 msec, Norm1 threads 128, Norm2 threads 32 fft size = 1K, ave time = 6.5080 msec, Norm1 threads 128, Norm2 threads 64 fft size = 1K, ave time = 6.5082 msec, Norm1 threads 128, Norm2 threads 128 fft size = 1K, ave time = 6.5079 msec, Norm1 threads 128, Norm2 threads 256 fft size = 1K, ave time = 6.5082 msec, Norm1 threads 128, Norm2 threads 512 fft size = 1K, ave time = 6.5072 msec, Norm1 threads 128, Norm2 threads 1024 fft size = 1K, ave time = 6.5090 msec, Norm1 threads 256, Norm2 threads 32 fft size = 1K, ave time = 6.5091 msec, Norm1 threads 256, Norm2 threads 64 fft size = 1K, ave time = 6.5085 msec, Norm1 threads 256, Norm2 threads 128 fft size = 1K, ave time = 6.5088 msec, Norm1 threads 256, Norm2 threads 256 fft size = 1K, ave time = 6.5078 msec, Norm1 threads 256, Norm2 threads 512 fft size = 1K, ave time = 6.5085 msec, Norm1 threads 256, Norm2 threads 1024 fft size = 1K, ave time = 6.5099 msec, Norm1 threads 512, Norm2 threads 32 fft size = 1K, ave time = 6.5098 msec, Norm1 threads 512, Norm2 threads 64 fft size = 1K, ave time = 6.5093 msec, Norm1 threads 512, Norm2 threads 128 fft size = 1K, ave time = 6.5098 msec, Norm1 threads 512, Norm2 threads 256 fft size = 1K, ave time = 6.5096 msec, Norm1 threads 512, Norm2 threads 512 fft size = 1K, ave time = 6.5099 msec, Norm1 threads 512, Norm2 threads 1024 fft size = 1K, ave time = 5.9309 msec, Norm1 threads 128, Mult threads 32, Norm2 threads 1024 fft size = 1K, ave time = 5.9307 msec, Norm1 threads 128, Mult threads 64, Norm2 threads 1024 fft size = 1K, ave time = 5.9311 msec, Norm1 threads 128, Mult threads 128, Norm2 threads 1024 fft size = 1K, ave time = 5.9318 msec, Norm1 threads 128, Mult threads 256, Norm2 threads 1024 Best time for fft = 1K, time: 5.9307, t0 = 128, t1 = 64, t2 = 1024 [/CODE] Followed by a dialogue box containing the following text: [B]CUDALucas_205Beta_x64_r52.exe has stopped working[/B] A problem caused the program to stop working correctly. Windows will close the program and notify you if a solution is available. This happens regardless of the parameters used for -cufftbench. |
On a more positive note, r52 correctly found 3 known primes.:showoff:
M( 11213 )P, n = 1K, CUDALucas v2.05 Beta M( 1257787 )P, n = 64K, CUDALucas v2.05 Beta M( 2976221 )P, n = 256K, CUDALucas v2.05 Beta |
R53 is up, fixing the sparse <gpu> fft.txt file issue, the uninitialized pointer causing mismatched residues in the self-test, an incorrect fft length in the threads bench and a bad bounday case condition in the fft initialization.
@mognuts: I could not get the behaviour your 460 showed to happen, so I don't know if the problem is fixed or not. Windows version is not up yet. |
[QUOTE=owftheevil;362287]R53 is up, fixing the sparse <gpu> fft.txt file issue, the uninitialized pointer causing mismatched residues in the self-test, an incorrect fft length in the threads bench and a bad bounday case condition in the fft initialization.
@mognuts: I could not get the behaviour your 460 showed to happen, so I don't know if the problem is fixed or not. Windows version is not up yet.[/QUOTE] You are referring to CUDALucas, not CUDAPm1 issues, aren't you? Luigi |
r53 is on [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SourceForge[/URL].
[B].ini file is updated, please re-download.[/B] Formatting output can be customized now. Please run the tests in this [URL="http://www.mersenneforum.org/showthread.php?p=360992#post360992"][COLOR=#0066cc]post[/COLOR][/URL] and continute to post any issues or bugs. Thanks! [QUOTE=ET_;362291]You are referring to CUDALucas, not CUDAPm1 issues, aren't you? Luigi[/QUOTE] Yes |
Posted Win32 .exe files on SourceForge - first time I've built Win32 with 2.05 Beta, please test accordingly.
|
[URL="http://www.mersenneforum.org/26926727"]Successful[/URL] test of Win32 version of r53
I am now able to build CUDA version 4.0 and up, 64 bit only, if anyone needs a version, let me know. |
[URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SorceForge[/URL] updated with latest commit, currently r55. Minor formatting changes and updated makefile.win file to allow for Win32 or x64 compiles with CUDA 4.0 up to 5.5.
If anyone wants help compiling with make or in MSVS, let me know. Had another successful DC with Win32 version. With the help of petrw1 I have 23/24 good DCs. The bad one was probably caused by all my stopping/starting while compiling, etc. None the less, that's why we DC. |
1 Attachment(s)
[QUOTE=mognuts;362210]This is the console output:[CODE]<snip>
[/CODE] Followed by a dialogue box containing the following text: [B]CUDALucas_205Beta_x64_r52.exe has stopped working[/B] A problem caused the program to stop working correctly. Windows will close the program and notify you if a solution is available. This happens regardless of the parameters used for -cufftbench.[/QUOTE] I am running tests to cause the NVIDIA Windows Kernel Mode Driver failure. Testing all versions of NVidia WHQL drivers since 296.10. Those results later... @mognuts, I was able to (accidentally) reproduce the results you experienced. @owftheevil -Anytime I run -cufftbench fft# [B]smallerfft# [/B]1 it causes CUDALucas to crash like mognuts experienced -When I run -cufftbench fft# fft# any# it skips [U]some[/U] of the fft tests completely See the attached file for screenshot and bench.txt output for the skipped tests. I included the .exe file I'm using for testing. I'm currently on 314.22, but it doesn't seem to matter what driver I use. |
I'll take a look.
New commit r56, fixes a regression concerning command line input. Try to specify a nonstandard fft like 3150k and you'll see what I'm talking about. |
Windows r56 executables posted to [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SourceForge[/URL]
|
[QUOTE=flashjh;362676]Windows r56 executables posted to [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SourceForge[/URL][/QUOTE]
r56 just successfully completed double check of [URL="http://www.mersenne.org/report_exponent/?exp_lo=31010747&exp_hi=31010747&B1=Get+status"]31010747[/URL]. |
[QUOTE=mognuts;362958]r56 just successfully completed double check of [URL="http://www.mersenne.org/report_exponent/?exp_lo=31010747&exp_hi=31010747&B1=Get+status"]31010747[/URL].[/QUOTE]
Through the run I had a couple of these, but it didn't affect the result, or cause the drivers to stop working. [CODE]| Dec 26 22:33:05 | M 54297883 2820000 0xa5d98db4daef2036 | 3136K 0.06641 5.3687 53.68s | 3:04:59:19 5.19% | | Dec 26 22:33:59 | M 54297883 2830000 0x4a7f94d8efb62886 | 3136K 0.06641 5.3695 53.69s | 3:04:58:22 5.21% | | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Dec 26 22:34:53 | M 54297883 2840000 0xa2c3927d7aefb869 | 3136K 0.06641 5.3689 53.68s | 3:04:57:25 5.23% | | Dec 26 22:35:47 | M 54297883 2850000 0xf5b7f62de86145e4 | 3136K 0.06641 5.3705 53.70s | 3:04:56:28 5.24% | C:/CUDA/CuLu/src/CUDALucas.cu(1509) : cudaSafeCall() Runtime API error 6: the launch timed out and was terminated. Resetting device and restarting from last checkpoint. Using threads: norm1 256, mult 256, norm2 1024. C:/CUDA/CuLu/src/CUDALucas.cu(891) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.[/CODE] |
In 64-bit Linux, r56 segfaults when I run with -r, but has correctly completed a number of double-checks. r59 simply exits without starting a test from worktodo.txt.
|
Does r59 work ok with -r?
Edit: For the exiting without starting a test in r59, on line 3346 in CUDALucas.cu, take the negation off of get_next_assignment. That should do it, although I can't check it myself right now. |
No, r59 crashes as well.
[CODE]Program received signal SIGSEGV, Segmentation fault. 0x000000000040420f in init_lucas (x_packed=0x6ddba0, q=86243, n=0x7fffffffe044, j=0x7fffffffe040, offset=0x7fffffffe03c, total_time=0x0, time_adj=0x0, iter_adj=0x0) at CUDALucas.cu:1317 1317 *time_adj = *total_time; (gdb) bt #0 0x000000000040420f in init_lucas (x_packed=0x6ddba0, q=86243, n=0x7fffffffe044, j=0x7fffffffe040, offset=0x7fffffffe03c, total_time=0x0, time_adj=0x0, iter_adj=0x0) at CUDALucas.cu:1317 #1 0x000000000040bb65 in check_residue (ls=0) at CUDALucas.cu:2624 #2 0x000000000040df57 in main (argc=2, argv=0x7fffffffe1f8) at CUDALucas.cu:3334 (gdb) print total_time $1 = (unsigned long long *) 0x0 (gdb) print time_adj $2 = (unsigned long long *) 0x0 [/CODE] So time_adj is a null pointer. |
Thanks frmky, your information was very useful. r60 should fix these bugs.
|
Getting close to release!
r60 compiled and tested (still needs more). CUDA 4.2 up to 5.5 all working, release and debug. All posted to [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SourceForge[/URL]
This version (and r57 and up) include new rcb code from Prime95 that give about a 1% speed improvement! Exciting for CUDALucas, but does need testing, please. In my testing CUDA 5.5 and Win32 are slightly faster than earlier versions or x64 (but you may need a batch file to keep it going, see below) What works: -cufftbench -r -normal testing What Doesn't: -threadbench Didn't test: -memtest [U][B]For those experiencing stops: This is an nVidia driver issue. Here is some info and I included some workarounds[/B][/U] <=306.97 work with x86/x64 CUDA 4.2 and CUDA 5.0 builds perfectly fine and produces no restarts (at least none from my testing over several days). >=310.70 have resets no matter what platform/CUDA version including 5.5 with >=320.18. There are two workarounds for anyone experiencing a similar problem described by [URL="http://www.mersenneforum.org/showthread.php?p=362968#post362968"]mognuts[/URL]: 1) The best way to fix the error is to downgrade your driver to one of the versions <=306.97 as mentioned above. CUDA Driver Versions: [CODE]CUDA 5.5: CUDA 5.0 CUDA 4.2 331.82 19-Nov-13 314.22 25-Mar-13 301.42 22-May-12 331.65 07-Nov-13 314.07 18-Feb-13 296.10 13-Mar-12 331.58 21-Oct-13 310.90 05-Jan-13 295.73 21-Feb-12 327.23 19-Sep-13 310.70 17-Dec-12 285.62 24-Oct-11 320.49 01-Jul-13 [B]306.97 10-Oct-12[/B] 280.26 09-Aug-11 320.18 23-May-13 306.23 13-Sep-12 275.33 01-Jun-11 [/CODE] I did not actually test below 296.10 so I don't know where the CUDA changes over to < CUDA 4.2 but I figure most will be on 296.10 by now. Windows CUDALucas from CUDA 4.0 up to 5.5, 32 or 64 bit are on SourceForge Request: I need to know who else is having the *stop* issue and what driver and video card you have. I'm working with NVidia to try and get the drivers fixed, so it will be helpful to know what other cards have this issue. 2) The other 'fix' for this issue is to use a batch file similar to this: [CODE]@echo off Set count=0 Set program=CUDALucas2.05Beta-CUDA5.0-Win32-r60 :loop TITLE %program% Current Reset Count = %count% Set /A count+=1 rem echo %count% >> log.txt rem echo %count% %program%.exe GOTO loop[/CODE] This will restart CUDALucas each time it stops and allow you see how many resets have occurred, if you care. I have not been able to thoroughly test speeds yet; I know that CUDA 5.5 is usually faster, but at the cost of having the driver lockup. Combined with the batch file, there really is no issue other than if the restarts bother you as I've run many good DCs with the batch file. With <=306.97, you don't need the batch file and there are no restarts, but it could potentially be &slightly* slower. I would love to see actual test data from everyone. Also, if anyone does experience the *stop* while on <=306.97, please let me know ASAP so I can update this info and nVidia. As for reliability, I have completed many successful tests with 2.05 Beta, CUDA 4.0 up to 5.5, 32 and 64 bit. Many with a lot of stop and restarts and forced FFT size changes for testing the code. :smile: |
[URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]r62 posted[/URL] to fix the -threadbench problem
Usage for testing: [B]CUDALucas -cufftbench lb ub p (e.g. CUDALucas -cufftbench 1 8192 6)[/B] It gives a warning if either lb or ub is not a power of two. It works when they are not, but non optimal lengths near the edges of the range are likely to be included in <gpu> fft.txt. [B]CUDALucas -threadbench lb ub p m (e.g. CUDALucas -threadbench 1 8192 6 1) [/B] The new parameter m (usually 0 or 1) controls a little bit of the behavior of the test. m = 0 causes all reasonable fft lengths ( n a multiple of 1K, largest prime factor of n is 7) between lb * 1k and ub * 1k to be tested, m = 1 tests only the lengths in <gpu> fft.txt and the table in init_ffts. When testing the new versions run: CUDALucas -r CUDALucas -cufftbench 1 8192 6 CUDALucas -threadbench 1 8192 6 1 You can also run a [URL="http://www.mersenneforum.org/showthread.php?p=359754#post359754"]memtest[/URL]: CUDALucas -memtest k n where k * 25 MB of memory are tested, n * 10000 iterations are done for each of 5 data types at each of the k positions |
I got the following output:
[CODE] -- polite interval increased to 2 -- error_reset increased to 95 [/CODE] What does that mean? |
Can you post your Cudalucas.ini and the command line you're using to run the program?
|
1 Attachment(s)
I am running it as
[CODE] CUDALucas -d 0 [/CODE] |
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.