![]() |
[QUOTE=flashjh;364575][URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]r62 posted[/URL] to fix the -threadbench problem[/QUOTE]
threadbench fails on my cards under linux (64-bit). My maxGridSize[0] is 2[SUP]32[/SUP]-1, and 1024*maxGridSize[0] overflows 32-bit integer type. [CODE]--- CUDALucas.cu.orig +++ CUDALucas.cu @@ -2098,7 +2098,7 @@ fft = choose_fft_length(0, &j); while(fft <= n) { - if(isReasonable(fft) <= 1 && fft <= 1024 * g_dev.maxGridSize[0] && fft % 1024 == 0) { + if(isReasonable(fft) <= 1 && fft / 1024 <= g_dev.maxGridSize[0] && fft % 1024 == 0) { cufftSafeCall (cufftPlan1d (&g_plan, fft / 2, CUFFT_Z2Z, 1)); for(k = 0; k < 2; k++) { for (t = s[k]; t < e[k]; t++) {[/CODE]If Nvidia want to raise the gridsizes any higher in future hardware, they'll need to change the API (cuda_runtime_api.h declares maxGridSize as int) |
Thanks for pointing that out. It was fixed in r62, but by some form of inattention, found its way back into r63.
Edit: r64 is up, refixing the issue. |
[QUOTE=patrik;314706]I just completed a succesful double-check of M33273391 on my GPU with its memory downclocked to 1800 MHz. This seems to be the solution for this GPU.
The main problem for me was that I never underclocked (or overclocked) a GPU before, so I had to learn that nvidia had some tools that I could download.[/QUOTE] This card has now been producing results for almost one and a half year, mostly DC-ing in the 10M digit range (exp just above 33.2M). However, a few weeks ago Primenet was out of exponents in that range, so I got exponents around 34.4M. This made CudaLucas (2.03) select a different FFT length. All eight tests I ran for that range resulted in a non-matching residue, four of which were later confirmed bad by mprime. (The other four have tests by mprime in progress.) Could it be that the underclock was stable only for certain FFT lengths, but not for others? I don't remember what self tests I ran 17 months ago. My card is a Gigabyte GTX 570: GV-N570OC-13I V2.0 Windows 7 Home Premium, 64 bit (SP1). Nvidia driver 306.97 One nearby exponent, same FFT, completed a verified test on another card and machine. I underclocked the memory of the card producing bad tests another 100 MHz down to 1700 MHz and have started another test in the range I got the bad ones. List of exponents completed by this card in 2014: [CODE]M( 33336041 )C, 0xa1ba4b60955e507b, n = 1835008, CUDALucas v2.03 Verified M( 33336091 )C, 0xfeb4555212d4f4ee, n = 1835008, CUDALucas v2.03 Verified M( 33336169 )C, 0xdbd40eaf554b2675, n = 1835008, CUDALucas v2.03 Verified M( 33336197 )C, 0xbed813ed9c9d16dc, n = 1835008, CUDALucas v2.03 Verified M( 33336221 )C, 0xc9af6488b22ce5da, n = 1835008, CUDALucas v2.03 Verified M( 33336227 )C, 0xeed747f63850cd__, n = 1835008, CUDALucas v2.03 Unverified M( 33338551 )C, 0xaa883a35b9a57740, n = 1835008, CUDALucas v2.03 Verified M( 33338687 )C, 0x2aff2f81dea81d04, n = 1835008, CUDALucas v2.03 Verified M( 33338759 )C, 0x15ba6062f66d7f3c, n = 1835008, CUDALucas v2.03 Verified M( 33338801 )C, 0xd43485f4feddbec9, n = 1835008, CUDALucas v2.03 Verified M( 33338861 )C, 0xb4e5931bfa1a4323, n = 1835008, CUDALucas v2.03 Verified M( 33338911 )C, 0x2f179cdf89f39c41, n = 1835008, CUDALucas v2.03 Verified M( 33338941 )C, 0xc9654436bc93afec, n = 1835008, CUDALucas v2.03 Verified M( 33339049 )C, 0xefc01fd742be9490, n = 1835008, CUDALucas v2.03 Verified M( 33339079 )C, 0x0f5a27109253cba5, n = 1835008, CUDALucas v2.03 Verified M( 33340217 )C, 0xfebe365cd4d8262b, n = 1835008, CUDALucas v2.03 Verified M( 33340259 )C, 0x803db62a1d4878c4, n = 1835008, CUDALucas v2.03 Verified M( 33340273 )C, 0xbf815f87e5a244cd, n = 1835008, CUDALucas v2.03 Verified M( 33340331 )C, 0x5b8b46fd69e1cf3f, n = 1835008, CUDALucas v2.03 Verified M( 33340399 )C, 0x6055989608290c9f, n = 1835008, CUDALucas v2.03 Verified M( 33338969 )C, 0xac6b1e36d116665e, n = 1835008, CUDALucas v2.03 Verified M( 33340603 )C, 0x9a47d238a35be14b, n = 1835008, CUDALucas v2.03 Verified M( 33340693 )C, 0xc8a725591822a7__, n = 1835008, CUDALucas v2.03 Unverified M( 33340729 )C, 0x13115d90972b9106, n = 1835008, CUDALucas v2.03 Verified M( 33340751 )C, 0x61fd1ed6ada34567, n = 1835008, CUDALucas v2.03 Verified M( 33340969 )C, 0x12228227855bafd1, n = 1835008, CUDALucas v2.03 Verified M( 33340973 )C, 0xf6bd7103e731cf__, n = 1835008, CUDALucas v2.03 Unverified M( 33341003 )C, 0xe5da51c09e871f95, n = 1835008, CUDALucas v2.03 Verified M( 33340999 )C, 0x2bb766951d08d1b6, n = 1835008, CUDALucas v2.03 Verified M( 33341023 )C, 0x9bef45d1fac75275, n = 1835008, CUDALucas v2.03 Verified M( 33341047 )C, 0x9350917fded037b7, n = 1835008, CUDALucas v2.03 Verified M( 33341069 )C, 0x5ad9645e801a3b16, n = 1835008, CUDALucas v2.03 Verified [COLOR="Red"]M( 34440337 )C, 0x8c07d05f66675a37, n = 1966080, CUDALucas v2.03 Unverified, later bad M( 34440383 )C, 0xa5e3277786752569, n = 1966080, CUDALucas v2.03 Unverified, later bad M( 34440391 )C, 0xac5826c2d7e1caf5, n = 1966080, CUDALucas v2.03 Unverified, later bad M( 34440491 )C, 0x4752da9b55ebaea5, n = 1966080, CUDALucas v2.03 Unverified, later bad M( 34440643 )C, 0xa9e459ad5338c6__, n = 1966080, CUDALucas v2.03 Unverified M( 34440647 )C, 0x5c2e2245038073__, n = 1966080, CUDALucas v2.03 Unverified M( 34440649 )C, 0x266f082ba2d23f__, n = 1966080, CUDALucas v2.03 Unverified M( 34440739 )C, 0x0624bcde0c1aeb__, n = 1966080, CUDALucas v2.03 Unverified[/COLOR] M( 33407953 )C, 0x3b0945f27b1c7100, n = 1835008, CUDALucas v2.03 Verified M( 33408797 )C, 0x5c92f5d23fb209b9, n = 1835008, CUDALucas v2.03 Verified [/CODE] |
SaveAllCheckpoints
Somebody said: CudaLucas.ini SaveAllCheckpoints=1. A very important feature to enable if you want to roll back to a "good" check point.
But I don´t understand, how to roll back. Anybody any idea? Thx. |
That feature will save a new file every time it writes a checkpoint. It takes up a lot of space, but you can use it to go back to one of the files. To roll back you use one of those save files by renaming it so when CUDALucas is restarted it uses the renamed file.
|
[QUOTE=flashjh;369306]To roll back you use one of those save files by renaming it...[/QUOTE]Thx for quick reply, but renaming to ????
|
renaming to cxxxxxxx or txxxxxxx overwriting the existent last checkpoints. Look into the folder and see how the files are created and you will understand.
Caution: (in attention of [B]owftheevil[/B]) last version of cudalucas has a bug (nonexistent in 2.04): if you have a mismatched residue, you have to resume not from the last file with the same residue, but from one before. This is because the residues in the name of the file are [U]shifted[/U] by one compared with real residues written on screen. Dubslow's version didn't make this mess, so it is something newly introduced. If you resume from the last saved file having the same residue, it is possible that the respective file is already corrupted. To understand what I mean, compare the names of the checkpoint files with the residues written on screen. edit: I think this affects cudaPM1 too |
This is in 2.05?
BTW - Thanks LaurV for the help |
[QUOTE=LaurV;369312]renaming to cxxxxxxx or txxxxxxx overwriting the existent last checkpoints. Look into the folder and see how the files are created and you will understand.
Caution: (in attention of [B]owftheevil[/B]) last version of cudalucas has a bug (nonexistent in 2.04): if you have a mismatched residue, you have to resume not from the last file with the same residue, but from one before. This is because the residues in the name of the file are [U]shifted[/U] by one compared with real residues written on screen. Dubslow's version didn't make this mess, so it is something newly introduced. If you resume from the last saved file having the same residue, it is possible that the respective file is already corrupted. To understand what I mean, compare the names of the checkpoint files with the residues written on screen. edit: I think this affects cudaPM1 too[/QUOTE] I think I fixed this as of r63, but I'll check tonight to make sure. Thanks. |
I haven't posted Windows executable since r62. I'll get updates done once you confirm it's fixed.
|
[QUOTE=LaurV;369312]... and you will understand...[/QUOTE]YES, it works! :smile: (98.66 % done)
|
[QUOTE=owftheevil;369317]I think I fixed this as of r63, but I'll check tonight to make sure. Thanks.[/QUOTE]
[QUOTE=flashjh;369318]I haven't posted Windows executable since r62. I'll get updates done once you confirm it's fixed.[/QUOTE] I am using 2.05, Jerry's compilations, and probably older than r62 (didn't update for a long time - if it worked why should I fix it? :smile:). You know I was really missing it, and I was [U]very upset[/U] when I found out (hey! I am the dude who argued hard about having the same naming scheme for checkpoints in cudaPM1, remember? and I got few punches even from Batalov for that :razz:) but I didn't go back to 2.04 because I also like the new keyboard options in 2.05 (increase, decrease things interactively - brilliant!) and the size/threads tuning mechanism, it saves me tons of manual work which I used to do with 2.04 to tune the ranges. So, don't get me wrong, 2.05 is a wonderful upgrade! kotgw guys! |
LaurV, your issue was fixed in r59. However, I did find a minor bug in threadbench which is up with r65. I'm sure Jerry will have Windows versions soon.
|
r65 Windows binaries posted to [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SourceForge[/URL]
|
Downloaded. Thanks. Some things could be better...
- the interactive t and T look very nice, but don't work. The iteration counter refuse to go under 10000, for example. This was working in the former version. - the file names are missing the ".txt" part, so some browser will not show the residues, which they consider is the "file extension" now. - the "tune" mechanism is gone, or I am enough stupid to fail for almost an hour to convince it to do some tuning. I tried -cufftbench x y z with x=y (it only does fft test, [B][U]overwriting[/U][/B] the file, c'mon! I am going berserk to this, luckily I have a backup, in the other folder, I worked one full day to make that file with the older version!) or with y<x as specified in the ini (it crashes completely!). |
Most of your problems are because I need to get off my ass and write some documentation.
-threadbench 1 8192 5 1 for example benchmarks fft lengths from 1 to 8192 found in <gpu> fft.txt. I have debate back and forth with myself whether to overwrite or append when doing a new cufftbench. The checkpoint interval cannot be less than the screen report interval. I'm open to suggestions for better ways to do these things. |
I've started an updated README. I'll email it to you later so you can use it.
|
[QUOTE=LaurV;369493]
. . . - the file names are missing the ".txt" part, so some browser will not show the residues, which they consider is the "file extension" now. . . . [/QUOTE] Isn't there a way to have all the files in the directory shown? I think its silly to call something a text file when it is not a file with text in it. Maybe use extension .cls? |
[QUOTE=owftheevil;369505]Isn't there a way to have all the files in the directory shown? I think its silly to call something a text file when it is not a file with text in it. Maybe use extension .cls?[/QUOTE]
Agree that is stupid to call them ".txt", I use totalcmd anyhow, as a file browser, so this does not affect me personally. Some ".ckp" (checkpoint), ".cls", or whatever, would be perfect. Agree that saving files should not be too often. In fact, what would work perfect for me it would be to have files saved every 200k, or 500k, or 1M iterations, but to be able to print on screen every 2k, 5k, 10k. That is because it give you a feeling that the program is doing something, the code is not "in the woods", but in the same time, saving files is a compromise between speed and hard disk space. Ideal we should save any iteration, so we could properly resume the DCs without wasting time and redoing the last 1M iterations every time when it crashes or it misses the partial residues (if you have them). But of course, this idea is not only stupid, but also absurd. Saving every 1M or 500k iterations (or 100k for huge expos) should be quite OK, and it should be quite OK to have this "non changeable", you put it in the ini file, or in the command line, and [U]it stays there![/U] If you need a new interval, you can change the ini file and restart (or restart with a different cmd line parameters). This should be perfect. So, the t/T key should not affect the number of iteration after which the checkpoint is saved. The t/T is only for the screen, and my complain was exactly that: pressing t/T has no effect on screen, the output is every 10k iterations, no mater what. In the beginning I want to evaluate an ETA for an expo, I can't wait till it does 100k iterations and I press "t" few times. Nothing happens, beside of the screen messages that the number is decreased. For the older versions, a checkpoint file was written [U]every time when a screen line was written[/U], and this is [U]still ok[/U], with the observation that [U]is limiting[/U]: you can't go too low with the number, without filling your HDD fast with garbage checkpoints, and slowing the things down (writing on disk is slow). But it was still ok, if there is no other way, we better have it as it was. It should be wonderful to have the syntax of the thread benching switch (why are 4 numbers? what is the forth?), this I am going to experiment immediately and urgent, when I will reach home in the evening (~6 hours left, lunch break now). And don't get angry with me when I am grumpy, is not disrespect. I respect you all for the good things you do here. BTW, I went last night to the cudaLucas' home page, I saw the list of "contributors", starting with Dubslow. All of you did a wonderful job, but people memory is short, and many, reading that page, may not remember that those changes starting from Dubslow, till today, are more or less "cosmetic", and actually is [U]msft [/U]who did cudaLucas... [SIZE=1] [/SIZE][FONT=Verdana][SIZE=1](Sorry if I look a grumpy, I mean I [U]am[/U] usually grumpy, but today more than other days, as I told to few on private, I had a small car accident, nothing serious, only few scratches of my car, one guy with a truck hit me from the back-right - in Thailand “right” is the driver’s side. My car insurance expired on March 11, I didn’t have time to renew it, because I was too busy at work, and of course, the truck guy didn’t stop - which is quite common for Thailand! I have the truck number, but people say that is not very useful, squeezing money from transportation companies, usually owned by some local lords, is impossible if you don’t have insurance – in the other case is not your business anymore, the insurance company will take care of it). [/SIZE][/FONT] [SIZE=1] [/SIZE] |
I am not at all angry with you, if fact I value your input highly.
Y and y switches in interactive mode increase or decrease the screen report interval. The fourth parameter in the -threadbench option affects which fft lengths are tested and what is printed on the screen. 1 tests every fft length in the <gpu> fft.txt file, 0 tests all reasonable fft lengths (greatest prime factor <= 7), higher numbers affect screen output and some exclude 32 and 1024 as tested thread values. I can't remember the exact details at the moment. Not all changes since msft quit working on CUDALucas have been cosmetic only. In fact many of the changes most users won't even notice. 1. Random bit shift. 2. Change fft length during the test. 3. Better fft and threads optimization. 4. Less memory useage, on device, host, and disk. 5. Smaller data tranfers between device and host, and between host and disk. 6. Faster kernels (thanks again George) 7. Ability to change most basic settings without restarting. |
[QUOTE=owftheevil;369548]
1. Random bit shift. 2. Change fft length during the test. 3. Better fft and threads optimization. 4. Less memory useage, on device, host, and disk. 5. Smaller data tranfers between device and host, and between host and disk. 6. Faster kernels (thanks again George) 7. Ability to change most basic settings without restarting.[/QUOTE] Wow! Does cudaLucas have so many features? :razz: (you know I am teasing you, right? hehe) I have added these lines to the ini file: [CODE] # Y -- increase screen output interval. # y -- decrease screen output interval. [/CODE] to the list of keyboard options. Tested and working. I have tested different screen output formats - working properly, no bug found. Currently testing the thread optimization, up to now it looks ok. I have changed in the ini file, to say '-threadbench" instead of "-cufftbench" where it was appropriate. I could attach the ini file to replace the one on sourceforge (unchanged since years! I have added different comments to it) but I have the feeling that Jerry will go through it anyhow, so I let him the pleasure :razz: Thanks a billion! Please do not overwrite the fft file. Adding to it should be better. Or give and option. I like to collect all the info, and eventually sort them (manually) in increasing order, I have a file with all sizes for "now I am not overclocking" or "now P95 is not running, but Aliqueit is running instead" and so on. The results are different, and to get maximum performance, I have to use the right size. I know this sounds nitpicking, well, it comes with the age... :blush: |
How about backing up the old fft.txt file? (I mean instead of overwriting it) Other routines depend on fft.txt being in increasing order.
|
[QUOTE=owftheevil;369553]How about backing up the old fft.txt file? (I mean instead of overwriting it) Other routines depend on fft.txt being in increasing order.[/QUOTE]
Perfect for me. Rename it like "(chip)fft_0.txt", "..._1.txt", on the same idea like [URL="http://www.mersenneforum.org/showpost.php?p=369388&postcount=27"]here[/URL] (second code box). 3 copies are enough, if the guy does dot realize after 2 times that the file is [STRIKE]overwritten[/STRIKE] renamed, that he is either stupid or he does not care. Then I/he can manually interleave and sort if I/he want(s). (edit: the only idea is to not lose a LONG fft file with all reasonable sizes inside, [U]without notification[/U] (like it is happening). Maybe I worked one full day to get that file and I don't have backup! I would be very angry than! :smile: - luckily I had more folders with the same content, having more of the same cards, and I had copies of the file in those folders, it may not always be the case) (edit 2: optimization of threads works very nice, and faster than the older version. The only unchanged thing is that the work is saved at the end, which may result in trouble if there is a crash, but here is no problem, this optimization is only done once in the lifetime, and it can be split in few consecutive small jobs, I mean I don't need to use "-threadbench 1 20480 6 0", but use 3-4 "splits". Which I was enough stupid not to think about, and the job took since the last post. Fortunately finished with success :smile:) |
Happening to me with GTX 570
Thanks for the restart batch file.
I am getting the API runtime errors, even with the latest beta build r65 (running toolkit 5.0 and latest 335.23 nvidia drivers )....however this is only happening on my gtx 570, not my 280. I have noticed that the 570 will run stable until I stop the job and go to mfaktc and then switch back to the LL job. It'll continue happening until I reboot my box. So far that seems to be what triggers the API errors for me. I have never seen this behavior on my 280 even when switching between cuda lucas and mfaktc. [QUOTE=flashjh;364436]r60 compiled and tested (still needs more). CUDA 4.2 up to 5.5 all working, release and debug. All posted to [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]SourceForge[/URL] This version (and r57 and up) include new rcb code from Prime95 that give about a 1% speed improvement! Exciting for CUDALucas, but does need testing, please. In my testing CUDA 5.5 and Win32 are slightly faster than earlier versions or x64 (but you may need a batch file to keep it going, see below) What works: -cufftbench -r -normal testing What Doesn't: -threadbench Didn't test: -memtest [U][B]For those experiencing stops: This is an nVidia driver issue. Here is some info and I included some workarounds[/B][/U] <=306.97 work with x86/x64 CUDA 4.2 and CUDA 5.0 builds perfectly fine and produces no restarts (at least none from my testing over several days). >=310.70 have resets no matter what platform/CUDA version including 5.5 with >=320.18. There are two workarounds for anyone experiencing a similar problem described by [URL="http://www.mersenneforum.org/showthread.php?p=362968#post362968"]mognuts[/URL]: 1) The best way to fix the error is to downgrade your driver to one of the versions <=306.97 as mentioned above. CUDA Driver Versions: [CODE]CUDA 5.5: CUDA 5.0 CUDA 4.2 331.82 19-Nov-13 314.22 25-Mar-13 301.42 22-May-12 331.65 07-Nov-13 314.07 18-Feb-13 296.10 13-Mar-12 331.58 21-Oct-13 310.90 05-Jan-13 295.73 21-Feb-12 327.23 19-Sep-13 310.70 17-Dec-12 285.62 24-Oct-11 320.49 01-Jul-13 [B]306.97 10-Oct-12[/B] 280.26 09-Aug-11 320.18 23-May-13 306.23 13-Sep-12 275.33 01-Jun-11 [/CODE] I did not actually test below 296.10 so I don't know where the CUDA changes over to < CUDA 4.2 but I figure most will be on 296.10 by now. Windows CUDALucas from CUDA 4.0 up to 5.5, 32 or 64 bit are on SourceForge Request: I need to know who else is having the *stop* issue and what driver and video card you have. I'm working with NVidia to try and get the drivers fixed, so it will be helpful to know what other cards have this issue. 2) The other 'fix' for this issue is to use a batch file similar to this: [CODE]@echo off Set count=0 Set program=CUDALucas2.05Beta-CUDA5.0-Win32-r60 :loop TITLE %program% Current Reset Count = %count% Set /A count+=1 rem echo %count% >> log.txt rem echo %count% %program%.exe GOTO loop[/CODE] This will restart CUDALucas each time it stops and allow you see how many resets have occurred, if you care. I have not been able to thoroughly test speeds yet; I know that CUDA 5.5 is usually faster, but at the cost of having the driver lockup. Combined with the batch file, there really is no issue other than if the restarts bother you as I've run many good DCs with the batch file. With <=306.97, you don't need the batch file and there are no restarts, but it could potentially be &slightly* slower. I would love to see actual test data from everyone. Also, if anyone does experience the *stop* while on <=306.97, please let me know ASAP so I can update this info and nVidia. As for reliability, I have completed many successful tests with 2.05 Beta, CUDA 4.0 up to 5.5, 32 and 64 bit. Many with a lot of stop and restarts and forced FFT size changes for testing the code. :smile:[/QUOTE] |
CudaLucas doesn´t work anymore
Round off error at iteration = 21463800, err = 0.5 > 0.40, fft = 3584K.
Increasing fft and restarting from last checkpoint. Using threads: square 128, splice 256. Continuing M62494429 @ iteration 21460001 with fft length 4096K, 34.34% done After some errors more, the programm stops. If I restart it tells at the end (example): Processing result: M( x )C, 0xy, offset = 6684, n = 4096K, CUDALucas v2.05 Beta, g_AID: A6ACCD2C719C7543871E42683998589C. I think, the result ist bad. |
Can you recall what the "more errors" were?
The root problem is most likely memory, at least that's the only time I see a roundoff error like that. But I don't know whats going on with the apparent output of a result after the errors. |
[QUOTE=owftheevil;370257]Can you recall what the "more errors" were?[/QUOTE]No, I cant The Logfile doesn´t exist anymore. I tried the next with the "savefile"-Option. Only if I reduce the memory-speed (-500 MHz) and the "Power Limit" (57%) -> GPU = 692 MHz, I get less errors. But what does "g_AID" mean? (last result: M( 62494429 )C, 0x7191357b114a13__, offset = 31262106, n = 4096K, CUDALucas v2.05 Beta, g_AID: EEFBC9895C77C54B1AC676621FFA____)
How can I see, how my "Computer must be proven reliable"? Ohhhh, I forget to log in in GIMPS and lost my result. 157 GHz-days! Now I found in my assginments the same exponent as double check! |
[QUOTE=MikeBerlin;370653]No, I cant The Logfile doesn´t exist anymore. I tried the next with the "savefile"-Option. Only if I reduce the memory-speed (-500 MHz) and the "Power Limit" (57%) -> GPU = 692 MHz, I get less errors. But what does "g_AID" mean? (last result: M( 62494429 )C, 0x7191357b114a13__, offset = 31262106, n = 4096K, CUDALucas v2.05 Beta, g_AID: EEFBC9895C77C54B1AC676621FFA____)
How can I see, how my "Computer must be proven reliable"? Ohhhh, I forget to log in in GIMPS and lost my result. 157 GHz-days![/QUOTE] PM user Prime95 and you will be helped. [QUOTE] Now I found in my assginments the same exponent as double check![/QUOTE] That's not useful to you. The second GPU result will be given no credit. |
[QUOTE=Batalov;370656]PM user Prime95 and you will be helped.[/QUOTE]
Thank very much for this hint. Maybe, he will solve my old Problem with M332,224,379 [QUOTE] That's not useful to you. The second GPU result will be given no credit.[/QUOTE]yes and this is gone allone. |
[QUOTE=Batalov;370656]PM user Prime95 and you will be helped.[/QUOTE]YES, he did it!:smile:Many thanks again.:smile:
|
hi! Sorry to jump in on this thread.
How efficient is the code running? [url]https://developer.nvidia.com/cuFFT[/url] I see there at bit larger transforms cuFFT gets at M2090 tesla efficiency of under 100 Gflop. Didn't checkout code yet - will soon. This Tesla delivers 666 Gflop. Not counting fused-multiply-adds (didn't check yet whether their code uses them - assuming not) then it's 333 Gflop. So efficiency of around 30%. How is efficiency there for CUDALucas at bit larger transforms? Interested in gpgpu fft for Riesel :) |
CUDALucas 2.05Beta r67 is [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]posted[/URL] for Windows. CUDA 4.2, 5.0, 5.5 and 6.0
CUDA 6.0 Libs are [URL="https://sourceforge.net/projects/cudalucas/files/CUDA%20Libs/"]here[/URL] [QUOTE]r67 just uploaded includes a facility for backing up fft.txt files with a timestamp. I've included a README and CUDALucas.ini with a few updates. The README has a rough draft of a new section on command line options and tuning. There are still many additionsand other changes to be made...[/QUOTE] |
1 Attachment(s)
[code]
CUDALucas_205Beta_CUDA6.0-x64_r67.exe -cufftbench 1 4096 1 ------- DEVICE 0 ------- name GeForce GTX 750 Ti Compatibility 5.0 clockRate (MHz) 1110 memClockRate (MHz) 2700 totalGlobalMem 2147483648 totalConstMem 65536 l2CacheSize 2097152 sharedMemPerBlock 49152 regsPerBlock 65536 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 2048 multiProcessorCount 5 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 2147483647,65535,65535 textureAlignment 512 deviceOverlap 1 Using threads: square 256, splice 128. [/code] |
CUDALucas 2.05Beta r68 is [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]posted[/URL]
Includes CUDA 4.2, 5.0, 5.5 & 6.0. The CUDA 6.0 version also has the new SM 3.2, though I don't know what card it's for yet. |
Following up, I know someone said there's an nvidia bug that causes this api reset issue. I'm also starting to wonder if this could also be a heat related. I pulled up EVGA Precision X while running cuda lucas and noticed the card was in the upper 80 degree celsius with the fan speed set to auto and fan just going around 30-60%. I've statically set my fan speed to around 70% which has the card running at a much cooler upper 70 celsius and I'm not seeing the card reset so far.
Also noticed when running mfakct that the fan on the card kicks up to high gear right away when starting up, I wonder if that's something that could be done in cuda lucas as well. [QUOTE=pdazzl;370117]Thanks for the restart batch file. I am getting the API runtime errors, even with the latest beta build r65 (running toolkit 5.0 and latest 335.23 nvidia drivers )....however this is only happening on my gtx 570, not my 280. I have noticed that the 570 will run stable until I stop the job and go to mfaktc and then switch back to the LL job. It'll continue happening until I reboot my box. So far that seems to be what triggers the API errors for me. I have never seen this behavior on my 280 even when switching between cuda lucas and mfaktc.[/QUOTE] |
[QUOTE=pdazzl;373362]Following up, I know someone said there's an nvidia bug that causes this api reset issue. I'm also starting to wonder if this could also be a heat related. I pulled up EVGA Precision X while running cuda lucas and [B]noticed the card was in the upper 80 degree celsius with the fan speed set to auto and fan just going around 30-60%.[/B] I've statically set my fan speed to around 70% which has the card running at a much cooler upper 70 celsius and I'm not seeing the card reset so far.
Also noticed when running mfakct that the fan on the card kicks up to high gear right away when starting up, I wonder if that's something that could be done in cuda lucas as well.[/QUOTE] In my experience, the nvidia default auto fan speeds are grossly low. It is possible that this began many driver versions back when there was a flurry of reports that one version was a card-killer. I use MSI Afterburner and set up a custom fan curve which maintains healthier temperatures. |
[QUOTE=flashjh;372389]CUDALucas 2.05Beta r68 is [URL="https://sourceforge.net/projects/cudalucas/files/2.05%20Beta/"]posted[/URL]
Includes CUDA 4.2, 5.0, 5.5 & 6.0. The CUDA 6.0 version also has the new SM 3.2, though I don't know what card it's for yet.[/QUOTE] Hi, does anybody know when there will be a stable version of CUDALucas 2.05 available? |
[QUOTE=HHfromG;374240]Hi, does anybody know when there will be a stable version of CUDALucas 2.05 available?[/QUOTE]
Take the beta, it works great. Be careful with the cuda version, the wrong one will result in few percents penalty in speed. |
Hi, I have already used the "beta version" together with CUDA 6.0 Toolkit. The increase of performance was about 8% compared with the same calculation using the CUDALucas_2.03 (stable) version. That leads me to the following question:
1) does PrimeNet and/or GIMPS accept results produced by a "beta" version ? 2) who decides when a beta version becomes a stable version and what are the criteria for this decision? Regards... |
[QUOTE=LaurV;374287]Take the beta, it works great. Be careful with the cuda version, the wrong one will result in few percents penalty in speed.[/QUOTE]
Hi, I have already used the "beta version" together with CUDA 6.0 Toolkit. The increase of performance was about 8% compared with the same calculation using the CUDALucas_2.03 (stable) version. That leads me to the following question: 1) does PrimeNet and/or GIMPS accept results produced by a "beta" version ? 2) who decides when a beta version becomes a stable version and what are the criteria for this decision? Regards... |
There are a couple of bugs that affect compute 3.0 and 3.5 cards with large (>4M) ffts and two short sections of the documentation I want to get fixed before 2.05 is released. I will actually have time to work on it starting the second week of June.
GIMPS does accept results from 2.05 beta. |
[QUOTE=owftheevil;374380]There are a couple of bugs that affect compute 3.0 and 3.5 cards with large (>4M) ffts and two short sections of the documentation I want to get fixed before 2.05 is released. I will actually have time to work on it starting the second week of June.
GIMPS does accept results from 2.05 beta.[/QUOTE] Hi, thank you for this information. Will the new 2.05 version also support the new features of the CUDA 6.0 Toolkit? Especially the concept of "Unified Memory" and "cufft as Drop in Library"? And - because I use 2 NVidia GTX 690 cards for CUDALucas - I would be very much intrested in a version that supports "Multri-GPU Scaling" wich is also a new feature of the CUDA 6.0 Toolkit. |
HHfromG, none of the new 6.0 features seem to be particularly useful for CUDALucas. The unified memory would make some of the code simpler, but would not otherwise give any improvements. There are very few host<->device memory tranfers going on. CUDALucas already uses CUFFT for all the ffts and the slowness of device<->device memory transfers makes multi-gpu ffts impractical.
|
I'm highly confused with what version CUDALucas I should be using. I have a GTX 295 (Tesla based dual GT200b chips), The PDFGuide shows only CL v2.03 CUDA 3.2 & SM 13 (Shader Model 1.3?) is for GPUs older than GF110 Fermi chips. The readme also states not to use Alpha or Beta releases. I have not seen a CL v2.05 with CUDA 3.2 & SM 13 at all, v2.04 is no where to be found online and there is no list of supported hardware per version either. I would appreciate some advice, thank you.
|
Welcome to the forum. You can use the current version of 2.05. You won't find CUDA 3.2 builds because the current code uses functions that are not available in CUDA 3.2. Use the version that matches your driver, to get started. If your driver is up-to-date then CUDA 6.
If something is preventing you from using a driver that supports CUDA 4 or higher , let me know and I'll see what I can do. Let us know if you have any other questions. |
Could anyone tell me why I was receiving negative ETA times for first time LL testing on version 2.03? I have not tested with the 2.05beta yet as I'm still prepping to launch it but I am curious what negative times actually mean. I don't recall observing this phenomenon for LL Double Checking.
While Device 0 was testing I had Device 1 double checking a different exponent with a 2nd instance of CL 2.03 [Code] ------- DEVICE 0 ------- name GeForce GTX 295 totalGlobalMem 939524096 sharedMemPerBlock 16384 regsPerBlock 16384 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 512 maxThreadsDim[3] 512,512,64 maxGridSize[3] 65535,65535,1 totalConstMem 65536 Compatibility 1.3 clockRate (MHz) 1242 textureAlignment 256 deviceOverlap 1 multiProcessorCount 30 Continuing work from a partial result of M61421179 fft length = 3670016 iteration = 5647384 Iteration 5700000 M( 61421179 )C, 0xe98ce4744ffa0fe4, n = 3670016, CUDALucas v2.03 err = 0.0898 (20:41 real, 12.4082 ms/iter, ETA 191:58:57) Iteration 5800000 M( 61421179 )C, 0xbe9a561ea78e8a99, n = 3670016, CUDALucas v2.03 err = 0.0898 (38:17 real, -19.9787 ms/iter, ETA -18513:-37) Iteration 5900000 M( 61421179 )C, 0xf0b569485eb57f09, n = 3670016, CUDALucas v2.03 err = 0.0898 (38:39 real, -19.7642 ms/iter, ETA -18281:-55) Iteration 6000000 M( 61421179 )C, 0x751a61c23ceecb0a, n = 3670016, CUDALucas v2.03 err = 0.0898 (38:41 real, -19.7384 ms/iter, ETA -18225:-8) Iteration 6100000 M( 61421179 )C, 0xbeffdc9a52d574ea, n = 3670016, CUDALucas v2.03 err = 0.0898 (38:42 real, -19.7337 ms/iter, ETA -18187:-55) Iteration 6200000 M( 61421179 )C, 0xa1578637f58a58b0, n = 3670016, CUDALucas v2.03 err = 0.0898 (38:43 real, -19.7219 ms/iter, ETA -18144:-11) Iteration 6300000 M( 61421179 )C, 0x9898ce608d2536fc, n = 3670016, CUDALucas v2.03 err = 0.0938 (39:03 real, -19.5186 ms/iter, ETA -17924:-36) Iteration 6400000 M( 61421179 )C, 0x5cdf0c2904e43abd, n = 3670016, CUDALucas v2.03 err = 0.0938 (1:19:06 real, 4.5072 ms/iter, ETA 68:51:35) Iteration 6500000 M( 61421179 )C, 0x86dd1dad33362fcb, n = 3670016, CUDALucas v2.03 err = 0.0938 (52:37 real, -11.3751 ms/iter, ETA -10408:-11) Iteration 6600000 M( 61421179 )C, 0x843e48dc6e7c1cc6, n = 3670016, CUDALucas v2.03 err = 0.0938 (39:57 real, -18.9785 ms/iter, ETA -17333:-41) Iteration 6700000 M( 61421179 )C, 0x23f5bcc088e1b383, n = 3670016, CUDALucas v2.03 err = 0.0938 (39:44 real, -19.1004 ms/iter, ETA -17413:-12) [/Code] |
kinda old bug.. we thought it was solved...
edit: it does not affect your results. |
Are you using the newest version of 2.03? I though we fixed that also?
Anyway, use the new 2.05 just make sure you finish your current exponent as the savefiles/checkpoints are not compatible. [FONT=Calibri][SIZE=3]owftheevil, ready to get 2.05 out of beta?[/SIZE][/FONT] |
Yes, although the current form of the README file is not entirely accurate.
|
[QUOTE=owftheevil;378139]Yes, although the current form of the README file is not entirely accurate.[/QUOTE]
I can work the README file, did you have any more changes to incorporate before it's out of beta? |
No, expect to take the "beta" out of the program variable.
|
[QUOTE=owftheevil;378139]Yes, although the current form of the README file is not entirely accurate.[/QUOTE]
+1 found out the hard way when the settings didn't all work, also most parameters inside CUDALucas.ini have no effect for v2.05beta; some require a different syntax structure than what is instructed inside the ini file. |
Recent versions of the README and CUDALucas.ini are in the code directory on SourceForge. The README inaccuracies I'm referring to are more of a historical nature.
|
So now I am supposed to compile all the files in the code directory to get v2.05 windows or is v2.05Beta identical to v2.05 as in no bugs discovered in the Beta so no changes were madea for the release?
|
The windows versions are identical, the linux version is a little out of date.
|
well for starters v2.05Beta does not respond to the DeviceNumber= parameter inside ANY version of the CUDALucas.ini file.
|
Thanks for pointing that out. Now fixed with r71. You notice any other problems?
|
sure probably quite a few, I see ErrorIterations= near the top and RoundOffTest= near the bottom, the descriptions for both seem to mean the same setting so are these two redundant or is one an ignored parameter from a previous version and the other the newer currently used parameter?
Also in that same upper section the following: [QUOTE]# ErrorIterations tells how often the roundoff error is checked. Larger values # give shorter iteration times, but introduce some uncertainty as to the actual # maximum roundoff error that occurs during the test. Default is 100. # ReportIterations is the same as the -x option; it determines how often # screen output is written. Default is 10000. # CheckpointIterations is the same as the -c option; it determines how often # checkpoints are written. Default is 100000. # Each of these values should be of the form [U][B]k * 10^n with k = 1, 2, or 5.[/B][/U] ErrorIterations=100 ReportIterations=10000 CheckpointIterations=100000[/QUOTE] The underlined & bold portion, is that syntax correct? It seems to be instructing that the values entered should be single digit integers instead of the multi-digit values already entered by default. I find it misleading. |
Those control different features. ErrorIterations tells how often the roundoff errors are checked. RoundOffTest determines if an initial roundoff test is done at the beginning of each test.
Edit: would "... of the form p=k*10^n ..." be clearer? |
Unfortunately no it doesn't make things clearer. When I see that syntax it is prompting me to type a value matching that entire syntax structure; a formula instead of just an integer. The default value tells me I should only put an integer. Then there is nothing indicating what the different variables in the formula represent (what "k" is substituting, what "p" is substituting, what "n" is substituting). So I don't know what letter is meant to be the 100 value or the 10000 value or the 100000 value? Or if I'm being instructed to type a value into each parameter that looks like the structure of "p=k*10^n". Maybe it means (p)rime=(k)onstant*10^(n)umber? Then what is "n" supposed to be? Ever confusing to me.
I took a quick look inside the r71 ini file, looks like the new default is devicenumber=1 not 0 anymore? Well I know in the beta setting it to 0 or 1 or 2 doesn't change the device. Maybe in a few days I'll be prepped to try out the non-beta and see if 0 and 1 work. Not asking to make anymore changes to that parameter, just want to know if you meant for the default to now be 1 instead of the former 0. thanks |
Thanks again for you input. No the default device is not supposed to be device 1, thats just what I left it at after testing the fix.
So does k*10^n with k = 1, 2, or 5 and n a non-negative integer make more sense? |
[QUOTE]
# Threads is the same as the -threads option. This sets the number of threads # used in the multiplication and splice kernels. Each of the two values must # be 32, 64, 128, 256, 512, or 1024. (Some FFT lengths have a higher minimum than 32.) # These vaules will be used only if no <gpu> threads.txt file is present or no entry # for the current exponent is in that file. The file is generated by running # # ./CUDALucas -threadbench s e i [B][U]m[/U][/B] # # This will time i repetitions of a 50 ll iteration loop, for certain fft # lengths between s * 1024 and e * 1024. The parameter m gives some control # over which fft lengths are tested, which thread values are tested, # and screen output: # bit 0: if set, only fft values from <gpu> fft.txt will be tested, # otherwise, all reasonable fft lengths will be tested. # bit 1: if set, skips thread value 32. # bit 2: if set, skips thread value 1024. [B][U]# bit 3: if set, supresses intermediate output: only the optimal # thread values for each fft will be printed to the screen.[/U][/B] # E.g. # # ./CUDALucas -threadbench 1 8192 5 [B][U]10[/U][/B] # # tests all reasonable (7-smooth multiples of 1024) fft lengths from 1k to 8192k # using thread values 64, 128, 256, 512, and 1024, [B][U]supressing intermediate output.[/U][/B] [/QUOTE] The underlined & bold looks like inaccurate description to me. Sorry if I seem like an annoying stickler, & thanks for all your hard work. [QUOTE=owftheevil;378537]Thanks again for you input. No the default device is not supposed to be device 1, thats just what I left it at after testing the fix. So does k*10^n with k = 1, 2, or 5 and n a non-negative integer make more sense?[/QUOTE] I'm sorry I'm having trouble explaining myself clearly. The syntax in itself is clear or understandable but what I find unclear is what the syntax/formula, especially the letter variables, have to do with the 3 parameters/parameter values. What is K? What is n (aside from its requirement to be a non-negative integer). Maybe another way of stating my question is, I understand what values or type of values are allowed in those letter variables; and I understand how they fit into the stated formula. I just do not know what each of those letter values are representing with respect to parameters in the ini file, or where that formula fits into this whole system. Could "k" be a substitute for [I]each[/I] parameter (ErrorIterations=k [I]then[/I] ReportIterations=k [I]then[/I] CheckpointIterations=k)? |
10 = 1010 in binary. Bit 0 is not set, so all reasonable lengths are tested. Bit 1 is set so thread value 32 is skipped. Bit 3 is 0 so thread value 1024 is included. Bit 4 is 1 so intermediate output is suppressed.
|
AHHH, that clears up the threadbench parameters, I was reading it as 1-0 binary number=2 in decimals.
|
:redface:
[QUOTE] # ErrorIterations tells how often the roundoff error is checked. Larger values # give shorter iteration times, but introduce some uncertainty as to the actual # maximum roundoff error that occurs during the test. Default is 100. # ReportIterations is the same as the -x option; it determines how often # screen output is written. Default is 10000. # CheckpointIterations is the same as the -c option; it determines how often # checkpoints are written. Default is 100000. # Each of these values should be of the form [B][U]k * 10^n with k = 1, 2, or 5[/U][/B]. ErrorIterations=100 ReportIterations=10000 CheckpointIterations=100000 [/QUOTE] [QUOTE=owftheevil;378537]... So does k*10^n with k = 1, 2, or 5 and n a non-negative integer make more sense?[/QUOTE] the formula could read: ParameterValue = k * 10^n or ParameterValue = k * 10ⁿ or ParameterValue = (1, 2, or 5 only) * 10^n or ParameterValue = (1, 2, or 5 only) * 10ⁿ, with k = 1, 2, or 5 [I][U]only[/U][/I], and n = a non-negative integer (like you stated). I never could see how the formula related to anything until I tried changing the values and then I could see those 3 parameters only accepted values of 1, 2, or 5 followed by any amount of zeros. |
Possible bug. I got a used GTX 580. Installed it and it was running a 1792K FFT double-check using 256 / 128 threads. Then I saw your cool new feature and ran:
./CUDALucas -threadbench 1536 2048 5 10 This created a file suggesting 1792K run with 64 / 64 threads. This combination turns out to be 10% slower than the 256 / 128 combination I was using. Needless to say, I deleted the threadbench output and resumed work. |
Another bug? The ETA is meaningless (to me). I have 29 million iterations left at 2.47 ms/iter. It says the ETA is in 42 hours. Is this because I did the first 8% of the test on a GTX 460?
It also appears that my iteration times go from 2.44ms to 2.85ms when the display shuts off. Does anyone have advice on how to keep the GPU going full throttle (Windows 7)? |
[QUOTE=Prime95;379011]Another bug? The ETA is meaningless (to me). I have 29 million iterations left at 2.47 ms/iter. It says the ETA is in 42 hours. Is this because I did the first 8% of the test on a GTX 460?
It also appears that my iteration times go from 2.44ms to 2.85ms when the display shuts off. Does anyone have advice on how to keep the GPU going full throttle (Windows 7)?[/QUOTE] You could have a look at the Power settings. Control Panel\All Control Panel Items\Power Options It defaults to "Balanced" but this may include things which happen when the display goes to sleep. Go into 'Change Plan Settings' and pick your way through all the layers. I am not sure of system specifics on Intel, but that's where to look. |
[QUOTE=Prime95;379010]Possible bug. I got a used GTX 580. Installed it and it was running a 1792K FFT double-check using 256 / 128 threads. Then I saw your cool new feature and ran:
./CUDALucas -threadbench 1536 2048 5 10 This created a file suggesting 1792K run with 64 / 64 threads. This combination turns out to be 10% slower than the 256 / 128 combination I was using. Needless to say, I deleted the threadbench output and resumed work.[/QUOTE] Try ./CUDALucas -threadbench 1792 1792 50 2 This will test only fft 1792k and show the timings for each thread combination. If the card is running a display, the 50 iterations will help smooth things out from the display's interuptions. The results of the threadbench do vary from run to run but I haven't seen them come up that far off before. In interactive mode, entering the character "n" will reset the timer so that the eta is meaningful for the new card. |
Thanks. The card is running a display and things have settled down now. Windows must have been doing some strange bookkeeping due to the new card and new driver. CUDALucas ran overnight at full speed and this morning threadbench produced more meaningful results.
|
I don't see why it is required to download a [B][U][I]1.1GB[/I][/U][/B] installation file (CUDA v6 toolkit) just to create the CUDALucas v2.05 non-beta? This seems very discouraging to non-developpers. As if the extra download & file size wasn't discourraging enough I also need to compile this software using a program I've never heard of. I would understand needing to do this as a performance tweak maximizer, but I think the programmers and developpers should provide a compiled stable compatible (even if it's not the most efficient assembly possible), that gets all users up and running ASAP. Not so savy users and users that can't spare the time to download extra files and compile everything, only care about the quickest/simplest way to participate.
I've finally fully migrated to v2.05Beta but each time I try to complete one more step at getting a v2.05 non-beta ready & running, it seems each step will take me a few days to find time to complete it. |
ok having read this page ([url]http://gnuwin32.sourceforge.net/packages/make.htm[/url]) from the v2.05 readme file, I am completely lost, I don't know what I'm supposed to be doing to create the non-beta program. Can someone provide some direction. I just see lots of references to things I may need, or steps that might need to be taken but nothing that clearly states exactly what I should do and which files or packages I specifically need. I'm not a developper and never programmed anything this complex before.
|
I've discovered another bug in v2.05Beta r68. When the display driver stops responding and resets/recovers, CL restarts testing from the last report iteration not from the last checkpoint iteration. This was causing me to lose far too many hours of processing because I had set my screen report iteration to 10x the checkpoint iteration number. The checkpoints absolutely work when you close CL and relaunch but the CL code is not using the checkpoint when the driver stops responding and recovers. I've already lost 24hrs worth of processing across a few exponent tests with this bug.
|
Thanks for pointing this out. I had assumed that checkpoints would be less frequent than screen reports, so this was intended to save time in case the cufft hang bug manifested. I'll set that to go with whichever is more frequent.
|
I have seen an unexpected format of CUDALucas results submitted to the new manual_results form:[quote][color=red]M66612345[/color], 0xf14280335dcba098, offset = 32109876, n = 3584K, CUDALucas v2.05 Beta, AID: B858C7F634B865CB75587ADCFFFFFFFF[/quote]Generally looks "normal", except the first part of the line shows "[b]M66612345[/b]" instead of the expected "[b]M( 66612345 )C[/b]"
Is this an accepted variant output of any version of CUDALucas, or was this just something the user edited before submitting? |
No, it is not, all versions of cudaLucas I know will show parenthesis. Someone played with the results (?!)
If you didn't mask the exponent, it ends in 5 :razz: |
That's what I figured. And yes, I masked the exponent, residue, offset and AID :smile:
|
I haven't made any changes to the code.
|
There was a version available for a few days about a year ago that used that format. People preferred the M (xxxxxxxx) C format for various reasons so it got changed back.
|
I've a request:
A user recently submitted the final line of screen output and then later submitted the line from results.txt. The screen output did not contain the offset=XXX value. Consequently, the server thought this was 2 separate CUDALUCAS runs (one with offset zero, one with a non-zero offset) and marked the exponent verified. Could the screen output be changed to match the results.txt output? |
It could, but exactly what the output is now depends on how the user sets it up. I haven't talked with Jerry about this, but personally I would prefer to see something more along the lines of what mprime does for verification of results.
|
I'm with owftheevil here.
That [edit: I think he talks about the CRC/secret stuff] would also allow us to "more accurately" validate LL and DC from the same machine, assuming the shift is not the same. The user must still be able to adjusts the format of its screen output. Not all people like 150 characters wide cmd prompt windows. For right now (James?), I would just reject lines that come without shift value, or not consider them a valid DC; all current versions of cudaLucas have the shift value in the reports. |
Hi,
I upgraded from [i]CUDALucas-2.04 Beta-4.1-sm_21-x64[/i] to [i]Release_CUDALucas_205Beta_CUDA4.2-x64_r68[/i] and now my display driver bombs after a few hours under CudaLUCAS. I get “Display driver nvlddmkm stopped responding and has successfully recovered” message in the events log. Perhaps I use wrong combination of libraries? Would you please recommend the exact version numbers of CudaLUCAS and CUDA dlls for [i]GTX580[/i] under [i]64-bit Windows 7[/i]? Thank you. |
This is an old issue, the "stability" depends on your card and driver versions.
It was discussed in the past in this thread or relatives. No matter which drivers and compute capability you use, it will still crash occasionally, especially if you have "mixtures" of CC in your system (like I have gtx580 and Titan, CC 2.0 and 3.x), or when some other application makes a large video memory request, regardless what you do. My solution is to pick up the drivers and libraries which give me the fastest speed, and launch cudaLucas from a batch like [CODE] :label start cudalucas /low /blahblah /alltheotherparameters goto label [/CODE] In this way, when it crashes, it only loses the work done from the last checkpoint. The loop ensures that the work is resumed according with the worktodo and the ini file, and the card doesn't stay idle until you have time to attend to it. Of course, if it crashes too often (I would say that two times per a 65M LL test is "too often") then you have to look for better drivers. You can stop it with ctrl/c as usual. |
1 Attachment(s)
Is there a common benchmark for CUDALucas?
I just ran './CUDALucas -d 0 -threadbench 1024 8192 1 0' on my reference/stock GTX 980.[LIST][*]CUDALucas r71[*]NVidia driver 343.22[*]CUDA toolkit V6.5.12[/LIST]Attached is the full screen output. Summarized output (FFT sizes {5..8} * 2[SUP]n[/SUP]) [CODE] ------- DEVICE 0 ------- name GeForce GTX 980 Compatibility 5.2 clockRate (MHz) 1215 memClockRate (MHz) 3505 [...] fft = 1024K, min time = 1.7851 ms, square: 128, splice: 512 fft = 1280K, min time = 2.3391 ms, square: 32, splice: 128 fft = 1536K, min time = 2.7773 ms, square: 32, splice: 64 fft = 1792K, min time = 3.2179 ms, square: 32, splice: 64 fft = 2048K, min time = 3.3182 ms, square: 32, splice: 128 fft = 2560K, min time = 4.5310 ms, square: 32, splice: 128 fft = 3072K, min time = 5.4528 ms, square: 128, splice: 32 fft = 3584K, min time = 6.2656 ms, square: 64, splice: 128 fft = 4096K, min time = 6.6292 ms, square: 128, splice: 64 fft = 5120K, min time = 8.9248 ms, square: 32, splice: 128 fft = 6144K, min time = 10.7820 ms, square: 32, splice: 256 fft = 7168K, min time = 12.4605 ms, square: 256, splice: 512 fft = 8192K, min time = 13.4926 ms, square: 64, splice: 64 [/CODE] Oliver |
[QUOTE=TheJudger;383727]Is there a common benchmark for CUDALucas?
[/QUOTE] Might not be the most thorough of benchmarks(since FFT benchmark will show timings for different FFT sizes), but may I suggest running CUDALucas against M(6,972,593) ? It's small, so the prime verification process will not take long. |
Is there a way to, in essence, back off the speed of CUDALucas? I get crashes pretty constantly using the latest NVIDIA drivers, and my GPU temperatures venture into the upper 80s. I would like to run CUDALucas during the day when I'm not using the computer, but I don't want to be pushing my GPU into high temperatures for hours at a time. Thanks!
|
Run with -polite [I]n[/I] where [I]n[/I] is an integer to 'delay' CUDALucas. Try 50 to start and adjust from there :smile:
|
Does an increase in the "polite" setting equal a longer delay?
Edit: Well, looks like regardless of the answer, everything is much more stable. Thanks for the help! |
QuickStart for Win version of CUDALucas/CUDA MP1 ??
I've downloaded several versions of CUDA MP1 and a couple of versions of CUDALucas -- Debug_CUDALucas_205Beta_CUDA4.2-Win32_r68.exe and Release_CUDALucas_205Beta_CUDA4.2-Win32_r68 -- but when I try to run any of these I just get the console window for a split second and no output, either files or text in the console. Earlier, I was getting a message that a .dll file was missing, but I downloaded the .dll's and that seemed to fix that. But now I can't get the programs to run for more than a split second.
Almost all of the posts about CUDA programs I see are for running under UNIX/LINUX. I happen to have a Wintel box in my office with a GeForce 7300 LE GPU which appears underutilized ... can anyone give me a "GPU computing under Win7 for Dummies"-style instruction sheet for getting LL, P-1, or trial factoring started on this machine ? (Windows 7 64-bit, Core2 6320) Please keep in mind you're talking to a non-programmer (OK, I studied a little C, I use Linux at home, but *not* a blooded programmer). GPU-Z gives me some info about this card, but doesn't indicate it's CUDA-capable. I tried installing the drivers from nVidia but this didn't seem to change anything. :( thanks, MF |
Open a command prompt first in the location of your executable and then run it from the command-line, that way you'll see what error message is spit out without the window disappearing.
Your 7300 LE won't be any use unfortunately, if I recall correctly nothing lower than GeForce 8600 supports any form of CUDA (and some things won't play nice before the 200-series). |
Still not stable
I've been trying to get CUDALucas running on my GTX 570 on and off for a while now. mfaktc works perfectly (and even with a slight overclock). I've tried various settings for -polite (from 0 to 1000). I got a new power supply to ensure that it wasn't a power draw issue. Temperatures while running CUDALucas aren't any higher than those for mfaktc, so I don't think it's that.
I've also compiled CUDALucas myself (under Windows 7 with VS2012) using CUDA 7.0. Neither it, nor the binary that uses CUDA 5.0 (the most recent release 2.05) works consistently. After a variable amount of time (sometimes < 10,000 iterations, sometimes >200,000 iterations), CUDALucas errors with: [CODE]D:/CUDA/CUDALucas/SourceForge/CUDALucas.cu(1878) : cudaSafeCall() Runtime API error 6: the launch timed out and was terminated. Resetting device and restarting from last checkpoint. Using threads: square 512, splice 1024. D:/CUDA/CUDALucas/SourceForge/CUDALucas.cu(1049) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.[/CODE] I don't think it's the card itself having issues either--I used the EVGA Overclock Testing Utility and ran both the GPU core burner and the GPU memory (1024 MB) burner. Only ran them for a few minutes, but there were no artifacts and no driver crashes. Any ideas on what I could look at? |
What are your core and RAM clocks?
Brand and model? The latter questions are less important, but the speed the memory is running at can make all the difference. If I am repeating myself, sorry. Can I assume that CuLu can complete -st2? |
[QUOTE=kladner;392732]What are your core and RAM clocks?
Brand and model? The latter questions are less important, but the speed the memory is running at can make all the difference. If I am repeating myself, sorry.[/QUOTE] That would be my guess, too. mfaktc doesn't use the memory at all really. I might try clocking down the memory and see what happens. |
Core and RAM are at stock values (797 Mhz and 975 Mhz, respectively, according to GPU-Z). It's an EVGA 570 GTX base model (so no factory overclocking or anything like that).
I'm not sure what -st2 is, but I usually (though not always) am unable to complete a full -threadbench or -cufftbench. Full being from 1 to 8192 with 2-5 repetitions. I'll try downclocking the memory by 50 and see if that helps. |
Sounds a lot like the driver bug that showed up shortly after the 300.?? driver and affects 570s, 580s, and I presume, 590s. I don't think Nvidia has any inclination to fix it. The only solution is to use old drivers and thus old cuda libraries. If it is worth your effort, I know the 295 dirvers with Cuda 4.2 work. flashjh knows a more precise cutoff point.
|
That seems like it might be it. Dropping the memory by 50 didn't help. It's a shame, really.
|
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.