![]() |
Hello!
I'm doing this test first time and I have a trouble. After completing a main test with a 800k+ iterations, program (cudapm1_win64_20130902) stopped for a while at 'starting stage 1' and then the window disappeared. There is nothing in the 'results'. There is still 'Pfactor=N/A,1,2,61262347,-1,73,2' in 'worktodo'. I tried to run test again and get a screenshot. That it is: [URL="http://radikal.ru/fp/c58ac17661a3439a8740472689016ec5"][img]http://s001.radikal.ru/i196/1309/02/4fcaedfc97ac.jpg[/img][/URL] There is [B]still[/B] nothing in the 'results'. There is [B]still[/B] 'Pfactor=N/A,1,2,61262347,-1,73,2' in 'worktodo'. And I have 'c61262347s1' and 't61262347s1' files in 'D:\CUDA_P-1' each 7479KB. System: Win7 x64, 6Gb RAM, Nvidia GeForce GTX 780 with 980Mhz Core and 3Gb Memory. CUDA 5.5 is also installed. What I did wrong? Or it is a program bug? What should I do to fix it? UPD: There was wrong exponent. I'll try again with right exp. (62980369) and then reply. |
[QUOTE]One more question: is "SaveAllCheckpoints" still disabled? I did a trial run of a live assignment, hoping to compare checkpoints with P95 on the same exponent, but no checkpoints were saved.[/QUOTE]
Yes its still disabled. In any event, I woud be surprised if anything matched between the checkpoints. [QUOTE]Hello! I'm doing this test first time and I have a trouble. After completing a main test with a 800k+ iterations, program (cudapm1_win64_20130902) stopped for a while at 'starting stage 1' and then the window disappeared. There is nothing in the 'results'. There is still 'Pfactor=N/A,1,2,61262347,-1,73,2' in 'worktodo'. I tried to run test again and get a screenshot. That it is: [URL]http://s001.radikal.ru/i196/1309/02/4fcaedfc97ac.jpg[/URL] There is [B]still[/B] nothing in the 'results'. There is [B]still[/B] 'Pfactor=N/A,1,2,61262347,-1,73,2' in 'worktodo'. And I have 'c61262347s1' and 't61262347s1' files in 'D:\CUDA_P-1' each 7479KB. System: Win7 x64, 6Gb RAM, Nvidia GeForce GTX 780 with 980Mhz Core and 3Gb Memory. CUDA 5.5 is also installed. What I did wrong? Or it is a program bug? What should I do to fix it? UPD: There was wrong exponent. I'll try again with right exp. (62980369) and then reply. [/QUOTE] The version of the program I posted was compiled with cuda toolkit 5.0. That might be casuing the problem. I'll try to get a 5.5 version up soon. What iteration times are you getting with the 780? |
[QUOTE]Yes its still disabled. In any event, I woud be surprised if anything matched between the checkpoints.
[/QUOTE] OK. Thanks very much. I am "clearly" unclear on the details. :smile: I am glad of information which keeps me from pursuing spurious correlations. |
[QUOTE=owftheevil;352439]The version of the program I posted was compiled with cuda toolkit 5.0. That might be casuing the problem. I'll try to get a 5.5 version up soon. What iteration times are you getting with the 780?[/QUOTE]
Last night I successfully completed this test [U]with default settings in ini-file[/U] and with a correct exponent, [B]BUT[/B] there were 5 or 6 times drops of the program [maybe because of overheating, but I'm not sure (t=80[SUP]o[/SUP]C)] Also I think that I know what a problem was yesterday. I had Prime95 running at all 4 cores of my CPU. So 'stage 1 gcd' had not enough CPU-time to get. When I started the second test with a correct exponent, I switched off 2 cores in Prime95. But, as I already said, there were still some drops of the program (caused maybe either overheating or my CUDA 5.5 instead of 5.0). And one more thing that could be useful. 'Drops of the program' at the 2nd test were only at the 1st part of test, where were many iterations. |
More error 30s
I have been experimenting with CUDAPM1, on a GTX 570. I had it throttled back from the factory OC of 845 core, 1900 VRAM, to 830 core, 1700 VRAM. At least twice I have gotten this error-
[QUOTE]C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(1131) : cudaSafeCall() Runtime API error 30: unknown error. [/QUOTE] I then decided to go back to CuLu, since I have never found the stable speed point for this card. Overnight it quit. Unfortunately, it was running from a batch, so the prompt window closed with the program. I restarted manually at the same clocks, and after a while got this error- [CODE]CUDALucas.cu(693) : cudaSafeCall() Runtime API error 30: unknown error. [/CODE] I have now turned the card down to 810 MHz core, 1600 MHz VRAM. I have restarted CuLu to see if the error still happens. Any thoughts or suggestions? I have searched out and read parts of threads which discuss 'error 30 unknown', but I could not be sure if there has been a conclusion as to the cause or a remedy. |
I haven't run the diagnositcs in Windows, but the related error in linux is caused by the driver stepping on cufft's toes. Nivida is aware of this, several other programs have been seeing similar errors. What they have in commom is double precision ffts repeated a large number of times. Hopefully there will eventually (soon?) be a fix from Nvidia.
To workaround the problem, I've been running CUDALucas and CUDAPm1 from a shell script that loops on a non-zero exit value. |
[QUOTE=owftheevil;352526]I haven't run the diagnositcs in Windows, but the related error in linux is caused by the driver stepping on cufft's toes. Nivida is aware of this, several other programs have been seeing similar errors. What they have in commom is double precision ffts repeated a large number of times. Hopefully there will eventually (soon?) be a fix from Nvidia.
[U]To workaround the problem, I've been running CUDALucas and CUDAPm1 from a shell script that loops on a non-zero exit value.[/U][/QUOTE] OK. Thanks. I remember that approach, now that you mention it. I have currently rolled back the Windows graphics driver to 314.22. I wonder if one of the earlier versions I have would have better odds of stability. |
1 Attachment(s)
You have to go back to < 300 drivers to avoid this problem.
I just spent a couple hours messing around in Windows. The error (Unknown Error 30) or whatever it was, showed up under the same circumstances the timeout errors show up in Linux. This is a version of Cudapm1 for cuda toolkit 5.5. It has a cufftbench option which evokes the error somewhat frequently. Run cufftbench with eg: [CODE]CUDAPm1.exe -cufftbench 2 8192 5 [/CODE] The first argument is the starting fft length, the second is the end length, and the 5 is the number of passes it will make. I've never made it through 20 passes without the error occuring. |
[QUOTE=owftheevil;352538]You have to go back to < 300 drivers to avoid this problem.[/QUOTE]
Rats. I have versions back to the 280's, but I don't think current mfaktc will run on those. As I remember, the 290's had problems. I have 301.42 in right now. Guess I'll stick with it for the time being, since I just did the clean-and-reinstall routine to put it there. [QUOTE]I just spent a couple hours messing around in Windows. The error (Unknown Error 30) or whatever it was, showed up under the same circumstances the timeout errors show up in Linux. This is a version of Cudapm1 for cuda toolkit 5.5. It has a cufftbench option which evokes the error somewhat frequently. Run cufftbench with eg: [CODE]CUDAPm1.exe -cufftbench 2 8192 5 [/CODE]The first argument is the starting fft length, the second is the end length, and the 5 is the number of passes it will make. I've never made it through 20 passes without the error occuring.[/QUOTE] It's good to know the cufftbench parameters. Thanks. |
1 Attachment(s)
The latest version of cudapm1 now up at sourceforge has optimizations for fft selection. To get it to work, you need to first run
[CODE]./CUDAPm1 -cufftbench n1 n2 p[/CODE] where n1 is the starting fft length (in KB), n2 is the end length, and p is the number of times it will repeat the test for each length. I usually run [CODE]./CUDAPm1 1 8196 1 [/CODE] It is important that n1 and n2 be powers of 2. It will run and give results otherwise, but the lengths in the output file are unlikely to all be optimal. What this does is generate a list of optimal fft lengths for you card, which will be used in any subsequent tests instead of the default lengths. If a particular fft length is going to be used often, it is a good idea to also run [CODE]./CUDAPm1 -cufftbench n n p [/CODE] where n is the fft length you will be using. This finds optimal thread values and can improve the iteration times by a few percent. Once this is set up, you shouldn't have to speccify any fft length in the command line or ini file unless you have a particular need to run with other fft lengths. As far as I know, there are no major problems with this version. I haven't yet looked into the occasional inability to write save files that Kladner reported, it sometimes writes some meaningless information to the screen, and excessive stage 2 round-off errors simply halt the program without error messages. Something not thouroughly tested yet is the selection of fft lengths for particular exponents. I have been slightly conservative in the selection mechanism, but there could be some inefficient fft lengths that I haven't looked at yet, which will cause a test to terminate with an excessive round-off error. Next feature I will work on is the ability to extend b1. Here is a Win64 version compiled with cuda tookit 5.5. |
[QUOTE=owftheevil;353933][CODE]./CUDAPm1 1 8196 1 [/CODE]It is important that n1 and n2 be powers of 2.[/QUOTE]8196 isn't a power of 2 :smile:
edit: also, your example doesn't include -cufftbench |
| All times are UTC. The time now is 23:19. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.