mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

wombatman 2015-09-22 03:38

I would appreciate it if you do find it, but don't waste too much time on it! :smile:

kladner 2015-09-22 05:36

[QUOTE=wombatman;410995]It hits the TDRdelay limit, Windows kills the driver, and the driver is restarted. So far, I just deal with it via a looping shell file as advised by someone on here. That way the program is repeatedly run until the exponent is completed.

I also can't really identify a rhyme or reason for the crashes (i.e., it's not heat or anything like that).[/QUOTE]

Typically, on my MSI 580, over an 8 hour run (like when I am at work) it might restart once. Today, I left it running on a DC, and was pleasantly surprised that it was still at Count 0 (sorry, Mr Gibson) when I came home from work. When I first started testing this card with CUDALucas, it would not complete the short test at anything like the clocks it could run with MFAKTC. Not only that, but it failed at "stock" OC of 833 MHz, default voltage of 1.006, even with the memory clocked at 1600 MHz.

The short of it is that I now run it at the core speed that is in the BIOS: 833. I turn the Vcore up to 1.025. The memory is running at 1800 MHz, with a 10 mv increase over whatever the base voltage is. (There are advantages to running an MSI card with Afterburner. :smile: I don't have the Memory and Auxiliary voltage settings on my EVGA 580. I am clueless about the Aux, but it was also set for +10 mv when I got the no-restart in 8 hours result.)

An advantage, beyond possible stability, to running at the default frequency, is that if it restarts, the clock stays the same. However, voltage and memory settings are not affected. That is a very good thing, since CuLu would almost certainly give bad results with the RAM at over 2 GHz. The self-tests also gave bad results with stock voltage, so I'm glad those setting stick.

In spite of the increased voltages, the card runs cooler than it would with MFAKTC at that speed/voltage combination. Perhaps DP calculation require more voltage for stability, while the DP throttling results in lower temps.

Try setting the GPU core at or below stock frequency, with higher than normal voltage. Set memory low: maybe like 1500 MHz.

The range of voltages I consider using comes from the wildly different "stock" voltages of the two 580s. For the MSI, it is 1.006 v; for the EVGA, it is 1.088 v. (And this insane voltage is to run at an 'oc' of 797 MHz!) I have never touched the higher voltage in real operation, even when running the EVGA at 882 MHz, as it is now, being fed 1.050 v. Meanwhile, the MSI card will run at its stock OC of 833 MHz, on 1.000 v, when running MFAKTC. It is right now running MFAKTC at 911 MHz, on 1.056 v.

I never got the EVGA card to pass self-testing, but I never gave it the same amount of adjustment and testing as I did the MSI. My feeling is that the MSI is just a better card. If nothing else, it has twice as many voltage and frequency steps as the EVGA, which makes fine tuning much more effective. It also runs faster and cooler 'out of the box.' I suspect higher-binned parts. EDIT: .....and a better cooler than the Zalman aftermarket on the EVGA.

kladner 2015-09-22 06:11

[QUOTE=kladner;409447]Sorry if this has been addressed, but I have a puzzling response from CuLu 2.05.1. On my first DC run of a 36M assignment, the FFT selected was 2048K. This produced errors in the 0.05 range. I now have a 34M exponent to DC with the GTX580. Thinking I could get better times with a smaller FFT, I tried inserting 1728K and 1792K both in the worktodo.txt and on the command line. These were either ignored, or the program stated that they are too small and put 2048K back in. I vaguely remember a command "FFT2=" rather than just ",1792K,". Is this what I need to use to get CuLu to test a smaller FFT than 2048?

EDIT: I have deleted the checkpoint files between tests.[/QUOTE]

I am motivated to revive this question. I currently have two 34.7M DCs running on P95. Both are using 1792K FFT, and reporting 0.375 roundoff error. Are error tolerances set more strictly in CuLu?

EDIT: Would the 'I' (capital I) (I/i -- increase/decrease error threshold) interactive command allow a smaller FFT?

wombatman 2015-09-22 14:33

[QUOTE=kladner;411025]Typically, on my MSI 580, over an 8 hour run (like when I am at work) it might restart once. Today, I left it running on a DC, and was pleasantly surprised that it was still at Count 0 (sorry, Mr Gibson) when I came home from work. When I first started testing this card with CUDALucas, it would not complete the short test at anything like the clocks it could run with MFAKTC. Not only that, but it failed at "stock" OC of 833 MHz, default voltage of 1.006, even with the memory clocked at 1600 MHz.

The short of it is that I now run it at the core speed that is in the BIOS: 833. I turn the Vcore up to 1.025. The memory is running at 1800 MHz, with a 10 mv increase over whatever the base voltage is. (There are advantages to running an MSI card with Afterburner. :smile: I don't have the Memory and Auxiliary voltage settings on my EVGA 580. I am clueless about the Aux, but it was also set for +10 mv when I got the no-restart in 8 hours result.)

An advantage, beyond possible stability, to running at the default frequency, is that if it restarts, the clock stays the same. However, voltage and memory settings are not affected. That is a very good thing, since CuLu would almost certainly give bad results with the RAM at over 2 GHz. The self-tests also gave bad results with stock voltage, so I'm glad those setting stick.

In spite of the increased voltages, the card runs cooler than it would with MFAKTC at that speed/voltage combination. Perhaps DP calculation require more voltage for stability, while the DP throttling results in lower temps.

Try setting the GPU core at or below stock frequency, with higher than normal voltage. Set memory low: maybe like 1500 MHz.

The range of voltages I consider using comes from the wildly different "stock" voltages of the two 580s. For the MSI, it is 1.006 v; for the EVGA, it is 1.088 v. (And this insane voltage is to run at an 'oc' of 797 MHz!) I have never touched the higher voltage in real operation, even when running the EVGA at 882 MHz, as it is now, being fed 1.050 v. Meanwhile, the MSI card will run at its stock OC of 833 MHz, on 1.000 v, when running MFAKTC. It is right now running MFAKTC at 911 MHz, on 1.056 v.

I never got the EVGA card to pass self-testing, but I never gave it the same amount of adjustment and testing as I did the MSI. My feeling is that the MSI is just a better card. If nothing else, it has twice as many voltage and frequency steps as the EVGA, which makes fine tuning much more effective. It also runs faster and cooler 'out of the box.' I suspect higher-binned parts. EDIT: .....and a better cooler than the Zalman aftermarket on the EVGA.[/QUOTE]

This is good info. I have an EVGA, so that may be it, but I'll try your suggested tweaks and see if it does anything. Thanks!

kladner 2015-09-22 17:34

I see the difference between the cards as marketing related. MSI went for a reasonably nice overclock, at a reasonable voltage out of the box. It still has [U]lots[/U] of headroom, but one can only go up a step or two on stock voltage. The finer resolutions for voltage and frequency are great for someone with a feel for tweaking, but might confuse a beginning overclocker.

EVGA, on the other hand, set the voltage really high in BIOS, so that a beginner could just crank the clock up and say, "I got an amazing OC on STOCK VOLTAGE!" Of course, it would be running hot as hell, though maybe not so much in a game setting if the load is intermittent. I have not tried to see how high the EVGA will go at 1.088 v, because the system could not handle the heat. I start backing off the settings when temps reach around 78 C.

TheJudger 2015-09-22 18:12

Keep in mind that modern CPUs/GPUs/whatever have indivual voltage and clock settings.
While e.g. all GTX 970 have programmed the same baseclock each specific GPU has an individual voltage for that clock rate, an individual power consumption, an individual max clock and, of course, individual clock rate under a specific load.

These are 12 ASUS GTX 970 (STRIX-GTX970-DC2OC-4GD5), factory OCed. All GPUs are running the same load (mfaktc in this case), GPU temperature is well below the temperature target aswell as power limit is not (yet) reached.
[CODE]
GPU #1
GPU Current Temp : 71 C
Power Draw : 178.41 W
Graphics : 1290 MHz
GPU #2
GPU Current Temp : 72 C
Power Draw : 175.58 W
Graphics : 1290 MHz
GPU #3
GPU Current Temp : 73 C
Power Draw : 185.42 W
Graphics : 1303 MHz
GPU #4
GPU Current Temp : 73 C
Power Draw : 177.85 W
Graphics : 1303 MHz
GPU #5
GPU Current Temp : 73 C
Power Draw : 193.13 W
Graphics : 1328 MHz
GPU #6
GPU Current Temp : 70 C
Power Draw : 178.00 W
Graphics : 1315 MHz
GPU #7
GPU Current Temp : 73 C
Power Draw : 179.40 W
Graphics : 1328 MHz
GPU #8
GPU Current Temp : 70 C
Power Draw : 174.76 W
Graphics : 1290 MHz
GPU #9
GPU Current Temp : 71 C
Power Draw : 177.80 W
Graphics : 1303 MHz
GPU #10
GPU Current Temp : 71 C
Power Draw : 177.25 W
Graphics : 1278 MHz
GPU #11
GPU Current Temp : 72 C
Power Draw : 181.74 W
Graphics : 1303 MHz
GPU #12
GPU Current Temp : 71 C
Power Draw : 184.82 W
Graphics : 1316 MHz
[/CODE]

Oliver

blip 2015-09-23 08:43

How do you get power draw?

TheJudger 2015-09-23 09:56

just 'nvidia-smi -a' (Linux) with a recent driver. But this depends on the GPU(-BIOS) itself.

Teslas are fully featured by 'nvidia-smi' all the time, most Quadros, too. Recently(?) they added full support for Geforce TITAN (all flavours). From time to time they enable or disable features for "regular" Geforce cards. And it depends on the GPU-BIOS/card, too, I *guess* not all Geforce 970 will report this stuff.

Oliver

blip 2015-09-23 21:56

Must be the newer cards. On my oldish 590's, almost all entries show N/A.
Or could it be the driver? (currently 343.36)

kladner 2015-09-24 06:28

[QUOTE=blip;411136]Must be the newer cards. On my oldish 590's, almost all entries show N/A.
Or could it be the driver? (currently 343.36)[/QUOTE]

500 series and earlier did not do that fancy stuff.

TheJudger 2015-09-24 08:44

I saw a lot of those "N/A" going away after installing 352 series driver (in my case 352.41).

This a a cheap GTX 750 (non-Ti) with 352.41:
[CODE]# nvidia-smi -a | grep -c " : "
105
# nvidia-smi -a | grep -c "N/A"
42
[/CODE]

Oliver


All times are UTC. The time now is 23:01.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.