mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

swl551 2013-01-11 12:44

[QUOTE=Dubslow;324367]Possibly simply because the GPU is under more stress now. Depending on the CPU behind those 4 instances, that might not have been enough to truly saturate the card, where now 0.20 can do that thanks to the GPU sieving. What's the Eq. GHz with 0.20 at factory clock, vs. the Eq. GHz with 0.19 at factory clock/4 instances?[/QUOTE]

at factory gpu defaults: with my i7-3770k @ 4.2ghz (unchanged throughout all my work)
0.19 with 4 instances=103ghzDays Per = 412 per day
0.20 with 1 instance= 368ghzDays





Based on the GPU processor % utilization it is not under more stress:

With 4 instances of 0.19 it remains at an immutable 99%.
With 0.20 it moves between 97% to 99%.

Remember one of the symptoms here is that after the crash the GPU cannot be returned to factory clock speeds until a reboot is performed. It always gets "stuck" at a dismal 405mhz.

Based on my 3 different 570 (all different manufacturers/models) across 3 PCs I cannot believe I am the only person seeing this. I'm confident I'm not the only person to use overclocking either.


Also of note in 0.19 if I OC past acceptable limits the instances would hang but the processes where still in memory and visible on the screen. With 0.20 they just disappear and the process is not in memory and I don't get the Windows (the display driver has stopped responding) message.



Thanks
Scott

James Heinrich 2013-01-11 13:11

[QUOTE=swl551;324383]0.19 with 4 instances=103ghzDays Per = 412 per day
0.20 with 1 instance= 368ghzDays[/QUOTE]Don't forget that your i7 contributes about 10GHd/d per core to the v0.19 numbers, so with that factored in throughput is close to the same.

I think in your case you've been providing ample CPU support to the GPU and so throughput won't increase all that much with v0.20. What was your SievePrimes value on v0.19? I'd suspect above 100000.

Are you running 0.20 32-bit or 64-bit? 32-bit has higher performance for GPU-sieving.

What assignment are you getting 368Ghd/d on? Your GPU clocks seem pretty high (I believe you said stock for the card is 845MHz; stock for the base GTX 570 is 732MHz), and yet at "only" 800MHz on my GTX 570 I easily get 420GHd/d.

swl551 2013-01-11 14:00

[QUOTE=James Heinrich;324389]Don't forget that your i7 contributes about 10GHd/d per core to the v0.19 numbers, so with that factored in throughput is close to the same.

I think in your case you've been providing ample CPU support to the GPU and so throughput won't increase all that much with v0.20. What was your SievePrimes value on v0.19? I'd suspect above 100000.

Are you running 0.20 32-bit or 64-bit? 32-bit has higher performance for GPU-sieving.

What assignment are you getting 368Ghd/d on? Your GPU clocks seem pretty high (I believe you said stock for the card is 845MHz; stock for the base GTX 570 is 732MHz), and yet at "only" 800MHz on my GTX 570 I easily get 420GHd/d.[/QUOTE]

Remember I'm trying to get an evaluation on [U]why 0.20 crashes at same clocks speeds 0.19 worked fine at[/U]. I'm not trying to compare performance, but documenting it here just in case it is useful.

Aramis Wyler 2013-01-11 14:12

Maybe sieving just generates more heat than factoring. Another possibility is that the sieving creates a fluctuation in usage that didn't exist when the gpu was being fed by the cpu, and the wave form is a little less unstable than the consistant feed.

I suspect that unless your throughput on .20 + the new throughput on the cpu is < the throughput form .19 though, it's a fairly moot point.

TheJudger 2013-01-11 15:11

Hi Scott,

[QUOTE=swl551;324383]Remember one of the symptoms here is that after the crash the GPU cannot be returned to factory clock speeds until a reboot is performed. It always gets "stuck" at a dismal 405mhz.

Based on my 3 different 570 (all different manufacturers/models) across 3 PCs I cannot believe I am the only person seeing this. I'm confident I'm not the only person to use overclocking either.[/QUOTE]

when sieving is done on [B]C[/B]PU than mfaktc uses most of the time only the GPUinternal registers. When sieving is done on [B]G[/B]PU than mfaktc puts some stress on the shared memory inside the GPU, too.
I'm not sure if this is related to the issues you're seeing.

Can you try cudalucas and/or other applications at you desired OC speeds/voltages?

Oliver

Ralf Recker 2013-01-11 15:16

[QUOTE=swl551;324366]The oddest thing is that after mfaktc crashes the GPU core clock will NOT go over 405mhz regardless of what I do with afterBurner. I have to reboot to allow the card to return to factory clock speed. This is a condition I have never seen before (below factory clocks)
[/QUOTE]

This is the standard behaviour of recent NVIDIA drivers.

[QUOTE=swl551;324383]Remember one of the symptoms here is that after the crash the GPU cannot be returned to factory clock speeds until a reboot is performed. It always gets "stuck" at a dismal 405mhz.[/QUOTE]

Ditto.

IIRC this behaviour was introduced with the 260.xy or 270.xy driver versions.

apsen 2013-01-11 15:16

[QUOTE=swl551;324366]GTX 570 and 0.19 I could run 4 instances on one card clock at 1000mv, 900mhz core. Average combined throughput was 480 ghz per day. Never crashed...

020 has forced drop down to 988mv (default) and 845mhz core to stay reliable. Reducing throughput to only 420 ghz per day. Confirmed on 3 different 570s on different PCs. The oddest thing is that after mfaktc crashes the GPU core clock will NOT go over 405mhz regardless of what I do with afterBurner. I have to reboot to allow the card to return to factory clock speed. This is a condition I have never seen before (below factory clocks)

I recognize all the benefits of 0.20 so this is not a 0.20 vs 0.19. The question is specifically why is 0.20 showing instability where 0.19 did not.

thanks

Scott[/QUOTE]

0.20 uses additional parts of the GPU. What I could tell that with 3 instances of 0.19 saturating GPU The games I play were not affected at all. I could run TF and play games at the same time. But with 0.20 it is impossible - my fps go down to something like 5.

Process Explorer shows you each GPU engine load separately and I could see that 0.19 was stressing one GPU engine and my game was stressing different GPU engine. New 0.20 stresses both of those. Unfortunately Process Explorer only shows engine by their numbers so I can't figure out what those engines are... But 0.20 requiring minimum CC 2.0 I guess it's some block that was added in that architecture.

Thanks,
Andriy

TheJudger 2013-01-11 15:59

[QUOTE=apsen;324404][...]
But 0.20 requiring minimum CC 2.0 I guess it's some block that was added in that architecture.[/QUOTE]

I can enable sieving on CC 1.x GPU easily... but the performance is horrible.
What causes not to run GPU sieving is this code in src/mfaktc.c:
[CODE] if((mystuff.compcapa_major == 1) && mystuff.gpu_sieving)
{
printf("Sorry, GPU sieving is not supported on devices with compute capability 1.x!\n");
printf("disable GPU sieving in mfaktc.ini (set SieveOnGPU to 0).\n");
return 1;
}
[/CODE]

The code is nearly identical for CC 1.x and CC 2.0, the differences are in src/tf_barrett96_gs.cu the functions ___clz() and ___popcnt(), for CC 2.0 there is a simple ptx command for clz and popcnt but for CC 1.x they need to be emulated.

Oliver

James Heinrich 2013-01-11 16:03

[QUOTE=TheJudger;324407]I can enable sieving on CC 1.x GPU easily... but the performance is horrible.[/QUOTE]I'm just curious if you can quantify "horrible"? Presumably it works, but is slower than CPU-sieving even on a slow CPU? By how much?

swl551 2013-01-11 16:14

[QUOTE=TheJudger;324402]Hi Scott,



when sieving is done on [B]C[/B]PU than mfaktc uses most of the time only the GPUinternal registers. When sieving is done on [B]G[/B]PU than mfaktc puts some stress on the shared memory inside the GPU, too.
I'm not sure if this is related to the issues you're seeing.

Can you try cudalucas and/or other applications at you desired OC speeds/voltages?

Oliver[/QUOTE]
CudaLucas will NOT run at the high OC rates I ran 0.19 on. I learned that instantly with CuLu. The answers regarding CulLus sensitivity to GPU excution errors made sense. No one has stated 0.20 has the similar constraints. Maybe that is what we are uncovering here.

kladner 2013-01-11 16:31

I can't address the current situation directly, except to say that I throttled back a bit on both the 570 and the 460 with 2.0. I had a few signs of instability, but some of that may have related to the CPU now running 6x P-1 workers. Since I have made multiple adjustments without fully evaluating each one I can't say for sure. (I can't be sure if things are fully stable now, but I did not wake up to a BSOD this morning as I did yesterday.)

As to the nVidia driver getting stuck at 405 MHz, I saw that when I first started running mfaktc on the 460. I think that was with V 0.17. While experimenting with batch files to get things going, I stopped and started mfaktc repeatedly. After 2-3 restarts the GPU clock would hang at 405 MHz, and I would have to reboot to clear it.


All times are UTC. The time now is 23:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.