mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-01-11, 12:44   #2080
swl551
 
swl551's Avatar
 
Aug 2012
New Hampshire

23·101 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Possibly simply because the GPU is under more stress now. Depending on the CPU behind those 4 instances, that might not have been enough to truly saturate the card, where now 0.20 can do that thanks to the GPU sieving. What's the Eq. GHz with 0.20 at factory clock, vs. the Eq. GHz with 0.19 at factory clock/4 instances?
at factory gpu defaults: with my i7-3770k @ 4.2ghz (unchanged throughout all my work)
0.19 with 4 instances=103ghzDays Per = 412 per day
0.20 with 1 instance= 368ghzDays





Based on the GPU processor % utilization it is not under more stress:

With 4 instances of 0.19 it remains at an immutable 99%.
With 0.20 it moves between 97% to 99%.

Remember one of the symptoms here is that after the crash the GPU cannot be returned to factory clock speeds until a reboot is performed. It always gets "stuck" at a dismal 405mhz.

Based on my 3 different 570 (all different manufacturers/models) across 3 PCs I cannot believe I am the only person seeing this. I'm confident I'm not the only person to use overclocking either.


Also of note in 0.19 if I OC past acceptable limits the instances would hang but the processes where still in memory and visible on the screen. With 0.20 they just disappear and the process is not in memory and I don't get the Windows (the display driver has stopped responding) message.



Thanks
Scott

Last fiddled with by swl551 on 2013-01-11 at 12:52
swl551 is offline   Reply With Quote
Old 2013-01-11, 13:11   #2081
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

7×13×47 Posts
Default

Quote:
Originally Posted by swl551 View Post
0.19 with 4 instances=103ghzDays Per = 412 per day
0.20 with 1 instance= 368ghzDays
Don't forget that your i7 contributes about 10GHd/d per core to the v0.19 numbers, so with that factored in throughput is close to the same.

I think in your case you've been providing ample CPU support to the GPU and so throughput won't increase all that much with v0.20. What was your SievePrimes value on v0.19? I'd suspect above 100000.

Are you running 0.20 32-bit or 64-bit? 32-bit has higher performance for GPU-sieving.

What assignment are you getting 368Ghd/d on? Your GPU clocks seem pretty high (I believe you said stock for the card is 845MHz; stock for the base GTX 570 is 732MHz), and yet at "only" 800MHz on my GTX 570 I easily get 420GHd/d.

Last fiddled with by James Heinrich on 2013-01-11 at 13:25 Reason: stock clocks
James Heinrich is offline   Reply With Quote
Old 2013-01-11, 14:00   #2082
swl551
 
swl551's Avatar
 
Aug 2012
New Hampshire

23·101 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Don't forget that your i7 contributes about 10GHd/d per core to the v0.19 numbers, so with that factored in throughput is close to the same.

I think in your case you've been providing ample CPU support to the GPU and so throughput won't increase all that much with v0.20. What was your SievePrimes value on v0.19? I'd suspect above 100000.

Are you running 0.20 32-bit or 64-bit? 32-bit has higher performance for GPU-sieving.

What assignment are you getting 368Ghd/d on? Your GPU clocks seem pretty high (I believe you said stock for the card is 845MHz; stock for the base GTX 570 is 732MHz), and yet at "only" 800MHz on my GTX 570 I easily get 420GHd/d.
Remember I'm trying to get an evaluation on why 0.20 crashes at same clocks speeds 0.19 worked fine at. I'm not trying to compare performance, but documenting it here just in case it is useful.

Last fiddled with by swl551 on 2013-01-11 at 14:06
swl551 is offline   Reply With Quote
Old 2013-01-11, 14:12   #2083
Aramis Wyler
 
Aramis Wyler's Avatar
 
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA

23×53 Posts
Default

Maybe sieving just generates more heat than factoring. Another possibility is that the sieving creates a fluctuation in usage that didn't exist when the gpu was being fed by the cpu, and the wave form is a little less unstable than the consistant feed.

I suspect that unless your throughput on .20 + the new throughput on the cpu is < the throughput form .19 though, it's a fairly moot point.
Aramis Wyler is offline   Reply With Quote
Old 2013-01-11, 15:11   #2084
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

100010110112 Posts
Default

Hi Scott,

Quote:
Originally Posted by swl551 View Post
Remember one of the symptoms here is that after the crash the GPU cannot be returned to factory clock speeds until a reboot is performed. It always gets "stuck" at a dismal 405mhz.

Based on my 3 different 570 (all different manufacturers/models) across 3 PCs I cannot believe I am the only person seeing this. I'm confident I'm not the only person to use overclocking either.
when sieving is done on CPU than mfaktc uses most of the time only the GPUinternal registers. When sieving is done on GPU than mfaktc puts some stress on the shared memory inside the GPU, too.
I'm not sure if this is related to the issues you're seeing.

Can you try cudalucas and/or other applications at you desired OC speeds/voltages?

Oliver
TheJudger is offline   Reply With Quote
Old 2013-01-11, 15:16   #2085
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

191 Posts
Default

Quote:
Originally Posted by swl551 View Post
The oddest thing is that after mfaktc crashes the GPU core clock will NOT go over 405mhz regardless of what I do with afterBurner. I have to reboot to allow the card to return to factory clock speed. This is a condition I have never seen before (below factory clocks)
This is the standard behaviour of recent NVIDIA drivers.

Quote:
Originally Posted by swl551 View Post
Remember one of the symptoms here is that after the crash the GPU cannot be returned to factory clock speeds until a reboot is performed. It always gets "stuck" at a dismal 405mhz.
Ditto.

IIRC this behaviour was introduced with the 260.xy or 270.xy driver versions.

Last fiddled with by Ralf Recker on 2013-01-11 at 15:23
Ralf Recker is offline   Reply With Quote
Old 2013-01-11, 15:16   #2086
apsen
 
Jun 2011

131 Posts
Default

Quote:
Originally Posted by swl551 View Post
GTX 570 and 0.19 I could run 4 instances on one card clock at 1000mv, 900mhz core. Average combined throughput was 480 ghz per day. Never crashed...

020 has forced drop down to 988mv (default) and 845mhz core to stay reliable. Reducing throughput to only 420 ghz per day. Confirmed on 3 different 570s on different PCs. The oddest thing is that after mfaktc crashes the GPU core clock will NOT go over 405mhz regardless of what I do with afterBurner. I have to reboot to allow the card to return to factory clock speed. This is a condition I have never seen before (below factory clocks)

I recognize all the benefits of 0.20 so this is not a 0.20 vs 0.19. The question is specifically why is 0.20 showing instability where 0.19 did not.

thanks

Scott
0.20 uses additional parts of the GPU. What I could tell that with 3 instances of 0.19 saturating GPU The games I play were not affected at all. I could run TF and play games at the same time. But with 0.20 it is impossible - my fps go down to something like 5.

Process Explorer shows you each GPU engine load separately and I could see that 0.19 was stressing one GPU engine and my game was stressing different GPU engine. New 0.20 stresses both of those. Unfortunately Process Explorer only shows engine by their numbers so I can't figure out what those engines are... But 0.20 requiring minimum CC 2.0 I guess it's some block that was added in that architecture.

Thanks,
Andriy
apsen is offline   Reply With Quote
Old 2013-01-11, 15:59   #2087
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

45B16 Posts
Default

Quote:
Originally Posted by apsen View Post
[...]
But 0.20 requiring minimum CC 2.0 I guess it's some block that was added in that architecture.
I can enable sieving on CC 1.x GPU easily... but the performance is horrible.
What causes not to run GPU sieving is this code in src/mfaktc.c:
Code:
  if((mystuff.compcapa_major == 1) && mystuff.gpu_sieving)
  {
    printf("Sorry, GPU sieving is not supported on devices with compute capability 1.x!\n");
    printf("disable GPU sieving in mfaktc.ini (set SieveOnGPU to 0).\n");
    return 1;
  }
The code is nearly identical for CC 1.x and CC 2.0, the differences are in src/tf_barrett96_gs.cu the functions ___clz() and ___popcnt(), for CC 2.0 there is a simple ptx command for clz and popcnt but for CC 1.x they need to be emulated.

Oliver
TheJudger is offline   Reply With Quote
Old 2013-01-11, 16:03   #2088
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

7·13·47 Posts
Default

Quote:
Originally Posted by TheJudger View Post
I can enable sieving on CC 1.x GPU easily... but the performance is horrible.
I'm just curious if you can quantify "horrible"? Presumably it works, but is slower than CPU-sieving even on a slow CPU? By how much?
James Heinrich is offline   Reply With Quote
Old 2013-01-11, 16:14   #2089
swl551
 
swl551's Avatar
 
Aug 2012
New Hampshire

23·101 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi Scott,



when sieving is done on CPU than mfaktc uses most of the time only the GPUinternal registers. When sieving is done on GPU than mfaktc puts some stress on the shared memory inside the GPU, too.
I'm not sure if this is related to the issues you're seeing.

Can you try cudalucas and/or other applications at you desired OC speeds/voltages?

Oliver
CudaLucas will NOT run at the high OC rates I ran 0.19 on. I learned that instantly with CuLu. The answers regarding CulLus sensitivity to GPU excution errors made sense. No one has stated 0.20 has the similar constraints. Maybe that is what we are uncovering here.
swl551 is offline   Reply With Quote
Old 2013-01-11, 16:31   #2090
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default

I can't address the current situation directly, except to say that I throttled back a bit on both the 570 and the 460 with 2.0. I had a few signs of instability, but some of that may have related to the CPU now running 6x P-1 workers. Since I have made multiple adjustments without fully evaluating each one I can't say for sure. (I can't be sure if things are fully stable now, but I did not wake up to a BSOD this morning as I did yesterday.)

As to the nVidia driver getting stuck at 405 MHz, I saw that when I first started running mfaktc on the 460. I think that was with V 0.17. While experimenting with batch files to get things going, I stopped and started mfaktc repeatedly. After 2-3 restarts the GPU clock would hang at 405 MHz, and I would have to reboot to clear it.
kladner is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1724 2023-06-04 23:31
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 42 2022-12-18 05:59
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 14:47.


Fri Jul 7 14:47:24 UTC 2023 up 323 days, 12:15, 0 users, load averages: 1.82, 1.47, 1.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔