mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2010-10-07, 16:52   #34
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

83A16 Posts
Default

For both a higher end GTX 480 and a lower end C1060, I got the same speed for the precompiled 2.3 binary and a 3.1 binary I compiled myself. 2.8 Mp/s is closer to what one would expect simply in terms of CUDA cores. 5.1M p/s (336/480) = 3.6M p/s.

Last fiddled with by frmky on 2010-10-07 at 16:58
frmky is offline   Reply With Quote
Old 2010-10-07, 17:37   #35
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
Edit: Also, have you tested that it's outputting correct results? Testing thread here, and/or test it against the version I compiled.
Confirmed, it does produce all the expected factors in the PPS test range. Should be good to go.
mdettweiler is offline   Reply With Quote
Old 2010-10-07, 18:47   #36
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

1100110112 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
Can someone with a GTX 470 or 480 (frmky?) confirm whether or not this applies to the higher-end Fermis? I'm guessing that it does, but to a lesser degree.
Since GTX 460 is not sm_20 as GTX 465,470,480, it has somewhat different architecture.
From what I've heard from OpenCL coders, sm_21 GPU's shaders cant fully be used, like all previous GeForce cards. 32 SPs per SM can work as they should, yet other 16 will execute instructions only if it's not dependant on the result of calculation. Like first Pentium, U & V pipes.
Also, sm_21 gpus benefit more from vectorized code than sm_20 do.
And, vectorized code compiled by toolkit 3.2 seems to be 5% faster than compiled by toolkit 3.1, which means NV is improving compiler for sm_21 gpus.
Karl M Johnson is offline   Reply With Quote
Old 2010-10-07, 19:01   #37
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

BF16 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
Since GTX 460 is not sm_20 as GTX 465,470,480, it has somewhat different architecture.
From what I've heard from OpenCL coders, sm_21 GPU's shaders cant fully be used, like all previous GeForce cards. 32 SPs per SM can work as they should, yet other 16 will execute instructions only if it's not dependant on the result of calculation. Like first Pentium, U & V pipes.
Also, sm_21 gpus benefit more from vectorized code than sm_20 do.
And, vectorized code compiled by toolkit 3.2 seems to be 5% faster than compiled by toolkit 3.1, which means NV is improving compiler for sm_21 gpus.
That would explain the results of a run on a GTX 460 using NVIDIA's profiler:

Occupancy = 0.666667 ( 32 / 48 ) Achieved occupancy = 0.666667 (on 7 SMs)
Occupancy limiting factor = Block-Size
Ralf Recker is offline   Reply With Quote
Old 2010-10-07, 21:46   #38
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5×79 Posts
Default

No, occupancy refers to how many blocks are queued up in registers at one time, not how many are being executed at one time. Occupancy only matters to hide latency. Local instruction latency is very low, so it's mostly used to hide memory access latency. There isn't a lot of memory access in this kernel, so it doesn't matter that much.

This is the issue Karl was referring to.
Ken_g6 is offline   Reply With Quote
Old 2010-10-08, 00:26   #39
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5·79 Posts
Default

FYI, the latest OpenCL version seems to work fine for most. To use it, you first need to get the ATI Stream SDK and follow these instructions (PDF) to install it. Yes, you have to read that PDF, particularly if you have Linux!

Let me know if you get computation errors or have other problems.
Ken_g6 is offline   Reply With Quote
Old 2010-10-08, 03:50   #40
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

191 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
No, occupancy refers to how many blocks are queued up in registers at one time, not how many are being executed at one time. Occupancy only matters to hide latency. Local instruction latency is very low, so it's mostly used to hide memory access latency. There isn't a lot of memory access in this kernel, so it doesn't matter that much.

This is the issue Karl was referring to.
Thanks for the explanation. I remember that I've read this article a few weeks ago and it seems that I've forgotten some details. When I first tested ppsieve-cuda 0.2.0 on the GF104 chip increasing the memory clocks without changing the core/shader clocks didn't have any effect on the runtimes which gave me an indication that memory isn't a limiting factor.

Last fiddled with by Ralf Recker on 2010-10-08 at 04:30
Ralf Recker is offline   Reply With Quote
Old 2010-10-08, 06:35   #41
vaughan
 
vaughan's Avatar
 
Jan 2005
Sydney, Australia

5×67 Posts
Default

How do you get the cuda client's DOS box to show the speed it is crunching at? i.e. factors / sec or equivalent nomenclature?
vaughan is offline   Reply With Quote
Old 2010-10-08, 06:50   #42
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by vaughan View Post
How do you get the cuda client's DOS box to show the speed it is crunching at? i.e. factors / sec or equivalent nomenclature?
It outputs a line like this, which is updated every 60 seconds (by default--this can be changed in ppconfig.txt or on the command line):
Code:
p=11233184952321, 2.818M p/sec, 0.05 CPU cores, 23.3% done. ETA 11 Oct 05:10
The p/sec. figure is the most accurate measure of speed (since factors/sec. becomes less useful as factors begin to thin out with the progression of the sieve).
mdettweiler is offline   Reply With Quote
Old 2010-10-08, 08:58   #43
vaughan
 
vaughan's Avatar
 
Jan 2005
Sydney, Australia

14F16 Posts
Default

Interesting, I watched it for 10 minutes and no sign of a speed indication. So I modified the ppconfig.txt using Notepad++ and changed the "Time between status reports" parameter from 60 to 30 then saved. Restarted the application and still no speed displays.

Never mind, GPUZ shows the GTX460 is running at 97-99 percent and I used Ntune to adjust the fan setting from 9 percent to 100 percent and this brought the GPU's temps down from 65C to 55C. The factors file is growing nicely, its at 2331kb so far and we're up to 120233 (range is 120000 - 130000) so that's nearly 25 percent completed in a 2.5 hours.
vaughan is offline   Reply With Quote
Old 2010-10-08, 09:18   #44
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

33·5·7·11 Posts
Default

Quote:
Originally Posted by vaughan View Post
Interesting, I watched it for 10 minutes and no sign of a speed indication. So I modified the ppconfig.txt using Notepad++ and changed the "Time between status reports" parameter from 60 to 30 then saved. Restarted the application and still no speed displays.

Never mind, GPUZ shows the GTX460 is running at 97-99 percent and I used Ntune to adjust the fan setting from 9 percent to 100 percent and this brought the GPU's temps down from 65C to 55C. The factors file is growing nicely, its at 2331kb so far and we're up to 120233 (range is 120000 - 130000) so that's nearly 25 percent completed in a 2.5 hours.
What range exactly is that? It can't be 12T-13T. It's taking my GPU about 4 days to do a P=1T range. Do you have the correct range plugged in or are you using several GPUs or is my GPU just really slow? :-)

Max, is there a way to cool my GPU? The CPU temps are running kinda high at 75C. I think I'll put an external fan on it.

Last fiddled with by gd_barnes on 2010-10-08 at 09:28
gd_barnes is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
New PRPnet drive discussion mdettweiler Conjectures 'R Us 89 2011-08-10 09:01
Sieving drive Riesel base 6 n=1M-2M gd_barnes Conjectures 'R Us 40 2011-01-22 08:10
Bigger and better GPU sieving drive: k<10000 n<2M mdettweiler No Prime Left Behind 61 2010-10-29 18:48
GPU sieving drive for k<=1001 n=1M-2M mdettweiler No Prime Left Behind 11 2010-10-04 22:45
Sieving drive for k=301-400 n=1M-2M MyDogBuster No Prime Left Behind 42 2010-03-21 01:14

All times are UTC. The time now is 11:04.


Sat Jul 17 11:04:34 UTC 2021 up 50 days, 8:51, 1 user, load averages: 0.96, 1.12, 1.19

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.