![]() |
|
|
#34 |
|
Jul 2003
So Cal
2·34·13 Posts |
For both a higher end GTX 480 and a lower end C1060, I got the same speed for the precompiled 2.3 binary and a 3.1 binary I compiled myself. 2.8 Mp/s is closer to what one would expect simply in terms of CUDA cores. 5.1M p/s (336/480) = 3.6M p/s.
Last fiddled with by frmky on 2010-10-07 at 16:58 |
|
|
|
|
|
#35 | |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
Quote:
|
|
|
|
|
|
|
#36 | |
|
Mar 2010
1100110112 Posts |
Quote:
From what I've heard from OpenCL coders, sm_21 GPU's shaders cant fully be used, like all previous GeForce cards. 32 SPs per SM can work as they should, yet other 16 will execute instructions only if it's not dependant on the result of calculation. Like first Pentium, U & V pipes. Also, sm_21 gpus benefit more from vectorized code than sm_20 do. And, vectorized code compiled by toolkit 3.2 seems to be 5% faster than compiled by toolkit 3.1, which means NV is improving compiler for sm_21 gpus. |
|
|
|
|
|
|
#37 | |
|
Oct 2010
101111112 Posts |
Quote:
Occupancy = 0.666667 ( 32 / 48 ) Achieved occupancy = 0.666667 (on 7 SMs) Occupancy limiting factor = Block-Size |
|
|
|
|
|
|
#38 |
|
Jan 2005
Caught in a sieve
5·79 Posts |
No, occupancy refers to how many blocks are queued up in registers at one time, not how many are being executed at one time. Occupancy only matters to hide latency. Local instruction latency is very low, so it's mostly used to hide memory access latency. There isn't a lot of memory access in this kernel, so it doesn't matter that much.
This is the issue Karl was referring to. |
|
|
|
|
|
#39 |
|
Jan 2005
Caught in a sieve
5×79 Posts |
FYI, the latest OpenCL version seems to work fine for most. To use it, you first need to get the ATI Stream SDK and follow these instructions (PDF) to install it. Yes, you have to read that PDF, particularly if you have Linux!
Let me know if you get computation errors or have other problems. |
|
|
|
|
|
#40 | |
|
Oct 2010
191 Posts |
Quote:
Last fiddled with by Ralf Recker on 2010-10-08 at 04:30 |
|
|
|
|
|
|
#41 |
|
Jan 2005
Sydney, Australia
14F16 Posts |
How do you get the cuda client's DOS box to show the speed it is crunching at? i.e. factors / sec or equivalent nomenclature?
|
|
|
|
|
|
#42 | |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
186916 Posts |
Quote:
Code:
p=11233184952321, 2.818M p/sec, 0.05 CPU cores, 23.3% done. ETA 11 Oct 05:10 |
|
|
|
|
|
|
#43 |
|
Jan 2005
Sydney, Australia
5×67 Posts |
Interesting, I watched it for 10 minutes and no sign of a speed indication. So I modified the ppconfig.txt using Notepad++ and changed the "Time between status reports" parameter from 60 to 30 then saved. Restarted the application and still no speed displays.
Never mind, GPUZ shows the GTX460 is running at 97-99 percent and I used Ntune to adjust the fan setting from 9 percent to 100 percent and this brought the GPU's temps down from 65C to 55C. The factors file is growing nicely, its at 2331kb so far and we're up to 120233 (range is 120000 - 130000) so that's nearly 25 percent completed in a 2.5 hours. |
|
|
|
|
|
#44 | |
|
May 2007
Kansas; USA
33·5·7·11 Posts |
Quote:
Max, is there a way to cool my GPU? The CPU temps are running kinda high at 75C. I think I'll put an external fan on it. Last fiddled with by gd_barnes on 2010-10-08 at 09:28 |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| New PRPnet drive discussion | mdettweiler | Conjectures 'R Us | 89 | 2011-08-10 09:01 |
| Sieving drive Riesel base 6 n=1M-2M | gd_barnes | Conjectures 'R Us | 40 | 2011-01-22 08:10 |
| Bigger and better GPU sieving drive: k<10000 n<2M | mdettweiler | No Prime Left Behind | 61 | 2010-10-29 18:48 |
| GPU sieving drive for k<=1001 n=1M-2M | mdettweiler | No Prime Left Behind | 11 | 2010-10-04 22:45 |
| Sieving drive for k=301-400 n=1M-2M | MyDogBuster | No Prime Left Behind | 42 | 2010-03-21 01:14 |