![]() |
|
|
#155 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
@Robish: Try -aggressive if that does anything for you.
(6 hours till my second DC completes )
|
|
|
|
|
|
#156 |
|
Nov 2010
Germany
3×199 Posts |
|
|
|
|
|
|
#157 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
|
|
|
|
|
|
#159 |
|
Mar 2010
Jyvaskyla, Finland
22×32 Posts |
This fixed my issue indeed, and the speed on a 5870 is similar to the other Cypress posted earlier.
Code:
Adapter 0 - ATI Radeon HD 5800 Series
New Core Peak : 900
New Memory Peak : 1200
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress
Device 1 : Cypress
start M32163559 fft length = 2097152
Iteration 10000 M( 32163559 )C, 0x4fcb7c91ec898d35, n = 2097152, clLucas v1.01 err = 0.003906 (1:32 real, 9.2052 ms/iter, ETA 82:12:26)
Iteration 20000 M( 32163559 )C, 0xf7db0862f3ce666d, n = 2097152, clLucas v1.01 err = 0.003906 (1:31 real, 9.1093 ms/iter, ETA 81:19:31)
Iteration 30000 M( 32163559 )C, 0xc6ee677eb0f7dab6, n = 2097152, clLucas v1.01 err = 0.003906 (1:33 real, 9.2079 ms/iter, ETA 82:10:50)
Iteration 40000 M( 32163559 )C, 0x6090de301e3b5f00, n = 2097152, clLucas v1.01 err = 0.003906 (1:31 real, 9.1100 ms/iter, ETA 81:16:53)
|
|
|
|
|
|
#160 |
|
Jul 2009
Tokyo
2·5·61 Posts |
Hi,
1.01 source code. 1) Fix TeknoHog issue. 2) Change from clAmdFft.h to clfft.h. 3) Fix "over specifications Grid = 65536" issue. 4) Change from cudalucas.ini to cllucas.ini. |
|
|
|
|
|
#161 |
|
"Rob Gahan"
Aug 2013
Ireland
22·32 Posts |
|
|
|
|
|
|
#162 | |
|
Romulan Interpreter
Jun 2011
Thailand
226658 Posts |
@kracker:
Back from my trip, and thinking to take your challenge seriously, hehe, but I just realized you might be talking about a different challenge ![]() Looking here, which at the moment looks like that: Code:
..... 4 kracker 43 307 426 13,040 ..... 6 LaurV 374 9,172 ..... But I just looked to my other stats, which are the GIMPS' Lifetime, and thinking that you may be talking about this: which should be a trifle, just few days of full throttle to put you far behind... ![]() Still thinking about it... But for me now it seems to be more important to do some P-1, because I will be soon pushed out of Lifetime's Top100, where I am trying to stay, once I was able to reach it... hehe. So, my cards will do some P-1 for the time being. Let you get some more advance, hehe ... you know what the problem with roosters is? (quote from the web, I spent some time to search for this old joke, I can never tell it properly in English, but I use to tell it to every young engineer who come to work for the company, when he hits the upper threshold of the door's frame with his head... , buddy, you are the young rooster here!)Quote:
Joking apart, I really like the new clLucas! I certainly have to play more with it! I think you guys did a wonderful job! kotgw and kudos! Last fiddled with by LaurV on 2013-09-23 at 06:45 Reason: forgot the primenet snip |
|
|
|
|
|
|
#163 | |
|
"Rob Gahan"
Aug 2013
Ireland
22×32 Posts |
Quote:
Quick question, what are the fft length multiples of? Im attempting a few 100million jobs with -f 20971520, ETA 190 days but since the -f makes such a difference I would like to try a range of values to find the optimum. I seen it somewhere but cant find now. |
|
|
|
|
|
|
#164 |
|
Romulan Interpreter
Jun 2011
Thailand
25B516 Posts |
It depends on your number of threads. An 8k (i.e. 8*1024) multiple is ok for most cases, but you have to try few DC first to see if you get the right residues. For paranoids (like me) you can get as low as 1K multiples (1024) which is the minimum allowed (it can not be lower, without producing crap). With many threads (like 512, 1024, for gtx cards) a granulation of 1k will still give you errors. Msft said somewhere that you need something like 32-64k FFT for 1024 threads, 16k FFT for 256-512 threads, etc, but in practice, you need to test for YOUR card. You can do "-cufftbench" to test for your card, and then, for each rang of expos you test, select the FFT length (size) that is faster AND in the same time, gives you errors between 0.1 and 0.23 (otherwise you risk to get rounding/summing errors during running).
Last fiddled with by LaurV on 2013-09-23 at 10:26 |
|
|
|
|
|
#165 | |
|
"Rob Gahan"
Aug 2013
Ireland
22×32 Posts |
Quote:
It'll take a while for all that to sink in I'm afraid ;-) I'll try cufftbench 1st. Sorry I probably shouldn't be asking here anyway but I'm using a GTX 690 Cudalucas CUDALucas-2.03-cuda4.2-sm_30-x86-64 -threads 512 -f 20971520 -t 332233123 So if I am reading this right, 8 * 1024 = 8192 so multiples of 8192? ie I'm using 8192 * 2560 = 20971520 Cheers Rob. |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS | VictordeHolland | Linux | 4 | 2018-04-11 13:44 |
| OpenCL accellerated lattice siever | pstach | Factoring | 1 | 2014-05-23 01:03 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| AMD's Graphics Core Next- a reason to accelerate towards OpenCL? | Belteshazzar | GPU Computing | 19 | 2012-03-07 18:58 |