mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-09-22, 14:41   #155
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

@Robish: Try -aggressive if that does anything for you.

(6 hours till my second DC completes )
kracker is online now   Reply With Quote
Old 2013-09-22, 14:59   #156
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

10010101012 Posts
Default

Quote:
Originally Posted by Robish View Post
Still 12 days though, I'll see if I can teak it a bit more with the settings
For me (rather for a 7850), -threads 64 was fastest. Slightly behind was 128, and a lot slower: 256.
Bdot is offline   Reply With Quote
Old 2013-09-22, 15:22   #157
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Quote:
Originally Posted by Bdot View Post
For me (rather for a 7850), -threads 64 was fastest. Slightly behind was 128, and a lot slower: 256.
My 7770:
64=12.4 ms
128=12.0 ms
256=12.2 ms
kracker is online now   Reply With Quote
Old 2013-09-22, 18:26   #158
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

My Second DC done.
M( 30822937 )C, 0x1c656da41a256c21, n = 2097152, clLucas v1.01

M( 30766511 )C, 0x1ff14c8237b5e935, n = 2097152, clLucas v1.00
kracker is online now   Reply With Quote
Old 2013-09-22, 18:28   #159
TeknoHog
 
TeknoHog's Avatar
 
Mar 2010
Jyvaskyla, Finland

1001002 Posts
Default

Quote:
Originally Posted by kracker View Post
clLucas 1.01 out.
This fixed my issue indeed, and the speed on a 5870 is similar to the other Cypress posted earlier.

Code:
Adapter 0 - ATI Radeon HD 5800 Series
            New Core Peak   : 900
            New Memory Peak : 1200
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress
Device 1 : Cypress


start M32163559 fft length = 2097152
Iteration 10000 M( 32163559 )C, 0x4fcb7c91ec898d35, n = 2097152, clLucas v1.01 err = 0.003906 (1:32 real, 9.2052 ms/iter, ETA 82:12:26)
Iteration 20000 M( 32163559 )C, 0xf7db0862f3ce666d, n = 2097152, clLucas v1.01 err = 0.003906 (1:31 real, 9.1093 ms/iter, ETA 81:19:31)
Iteration 30000 M( 32163559 )C, 0xc6ee677eb0f7dab6, n = 2097152, clLucas v1.01 err = 0.003906 (1:33 real, 9.2079 ms/iter, ETA 82:10:50)
Iteration 40000 M( 32163559 )C, 0x6090de301e3b5f00, n = 2097152, clLucas v1.01 err = 0.003906 (1:31 real, 9.1100 ms/iter, ETA 81:16:53)
TeknoHog is offline   Reply With Quote
Old 2013-09-22, 21:05   #160
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Hi,
1.01 source code.
1) Fix TeknoHog issue.
2) Change from clAmdFft.h to clfft.h.
3) Fix "over specifications Grid = 65536" issue.
4) Change from cudalucas.ini to cllucas.ini.
Attached Files
File Type: bz2 clLucas.1.01.tar.bz2 (16.1 KB, 74 views)
msft is offline   Reply With Quote
Old 2013-09-22, 21:44   #161
Robish
 
"Rob Gahan"
Aug 2013
Ireland

3610 Posts
Default

Quote:
Originally Posted by kracker View Post
@Robish: Try -aggressive if that does anything for you.

(6 hours till my second DC completes )
Will do Kracker, report back soon as
Robish is offline   Reply With Quote
Old 2013-09-23, 06:23   #162
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

961010 Posts
Default

@kracker:
Back from my trip, and thinking to take your challenge seriously, hehe, but I just realized you might be talking about a different challenge

Looking here, which at the moment looks like that:

Code:
.....
4 kracker 43 307 426 13,040
.....
6 LaurV          374  9,172
.....
it would be a 4THzD difference, which might take time (talking about DC, not about TF where thousands of GHzD/D are possible), even if I work full power and you are sleeping...

But I just looked to my other stats, which are the GIMPS' Lifetime, and thinking that you may be talking about this:
Click image for larger version

Name:	top.PNG
Views:	115
Size:	5.0 KB
ID:	10294
which should be a trifle, just few days of full throttle to put you far behind...

Still thinking about it... But for me now it seems to be more important to do some P-1, because I will be soon pushed out of Lifetime's Top100, where I am trying to stay, once I was able to reach it... hehe. So, my cards will do some P-1 for the time being. Let you get some more advance, hehe ... you know what the problem with roosters is?

(quote from the web, I spent some time to search for this old joke, I can never tell it properly in English, but I use to tell it to every young engineer who come to work for the company, when he hits the upper threshold of the door's frame with his head... , buddy, you are the young rooster here!)

Quote:
A farmer goes out one day and buys a brand new stud rooster for his chicken coop. The young rooster walks over to the old rooster and says "Ok, old fellow, time to retire." The old rooster says, "You can't handle all these chickens....look at what it did to me!"
The young rooster replies, "Now, don't give me a hassle about this. Time for the old to step aside and the young to take over, so take a hike."
The old rooster says, "Aw, c'mon.....just let me have the two old hens over in the corner. I won't bother you."
The young rooster says, "Scram! Beat it! You're washed up! I'm taking over!"
So, the old rooster thinks for a minute and then says to the young rooster, "I'll tell you what, young fellow, I'll have a race with you around the farmhouse. Whoever wins the race gets domain of the chicken coop."
The young rooster says, "You know I'm going to beat you, old man, so just to be fair, I'm even going to give you a head start."
They line up in back of the farm house, get a chicken to cluck "Go!" and the old rooster takes off running. About 15 seconds later, the young rooster takes off after him. They round the front of the farmhouse and the young rooster is only about 5 inches behind the old rooster and gaining fast.
The farmer, sitting on the porch, looks up, sees what's going on, grabs his shotgun and BOOM - he blows the young rooster to bits. He sadly shakes his head and says, "Dammit, third gay rooster I bought this week!"
(no disrespect for the gay's community, this joke is just funny!)

Joking apart, I really like the new clLucas! I certainly have to play more with it! I think you guys did a wonderful job! kotgw and kudos!

Last fiddled with by LaurV on 2013-09-23 at 06:45 Reason: forgot the primenet snip
LaurV is offline   Reply With Quote
Old 2013-09-23, 09:48   #163
Robish
 
"Rob Gahan"
Aug 2013
Ireland

22·32 Posts
Unhappy

Quote:
Originally Posted by kracker View Post
@Robish: Try -aggressive if that does anything for you.

(6 hours till my second DC completes )
Hi Kracker

Quick question, what are the fft length multiples of?

Im attempting a few 100million jobs with -f 20971520, ETA 190 days but since the -f makes such a difference I would like to try a range of values to find the optimum.

I seen it somewhere but cant find now.
Robish is offline   Reply With Quote
Old 2013-09-23, 10:23   #164
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2·5·312 Posts
Default

Quote:
Originally Posted by Robish View Post
Quick question, what are the fft length multiples of?
It depends on your number of threads. An 8k (i.e. 8*1024) multiple is ok for most cases, but you have to try few DC first to see if you get the right residues. For paranoids (like me) you can get as low as 1K multiples (1024) which is the minimum allowed (it can not be lower, without producing crap). With many threads (like 512, 1024, for gtx cards) a granulation of 1k will still give you errors. Msft said somewhere that you need something like 32-64k FFT for 1024 threads, 16k FFT for 256-512 threads, etc, but in practice, you need to test for YOUR card. You can do "-cufftbench" to test for your card, and then, for each rang of expos you test, select the FFT length (size) that is faster AND in the same time, gives you errors between 0.1 and 0.23 (otherwise you risk to get rounding/summing errors during running).

Last fiddled with by LaurV on 2013-09-23 at 10:26
LaurV is offline   Reply With Quote
Old 2013-09-23, 11:00   #165
Robish
 
"Rob Gahan"
Aug 2013
Ireland

22×32 Posts
Thumbs up

Quote:
Originally Posted by LaurV View Post
It depends on your number of threads. An 8k (i.e. 8*1024) multiple is ok for most cases, but you have to try few DC first to see if you get the right residues. For paranoids (like me) you can get as low as 1K multiples (1024) which is the minimum allowed (it can not be lower, without producing crap). With many threads (like 512, 1024, for gtx cards) a granulation of 1k will still give you errors. Msft said somewhere that you need something like 32-64k FFT for 1024 threads, 16k FFT for 256-512 threads, etc, but in practice, you need to test for YOUR card. You can do "-cufftbench" to test for your card, and then, for each rang of expos you test, select the FFT length (size) that is faster AND in the same time, gives you errors between 0.1 and 0.23 (otherwise you risk to get rounding/summing errors during running).
Thanks LaurV

It'll take a while for all that to sink in I'm afraid ;-) I'll try cufftbench 1st.

Sorry I probably shouldn't be asking here anyway but I'm using a GTX 690 Cudalucas

CUDALucas-2.03-cuda4.2-sm_30-x86-64 -threads 512 -f 20971520 -t 332233123

So if I am reading this right, 8 * 1024 = 8192 so multiples of 8192?

ie I'm using 8192 * 2560 = 20971520

Cheers

Rob.
Robish is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS VictordeHolland Linux 4 2018-04-11 13:44
OpenCL accellerated lattice siever pstach Factoring 1 2014-05-23 01:03
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
AMD's Graphics Core Next- a reason to accelerate towards OpenCL? Belteshazzar GPU Computing 19 2012-03-07 18:58

All times are UTC. The time now is 13:48.


Fri Jul 16 13:48:19 UTC 2021 up 49 days, 11:35, 2 users, load averages: 1.27, 1.38, 1.53

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.