mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-10-09, 16:51   #243
Robish
 
"Rob Gahan"
Aug 2013
Ireland

22·32 Posts
Default

Quote:
Originally Posted by LaurV View Post
Well, man, you are serious about it!

If you want to use cudaLucas, then you can do better. With a GTX Titan, even better... Do you have one? (if yes, send it to me! )

Start P95 (or go to it if it is already running) and use "Advanced -> Time..." option, to "time" your exponent. Fill the exponent in the box, select 100 iterations, click ok. See what FFT length is used. Write it down somewhere.

Play with "cudalucas -cufftbench x y z" option, where x is a bit smaller than the value chosen by p95, y is about 20% higher, and z is a multiple of 1024 (see discussions on cudaLucas thread).

With last command you "tunned" the right FFT size for your card. Select the FFT length which is around the one chosen by p95, it may be a bit smaller, or up to 10-15-20 percent larger, AND which is the fastest for your card. Run few iterations with the command you used above, but with the right FFT, the one you "cufftbench" it.

Try with both -t and without. Try with "-polite 0" and without. (edit: use -k, then you can turn "polite" on/off interactively using the keyboard, just press "p" and "enter", see the speed difference. watch the GPU occupancy in this time, it should go down a bit when polite is active).

My estimation (from a superficial mental calculus and from the error your test shows) it would be that a FFT close to 18M would be much faster, and keep the error under 0.21 (I may be wrong). 2097152*3^2 is better than 2097152*2*5 when splitting the "butterflies" of the FFT multiplication, therefore if the error given for ~18M8 FFT size is not too big, that would be much faster for your card.

If my estimation is right, you may come few weeks shorter, comparing with the time you have now.

Keep us informed.

edit: if you can save the residues on the way it would be even better, in case someone wants to DC later.

Thanks again LaurV

This is exactly what I need, it'll keep me busy for a while but I'll report back with the results.

Second ClLucas
M( 63083221 )C, 0x3c3b3e3282ec79__, n = 4194304, clLucas v1.01

Last fiddled with by chalsall on 2013-10-14 at 15:51 Reason: Masked residue.
Robish is offline   Reply With Quote
Old 2013-10-11, 00:09   #244
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Makefile for mingw(windows).
Code:
DEPTH = ../../../../..

include $(DEPTH)/make/openclsdkdefs.mk 

# Root path of clFFT
CLFFT = /opt/clFFT-2.0

####
#
#  Targets
#
####

OPENCL            = 1
SAMPLE_EXE        = 1
EXE_TARGET         = clLucas
EXE_TARGET_INSTALL       = clLucas

####
#
#  C/CPP files
#
####

FILES     = clLucas 
CLFILES    = Kernels.cl

CFLAGS  +=  -O3 -Wno-conversion-null -Wno-write-strings -Wno-pointer-arith  -I . -I /opt/AMDAPP/include/ -I $(CLFFT)/include -I $(CLFFT)/src/include

LLIBS      += SDKUtil 
LDFLAGS += -O3 $(CLFFT)/library/libclFFT.dll.a -lOpenCL

include $(DEPTH)/make/openclsdkrules.mk
kracker is offline   Reply With Quote
Old 2013-10-13, 17:11   #245
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

First LL-First time verified
M58191149
kracker is offline   Reply With Quote
Old 2013-10-14, 05:56   #246
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

72×197 Posts
Default

BTW, a supermod could mask the residue two posts above... Just in case...

@Robish: please do not post full residues of un-verified LL tests, some "credit hunters" will be tempted to "verify" them using your residue, and 20 years later we may find out we missed a prime, in case that residue is wrong.
LaurV is offline   Reply With Quote
Old 2013-10-15, 14:47   #247
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Just finished my ninth DC(38M) no mismatch, all matching with one card memory overclocked all the way to max.

38198291 Looks oddly familiar.

Also, upgrading from 13.9 to 13.11, I had 13.4 ms vs 12.0 .

Last fiddled with by kracker on 2013-10-15 at 15:10
kracker is offline   Reply With Quote
Old 2013-10-16, 10:02   #248
Manpowre
 
"Svein Johansen"
May 2013
Norway

20110 Posts
Default

Quote:
Originally Posted by kracker View Post
Just finished my ninth DC(38M) no mismatch, all matching with one card memory overclocked all the way to max.

38198291 Looks oddly familiar.

Also, upgrading from 13.9 to 13.11, I had 13.4 ms vs 12.0 .
Which board are you using ?

I am considering getting a R9 290X which has almost same number of compute cores and number of transistors as the GK110 on a Titan. Just wondering if we would see similar iteration times ? What do you think Kracker ?
Manpowre is offline   Reply With Quote
Old 2013-10-16, 11:00   #249
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

19B16 Posts
Default

Quote:
Originally Posted by Manpowre View Post
Which board are you using ?

I am considering getting a R9 290X which has almost same number of compute cores and number of transistors as the GK110 on a Titan. Just wondering if we would see similar iteration times ? What do you think Kracker ?
It can very roughly be approximated from 7970's timings, since we know the amount of shaders and their clocks for both cards.

Last fiddled with by Karl M Johnson on 2013-10-16 at 11:01 Reason: yes
Karl M Johnson is offline   Reply With Quote
Old 2013-10-16, 15:23   #250
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
It can very roughly be approximated from 7970's timings, since we know the amount of shaders and their clocks for both cards.
Hate to disagree, but I don't think so. We do know the number of st's and clocks but the R9 290(X) is from a new architecture, and that can/will change things. I have my ideas on the performance of it, but...
kracker is offline   Reply With Quote
Old 2013-10-16, 16:02   #251
chris2be8
 
chris2be8's Avatar
 
Sep 2009

22×523 Posts
Default

Would there be a range about 76M where the next larger power of 2 FFT would be optimal for first time checks?

Chris
chris2be8 is offline   Reply With Quote
Old 2013-10-16, 16:35   #252
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

72×197 Posts
Default

Yes, with a max of 77M, for which the error stays at 0.25 (therefore a bit risky, if it "increase" in the middle, then a hundred hours gone...).

Safe to use from about 68M to about 76M5, for which a Tahiti gets ETA 155-168 hours with the CPU idle and 230-260 hours with the CPU busy (P95 running in all cores). With this score, the question is if it is not better to let the Tahiti do TF or coins mining.

The 2^x (i.e. 4194304 in our case) is indeed much faster, for comparison, the "precedent" FFT (3932160) gets double times (330 hours at 70M expo), and the "next" in size after the power of two (4718592) needs more time (like 650++ hours at 78M expo). Interesting that the "non-power of two" sizes are not affected by P95 running or not (!?!?!?). (for the records, I use driver 13.1, all the other after it are either slower or have other problems).
LaurV is offline   Reply With Quote
Old 2013-10-16, 17:52   #253
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

87816 Posts
Default

Quote:
Originally Posted by LaurV View Post
Yes, with a max of 77M, for which the error stays at 0.25 (therefore a bit risky, if it "increase" in the middle, then a hundred hours gone...).

Safe to use from about 68M to about 76M5, for which a Tahiti gets ETA 155-168 hours with the CPU idle and 230-260 hours with the CPU busy (P95 running in all cores). With this score, the question is if it is not better to let the Tahiti do TF or coins mining.

The 2^x (i.e. 4194304 in our case) is indeed much faster, for comparison, the "precedent" FFT (3932160) gets double times (330 hours at 70M expo), and the "next" in size after the power of two (4718592) needs more time (like 650++ hours at 78M expo). Interesting that the "non-power of two" sizes are not affected by P95 running or not (!?!?!?). (for the records, I use driver 13.1, all the other after it are either slower or have other problems).
My 38M DC, which originally had a err of 0.13, increased to 0.2 and it was fine.

Frankly, I'm waiting for a mismatch on my heavily oc'ed card but none yet.
kracker is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS VictordeHolland Linux 4 2018-04-11 13:44
OpenCL accellerated lattice siever pstach Factoring 1 2014-05-23 01:03
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
AMD's Graphics Core Next- a reason to accelerate towards OpenCL? Belteshazzar GPU Computing 19 2012-03-07 18:58

All times are UTC. The time now is 07:13.


Mon Aug 2 07:13:47 UTC 2021 up 10 days, 1:42, 0 users, load averages: 1.71, 1.94, 1.74

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.