mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-09-14, 21:05   #122
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default

Quote:
Originally Posted by kracker View Post
Ok, well frankly, 3-4 DC's(all cpu) a day is my max at the moment. So not much.
That sounds like a good bit to me. My GPUs take between 20 and 22 hours per DC (~28M).
kladner is offline   Reply With Quote
Old 2013-09-15, 18:19   #123
TeknoHog
 
TeknoHog's Avatar
 
Mar 2010
Jyvaskyla, Finland

3610 Posts
Default

FWIW:

Code:
Adapter 0 - AMD Radeon HD 7900 Series
            New Core Peak   : 1000
            New Memory Peak : 1500
Platform :Advanced Micro Devices, Inc.
Device 0 : Tahiti


start M57885161 fft length = 3145728
err = 0.352051, increasing n from 3145728

start M57885161 fft length = 3276800
Iteration 10000 M( 57885161 )C, 0x76c27556683cd84d, n = 3276800, clLucas v1.00 err = 0.1289 (4:01 real, 24.1701 ms/iter, ETA 388:32:03)
Iteration 20000 M( 57885161 )C, 0xfd8e311d20ffe6ab, n = 3276800, clLucas v1.00 err = 0.1289 (4:00 real, 23.9985 ms/iter, ETA 385:42:34)
I'd expect more with this 7970, but the 1x PCIe slot may be a bottleneck. GPU load is 99%, using the -aggressive option.

I also tested a couple of 5870s, but they keep throwing this error

Code:
Error: CommandQueue::enqueueNDRangeKernel() failed. Error code : CL_INVALID_WORK_GROUP_SIZE
Location : Kernels.cpp:425
and tests with known primes fail.

However, I'm quite happy with the overall development so far, nice work msft
TeknoHog is offline   Reply With Quote
Old 2013-09-15, 19:35   #124
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

See this:

2097152 and 4194304 fft work best.(-f)

Also, you might want to stick with DC's(for now) until we know clLucas can do those ranges.

Last fiddled with by kracker on 2013-09-15 at 19:38
kracker is offline   Reply With Quote
Old 2013-09-15, 21:16   #125
TeknoHog
 
TeknoHog's Avatar
 
Mar 2010
Jyvaskyla, Finland

22×32 Posts
Default

Quote:
Originally Posted by kracker View Post
See this:

2097152 and 4194304 fft work best.(-f)

Also, you might want to stick with DC's(for now) until we know clLucas can do those ranges.
Thanks :) Also, that 57885161 was just a speed test (M#48), I'm already working on a DC.
TeknoHog is offline   Reply With Quote
Old 2013-09-16, 00:23   #126
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

1000011110002 Posts
Default

HD 7770
Code:
Iteration 10000 M( 22256453 )C, 0x3d9450d492b7e880, n = 1179648, clLucas v1.00 err = 0.2656 (2:39 real, 15.9270 ms/iter, ETA 98:23:35)
Iteration 10000 M( 24732709 )C, 0x81a12a304a754572, n = 1310720, clLucas v1.00 err = 0.2813 (3:04 real, 18.4028 ms/iter, ETA 126:21:56)
Iteration 10000 M( 29412433 )C, 0x27d7d112a73aa203, n = 1572864, clLucas v1.00 err = 0.25 (4:02 real, 24.1517 ms/iter, ETA 197:14:18)
Iteration 10000 M( 30620113 )C, 0x212dca3cec0acde2, n = 1638400, clLucas v1.00 err = 0.25 (5:11 real, 31.0753 ms/iter, ETA 264:13:34)
Iteration 10000 M( 32993419 )C, 0xcf86a69b844e35c0, n = 1769472, clLucas v1.00 err = 0.2813 (7:18 real, 43.7117 ms/iter, ETA 400:26:52)
Iteration 10000 M( 36418493 )C, 0x2f1388379572d5b4, n = 1966080, clLucas v1.00 err = 0.25 (5:30 real, 33.0295 ms/iter, ETA 333:57:52)
Iteration 10000 M( 38955173 )C, 0x8a45e3bbd4e4fc9b, n = 2097152, clLucas v1.00 err = 0.25 (2:03 real, 12.2586 ms/iter, ETA 132:35:49)
Iteration 10000 M( 43792559 )C, 0x7048d84bbfb0f810, n = 2359296, clLucas v1.00 err = 0.2813 (5:54 real, 35.3544 ms/iter, ETA 429:56:54)
Iteration 10000 M( 48375209 )C, 0xf957e240d591a99e, n = 2621440, clLucas v1.00 err = 0.2188 (6:33 real, 39.2538 ms/iter, ETA 527:18:36)
Iteration 10000 M( 57899201 )C, 0xa2ac01bbc76d92ee, n = 3145728, clLucas v1.00 err = 0.25 (9:00 real, 53.9709 ms/iter, ETA 867:43:57)
Iteration 10000 M( 60622229 )C, 0xd81c849f11fd1054, n = 3276800, clLucas v1.00 err = 0.2813 (11:06 real, 66.5953 ms/iter, ETA 1121:12:22)
Iteration 10000 M( 65066623 )C, 0xde7aeb8cc7a2a826, n = 3538944, clLucas v1.00 err = 0.2539 (16:00 real, 96.0663 ms/iter, ETA 1735:51:51)
Iteration 10000 M( 67662869 )C, 0xf854d1dee3fbb5d7, n = 3932160, clLucas v1.00 err = 0.05078 (11:59 real, 71.8933 ms/iter, ETA 1350:59:44)
Iteration 10000 M( 76722161 )C, 0x4b6ba0a6078e4bbb, n = 4194304, clLucas v1.00 err = 0.25 (4:14 real, 24.9663 ms/iter, ETA 540:43:00)
kracker is offline   Reply With Quote
Old 2013-09-16, 00:56   #127
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32·5·7 Posts
Default

The errors are nicely low. Is this using the double double sin and cos data for the ffts?
owftheevil is offline   Reply With Quote
Old 2013-09-16, 01:41   #128
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by owftheevil View Post
The errors are nicely low. Is this using the double double sin and cos data for the ffts?
You're going to have to ask msft about that, I only compiled it... that's all.

EDIT: Just realized I won't be able to submit my DC(yet) when it is doen to Primenet.

Last fiddled with by kracker on 2013-09-16 at 02:04
kracker is offline   Reply With Quote
Old 2013-09-17, 15:39   #129
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Quote:
Originally Posted by LaurV View Post
Ok, played with it, but the result is very odd!
Hi,
Can you check with -threads option?
Code:
$ ./clLucas -threads 64 36666666
$ ./clLucas -threads 128 36666666
msft is offline   Reply With Quote
Old 2013-09-17, 15:44   #130
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Quote:
Originally Posted by owftheevil View Post
The errors are nicely low. Is this using the double double sin and cos data for the ffts?
Hi,
Double sin cos,I make fftw like function.
Code:
/*
 * Improve accuracy by reducing x to range [0..1/8]
 * before multiplication by 2 * PI.
 */
#define K2PI 6.2831853071795864769252867665590057683943388
#define m2pi(m, n) ((K2PI * (m)) / (n))
static void ap_sincos(int m, int n, double * si, double * co)
{
        double s,c,theta;
        int n14,n24,n34,n44;
        int m14,m44;
        int n18,n28,n38,n48,n58,n68,n78;
        int m88;

        n14 = n;
        n24 = n14 + n14;
        n34 = n14 + n24;
        n44 = n24 + n24;

        m14 = m;
        m44 = m14 + m14;
        m44 = m44 + m44;

        n18 = n;
        n28 = n18 + n18;
        n38 = n18 + n28;
        n48 = n28 + n28;
        n58 = n28 + n38;
        n68 = n38 + n38;
        n78 = n38 + n48;

        m88 = m + m;
        m88 = m88 + m88;
        m88 = m88 + m88;

        if(n18 > m88)
        {
                theta = m2pi(m44,n44);
                s = sin(theta);
                c = cos(theta);
        }
        else if(n28 > m88)
        {
                theta = m2pi(n14-m44,n44);
                s = cos(theta);
                c = sin(theta);
        } else if(n38 > m88)
        {
                theta = m2pi(m44-n14,n44);
                s = cos(theta);
                c = -sin(theta);
        } else if(n48 > m88)
        {
                theta = m2pi(n24-m44,n44);
                s = sin(theta);
                c = -cos(theta);
        } else if(n58 > m88)
        {
                theta = m2pi(m44-n24,n44);
                s = -sin(theta);
                c = -cos(theta);
        } else if(n68 > m88)
        {
                theta = m2pi(n34-m44,n44);
                s = -cos(theta);
                c = -sin(theta);
        } else if(n78 > m88)
        {
                theta = m2pi(m44-n34,n44);
                s = -cos(theta);
                c = sin(theta);
        } else
        {
                theta = m2pi(n44-m44,n44);
                s = -sin(theta);
                c = cos(theta);
        }
        *si = s;
        *co = c;
}
msft is offline   Reply With Quote
Old 2013-09-17, 15:46   #131
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by TeknoHog View Post
I also tested a couple of 5870s, but they keep throwing this error

Code:
Error: CommandQueue::enqueueNDRangeKernel() failed. Error code : CL_INVALID_WORK_GROUP_SIZE
Location : Kernels.cpp:425
and tests with known primes fail.
Hi,
I found bug with this issue,Fix next version.
msft is offline   Reply With Quote
Old 2013-09-17, 15:53   #132
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by LaurV View Post
Trying to squeeze some more juice from the Tahiti, I believe this is what msft is after, isn't he?
Hi,
Error occurs just 30000.
msft is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS VictordeHolland Linux 4 2018-04-11 13:44
OpenCL accellerated lattice siever pstach Factoring 1 2014-05-23 01:03
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
AMD's Graphics Core Next- a reason to accelerate towards OpenCL? Belteshazzar GPU Computing 19 2012-03-07 18:58

All times are UTC. The time now is 07:04.


Mon Aug 2 07:04:48 UTC 2021 up 10 days, 1:33, 0 users, load averages: 1.79, 1.83, 1.50

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.