mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   LL with OpenCL (https://www.mersenneforum.org/showthread.php?t=18297)

kladner 2013-09-14 21:05

[QUOTE=kracker;353006]Ok, well frankly, 3-4 DC's(all cpu) a day is my max at the moment. So not much.[/QUOTE]

That sounds like a good bit to me. My GPUs take between 20 and 22 hours per DC (~28M).

TeknoHog 2013-09-15 18:19

FWIW:

[code]
Adapter 0 - AMD Radeon HD 7900 Series
New Core Peak : 1000
New Memory Peak : 1500
Platform :Advanced Micro Devices, Inc.
Device 0 : Tahiti


start M57885161 fft length = 3145728
err = 0.352051, increasing n from 3145728

start M57885161 fft length = 3276800
Iteration 10000 M( 57885161 )C, 0x76c27556683cd84d, n = 3276800, clLucas v1.00 err = 0.1289 (4:01 real, 24.1701 ms/iter, ETA 388:32:03)
Iteration 20000 M( 57885161 )C, 0xfd8e311d20ffe6ab, n = 3276800, clLucas v1.00 err = 0.1289 (4:00 real, 23.9985 ms/iter, ETA 385:42:34)
[/code]I'd expect more with this 7970, but the 1x PCIe slot may be a bottleneck. GPU load is 99%, using the -aggressive option.

I also tested a couple of 5870s, but they keep throwing this error

[code]
Error: CommandQueue::enqueueNDRangeKernel() failed. Error code : CL_INVALID_WORK_GROUP_SIZE
Location : Kernels.cpp:425
[/code]and tests with known primes fail.

However, I'm quite happy with the overall development so far, nice work msft :cool:

kracker 2013-09-15 19:35

See [URL="http://mersenneforum.org/showpost.php?p=350303&postcount=64"]this[/URL]:

2097152 and 4194304 fft work best.(-f)

Also, you might want to stick with DC's(for now) until we know clLucas can do those ranges.

TeknoHog 2013-09-15 21:16

[QUOTE=kracker;353065]See [URL="http://mersenneforum.org/showpost.php?p=350303&postcount=64"]this[/URL]:

2097152 and 4194304 fft work best.(-f)

Also, you might want to stick with DC's(for now) until we know clLucas can do those ranges.[/QUOTE]

Thanks :) Also, that 57885161 was just a speed test (M#48), I'm already working on a DC.

kracker 2013-09-16 00:23

HD 7770
[code]
Iteration 10000 M( 22256453 )C, 0x3d9450d492b7e880, n = 1179648, clLucas v1.00 err = 0.2656 (2:39 real, 15.9270 ms/iter, ETA 98:23:35)
Iteration 10000 M( 24732709 )C, 0x81a12a304a754572, n = 1310720, clLucas v1.00 err = 0.2813 (3:04 real, 18.4028 ms/iter, ETA 126:21:56)
Iteration 10000 M( 29412433 )C, 0x27d7d112a73aa203, n = 1572864, clLucas v1.00 err = 0.25 (4:02 real, 24.1517 ms/iter, ETA 197:14:18)
Iteration 10000 M( 30620113 )C, 0x212dca3cec0acde2, n = 1638400, clLucas v1.00 err = 0.25 (5:11 real, 31.0753 ms/iter, ETA 264:13:34)
Iteration 10000 M( 32993419 )C, 0xcf86a69b844e35c0, n = 1769472, clLucas v1.00 err = 0.2813 (7:18 real, 43.7117 ms/iter, ETA 400:26:52)
Iteration 10000 M( 36418493 )C, 0x2f1388379572d5b4, n = 1966080, clLucas v1.00 err = 0.25 (5:30 real, 33.0295 ms/iter, ETA 333:57:52)
Iteration 10000 M( 38955173 )C, 0x8a45e3bbd4e4fc9b, n = 2097152, clLucas v1.00 err = 0.25 (2:03 real, 12.2586 ms/iter, ETA 132:35:49)
Iteration 10000 M( 43792559 )C, 0x7048d84bbfb0f810, n = 2359296, clLucas v1.00 err = 0.2813 (5:54 real, 35.3544 ms/iter, ETA 429:56:54)
Iteration 10000 M( 48375209 )C, 0xf957e240d591a99e, n = 2621440, clLucas v1.00 err = 0.2188 (6:33 real, 39.2538 ms/iter, ETA 527:18:36)
Iteration 10000 M( 57899201 )C, 0xa2ac01bbc76d92ee, n = 3145728, clLucas v1.00 err = 0.25 (9:00 real, 53.9709 ms/iter, ETA 867:43:57)
Iteration 10000 M( 60622229 )C, 0xd81c849f11fd1054, n = 3276800, clLucas v1.00 err = 0.2813 (11:06 real, 66.5953 ms/iter, ETA 1121:12:22)
Iteration 10000 M( 65066623 )C, 0xde7aeb8cc7a2a826, n = 3538944, clLucas v1.00 err = 0.2539 (16:00 real, 96.0663 ms/iter, ETA 1735:51:51)
Iteration 10000 M( 67662869 )C, 0xf854d1dee3fbb5d7, n = 3932160, clLucas v1.00 err = 0.05078 (11:59 real, 71.8933 ms/iter, ETA 1350:59:44)
Iteration 10000 M( 76722161 )C, 0x4b6ba0a6078e4bbb, n = 4194304, clLucas v1.00 err = 0.25 (4:14 real, 24.9663 ms/iter, ETA 540:43:00)
[/code]

owftheevil 2013-09-16 00:56

The errors are nicely low. Is this using the double double sin and cos data for the ffts?

kracker 2013-09-16 01:41

[QUOTE=owftheevil;353091]The errors are nicely low. Is this using the double double sin and cos data for the ffts?[/QUOTE]

You're going to have to ask msft about that, I only compiled it... that's all. :razz:

EDIT: Just realized I won't be able to submit my DC(yet) when it is doen to Primenet.

msft 2013-09-17 15:39

[QUOTE=LaurV;352974]Ok, played with it, but the result is very odd![/QUOTE]
Hi,
Can you check with -threads option?
[code]
$ ./clLucas -threads 64 36666666
$ ./clLucas -threads 128 36666666
[/code]

msft 2013-09-17 15:44

[QUOTE=owftheevil;353091]The errors are nicely low. Is this using the double double sin and cos data for the ffts?[/QUOTE]
Hi,
Double sin cos,I make fftw like function.
[code]
/*
* Improve accuracy by reducing x to range [0..1/8]
* before multiplication by 2 * PI.
*/
#define K2PI 6.2831853071795864769252867665590057683943388
#define m2pi(m, n) ((K2PI * (m)) / (n))
static void ap_sincos(int m, int n, double * si, double * co)
{
double s,c,theta;
int n14,n24,n34,n44;
int m14,m44;
int n18,n28,n38,n48,n58,n68,n78;
int m88;

n14 = n;
n24 = n14 + n14;
n34 = n14 + n24;
n44 = n24 + n24;

m14 = m;
m44 = m14 + m14;
m44 = m44 + m44;

n18 = n;
n28 = n18 + n18;
n38 = n18 + n28;
n48 = n28 + n28;
n58 = n28 + n38;
n68 = n38 + n38;
n78 = n38 + n48;

m88 = m + m;
m88 = m88 + m88;
m88 = m88 + m88;

if(n18 > m88)
{
theta = m2pi(m44,n44);
s = sin(theta);
c = cos(theta);
}
else if(n28 > m88)
{
theta = m2pi(n14-m44,n44);
s = cos(theta);
c = sin(theta);
} else if(n38 > m88)
{
theta = m2pi(m44-n14,n44);
s = cos(theta);
c = -sin(theta);
} else if(n48 > m88)
{
theta = m2pi(n24-m44,n44);
s = sin(theta);
c = -cos(theta);
} else if(n58 > m88)
{
theta = m2pi(m44-n24,n44);
s = -sin(theta);
c = -cos(theta);
} else if(n68 > m88)
{
theta = m2pi(n34-m44,n44);
s = -cos(theta);
c = -sin(theta);
} else if(n78 > m88)
{
theta = m2pi(m44-n34,n44);
s = -cos(theta);
c = sin(theta);
} else
{
theta = m2pi(n44-m44,n44);
s = -sin(theta);
c = cos(theta);
}
*si = s;
*co = c;
}
[/code]

msft 2013-09-17 15:46

[QUOTE=TeknoHog;353061]I also tested a couple of 5870s, but they keep throwing this error

[code]
Error: CommandQueue::enqueueNDRangeKernel() failed. Error code : CL_INVALID_WORK_GROUP_SIZE
Location : Kernels.cpp:425
[/code]and tests with known primes fail.
[/QUOTE]

Hi,
I found bug with this issue,Fix next version.

msft 2013-09-17 15:53

[QUOTE=LaurV;352976]Trying to squeeze some more juice from the Tahiti, I believe this is what msft is after, isn't he? :smile:[/QUOTE]
Hi,
Error occurs just 30000.


All times are UTC. The time now is 13:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.