mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   LL with OpenCL (https://www.mersenneforum.org/showthread.php?t=18297)

kracker 2013-09-20 16:58

You may want to manually compile clLucas, frankly I've never tried a whole lot with MSVC but I've had better results on MinGW x64 in Win and have a executable and just finished a DC with it. If you want, I can give it to you.

EDIT: Manual as in running cl alone.

sanaris 2013-09-20 17:50

It works! Device: firestream9350.
Just was essential to set right kinds of "item", because MSVC tries to "compile" it to different objs.
[CODE]
c:\Users\yury\My Documents\AMD APP\samples\opencl\bin\release\x86_64>clLucas.exe
36666666
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress


start M36666666 fft length = 1966080
err = 0.359375, increasing n from 1966080

start M36666666 fft length = 2097152
Iteration 10000 M( 36666666 )C, 0xded2eec2ad4c020b, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:32 real, 9.2722 ms/iter, ETA 94:23:44)
Iteration 20000 M( 36666666 )C, 0x8c022e364d0eac22, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:35 real, 9.4378 ms/iter, ETA 96:03:21)
Iteration 30000 M( 36666666 )C, 0x581cb1c8d6065b84, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:48 real, 10.7739 ms/iter, ETA 109:37:28)
Iteration 40000 M( 36666666 )C, 0x21b58443efd8f52f, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:37 real, 9.7251 ms/iter, ETA 98:55:34)
[/CODE]

kracker 2013-09-20 18:10

[QUOTE=sanaris;353613]It works! Device: firestream9350.
Just was essential to set right kinds of "item", because MSVC tries to "compile" it to different objs.
[CODE]
c:\Users\yury\My Documents\AMD APP\samples\opencl\bin\release\x86_64>clLucas.exe
36666666
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress


start M36666666 fft length = 1966080
err = 0.359375, increasing n from 1966080

start M36666666 fft length = 2097152
Iteration 10000 M( 36666666 )C, 0xded2eec2ad4c020b, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:32 real, 9.2722 ms/iter, ETA 94:23:44)
Iteration 20000 M( 36666666 )C, 0x8c022e364d0eac22, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:35 real, 9.4378 ms/iter, ETA 96:03:21)
Iteration 30000 M( 36666666 )C, 0x581cb1c8d6065b84, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:48 real, 10.7739 ms/iter, ETA 109:37:28)
Iteration 40000 M( 36666666 )C, 0x21b58443efd8f52f, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:37 real, 9.7251 ms/iter, ETA 98:55:34)
[/CODE][/QUOTE]

Nice. :smile: firestream9350, is that a "pro" type card?

My Radeon HD 7770 gets 12 ms, a 7970 gets ~3.7-4 ms.

sanaris 2013-09-20 19:45

[QUOTE=kracker;353620]Nice. :smile: firestream9350, is that a "pro" type card?

My Radeon HD 7770 gets 12 ms, a 7970 gets ~3.7-4 ms.[/QUOTE]

Yes, that was kinda AMD FireStream direction. They were combining "pro-type" builds with 1 port output. For HPC/server-rendering workloads. But for some reason that direction was closed. They merged it all into "standart firepro" line. Sadly, but HPC and CAD is completely different tasks that require different approaches, but AMD guys didn't understand that...

9350/9370 was latest FireStream with declared DPFlops around 500/700 accordinly.
9350 has declared TDP 150W - pretty good for its throughput.

kracker 2013-09-20 20:07

[QUOTE=sanaris;353628]Yes, that was kinda AMD FireStream direction. They were combining "pro-type" builds with 1 port output. For HPC/server-rendering workloads. But for some reason that direction was closed. They merged it all into "standart firepro" line. Sadly, but HPC and CAD is completely different tasks that require different approaches, but AMD guys didn't understand that...

9350/9370 was latest FireStream with declared DPFlops around 500/700 accordinly.
9350 has declared TDP 150W - pretty good for its throughput.[/QUOTE]

Well, it is cypress(a few years old) a newer gaming GCN* card probably is more efficient. A 200W GCN card does 4.5 ms(2097152).

* Graphic Compute Next

msft 2013-09-20 22:39

[QUOTE=Bdot;353327]Hi,
I could not spend a lot of time, but a few observations:
[/QUOTE]
Hi,
Thank you observations.

TeknoHog 2013-09-21 07:46

As some of you already know, Primenet now accepts clLucas results without any fuss. To make things even smoother, try my [URL="https://github.com/teknohog/primetools"]scripts[/URL] for automatic work assignment and submission.

kracker 2013-09-22 02:32

clLucas 1.01
 
clLucas 1.01 out. :smile:

[QUOTE=msft]
1) Fix TeknoHog issue.
2) Change from clAmdFft.h to clfft.h.
3) Fix "over specifications Grid = 65536" issue.
[/QUOTE]

[URL="http://www.mediafire.com/download/aarhg64vzz2364z/clLucas_x64_1.01.zip"]Windows x86_64[/URL]

Does anyone even use 32 bit machines?

kladner 2013-09-22 06:17

[QUOTE=kracker;353757]
Does anyone even use 32 bit machines?[/QUOTE]

Hardware or software? There's still lots of 32 bit Windows XP out there. There's some about 20 feet from me right now, though I think Dan is finally ready to take on Win 7-64.....EDIT: and the 32 bit version of mfaktc 0.2 is a bit faster than the 64 bit version.

Robish 2013-09-22 13:26

[QUOTE=kracker;353757]clLucas 1.01 out. :smile:



[URL="http://www.mediafire.com/download/aarhg64vzz2364z/clLucas_x64_1.01.zip"]Windows x86_64[/URL]

Does anyone even use 32 bit machines?[/QUOTE]

Dramatic difference here guys 1175 hours vs 284 hours!!! Wow thats some difference ? all because of lil old -f 4194304

Run without -f

C:\Users\ati2\Desktop\clLucas_x64_1.01>clLucas_x64_1.01 62868347 -threads 256
Platform :Advanced Micro Devices, Inc.
Device 0 : Pitcairn


start M62868347 fft length = 3145728
err = 0.40625, increasing n from 3145728

start M62868347 fft length = 3276800
err = 0.484375, increasing n from 3276800

start M62868347 fft length = 3538944
Iteration 10000 M( 62868347 )C, 0x2fead152a6afa7d8, n = 3538944, clLucas v1.01 e
rr = 0.125 (11:13 real, 67.3594 ms/iter, ETA 1175:58:59)


Run with -f 4194304

C:\Users\ati2\Desktop\clLucas_x64_1.01>clLucas_x64_1.01 62868347 -f 4194304 -thr
eads 256
Platform :Advanced Micro Devices, Inc.
Device 0 : Pitcairn


start M62868347 fft length = 4194304
Iteration 10000 M( 62868347 )C, 0x2fead152a6afa7d8, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:43 real, 16.2876 ms/iter, ETA 284:21:16)
Iteration 20000 M( 62868347 )C, 0x06a9133da73deab9, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:42 real, 16.2534 ms/iter, ETA 283:42:42)
Iteration 30000 M( 62868347 )C, 0x130b4bbd5e6fd089, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:42 real, 16.2618 ms/iter, ETA 283:48:46)
Iteration 40000 M( 62868347 )C, 0x71bf6180dbb3ab34, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:43 real, 16.2489 ms/iter, ETA 283:32:36)

Robish 2013-09-22 13:29

[QUOTE=Robish;353805]Dramatic difference here guys 1175 hours vs 284 hours!!! Wow thats some difference ? all because of lil old -f 4194304

Run without -f

C:\Users\ati2\Desktop\clLucas_x64_1.01>clLucas_x64_1.01 62868347 -threads 256
Platform :Advanced Micro Devices, Inc.
Device 0 : Pitcairn


start M62868347 fft length = 3145728
err = 0.40625, increasing n from 3145728

start M62868347 fft length = 3276800
err = 0.484375, increasing n from 3276800

start M62868347 fft length = 3538944
Iteration 10000 M( 62868347 )C, 0x2fead152a6afa7d8, n = 3538944, clLucas v1.01 e
rr = 0.125 (11:13 real, 67.3594 ms/iter, ETA 1175:58:59)


Run with -f 4194304

C:\Users\ati2\Desktop\clLucas_x64_1.01>clLucas_x64_1.01 62868347 -f 4194304 -thr
eads 256
Platform :Advanced Micro Devices, Inc.
Device 0 : Pitcairn


start M62868347 fft length = 4194304
Iteration 10000 M( 62868347 )C, 0x2fead152a6afa7d8, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:43 real, 16.2876 ms/iter, ETA 284:21:16)
Iteration 20000 M( 62868347 )C, 0x06a9133da73deab9, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:42 real, 16.2534 ms/iter, ETA 283:42:42)
Iteration 30000 M( 62868347 )C, 0x130b4bbd5e6fd089, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:42 real, 16.2618 ms/iter, ETA 283:48:46)
Iteration 40000 M( 62868347 )C, 0x71bf6180dbb3ab34, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:43 real, 16.2489 ms/iter, ETA 283:32:36)[/QUOTE]


Still 12 days though, I'll see if I can teak it a bit more with the settings


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.