mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-09-20, 16:58   #144
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

You may want to manually compile clLucas, frankly I've never tried a whole lot with MSVC but I've had better results on MinGW x64 in Win and have a executable and just finished a DC with it. If you want, I can give it to you.

EDIT: Manual as in running cl alone.

Last fiddled with by kracker on 2013-09-20 at 17:00
kracker is offline   Reply With Quote
Old 2013-09-20, 17:50   #145
sanaris
 
"Yury Vorobyov"
Jul 2013
Chelyabinsk

1316 Posts
Default

It works! Device: firestream9350.
Just was essential to set right kinds of "item", because MSVC tries to "compile" it to different objs.
Code:
c:\Users\yury\My Documents\AMD APP\samples\opencl\bin\release\x86_64>clLucas.exe
 36666666
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress


start M36666666 fft length = 1966080
err = 0.359375, increasing n from 1966080

start M36666666 fft length = 2097152
Iteration 10000 M( 36666666 )C, 0xded2eec2ad4c020b, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:32 real, 9.2722 ms/iter, ETA 94:23:44)
Iteration 20000 M( 36666666 )C, 0x8c022e364d0eac22, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:35 real, 9.4378 ms/iter, ETA 96:03:21)
Iteration 30000 M( 36666666 )C, 0x581cb1c8d6065b84, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:48 real, 10.7739 ms/iter, ETA 109:37:28)
Iteration 40000 M( 36666666 )C, 0x21b58443efd8f52f, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:37 real, 9.7251 ms/iter, ETA 98:55:34)
sanaris is offline   Reply With Quote
Old 2013-09-20, 18:10   #146
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Quote:
Originally Posted by sanaris View Post
It works! Device: firestream9350.
Just was essential to set right kinds of "item", because MSVC tries to "compile" it to different objs.
Code:
c:\Users\yury\My Documents\AMD APP\samples\opencl\bin\release\x86_64>clLucas.exe
 36666666
Platform :Advanced Micro Devices, Inc.
Device 0 : Cypress


start M36666666 fft length = 1966080
err = 0.359375, increasing n from 1966080

start M36666666 fft length = 2097152
Iteration 10000 M( 36666666 )C, 0xded2eec2ad4c020b, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:32 real, 9.2722 ms/iter, ETA 94:23:44)
Iteration 20000 M( 36666666 )C, 0x8c022e364d0eac22, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:35 real, 9.4378 ms/iter, ETA 96:03:21)
Iteration 30000 M( 36666666 )C, 0x581cb1c8d6065b84, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:48 real, 10.7739 ms/iter, ETA 109:37:28)
Iteration 40000 M( 36666666 )C, 0x21b58443efd8f52f, n = 2097152, clLucas v1.00 e
rr = 0.08594 (1:37 real, 9.7251 ms/iter, ETA 98:55:34)
Nice. firestream9350, is that a "pro" type card?

My Radeon HD 7770 gets 12 ms, a 7970 gets ~3.7-4 ms.
kracker is offline   Reply With Quote
Old 2013-09-20, 19:45   #147
sanaris
 
"Yury Vorobyov"
Jul 2013
Chelyabinsk

1910 Posts
Default

Quote:
Originally Posted by kracker View Post
Nice. firestream9350, is that a "pro" type card?

My Radeon HD 7770 gets 12 ms, a 7970 gets ~3.7-4 ms.
Yes, that was kinda AMD FireStream direction. They were combining "pro-type" builds with 1 port output. For HPC/server-rendering workloads. But for some reason that direction was closed. They merged it all into "standart firepro" line. Sadly, but HPC and CAD is completely different tasks that require different approaches, but AMD guys didn't understand that...

9350/9370 was latest FireStream with declared DPFlops around 500/700 accordinly.
9350 has declared TDP 150W - pretty good for its throughput.

Last fiddled with by sanaris on 2013-09-20 at 19:48
sanaris is offline   Reply With Quote
Old 2013-09-20, 20:07   #148
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

1000011110002 Posts
Default

Quote:
Originally Posted by sanaris View Post
Yes, that was kinda AMD FireStream direction. They were combining "pro-type" builds with 1 port output. For HPC/server-rendering workloads. But for some reason that direction was closed. They merged it all into "standart firepro" line. Sadly, but HPC and CAD is completely different tasks that require different approaches, but AMD guys didn't understand that...

9350/9370 was latest FireStream with declared DPFlops around 500/700 accordinly.
9350 has declared TDP 150W - pretty good for its throughput.
Well, it is cypress(a few years old) a newer gaming GCN* card probably is more efficient. A 200W GCN card does 4.5 ms(2097152).

* Graphic Compute Next
kracker is offline   Reply With Quote
Old 2013-09-20, 22:39   #149
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by Bdot View Post
Hi,
I could not spend a lot of time, but a few observations:
Hi,
Thank you observations.
msft is offline   Reply With Quote
Old 2013-09-21, 07:46   #150
TeknoHog
 
TeknoHog's Avatar
 
Mar 2010
Jyvaskyla, Finland

22·32 Posts
Default

As some of you already know, Primenet now accepts clLucas results without any fuss. To make things even smoother, try my scripts for automatic work assignment and submission.
TeknoHog is offline   Reply With Quote
Old 2013-09-22, 02:32   #151
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

216810 Posts
Default clLucas 1.01

clLucas 1.01 out.

Quote:
Originally Posted by msft
1) Fix TeknoHog issue.
2) Change from clAmdFft.h to clfft.h.
3) Fix "over specifications Grid = 65536" issue.
Windows x86_64

Does anyone even use 32 bit machines?
kracker is offline   Reply With Quote
Old 2013-09-22, 06:17   #152
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

236568 Posts
Default

Quote:
Originally Posted by kracker View Post
Does anyone even use 32 bit machines?
Hardware or software? There's still lots of 32 bit Windows XP out there. There's some about 20 feet from me right now, though I think Dan is finally ready to take on Win 7-64.....EDIT: and the 32 bit version of mfaktc 0.2 is a bit faster than the 64 bit version.

Last fiddled with by kladner on 2013-09-22 at 06:18
kladner is offline   Reply With Quote
Old 2013-09-22, 13:26   #153
Robish
 
"Rob Gahan"
Aug 2013
Ireland

22×32 Posts
Smile

Quote:
Originally Posted by kracker View Post
clLucas 1.01 out.



Windows x86_64

Does anyone even use 32 bit machines?
Dramatic difference here guys 1175 hours vs 284 hours!!! Wow thats some difference ? all because of lil old -f 4194304

Run without -f

C:\Users\ati2\Desktop\clLucas_x64_1.01>clLucas_x64_1.01 62868347 -threads 256
Platform :Advanced Micro Devices, Inc.
Device 0 : Pitcairn


start M62868347 fft length = 3145728
err = 0.40625, increasing n from 3145728

start M62868347 fft length = 3276800
err = 0.484375, increasing n from 3276800

start M62868347 fft length = 3538944
Iteration 10000 M( 62868347 )C, 0x2fead152a6afa7d8, n = 3538944, clLucas v1.01 e
rr = 0.125 (11:13 real, 67.3594 ms/iter, ETA 1175:58:59)


Run with -f 4194304

C:\Users\ati2\Desktop\clLucas_x64_1.01>clLucas_x64_1.01 62868347 -f 4194304 -thr
eads 256
Platform :Advanced Micro Devices, Inc.
Device 0 : Pitcairn


start M62868347 fft length = 4194304
Iteration 10000 M( 62868347 )C, 0x2fead152a6afa7d8, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:43 real, 16.2876 ms/iter, ETA 284:21:16)
Iteration 20000 M( 62868347 )C, 0x06a9133da73deab9, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:42 real, 16.2534 ms/iter, ETA 283:42:42)
Iteration 30000 M( 62868347 )C, 0x130b4bbd5e6fd089, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:42 real, 16.2618 ms/iter, ETA 283:48:46)
Iteration 40000 M( 62868347 )C, 0x71bf6180dbb3ab34, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:43 real, 16.2489 ms/iter, ETA 283:32:36)
Robish is offline   Reply With Quote
Old 2013-09-22, 13:29   #154
Robish
 
"Rob Gahan"
Aug 2013
Ireland

22·32 Posts
Default

Quote:
Originally Posted by Robish View Post
Dramatic difference here guys 1175 hours vs 284 hours!!! Wow thats some difference ? all because of lil old -f 4194304

Run without -f

C:\Users\ati2\Desktop\clLucas_x64_1.01>clLucas_x64_1.01 62868347 -threads 256
Platform :Advanced Micro Devices, Inc.
Device 0 : Pitcairn


start M62868347 fft length = 3145728
err = 0.40625, increasing n from 3145728

start M62868347 fft length = 3276800
err = 0.484375, increasing n from 3276800

start M62868347 fft length = 3538944
Iteration 10000 M( 62868347 )C, 0x2fead152a6afa7d8, n = 3538944, clLucas v1.01 e
rr = 0.125 (11:13 real, 67.3594 ms/iter, ETA 1175:58:59)


Run with -f 4194304

C:\Users\ati2\Desktop\clLucas_x64_1.01>clLucas_x64_1.01 62868347 -f 4194304 -thr
eads 256
Platform :Advanced Micro Devices, Inc.
Device 0 : Pitcairn


start M62868347 fft length = 4194304
Iteration 10000 M( 62868347 )C, 0x2fead152a6afa7d8, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:43 real, 16.2876 ms/iter, ETA 284:21:16)
Iteration 20000 M( 62868347 )C, 0x06a9133da73deab9, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:42 real, 16.2534 ms/iter, ETA 283:42:42)
Iteration 30000 M( 62868347 )C, 0x130b4bbd5e6fd089, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:42 real, 16.2618 ms/iter, ETA 283:48:46)
Iteration 40000 M( 62868347 )C, 0x71bf6180dbb3ab34, n = 4194304, clLucas v1.01 e
rr = 0.002441 (2:43 real, 16.2489 ms/iter, ETA 283:32:36)

Still 12 days though, I'll see if I can teak it a bit more with the settings
Robish is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS VictordeHolland Linux 4 2018-04-11 13:44
OpenCL accellerated lattice siever pstach Factoring 1 2014-05-23 01:03
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
AMD's Graphics Core Next- a reason to accelerate towards OpenCL? Belteshazzar GPU Computing 19 2012-03-07 18:58

All times are UTC. The time now is 07:15.


Mon Aug 2 07:15:19 UTC 2021 up 10 days, 1:44, 0 users, load averages: 2.09, 1.97, 1.77

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.