mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2016-01-12, 07:16   #386
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

101000001100112 Posts
Default

Quote:
Originally Posted by kracker View Post
Well well... never thought I'd see this day.

EDIT: I wonder what kind of performance you can get out of something like a FirePro W8100.. it's basically a R9 290 with 1/2 DP instead of 1/8 for the R9 290
good job! I will give it a try tonight, too.

(about the edit, buy one and try, then tell us so everybody knows!)
LaurV is offline   Reply With Quote
Old 2016-01-12, 10:50   #387
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

51710 Posts
Default

Curiously, my W9100 and W8100s actually give pretty subpar performance for having such fast DP units.

My Fury X is still the fastest. at a 4K FFT 3.7ms/iteration vs. 4.9ms/iteration on the FirePro. I wonder if there is something not quite being used right in this case. Titan Black with cuFFT is doing 2.65ms/iteration.
airsquirrels is offline   Reply With Quote
Old 2016-01-12, 18:11   #388
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2,663 Posts
Default

clLucas is limited by memory bandwidth on the R9 290. The HBM in the Fury X wins. Adding extra DP units doesn't help.
frmky is offline   Reply With Quote
Old 2016-01-12, 18:24   #389
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11×47 Posts
Default

Quote:
Originally Posted by frmky View Post
clLucas is limited by memory bandwidth on the R9 290. The HBM in the Fury X wins. Adding extra DP units doesn't help.
Is the Titan's memory bandwidth really that much higher than the Hawaii (FirePro/290)? Specs say 288 for the Titan vs 320GB/s for the FirePro.

That means either clFFT has much much worse memory performance than cuFFT, which isn't supported by AMDs 1-1 comparison at these FFT sizes http://developer.amd.com/community/b...clfft-library/ or there is something off in clLucas itself.

My guess is it is the other kernels in clLucas that cause the trouble, the point wise multiplication in particular does not look very optimized for GCN.

When I get time I plan to pre-bake FFT plans of a certain size and embed the multiply step and 2nd FFT + normalize kernels into one call. I believe that will give a substantial uplift.

clFFT added preCallback support for exactly this reason.

Last fiddled with by airsquirrels on 2016-01-12 at 18:29
airsquirrels is offline   Reply With Quote
Old 2016-01-14, 02:51   #390
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

37·59 Posts
Default

clFFT 2.10.0 is out.. no idea if this will/could affect clLucas in any way.

Code:
This clFFT release tagged as v2.10.0 is part of AMD Compute Libraries (ACL) 1.0 GA.
clFFT - Release Notes - version 2.10.0:
  
  • Post-callback feature that enables custom post-processing of output data directly by the library with user callback function
  • Support for in-place transposes for power-of-2 sizes enables really large 1D transforms as well as supporting no additional memory allocation, by library, for a range of problem sizes
kracker is offline   Reply With Quote
Old 2016-01-14, 03:01   #391
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

10058 Posts
Default

Quote:
Originally Posted by kracker View Post
clFFT 2.10.0 is out.. no idea if this will/could affect clLucas in any way.

Code:
This clFFT release tagged as v2.10.0 is part of AMD Compute Libraries (ACL) 1.0 GA.
clFFT - Release Notes - version 2.10.0:
  
  • Post-callback feature that enables custom post-processing of output data directly by the library with user callback function
  • Support for in-place transposes for power-of-2 sizes enables really large 1D transforms as well as supporting no additional memory allocation, by library, for a range of problem sizes
I pulled it down a few days ago before the final release and did a clLucas build against it. Nothing earth shattering in the default configuration.

I do believe using the pre/post callback feature will offer nice boosts.
airsquirrels is offline   Reply With Quote
Old 2016-01-14, 04:43   #392
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default

You guys are amazing! (With complete respect.)
kladner is offline   Reply With Quote
Old 2016-01-17, 16:23   #393
lalera
 
lalera's Avatar
 
Jul 2003

27·5 Posts
Default

hi,
i did a short test with v1.04 on a R9 390X (Hawaii XT)
Code:
C:\1cllucas104>clLucas_x64 -sixstepfft -clfftbench 1048576 8388608 1048576

Platform 0 : Advanced Micro Devices, Inc.
Platform :Advanced Micro Devices, Inc.
Device 0 : Hawaii

Build Options are : -D KHR_DP_EXTENSION

CL_DEVICE_NAME                          Hawaii
CL_DEVICE_VENDOR                        Advanced Micro Devices, Inc.
CL_DEVICE_VERSION                       OpenCL 2.0 AMD-APP (1800.8)
CL_DRIVER_VERSION                       1800.8 (VM)
CL_DEVICE_MAX_COMPUTE_UNITS             44
CL_DEVICE_MAX_CLOCK_FREQUENCY           1080
CL_DEVICE_GLOBAL_MEM_SIZE               0
CL_DEVICE_MAX_WORK_GROUP_SIZE           256
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE 1

clFFT bench start = 1048576 end = 8388608 distance = 1048576
clFFT size= 1048576 time= 0.624990 msec
clFFT size= 2097152 time= 0.781230 msec
clFFT size= 3145728 time= 1.093740 msec
clFFT size= 4194304 time= 1.517150 msec
clFFT size= 5242880 time= 3.749980 msec
clFFT size= 6291456 time= 4.094270 msec
clFFT size= 7340032 time= 4.843740 msec
clFFT size= 8388608 time= 3.497480 msec

C:\1cllucas104>
--

C:\1cllucas104>clLucas_x64 -sixstepfft -f 2048k 36976267

Platform 0 : Advanced Micro Devices, Inc.
Platform :Advanced Micro Devices, Inc.
Device 0 : Hawaii

Build Options are : -D KHR_DP_EXTENSION

CL_DEVICE_NAME                          Hawaii
CL_DEVICE_VENDOR                        Advanced Micro Devices, Inc.
CL_DEVICE_VERSION                       OpenCL 2.0 AMD-APP (1800.8)
CL_DRIVER_VERSION                       1800.8 (VM)
CL_DEVICE_MAX_COMPUTE_UNITS             44
CL_DEVICE_MAX_CLOCK_FREQUENCY           1080
CL_DEVICE_GLOBAL_MEM_SIZE               0
CL_DEVICE_MAX_WORK_GROUP_SIZE           256
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE 1

mkdir: cannot create directory `': File exists
Starting M36976267 fft length = 2048K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration  100, average error = 0.04351, max error = 0.05859
Iteration  200, average error = 0.05244, max error = 0.06250
Iteration  300, average error = 0.05579, max error = 0.06250
Iteration  400, average error = 0.05747, max error = 0.06250
Iteration  500, average error = 0.05998, max error = 0.07031
Iteration  600, average error = 0.06170, max error = 0.07031
Iteration  700, average error = 0.06293, max error = 0.07031
Iteration  800, average error = 0.06385, max error = 0.07031
Iteration  900, average error = 0.06457, max error = 0.07031
Iteration 1000, average error = 0.06514 < 0.25 (max error = 0.07031), continuing test.
Iteration 20000 M( 36976267 )C, 0x8c1da923bc3ab356, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 2.2464 ms/iter, ETA 23:03:03)
Iteration 40000 M( 36976267 )C, 0x91133c8d2a727523, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 2.2344 ms/iter, ETA 22:54:55)
Iteration 60000 M( 36976267 )C, 0x5a30204bb64469fa, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 2.2363 ms/iter, ETA 22:55:18)
Iteration 80000 M( 36976267 )C, 0x8a8bd90aeb56e711, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 2.2361 ms/iter, ETA 22:54:28)
Iteration 100000 M( 36976267 )C, 0x53d5d66e91a7cc1e, n = 2048K, clLucas v1.04 err = 0.0703 (0:44 real, 2.2354 ms/iter, ETA 22:53:16)
Iteration 120000 M( 36976267 )C, 0xfc64c4c3b6ec8934, n = 2048K, clLucas v1.04 err = 0.0703 (0:44 real, 2.2350 ms/iter, ETA 22:52:16)
Iteration 140000 M( 36976267 )C, 0xd3509068b42c84c5, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 2.2362 ms/iter, ETA 22:52:15)
Iteration 160000 M( 36976267 )C, 0xe640c2979b7861bc, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 2.2344 ms/iter, ETA 22:50:25)
Iteration 180000 M( 36976267 )C, 0x3e91782572e7b52c, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 2.2364 ms/iter, ETA 22:50:55)
Iteration 200000 M( 36976267 )C, 0x2df9fc599a8ad81d, n = 2048K, clLucas v1.04 err = 0.0703 (0:44 real, 2.2370 ms/iter, ETA 22:50:32)
Iteration 220000 M( 36976267 )C, 0x3a49b542818203ef, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 2.2361 ms/iter, ETA 22:49:14)
Iteration 240000 M( 36976267 )C, 0x2378291204a357e6, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 2.2397 ms/iter, ETA 22:50:43)
Iteration 260000 M( 36976267 )C, 0x20a525306e49c305, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 2.2367 ms/iter, ETA 22:48:05)
Iteration 280000 M( 36976267 )C, 0xeb7dea574bc026be, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 2.2369 ms/iter, ETA 22:47:30)
        Unknown signal caught, writing checkpoint. Estimated time spent so far: 10:31


C:\1cllucas104>
lalera is offline   Reply With Quote
Old 2016-01-19, 17:39   #394
lalera
 
lalera's Avatar
 
Jul 2003

27·5 Posts
Smile

hi,
i did a doublecheck-test with v1.04 on a R9 390X (Hawaii XT)
with a powerlimit of -40 percent for the gpu
M43656857 - around 3ms per iteration - estimated total time = 37:23:36

Code:
C:\1cllucas104>clLucas_x64 -sixstepfft

Platform 0 : Advanced Micro Devices, Inc.
Platform :Advanced Micro Devices, Inc.
Device 0 : Hawaii

Build Options are : -D KHR_DP_EXTENSION

CL_DEVICE_NAME                          Hawaii
CL_DEVICE_VENDOR                        Advanced Micro Devices, Inc.
CL_DEVICE_VERSION                       OpenCL 2.0 AMD-APP (1800.8)
CL_DRIVER_VERSION                       1800.8 (VM)
CL_DEVICE_MAX_COMPUTE_UNITS             44
CL_DEVICE_MAX_CLOCK_FREQUENCY           1080
CL_DEVICE_GLOBAL_MEM_SIZE               0
CL_DEVICE_MAX_WORK_GROUP_SIZE           256
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE 1

mkdir: cannot create directory `': File exists
Starting M43656857 fft length = 2240K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 0.43750 >= 0.35, increasing n from 2240K
Starting M43656857 fft length = 2304K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration  100, average error = 0.16664, max error = 0.22266
Iteration  200, average error = 0.19465, max error = 0.22266
Iteration  300, average error = 0.21301, max error = 0.25000
Iteration  400, average error = 0.22226, max error = 0.25000
Iteration  500, average error = 0.22780, max error = 0.25000
Iteration  600, average error = 0.23150, max error = 0.25000
Iteration  700, average error = 0.23415, max error = 0.25000
Iteration  800, average error = 0.23613, max error = 0.25000
Iteration  900, average error = 0.23767, max error = 0.25000
Iteration 1000, average error = 0.23889 < 0.25 (max error = 0.25000), continuing test.
Iteration 20000 M( 43656857 )C, 0x93ce78fa2fef13c3, n = 2304K, clLucas v1.04 err = 0.2500 (1:01 real, 3.0184 ms/iter, ETA 36:34:24)
Iteration 40000 M( 43656857 )C, 0xa7c5eb7734d6a507, n = 2304K, clLucas v1.04 err = 0.2500 (1:00 real, 3.0075 ms/iter, ETA 36:25:26)
Iteration 60000 M( 43656857 )C, 0x1adcc85304d23fe2, n = 2304K, clLucas v1.04 err = 0.2500 (1:00 real, 3.0059 ms/iter, ETA 36:23:16)
Iteration 80000 M( 43656857 )C, 0xb7e7a01734335a19, n = 2304K, clLucas v1.04 err = 0.2500 (1:00 real, 3.0117 ms/iter, ETA 36:26:31)
Iteration 100000 M( 43656857 )C, 0xd96857efa86503ac, n = 2304K, clLucas v1.04 err = 0.2500 (1:00 real, 3.0065 ms/iter, ETA 36:21:41)
Iteration 120000 M( 43656857 )C, 0xe5c88868a0f71263, n = 2304K, clLucas v1.04 err = 0.2500 (1:00 real, 3.0056 ms/iter, ETA 36:20:03)
Iteration 140000 M( 43656857 )C, 0xd55a8ea7e5995672, n = 2304K, clLucas v1.04 err = 0.2500 (1:00 real, 3.0070 ms/iter, ETA 36:20:03)
Iteration 160000 M( 43656857 )C, 0xfdf27393cff1d8e1, n = 2304K, clLucas v1.04 err = 0.2500 (1:00 real, 3.0075 ms/iter, ETA 36:19:25)
Iteration 180000 M( 43656857 )C, 0xd2383a2759ace039, n = 2304K, clLucas v1.04 err = 0.2500 (1:01 real, 3.0054 ms/iter, ETA 36:16:56)
Iteration 200000 M( 43656857 )C, 0x19ffa4ce2c4039fd, n = 2304K, clLucas v1.04 err = 0.2500 (1:00 real, 3.0073 ms/iter, ETA 36:17:17)

...

Iteration 43460000 M( 43656857 )C, 0xfed9a59e92e8b13c, n = 2304K, clLucas v1.04 err = 0.3125 (1:02 real, 3.1015 ms/iter, ETA 9:18)
Iteration 43480000 M( 43656857 )C, 0x2b6ad4f57568dd67, n = 2304K, clLucas v1.04 err = 0.3125 (1:02 real, 3.0936 ms/iter, ETA 8:14)
Iteration 43500000 M( 43656857 )C, 0xc1efde761cd7e7ae, n = 2304K, clLucas v1.04 err = 0.3125 (1:02 real, 3.0916 ms/iter, ETA 7:12)
Iteration 43520000 M( 43656857 )C, 0x0acd599af4479068, n = 2304K, clLucas v1.04 err = 0.3125 (1:01 real, 3.0883 ms/iter, ETA 6:10)
Iteration 43540000 M( 43656857 )C, 0x33512556c84a656e, n = 2304K, clLucas v1.04 err = 0.3125 (1:01 real, 3.0929 ms/iter, ETA 5:09)
Iteration 43560000 M( 43656857 )C, 0x88c796f90f2780a0, n = 2304K, clLucas v1.04 err = 0.3125 (1:02 real, 3.0888 ms/iter, ETA 4:07)
Iteration 43580000 M( 43656857 )C, 0xc494e7dd7d84bc48, n = 2304K, clLucas v1.04 err = 0.3125 (1:02 real, 3.0892 ms/iter, ETA 3:05)
Iteration 43600000 M( 43656857 )C, 0x503c1fe26490b85b, n = 2304K, clLucas v1.04 err = 0.3125 (1:02 real, 3.0873 ms/iter, ETA 2:03)
Iteration 43620000 M( 43656857 )C, 0xc9ba71029f42e911, n = 2304K, clLucas v1.04 err = 0.3125 (1:02 real, 3.1016 ms/iter, ETA 1:02)
Iteration 43640000 M( 43656857 )C, 0xd945e5cbd2bc21ee, n = 2304K, clLucas v1.04 err = 0.3125 (1:02 real, 3.0932 ms/iter, ETA 0:00)
M( 43656857 )C, 0x10b5c3b8fad63572, n = 2304K, clLucas v1.04, estimated total time = 37:23:36

No valid assignment found.


C:\1cllucas104>
nice program !
lalera is offline   Reply With Quote
Old 2016-01-19, 18:02   #395
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
San Diego, Calif.

281D16 Posts
Thumbs up

clLucas was now one of the verification programs for M49*
Congrats and kudos to the author!
Batalov is offline   Reply With Quote
Old 2016-01-20, 00:30   #396
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by lalera View Post
nice program !
Thank you for your support.

clLucas.ini support six step fft.
Code:
# SixStepFFT is the same as the -sixstepfft option. 
SixStepFFT=1
1 on
0 off
Attached Files
File Type: ini clLucas.ini (3.9 KB, 228 views)
msft is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1724 2023-06-04 23:31
Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS VictordeHolland Linux 4 2018-04-11 13:44
OpenCL accellerated lattice siever pstach Factoring 1 2014-05-23 01:03
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
AMD's Graphics Core Next- a reason to accelerate towards OpenCL? Belteshazzar GPU Computing 19 2012-03-07 18:58

All times are UTC. The time now is 15:26.


Fri Jul 7 15:26:16 UTC 2023 up 323 days, 12:54, 0 users, load averages: 1.08, 1.12, 1.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔