![]() |
LL with OpenCL
I try ported LL to OpenCL.
|
1 Attachment(s)
Sometime clAmdFft-1.10.321 have precision problem.
7750: [QUOTE] $ sh -x ./run.sh + rm *.o a.out + g++ -c main.cpp -I /opt/AMDAPP/include/ -I /opt/clAmdFft-1.10.321/include/ + g++ -c clFFTPlans.cpp -I /opt/AMDAPP/include/ -I /opt/clAmdFft-1.10.321/include/ + g++ main.o clFFTPlans.o /opt/clAmdFft-1.10.321/lib64/libclAmdFft.Runtime.so -lOpenCL -lfftw3 + export LD_LIBRARY_PATH=:/opt/clAmdFft-1.10.321/lib64/ + time ./a.out Using device: Capeverde AmdFFT_Z2Z size= 2048 time= 0.080000 msec Everything went fine! 3.52user 7.27system 0:09.48elapsed 113%CPU (0avgtext+0avgdata 381360maxresident)k 0inputs+1680outputs (0major+38531minor)pagefaults 0swaps + diff+ fft_fftw.dat fft_cl.dat head -n 40 2,16c2,16 < 1 -1.673975300084231e+05 1.663735300084231e+05 < 2 -8.395397960358397e+04 8.292997960358398e+04 < 3 -5.613911373064458e+04 5.511511373064457e+04 < 4 -4.223141898575165e+04 4.120741898575165e+04 < 5 -3.388659268655724e+04 3.286259268655724e+04 < 6 -2.832320060429759e+04 2.729920060429759e+04 < 7 -2.434919949694704e+04 2.332519949694705e+04 < 8 -2.136856774250665e+04 2.034456774250665e+04 < 9 -1.905018221676692e+04 1.802618221676692e+04 < 10 -1.719536904441296e+04 1.617136904441295e+04 < 11 -1.567769939498128e+04 1.465369939498128e+04 < 12 -1.441288738083825e+04 1.338888738083825e+04 < 13 -1.334258123301454e+04 1.231858123301454e+04 < 14 -1.242510111596777e+04 1.140110111596777e+04 < 15 -1.162988181643994e+04 1.060588181643995e+04 --- > 1 -1.673975291656877e+05 1.663735279914048e+05 > 2 -8.395397960358400e+04 8.292997960358398e+04 > 3 -5.613911438886409e+04 5.511511209925302e+04 > 4 -4.223141898575164e+04 4.120741898575164e+04 > 5 -3.388659171768492e+04 3.286259310540232e+04 > 6 -2.832320060429760e+04 2.729920060429759e+04 > 7 -2.434919920591193e+04 2.332519938365545e+04 > 8 -2.136856763176386e+04 2.034456749267493e+04 > 9 -1.905018202758640e+04 1.802618176363859e+04 > 10 -1.719536897123582e+04 1.617136882738273e+04 > 11 -1.567769950455412e+04 1.465369873990205e+04 > 12 -1.441288733405879e+04 1.338888718385923e+04 > 13 -1.334258082835818e+04 1.231858121315468e+04 > 14 -1.242510108971055e+04 1.140110093114974e+04 > 15 -1.162988166206707e+04 1.060588158595162e+04 18c18 < 17 -1.031992388900484e+04 9.295923889004827e+03 --- > 17 -1.031992383364508e+04 9.295923772879578e+03 [/QUOTE] Compare LINE 3. |
I hear the AMD dev forums is a nice place to ask if you get stuck, [URL="http://devgurus.amd.com/welcome"]here[/URL].
|
[QUOTE=kracker;343516]I hear the AMD dev forums is a nice place to ask if you get stuck, [URL="http://devgurus.amd.com/welcome"]here[/URL].[/QUOTE]
Thank you information. Problem of the addition order is subtle. I'll try to investigate a little more. |
I just noticed something... The GTX 780 has a DP ratio of 1/24, GCN has a ratio of around 1/4. Not sure if it will mean anything, just something I read. :popcorn:
|
1 Attachment(s)
Mul loop on GPU.
[QUOTE] $ pwd /opt/AMDAPP/samples/opencl/cl/0.19/0.19 msft@msft-desktop:/opt/AMDAPP/samp les/opencl/cl/0.19/0.19$ sh -x ./run.sh + export LD_LIBRARY_PATH=:/opt/clAmdFft-1.10.321/lib64/ $ time ./a.out 216091 Platform :Advanced Micro Devices, Inc. Device 0 : Capeverde Build Options are : -D KHR_DP_EXTENSION --- 216001 32768 M( 216091 )P, n = 32768, MacLucasFFTW v8.1 Ballester real 12m43.947s user 6m19.212s sys 3m8.580s [/QUOTE] |
1 Attachment(s)
All Loop on GPU.:smile:
7750: [QUOTE] $ pwd /opt/AMDAPP/samples/opencl/cl/0.27/0.27 $ sh -x ./run.sh $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/clAmdFft-1.10.321/lib64/ $ time ./a.out 216091 Platform :Advanced Micro Devices, Inc. Device 0 : Capeverde ---- 216001 32768 M( 216091 )P, n = 32768, MacLucasFFTW v8.1 Ballester real 16m51.839s user 1m26.733s sys 3m35.473s [/QUOTE] Caution: Very Slow and System crash suddenly. |
Is there any way you can give me binaries for linux or even windows for me to tinker around with? :max: I tried compiling, but it just doesn't work for me here...
|
[QUOTE=kracker;343740]Is there any way you can give me binaries for linux or even windows for me to tinker around with? :max: I tried compiling, but it just doesn't work for me here...[/QUOTE]
Can you compile MatrixMulImage on linux ? [QUOTE] desktop:/opt/AMDAPP/samples/opencl/cl/app/MatrixMulImage$ make mkdir -p depends/x86_64 perl ../../../../../make/fastdep.pl -I. -I../../../../../include -I../../../../ ../samples/opencl/SDKUtil/include -I../../../../../samples/bolt/BoltUtil/includ e -I../../../../../samples/C++Amp/AmpUtil/include --obj-suffix='.o' --obj-prefi x='build/debug/x86_64//' MatrixMulImage.cpp > depends/x86_64/MatrixMulImage.depe nd mkdir -p build/debug/x86_64/ Building build/debug/x86_64//MatrixMulImage.o g++ -Wpointer-arith -Wfloat-equal -g3 -ffor-scope -I ../../../../../sample s/opencl/SDKUtil/include -I ../../../../../samples/bolt/BoltUtil/include -I ../../../../../samples/C++Amp/AmpUtil/include -I "/opt/AMDAPP/include" -I ../. ./../../../include -o build/debug/x86_64//MatrixMulImage.o -c MatrixMulImage.c pp Building build/debug/x86_64/MatrixMulImage g++ -o build/debug/x86_64/MatrixMulImage build/debug/x86_64//MatrixMulImage.o -l pthread -ldl -L/usr/X11R6/lib -lSDKUtil -lOpenCL -L../../../../../lib/x86_6 4 -L../../../../../TempSDKUtil/lib/x86_64 -L"/opt/AMDAPP/lib/x86_64" install -D build/debug/x86_64/MatrixMulImage ../../../../../samples/opencl/bin/x 86_64/MatrixMulImage for f in MatrixMulImage_Kernels.cl; do \ install -D $f ../../../../../samples/opencl/bin/x86_64/$f; \ done [/QUOTE] |
1 Attachment(s)
0.27 a.out on Ubuntu 12.04 LTS
[QUOTE] export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/clAmdFft-1.10.321/lib64/ time ./a.out 607 [/QUOTE] |
I tried booting linux after not using it for around a month I think... for some reason wouldn't boot, so I tried reinstalling both Ubuntu and Fedora.. no luck, it boots to a black screen, the fallback graphic mode and nomodeset didn't work either... damn linux :P
|
| All times are UTC. The time now is 13:03. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.