![]() |
[QUOTE=Prime95;535288]I think you'll get an error message. Try it.
You could also do "-use FANCY_MIDDLEMUL1,ORIGINAL_TWEAKED" to get fancy middlemul1 for middle=10,11 and original tweaked middle mul1 otherwise.[/QUOTE] Yup. Instant trouble, a real showstopper.[CODE]2020-01-23 16:54:19 condorella/rx480 82053239 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 17.39 bits/word 2020-01-23 16:54:19 condorella/rx480 OpenCL args "-DEXP=82053239u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -DWEIGHT_STEP=0xc.373107b1f3e78p-3 -DIWEIGHT_STE P=0xa.7a792f1683b7p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DAMDGPU=1 -DCARRY32=1 -DCHEBYSHEV_MIDDLEMUL2=1 -DMERGED_MIDD LE=1 -DMORE_SQUARES_MIDDLEMUL1=1 -DNEW_SLOWTRIG=1 -DNO_ASM=1 -DT2_SHUFFLE_HEIGHT=1 -DT2_SHUFFLE_WIDTH=1 -DUNROLL_HEIGHT=1 -DUNROLL_WIDTH=1 -DWORKINGIN1=1 -DWORK INGOUT1=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-01-23 16:54:22 condorella/rx480 OpenCL compilation in 2.65 s 2020-01-23 16:54:24 condorella/rx480 82053239 EE 0 loaded: blockSize 400, a8c3b11429b46cbf (expected 0000000000000003) 2020-01-23 16:54:24 condorella/rx480 Exiting because "error on load" 2020-01-23 16:54:24 condorella/rx480 Bye[/CODE]Next step; coding in gpuowl to apply -use options only when they are legal and nonfatal for the applicable fft length? Support for fft-length conditionals in config.txt? This is a PRP-DC 4608K fft length, to which 5M fft length optimal -use options were applied from config.txt, with fatal result. Not looking forward to tuning a long list of -use options on an fftlength by fftlength basis for numerous gpu models and swapping them out manually when exponents change, or having such crashes discard 18 hours of gpu time instead of make progress. At 3-5% speedup on many models, it takes a long time to pay that back. |
Gpuowl -use options tune on RX480 for 4.5M fft length
Likes a somewhat different combination than for 5M[QUOTE]gpuowl v6.11-134-g1e0ce1d
RX480 8GB Win7 x64 exponent 82053239 PRP 4.5M fft -iters 10000 -time all timings below are in microsec/iteration NO_ASM 3021 NO_ASM 3022 NO_ASM,UNROLL_ALL 3010 * NO_ASM,UNROLL_NONE 3039 NO_ASM,UNROLL_WIDTH 3035 NO_ASM,UNROLL_HEIGHT 3038 NO_ASM,UNROLL_MIDDLEMUL1 3036 NO_ASM,UNROLL_MIDDLEMUL2 3027 NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL1 3028 NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL2 3019, 3028 NO_ASM,NO_ASM,UNROLL_MIDDLEMUL2,UNROLL_MIDDLEMUL1 2989 * NO_ASM,UNROLL_WIDTH,UNROLL_MIDDLEMUL1,UNROLL_MIDDLEMUL2 2996 NO_ASM,MERGED_MIDDLE,WORKINGIN 5309 NO_ASM,MERGED_MIDDLE,WORKINGIN 5306 NO_ASM,MERGED_MIDDLE,WORKINGIN1 3032 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 3052 NO_ASM,MERGED_MIDDLE,WORKINGIN2 3111 NO_ASM,MERGED_MIDDLE,WORKINGIN3 3133 NO_ASM,MERGED_MIDDLE,WORKINGIN4 3454 NO_ASM,MERGED_MIDDLE,WORKINGIN5 2995 * NO_ASM,MERGED_MIDDLE,WORKINGOUT 5224 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 4036 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 2984 * NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 3012/2982 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 3353 NO_ASM,MERGED_MIDDLE,WORKINGOUT3 2986 NO_ASM,MERGED_MIDDLE,WORKINGOUT4 3137 NO_ASM,MERGED_MIDDLE,WORKINGOUT5 2995 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout% 2973 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_WIDTH 2957 * NO_ASM,MERGED_MIDDLE,%wgkin%,%wkgout%,T2_SHUFFLE_MIDDLE 3026 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_HEIGHT 2966 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE_REVERSELINE 2972 NO_ASM,MERGED_MIDDLE,%wkgin%,%wkgout%,T2_SHUFFLE 2992 set allotheroptions=NO_ASM,WORKINGIN5,WORKINGOUT1,UNROLL_MIDDLEMUL2,UNROLL_MIDDLEMUL1 %allotheroptions%,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT 2938 * %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH 2989 %allotheroptions%,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_WIDTH,T2_SHUFFLE_REVERSELINE 2987 set allotheroptions=NO_ASM,MERGED_MIDDLE,UNROLL_HEIGHT,UNROLL_WIDTH,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT %allotheroptions%,CARRY32 2940 * %allotheroptions%,CARRY64 3054 set allotheroptions=NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT,UNROLL_MIDDLEMUL2,UNROLL_MIDDLEMUL1,CARRY32 %allotheroptions%,FANCY_MIDDLEMUL1 "error: Clang front-end compilation failed!" %allotheroptions%,MORE_SQUARES_MIDDLEMUL1 2985 %allotheroptions%,CHEBYSHEV_METHOD 2919 %allotheroptions%,CHEBYSHEV_METHOD_FMA 2911 * %allotheroptions%,ORIGINAL_METHOD 2942 %allotheroptions%,ORIGINAL_TWEAKED 2937 set allotheroptions=NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT,UNROLL_MIDDLEMUL2,UNROLL_MIDDLEMUL1,CARRY32,CHEBYSHEV_METHOD_FMA %allotheroptions%,ORIG_MIDDLEMUL2 2926 %allotheroptions%,CHEBYSHEV_MIDDLEMUL2 2916 * %allotheroptions%,ORIG_SLOWTRIG 3058 %allotheroptions%,NEW_SLOWTRIG 2910 %allotheroptions%,MORE_ACCURATE 2921 %allotheroptions%,LESS_ACCURATE 2909 * NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT,UNROLL_MIDDLEMUL2,UNROLL_MIDDLEMUL1,CARRY32,CHEBYSHEV_METHOD_FMA,CHEBYSHEV_MIDDLEMUL2,LESS_ACCURATE base 3021.5 repeatability +-1.5/5307.5 =~ +-0.028% best 2909 ratio 3021.5/2909 = 1.039[/QUOTE] |
GTX1660 -use options
Hi!
Can someone help me? I've a Nvidia GTX1660 running gpuowl at around 8250 us/it (FFT 5632K). With some overclock I can get less then 8000 us/it, but I'm not sure how to test gpus better for errors or tuning it with -use options. Can someone help me out? More questions: 1. I'm considering to buy 2x Radeon VII or should I wait for Big Navi? 2. Anyone with AMD 5700 XT benchmarks to compare with Radeon VII? 3. CudaLUCAS seems to run slower then gpuowl. Are there any other options? Thanks! |
[QUOTE=JCoveiro;536407]
1. I'm considering to buy 2x Radeon VII or should I wait for Big Navi? [/QUOTE] My expectation is that Radeon VII will still be better than "big navi" because it has such a good DP (FP64) throughput. Also the memory is both large and fast. In addition to that, the prices for Radeon VII moved down a bit. |
[QUOTE=JCoveiro;536407]Hi!
Can someone help me? I've a Nvidia GTX1660 running gpuowl at around 8250 us/it (FFT 5632K). With some overclock I can get less then 8000 us/it, but I'm not sure how to test gpus better for errors or tuning it with -use options. Can someone help me out? More questions: 1. I'm considering to buy 2x Radeon VII or should I wait for Big Navi? 2. Anyone with AMD 5700 XT benchmarks to compare with Radeon VII? 3. CudaLUCAS seems to run slower then gpuowl. Are there any other options? Thanks![/QUOTE] You have an Nvidia Turing GPU which is amazing for trial factoring, and the 1660 is particularly efficient in that workload due to its 1080ti like performance but significantly lower power draw. In that case, overclocking the core will help trial factoring but memory won't change anything but waste more power. Go ahead and try that out if you want to factor some numbers. A1: Definitely buy 2 radeon VII over big navi, I seriously doubt amd will put FP64 performance on big navi since the norm right now for gaming GPU is to cut down FP64 as much as possible to save die space for Ray Tracing or Shaders. A2: I think the OpenCL is still broken on Navi GPUs and run much more stably on GCN GPUs. Even if it's not broken I am assuming that the 5700xt should perform slightly better than a stock Vega 56 in PRP, so around 3000us/it for 5632K FFT. But Radeon VII should get it close to 1000us/it (I personally don't own one but if i remembered correctly from other owner's benchmarks). A3: gpuowl is already the fastest option for primality tests. Maybe future optimizations will make it even faster but for now it's going to be way faster than CUDALucas on memory bound GPUs such as Titan V or Radeon VII (in which the latter doesn't run on CUDALucas but gpuowl is 2x faster on Titan V). Though it doesn't matter if you own a modern Nvidia (supporting OpenCL 2.0 and above) or AMD GPU and you should always run gpuowl over CUDALucas or CLLucas due to its superior error checking algorithm that could potentially eliminate the need for double checking. |
[QUOTE=preda;536412]My expectation is that Radeon VII will still be better than "big navi" because it has such a good DP (FP64) throughput. Also the memory is both large and fast. In addition to that, the prices for Radeon VII moved down a bit.[/QUOTE]
The prices of the Radeon VII are still high in here. It's around 800€ each. But I think they're a good investment anyway (for this kind of project). I hope they move down a bit more, since AMD discontinued them. |
[QUOTE=xx005fs;536419]You have an Nvidia Turing GPU which is amazing for trial factoring, and the 1660 is particularly efficient in that workload due to its 1080ti like performance but significantly lower power draw. In that case, overclocking the core will help trial factoring but memory won't change anything but waste more power. Go ahead and try that out if you want to factor some numbers.
A1: Definitely buy 2 radeon VII over big navi, I seriously doubt amd will put FP64 performance on big navi since the norm right now for gaming GPU is to cut down FP64 as much as possible to save die space for Ray Tracing or Shaders. A2: I think the OpenCL is still broken on Navi GPUs and run much more stably on GCN GPUs. Even if it's not broken I am assuming that the 5700xt should perform slightly better than a stock Vega 56 in PRP, so around 3000us/it for 5632K FFT. But Radeon VII should get it close to 1000us/it (I personally don't own one but if i remembered correctly from other owner's benchmarks). A3: gpuowl is already the fastest option for primality tests. Maybe future optimizations will make it even faster but for now it's going to be way faster than CUDALucas on memory bound GPUs such as Titan V or Radeon VII (in which the latter doesn't run on CUDALucas but gpuowl is 2x faster on Titan V). Though it doesn't matter if you own a modern Nvidia (supporting OpenCL 2.0 and above) or AMD GPU and you should always run gpuowl over CUDALucas or CLLucas due to its superior error checking algorithm that could potentially eliminate the need for double checking.[/QUOTE] Thanks for the answers! Well... AMD 5700 XT is alot cheaper than the Radeon VII. They're almost half-price of the Radeon VII. Also 3000us/it for the 5700 XT is still good, but 1000us/it for the Radeon VII is awesome! |
1 Attachment(s)
[QUOTE=JCoveiro;536407]Hi!
Can someone help me? I've a Nvidia GTX1660 running gpuowl at around 8250 us/it (FFT 5632K). With some overclock I can get less then 8000 us/it, but I'm not sure how to test gpus better for errors or tuning it with -use options. Can someone help me out? More questions: 1. I'm considering to buy 2x Radeon VII or should I wait for Big Navi? 2. Anyone with AMD 5700 XT benchmarks to compare with Radeon VII? 3. CudaLUCAS seems to run slower then gpuowl. Are there any other options? Thanks![/QUOTE]A GTX1660 is so much better at TF, relatively speaking, that it's probably a waste to run it on gpuowl instead, even though gpuowl is excellent. But your kit your choice. CUDALucas has not had any significant development in years, so naturally has fallen behind. Preda, Prime95, and others have done a great job on gpuowl speed and other improvements. "More questions" has been covered pretty well already by others. For gpuowl -use option timing and tuning, I use the Windows batch file attached. Pass zero and one run together; other passes individually. Edit the gotos and sets from one pass to the next, to change the control flow and -use options in effect, respectively. That is what I did to produce my previous posts of tuning results. See the comments at both ends of the file, for more info. (Had to zip it, the forum won't accept a .bat file.) Please post your tuning results. |
Bug!!!
[QUOTE=kriesel;536423]A GTX1660 is so much better at TF, relatively speaking, that it's probably a waste to run it on gpuowl instead, even though gpuowl is excellent. But your kit your choice. CUDALucas has not had any significant development in years, so naturally has fallen behind. Preda, Prime95, and others have done a great job on gpuowl speed and other improvements.
"More questions" has been covered pretty well already by others. For gpuowl -use option timing and tuning, I use the Windows batch file attached. Pass zero and one run together; other passes individually. Edit the gotos and sets from one pass to the next, to change the control flow and -use options in effect, respectively. That is what I did to produce my previous posts of tuning results. See the comments at both ends of the file, for more info. (Had to zip it, the forum won't accept a .bat file.) Please post your tuning results.[/QUOTE] Thanks! But first, just want to say that there is a bug on the program. [B]I'm using gpuowl v6.11-134-g1e0ce1d.[/B] ##################################### Running the batch outputs the following errors: [B]Error#1[/B] Running the Windows batch file at: 2020-02-01 23:55:14 config: -time -iters 10000 -use NO_ASM,UNROLL_NONE outputs some errors and after the following: 2020-02-01 23:55:14 GeForce GTX 1660-0 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:247 build [B]Error#2[/B] Running the Windows batch file at: 2020-02-01 23:55:14 config: -time -iters 10000 -use NO_ASM,UNROLL_WIDTH outputs some errors and after the following: 2020-02-01 23:55:15 GeForce GTX 1660-0 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:247 build [B]Error#3[/B] Running the Windows batch file at: 2020-02-01 23:55:15 config: -time -iters 10000 -use NO_ASM,UNROLL_HEIGHT outputs some errors and after the following: 2020-02-01 23:55:15 GeForce GTX 1660-0 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:247 build [B] Error#4[/B] Running the Windows batch file at: 2020-02-01 23:55:15 config: -time -iters 10000 -use NO_ASM,UNROLL_MIDDLEMUL1 outputs some errors and after the following: 2020-02-01 23:55:16 GeForce GTX 1660-0 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:247 build [B]Error#5[/B] Running the Windows batch file at: 2020-02-01 23:55:16 config: -time -iters 10000 -use NO_ASM,UNROLL_MIDDLEMUL2 outputs some errors and after the following: 2020-02-01 23:55:16 GeForce GTX 1660-0 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:247 build ##################################### [B]Here are some more details on Error#1:[/B] [CODE]2020-02-01 23:55:14 config: -time -iters 10000 -use NO_ASM,UNROLL_NONE 2020-02-01 23:55:14 device 0, unique id '' 2020-02-01 23:55:14 GeForce GTX 1660-0 99753809 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.30 bits/word 2020-02-01 23:55:14 GeForce GTX 1660-0 OpenCL args "-DEXP=99753809u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xd.064531a6f6b48p-3 -DIWEIGHT_STEP=0x9.d3e00e7c301p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DNO_ASM=1 -DUNROLL_NONE=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-02-01 23:55:14 GeForce GTX 1660-0 OpenCL compilation error -11 (args -DEXP=99753809u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0xd.064531a6f6b48p-3 -DIWEIGHT_STEP=0x9.d3e00e7c301p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DNO_ASM=1 -DUNROLL_NONE=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DNO_ASM=1) 2020-02-01 23:55:14 GeForce GTX 1660-0 <kernel>:1386:3: error: expected identifier or '(' for (i32 s = 4; s >= 0; s -= 2) { ^ <kernel>:1394:3: error: expected identifier or '(' for (i32 s = 4; s >= 0; s -= 2) { ^ <kernel>:1404:3: error: expected identifier or '(' for (i32 s = 3; s >= 0; s -= 3) { ^ <kernel>:1412:3: error: expected identifier or '(' for (i32 s = 3; s >= 0; s -= 3) { ^ <kernel>:1422:3: error: expected identifier or '(' for (i32 s = 6; s >= 0; s -= 2) { ^ <kernel>:1430:3: error: expected identifier or '(' for (i32 s = 6; s >= 0; s -= 2) { ^ <kernel>:1440:3: error: expected identifier or '(' for (i32 s = 6; s >= 0; s -= 3) { ^ <kernel>:1448:3: error: expected identifier or '(' for (i32 s = 6; s >= 0; s -= 3) { ^ <kernel>:1458:3: error: expected identifier or '(' for (i32 s = 5; s >= 2; s -= 3) { ^ <kernel>:1502:3: error: expected identifier or '(' for (i32 s = 5; s >= 2; s -= 3) { ^ <kernel>:2478:3: error: expected identifier or '(' for (i32 i = 0; i < MIDDLE; ++i) { ^ 2020-02-01 23:55:14 GeForce GTX 1660-0 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:247 build 2020-02-01 23:55:14 GeForce GTX 1660-0 Bye [/CODE] |
[QUOTE=kriesel;536423]Please post your tuning results.[/QUOTE]
I ran this file on my Titan V to try out the most recent update, but I got consistently slower result (657us/it vs 632us/it) compared to version 6.11-113-g6ecd9a2 that I am running. Seems like that the default Nvidia optimization settings don't play well with the Titan V. |
Bug#2
I have found another bug, while trying to test M47 (a lower exponent).
[CODE]2020-02-02 01:36:38 gpuowl v6.11-134-g1e0ce1d 2020-02-02 01:36:38 Note: not found 'config.txt' 2020-02-02 01:36:38 config: -use UNROLL_ALL,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE,CARRY64,FANCYMIDDLEMUL1,LESS_ACCURATE 2020-02-02 01:36:38 device 0, unique id '' 2020-02-02 01:36:38 GeForce GTX 1660-0 43112609 FFT 2304K: Width 8x8, Height 256x8, Middle 9; 18.27 bits/word 2020-02-02 01:36:39 GeForce GTX 1660-0 OpenCL args "-DEXP=43112609u -DWIDTH=64u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -DWEIGHT_STEP=0xd.3ca600d8f455p-3 -DIWEIGHT_STEP=0x9.ab80a96f8aeap-4 -DWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-3 -DIWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p-4 -DCARRY64=1 -DFANCYMIDDLEMUL1=1 -DLESS_ACCURATE=1 -DT2_SHUFFLE=1 -DUNROLL_ALL=1 -DWORKINGIN4=1 -DWORKINGOUT4=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-02-02 01:36:39 GeForce GTX 1660-0 OpenCL compilation error -11 (args -DEXP=43112609u -DWIDTH=64u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -DWEIGHT_STEP=0xd.3ca600d8f455p-3 -DIWEIGHT_STEP=0x9.ab80a96f8aeap-4 -DWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-3 -DIWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p-4 -DCARRY64=1 -DFANCYMIDDLEMUL1=1 -DLESS_ACCURATE=1 -DT2_SHUFFLE=1 -DUNROLL_ALL=1 -DWORKINGIN4=1 -DWORKINGOUT4=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DNO_ASM=1) 2020-02-02 01:36:39 GeForce GTX 1660-0 <kernel>:2009:2: error: WORKINGOUT4 not compatible with this FFT size #error WORKINGOUT4 not compatible with this FFT size ^ 2020-02-02 01:36:39 GeForce GTX 1660-0 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:247 build 2020-02-02 01:36:39 GeForce GTX 1660-0 Bye[/CODE] |
| All times are UTC. The time now is 23:12. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.