![]() |
|
|
#1552 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
I think you left "quick test" territory a while ago. Wow that's thorough. I'm running single 10,000-iter timings generally. Re corporate, condolences. Scheduled virus scans and backups may be dodged, but not all aspects. Last fiddled with by kriesel on 2019-12-10 at 14:06 |
|
|
|
|
|
|
#1553 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
gpuowl-v6.11-79-g0c139c4
Win7 Pro x64, AMD RX550 4GB (fixed 1203Mhz gpu clock by design) 89796247 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.13 bits/word config -device 1 -user kriesel -cpu condorella/rx550 15919 NO_ASM us/sq warmup & user interaction 15915 NO_ASM baseline 20500 NO_ASM,MERGED_MIDDLE,WORKINGIN 20498 NO_ASM,MERGED_MIDDLE,WORKINGIN (repeatability) 15585 NO_ASM,MERGED_MIDDLE,WORKINGIN1 15589 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 15751 NO_ASM,MERGED_MIDDLE,WORKINGIN2 15990 NO_ASM,MERGED_MIDDLE,WORKINGIN3 18175 NO_ASM,MERGED_MIDDLE,WORKINGIN4 15568 NO_ASM,MERGED_MIDDLE,WORKINGIN5 16065 NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT4 33707 NO_ASM,MERGED_MIDDLE,WORKINGOUT 19353 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 16301 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 16284 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 15945 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 16002 NO_ASM,MERGED_MIDDLE,WORKINGOUT3 16484 NO_ASM,MERGED_MIDDLE,WORKINGOUT4 17037 NO_ASM,MERGED_MIDDLE,WORKINGOUT5 15869 NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT1 15917 NO_ASM 15373 NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT2 repeatability +-1/20499 = +-0.005% best 15373 base 15915 ratio 1.0353 |
|
|
|
|
|
#1554 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
1000011110002 Posts |
Latest git commit is slightly slower on a P100(754 vs 751 compared to 0c139c4, 836 vs 821 for P1)
By the way... how is P1 currently for gpuowl? |
|
|
|
|
|
#1555 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19·397 Posts |
Quote:
It is interesting that the 2080 and P100 show little difference among the choices. On the Radeon VII, there can be 100+us difference (15+%). |
|
|
|
|
|
|
#1556 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
165678 Posts |
Try -use T2_SHUFFLE. AFAICT that is the most likely culprit for any slowdown from the last commit. The other possibility is a denser packing of a bit array. It does not seem likely that reducing the amount of memory read would increase iteration times.
|
|
|
|
|
|
#1557 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
87816 Posts |
Quote:
![]() We may be needing a place where we can lookup/submit the best gpu settings for various GPU's running gpuowl... |
|
|
|
|
|
|
#1558 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19×397 Posts |
Interesting. There are several other places in the code that could shuffle T values (a double) rather than T2 values (2 doubles - a complex number). It would double the amount of local storage required, which could negatively impact occupancy....
|
|
|
|
|
|
#1559 |
|
"Eric"
Jan 2018
USA
110101002 Posts |
Got the following error with the newest commit, despite having OpenCL 2.0 on my Vega. Works fine with Nvidia driver though.
Code:
2019-12-10 14:39:00 gpuowl v6.11-82-gdb9ce44-dirty
2019-12-10 14:39:00 Note: no config.txt file found
2019-12-10 14:39:00 config: -device 0 -carry short -nospin -use MERGED_MIDDLE,ORIG_X2,WORKINGIN5,WORKINGOUT2,T2_SHUFFLE -block 500
2019-12-10 14:39:00 94204153 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.97 bits/word
2019-12-10 14:39:01 OpenCL args "-DEXP=94204153u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0x8.2de8f968e724p-3 -DIWEIGHT_STEP=0xf.a6316e77270fp-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DAMDGPU=1 -DMERGED_MIDDLE=1 -DORIG_X2=1 -DWORKINGIN5=1 -DWORKINGOUT2=1 -DT2_SHUFFLE=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-12-10 14:39:01 OpenCL compilation error -11 (args -DEXP=94204153u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0x8.2de8f968e724p-3 -DIWEIGHT_STEP=0xf.a6316e77270fp-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DAMDGPU=1 -DMERGED_MIDDLE=1 -DORIG_X2=1 -DWORKINGIN5=1 -DWORKINGOUT2=1 -DT2_SHUFFLE=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0)
2019-12-10 14:39:01 C:\Users\Admin\AppData\Local\Temp\\OCL6712T0.cl:13:9: warning: GpuOwl requires OpenCL 200, found 200
#pragma message "GpuOwl requires OpenCL 200, found " STR(__OPENCL_VERSION__)
^
C:\Users\Admin\AppData\Local\Temp\\OCL6712T0.cl:14:2: error: OpenCL >= 2.0 required
#error OpenCL >= 2.0 required
^
1 warning and 1 error generated.
error: Clang front-end compilation failed!
Frontend phase failed compilation.
Error: Compiling CL to IR
2019-12-10 14:39:01 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:234 build
2019-12-10 14:39:01 Bye
Last fiddled with by xx005fs on 2019-12-10 at 22:45 |
|
|
|
|
|
#1560 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
19·397 Posts |
Quote:
I had just hacked in the new shuffle. Now I'll go back and code it up proper (with -use switches) so we can turn the feature on and off as needed on different GPUs. Thanks for prompting me to try this! |
|
|
|
|
|
|
#1561 | |
|
Aug 2006
3·1,993 Posts |
Quote:
![]() There is such a wealth of knowledge on these boards, I find myself constantly in awe. |
|
|
|
|
|
|
#1562 | |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
The OpenCL version check should be fixed now (recent commit)
Quote:
|
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |