![]() |
|
|
#1519 | |
|
"Eric"
Jan 2018
USA
22×53 Posts |
Quote:
|
|
|
|
|
|
|
#1520 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Tried on a P100 in colab with 4608K FFT/PRP... I'm getting 766 us/it compared to 1064 us/it without the switch.(!!)
Last fiddled with by kracker on 2019-12-09 at 00:34 Reason: can't read |
|
|
|
|
|
#1521 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
10101001111012 Posts |
|
|
|
|
|
|
#1522 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
Have fun! Last fiddled with by kriesel on 2019-12-09 at 00:49 |
|
|
|
|
|
|
#1523 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
Quote:
EDIT: looks like it's semi stabilized... ~180W without, ~190W with. Last fiddled with by kracker on 2019-12-09 at 01:17 |
|
|
|
|
|
|
#1524 | |
|
"Eric"
Jan 2018
USA
D416 Posts |
Quote:
More Updates: The updated source code by Preda works on windows now, and I'm seeing almost exactly 33% speed up on my Titan V much less for regular Vega. Something I found very strange is that I don't know if the graphics that changes from . to o to 0 then to * is intentional or not, but it seems to slow down my Colab console and leave a symbol in front of every line in the log. Is there an option to disable that? Last fiddled with by xx005fs on 2019-12-09 at 02:04 Reason: update |
|
|
|
|
|
|
#1525 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
216810 Posts |
|
|
|
|
|
|
#1526 |
|
Jan 2019
Tallahassee, FL
35 Posts |
I tried this on one of my Radeon VII cards that has not yet gave me any errors from the last 4-5 PRP tests (while the other returned too many lol). This card sits on second slot with no display attached to it.
Code:
2019-12-08 23:47:19 config.txt: -user dcheuk/gpu01 -use ORIG_X2 -device 1 -log 100000 -use MERGED_MIDDLE 2019-12-08 23:47:19 config.txt: 2019-12-08 23:47:19 gfx906-0 94607437 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 18.04 bits/word 2019-12-08 23:47:20 gfx906-0 OpenCL args "-DEXP=94607437u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xf.8262bb7326f28p-3 -DIWEIGHT_STEP=0x8.40cb53a4a1fd8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DORIG_X2=1 -DMERGED_MIDDLE=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-12-08 23:47:21 gfx906-0 OpenCL compilation in 1.31 s 2019-12-08 23:47:22 gfx906-0 94607437 OK 2071500 loaded: blockSize 500, 132c5e1692604fd6 2019-12-08 23:47:23 gfx906-0 94607437 OK 2072500 2.19%; 891 us/it (min 885 885); ETA 0d 22:54; 8d4ac7f8617372d8 (check 0.53s) 2019-12-08 23:47:48 gfx906-0 94607437 OK 2100000 2.22%; 887 us/it (min 884 884); ETA 0d 22:48; f8d6a63b03cfa32a (check 0.53s) 2019-12-08 23:49:17 gfx906-0 94607437 OK 2200000 2.33%; 887 us/it (min 884 884); ETA 0d 22:47; 42044b1ea9fb8b01 (check 0.53s) 2019-12-08 23:50:47 gfx906-0 94607437 OK 2300000 2.43%; 887 us/it (min 884 884); ETA 0d 22:45; fcd02bb8420d5ba7 (check 0.53s) 2019-12-08 23:52:17 gfx906-0 94607437 OK 2400000 2.54%; 887 us/it (min 884 884); ETA 0d 22:43; d784ed68cfa19bd7 (check 0.53s) 2019-12-08 23:53:46 gfx906-0 94607437 OK 2500000 2.64%; 887 us/it (min 884 884); ETA 0d 22:42; 79d614fc892e7a5a (check 0.53s)
Last fiddled with by dcheuk on 2019-12-09 at 05:58 |
|
|
|
|
|
#1527 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
754310 Posts |
Preliminary results from Ken suggested WORKINGOUT4 is better than WORKINGOUT. Of course, that was from a huge sample size of 1 nVidia card.
|
|
|
|
|
|
#1528 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
Quote:
obtained with -time -iters 10000 Code:
ms/it -use options 5124 no_asm 5120 no_asm 4868 no_asm,merged_middle,workingin 4873 no_asm,merged_middle,workingin 4873 no_asm,merged_middle,workingin1 4951 no_asm,merged_middle,workingin1a 4876 no_asm,merged_middle,workingin2 4874 no_asm,merged_middle,workingin3 4865 no_asm,merged_middle,workingin5 4878 no_asm,merged_middle,workingout 4911 no_asm,merged_middle,workingout0 4872 no_asm,merged_middle,workingout1 4950 no_asm,merged_middle,workingout1a 4881 no_asm,merged_middle,workingout2 4875 no_asm,merged_middle,workingout3 4836 no_asm,merged_middle,workingout4 4876 no_asm,merged_middle,workingout5 5122/4836= 1.059 obtained with this batch file derived from a list of cases George requested: Code:
:iter count is required to be multiple of 10000 set iters=10000 :first one was there just to ensure the gpu is warmed up and clock-stable somewhat, ignore its timing, use the second, but maybe the first 800 iters block does that gpuowl-win -time -iters %iters% -use NO_ASM gpuowl-win -time -iters %iters% -use NO_ASM gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN :repeated, let's see reproducibility once; then onward through the list gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN1 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN1A gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN2 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN3 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGIN5 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT0 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT1 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT1A gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT2 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT3 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT4 gpuowl-win -time -iters %iters% -use NO_ASM,MERGED_MIDDLE,WORKINGOUT5 |
|
|
|
|
|
|
#1529 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,437 Posts |
gtx1080 again
usec/iter; -use case 5055 no_asm 5104 no_asm 4848 NO_ASM,MERGED_MIDDLE,WORKINGIN 4863 NO_ASM,MERGED_MIDDLE,WORKINGIN 4851 NO_ASM,MERGED_MIDDLE,WORKINGIN4 4859 NO_ASM,MERGED_MIDDLE,WORKINGIN5 4873 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT5 retest with minimal user interaction: 5058 no_asm 5091 no_asm 4837 NO_ASM,MERGED_MIDDLE,WORKINGIN 4836 NO_ASM,MERGED_MIDDLE,WORKINGIN 4836 NO_ASM,MERGED_MIDDLE,WORKINGIN4 4833 NO_ASM,MERGED_MIDDLE,WORKINGIN5 4835 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT5 5091/4833 =~ 1.053 Last fiddled with by kriesel on 2019-12-09 at 10:17 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |