mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

kriesel 2020-03-17 22:36

gpuowl-win v6.11-198-g build and initial speed checks
 
2 Attachment(s)
The usual warning shower, see build log, but it runs.
Win7 64 Pro, dual E5645, prime95 maxed, 12GB ram
RX550: V6.11-134[CODE]2020-03-17 13:25:39 condorella/rx550 93873049 OK 60600000 64.56%; 14442 us/it; ETA 5d 13:29; 7c9ef8f79b678f5e (check 5.82s)
2020-03-17 14:13:57 condorella/rx550 93873049 OK 60800000 64.77%; 14458 us/it; ETA 5d 12:50; cf3a3470216c1801 (check 5.86s)
2020-03-17 15:02:12 condorella/rx550 93873049 OK 61000000 64.98%; 14448 us/it; ETA 5d 11:56; 975f4c7dd6bd8513 (check 5.81s)
2020-03-17 15:50:28 condorella/rx550 93873049 OK 61200000 65.19%; 14452 us/it; ETA 5d 11:10; f2e39e36a1a45ad8 (check 5.83s)
2020-03-17 15:53:43 condorella/rx550 Stopping, please wait..
2020-03-17 15:53:55 condorella/rx550 93873049 OK 61214000 65.21%; 14379 us/it; ETA 5d 10:27; cb89cd1c515d4a12 (check 5.82s)
2020-03-17 15:53:55 condorella/rx550 Exiting because "stop requested"
2020-03-17 15:53:55 condorella/rx550 Bye[/CODE][CODE]C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-198-g628f3cd\rx550>gpuowl-win
2020-03-17 15:54:11 gpuowl v6.11-198-g628f3cd
2020-03-17 15:54:11 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM,UNROLL_HEIGHT,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT2,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,CARRY32,ORIGINAL_METHOD,LESS_ACCURATE
2020-03-17 15:54:11 device 1, unique id ''
2020-03-17 15:54:11 condorella/rx550 93873049 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.90 bits/word
2020-03-17 15:54:13 condorella/rx550 Warning: -use LESS_ACCURATE has no effect
2020-03-17 15:54:13 condorella/rx550 Warning: -use MERGED_MIDDLE has no effect
2020-03-17 15:54:13 condorella/rx550 Warning: -use ORIGINAL_METHOD has no effect
2020-03-17 15:54:13 condorella/rx550 Warning: -use T2_SHUFFLE_HEIGHT has no effect
2020-03-17 15:54:13 condorella/rx550 Warning: -use T2_SHUFFLE_MIDDLE has no effect
2020-03-17 15:54:13 condorella/rx550 Warning: -use WORKINGOUT2 has no effect
2020-03-17 15:54:13 condorella/rx550 OpenCL args "-DEXP=93873049u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0x8.8b9afd7da35e8p-3 -DIWEIGHT_STEP=0xe.fa9b7f6844848p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DPM1=0 -DAMDGPU=1 -DCARRY32=1 -DLESS_ACCURATE=1 -DMERGED_M
IDDLE=1 -DNO_ASM=1 -DORIGINAL_METHOD=1 -DT2_SHUFFLE_HEIGHT=1 -DT2_SHUFFLE_MIDDLE=1 -DUNROLL_HEIGHT=1 -DWORKINGIN5=1 -DWORKINGOUT2=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-03-17 15:54:17 condorella/rx550 OpenCL compilation in 4.21 s
2020-03-17 15:54:24 condorella/rx550 93873049 OK 61200000 loaded: blockSize 400, f2e39e36a1a45ad8
2020-03-17 15:54:42 condorella/rx550 93873049 OK 61200800 65.20%; 15423 us/it; ETA 5d 19:58; 978b866258bcc6ff (check 6.02s)
2020-03-17 16:44:59 condorella/rx550 93873049 OK 61400000 65.41%; 15117 us/it; ETA 5d 16:22; bba7f7db066343fb (check 6.13s)[/CODE]Same exponent, v6.11-198 is slower than v6.11-134 on RX550: 1-14453/15117 = 4.4% slower.

RX480: V6.11-134[CODE]2020-03-17 14:49:29 condorella/rx480 94073297 OK 36600000 38.91%; 3580 us/it; ETA 2d 09:10; 8fe30d16593f9dd3 (check 1.51s)
2020-03-17 15:01:47 condorella/rx480 94073297 OK 36800000 39.12%; 3685 us/it; ETA 2d 10:37; d3eed3f44f9e2a97 (check 1.47s)
2020-03-17 15:14:07 condorella/rx480 94073297 OK 37000000 39.33%; 3691 us/it; ETA 2d 10:31; 416999f765247350 (check 1.50s)
2020-03-17 15:26:15 condorella/rx480 94073297 OK 37200000 39.54%; 3631 us/it; ETA 2d 09:22; c8d58ee219203f21 (check 1.53s)
2020-03-17 15:38:16 condorella/rx480 94073297 OK 37400000 39.76%; 3600 us/it; ETA 2d 08:41; 94e4dcc24e50b2bb (check 1.50s)
2020-03-17 15:50:12 condorella/rx480 94073297 OK 37600000 39.97%; 3574 us/it; ETA 2d 08:04; 062e7035793975f0 (check 1.56s)
2020-03-17 16:02:07 condorella/rx480 94073297 OK 37800000 40.18%; 3570 us/it; ETA 2d 07:48; 30335ecba13bf8e9 (check 1.47s)
2020-03-17 16:14:16 condorella/rx480 94073297 OK 38000000 40.39%; 3637 us/it; ETA 2d 08:39; 650cb74a2a044221 (check 1.48s)
2020-03-17 16:15:34 condorella/rx480 Stopping, please wait..
2020-03-17 16:15:37 condorella/rx480 94073297 OK 38022400 40.42%; 3536 us/it; ETA 2d 07:03; 5f6a5d73eb842b21 (check 1.47s)
2020-03-17 16:15:37 condorella/rx480 Exiting because "stop requested"
2020-03-17 16:15:37 condorella/rx480 Bye[/CODE]V6.11-198:[CODE]2020-03-17 16:17:37 gpuowl v6.11-198-g628f3cd
2020-03-17 16:17:37 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM,UNROLL_HEIGHT,UNROLL_WIDTH,MERGED_MIDDLE,WORKINGIN1,
WORKINGOUT1,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH,CARRY32,MORE_SQUARES_MIDDLEMUL1,CHEBYSHEV_MIDDLEMUL2,NEW_SLOWTRIG
2020-03-17 16:17:37 config:
2020-03-17 16:17:37 config: :4.5m fft NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT1,T2_SHUFFLE_WIDTH,T2_SHUFFLE_HEIGHT,UNROLL_MIDDLEMUL2,UNROLL_MIDDLEMUL1,CARRY32
,CHEBYSHEV_METHOD_FMA,CHEBYSHEV_MIDDLEMUL2,LESS_ACCURATE
2020-03-17 16:17:37 config: :5m fft NO_ASM,UNROLL_HEIGHT,UNROLL_WIDTH,MERGED_MIDDLE,WORKINGIN1,WORKINGOUT1,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH,CARRY32,MORE_SQUA
RES_MIDDLEMUL1,CHEBYSHEV_MIDDLEMUL2,NEW_SLOWTRIG
2020-03-17 16:17:37 device 0, unique id ''
2020-03-17 16:17:37 condorella/rx480 94073297 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.94 bits/word
2020-03-17 16:17:40 condorella/rx480 Warning: -use CHEBYSHEV_MIDDLEMUL2 has no effect
2020-03-17 16:17:40 condorella/rx480 Warning: -use MERGED_MIDDLE has no effect
2020-03-17 16:17:40 condorella/rx480 Warning: -use MORE_SQUARES_MIDDLEMUL1 has no effect
2020-03-17 16:17:40 condorella/rx480 Warning: -use T2_SHUFFLE_HEIGHT has no effect
2020-03-17 16:17:40 condorella/rx480 Warning: -use T2_SHUFFLE_WIDTH has no effect
2020-03-17 16:17:40 condorella/rx480 Warning: -use WORKINGOUT1 has no effect
2020-03-17 16:17:40 condorella/rx480 OpenCL args "-DEXP=94073297u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0x8.52733b011536p-3 -DIWEIGHT_STE
P=0xf.617b45b852608p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DPM1=0 -DAMDGPU=1 -DCARRY32=1 -DCHEBYSHEV_MIDDLEMUL2=1 -DME
RGED_MIDDLE=1 -DMORE_SQUARES_MIDDLEMUL1=1 -DNEW_SLOWTRIG=1 -DNO_ASM=1 -DT2_SHUFFLE_HEIGHT=1 -DT2_SHUFFLE_WIDTH=1 -DUNROLL_HEIGHT=1 -DUNROLL_WIDTH=1 -DWORKINGIN1
=1 -DWORKINGOUT1=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-03-17 16:17:44 condorella/rx480 OpenCL compilation in 4.10 s
2020-03-17 16:17:46 condorella/rx480 94073297 OK 37826000 loaded: blockSize 400, febfc14d0f24e5fa
2020-03-17 16:17:50 condorella/rx480 94073297 OK 37826800 40.21%; 3461 us/it; ETA 2d 06:04; b5a114da61bfa21e (check 1.55s)
2020-03-17 16:28:31 condorella/rx480 94073297 OK 38000000 40.39%; 3692 us/it; ETA 2d 09:31; 650cb74a2a044221 (check 1.51s)
2020-03-17 16:40:53 condorella/rx480 94073297 OK 38200000 40.61%; 3703 us/it; ETA 2d 09:28; 39ce25c654678cec (check 1.51s)[/CODE]1-3617/3697 = .0216 = 2.16% slower v6.11-198 than v6.11-134 on RX480


Downclocked radeon VII:
v6.11-134[CODE]020-03-17 17:43:24 roa/radeonvii 655685803 OK 626640000 95.57%; 10363 us/it; ETA 3d 11:37; 6fcd5d380e08a691 (check 5.99s) 20 errors
2020-03-17 17:46:57 roa/radeonvii 655685803 OK 626660000 95.57%; 10378 us/it; ETA 3d 11:40; 08e9287564d158a0 (check 5.94s) 20 errors
2020-03-17 17:50:30 roa/radeonvii 655685803 OK 626680000 95.58%; 10363 us/it; ETA 3d 11:30; cc909f1a064cff84 (check 5.88s) 20 errors
2020-03-17 17:51:49 roa/radeonvii Stopping, please wait..
2020-03-17 17:51:59 roa/radeonvii 655685803 OK 626688000 95.58%; 10369 us/it; ETA 3d 11:31; ceaa28d44748f0e7 (check 5.89s) 20 errors
2020-03-17 17:51:59 roa/radeonvii Exiting because "stop requested"
2020-03-17 17:51:59 roa/radeonvii Bye[/CODE]V6.11-198-g628f3cd[CODE]C:\Users\ken\Documents\gpuowl-v6.11-198-g628f3cd>gpuowl-win
2020-03-17 17:54:57 gpuowl v6.11-198-g628f3cd
2020-03-17 17:54:57 config: -device 1 -user kriesel -cpu roa/radeonvii -yield -maxAlloc 16000 -use -device 1 -user kriesel -cpu roa/radeonvii -use NO_ASM,UNROLL_MIDDLEMUL2,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT3,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_HEIGHT,CARRY32,CHEBYSHEV_METHOD,ORIG_MIDDLEMUL2,LESS_ACCURATE
2020-03-17 17:54:57 config: ;NO_ASM,ORIG_SLOWTRIG
2020-03-17 17:54:57 device 1, unique id ''
2020-03-17 17:54:57 roa/radeonvii 655685803 FFT 40960K: Width 256x4, Height 256x8, Middle 10; 15.63 bits/word
2020-03-17 17:55:07 roa/radeonvii Warning: -use CHEBYSHEV_METHOD has no effect
2020-03-17 17:55:07 roa/radeonvii Warning: -use LESS_ACCURATE has no effect
2020-03-17 17:55:07 roa/radeonvii Warning: -use MERGED_MIDDLE has no effect
2020-03-17 17:55:07 roa/radeonvii Warning: -use ORIG_MIDDLEMUL2 has no effect
2020-03-17 17:55:07 roa/radeonvii Warning: -use T2_SHUFFLE_HEIGHT has no effect
2020-03-17 17:55:07 roa/radeonvii Warning: -use T2_SHUFFLE_REVERSELINE has no effect
2020-03-17 17:55:07 roa/radeonvii Warning: -use UNROLL_MIDDLEMUL2 has no effect
2020-03-17 17:55:07 roa/radeonvii OpenCL args "-DEXP=655685803u -DWIDTH=1024u -DSMALL_HEIGHT=2048u -DMIDDLE=10u -DWEIGHT_STEP=0xa.51aa7280d93dp-3 -DIWEIGHT_STEP=0xc.677fd3dfd408p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -DCARRY32=1 -DCHEBYSHEV_METHOD=1 -DLESS_ACCURATE=1 -DMERGED_MIDDLE=1 -DNO_ASM=1 -DORIG_MIDDLEMUL2=1 -DT2_SHUFFLE_HEIGHT=1 -DT2_SHUFFLE_REVERSELINE=1 -DUNROLL_MIDDLEMUL2=1 -DWORKINGIN5=1 -DWORKINGOUT3=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-03-17 17:55:22 roa/radeonvii OpenCL compilation in 15.11 s
2020-03-17 17:55:31 roa/radeonvii 655685803 OK 626680000 loaded: blockSize 400, cc909f1a064cff84
2020-03-17 17:55:45 roa/radeonvii 655685803 OK 626680800 95.58%; 10753 us/it; ETA 3d 14:38; 47b82db06405ee2a (check 6.07s) 20 errors
2020-03-17 17:59:18 roa/radeonvii 655685803 OK 626700000 95.58%; 10769 us/it; ETA 3d 14:42; 810fe9130840b14d (check 6.08s) 20 errors
2020-03-17 18:02:59 roa/radeonvii 655685803 OK 626720000 95.58%; 10757 us/it; ETA 3d 14:33; 3294d8e8770b2911 (check 6.22s) 20 errors
[/CODE]1- 10370/10763 = 3.65% slower v6.11-198 than v6.11-134 on the same Radeon VII and exponent, same clock settings.

PhilF 2020-03-17 22:43

[QUOTE=ewmayer;539987]Your mem-downclock is likely the reason you both run slower and at significantly lower power than I, at the same sclk and 1-worker setting. But why not fire up a second worker?[/QUOTE]

I will, once I can make time for it. :smile:

ewmayer 2020-03-17 22:56

[QUOTE=PhilF;539992]I will, once I can make time for it. :smile:[/QUOTE]

I used the low-tech method:

1. open 2nd term;
2. create 2nd rundir under gpuowl-dir, cd into same;
3. populate worktodo by running primenet.py (which I see is based on the same-name py-script of another GIMPS coder who shall remain nameless, but who restored the same "run just once and quit rather than recurring by adding '-t 0'" option which exists in his original script to the gpuowl one :);
4. ../gpuowl

I use same system sclk and fan settings for 2-worker-running as for 1.

preda 2020-03-18 09:59

[QUOTE=ewmayer;539996]I used the low-tech method:

1. open 2nd term;
2. create 2nd rundir under gpuowl-dir, cd into same;
3. populate worktodo by running primenet.py (which I see is based on the same-name py-script of another GIMPS coder who shall remain nameless, but who restored the same "run just once and quit rather than recurring by adding '-t 0'" option which exists in his original script to the gpuowl one :);
4. ../gpuowl

I use same system sclk and fan settings for 2-worker-running as for 1.[/QUOTE]

Do you know about -pool option? You create a single folder where you put the output of primenet.py , and run multiple instances, each in its own folder (indicated with -dir) all feeding from the common pool (indicated with -pool). You can also put config files in the pool dir, for shared config. Next logical step is to put the -pool option in the config file of each individual instance, and now you can start it with only -dir.

I'm afraid all that is not very clear, so let's see an example config:

~/gpuowl-xfx/config.txt contains:
-cpu XFX -uid 780c28cffffffeee -pool /home/user/pool

(note that the gpu is indicated not by -d <position> but by UID, which is very useful when shuffling GPUs around. A symbolic name "XFX" is associated to it).

~/pool/config.txt contains:
-user name

(because the user is shared by all instances)

primenet.py only knows about the pool dir:
~/gpuowl/tools/primenet.py -u <user> -p <password> --dirs ~/pool --tasks 4 -w PRP &

preda 2020-03-18 10:14

[QUOTE=kriesel;539991]1- 10370/10763 = 3.65% slower v6.11-198 than v6.11-134 on the same Radeon VII and exponent, same clock settings.[/QUOTE]

There are warnings there telling you that you are passing -use options that don't exist (because they have been removed), some have been replaced by more general e.g. there is still a T2_SHUFFLE that you may want to try. But a bigger benefit would be to move to ROCm (either 2.10 or 3.1).

kriesel 2020-03-18 12:27

[QUOTE=preda;540029]There are warnings there telling you that you are passing -use options that don't exist (because they have been removed), some have been replaced by more general e.g. there is still a T2_SHUFFLE that you may want to try. But a bigger benefit would be to move to ROCm (either 2.10 or 3.1).[/QUOTE]Moving to rocm isn't an option on Windows. [url]https://rocm.github.io/[/url]
Documentation of what -use options are available would be helpful.

ewmayer 2020-03-18 19:29

[QUOTE=preda;540027]Do you know about -pool option? You create a single folder where you put the output of primenet.py , and run multiple instances, each in its own folder (indicated with -dir) all feeding from the common pool (indicated with -pool). You can also put config files in the pool dir, for shared config. Next logical step is to put the -pool option in the config file of each individual instance, and now you can start it with only -dir.
[snip][/QUOTE]

Thanks, Mihai, but after a few more lines your recipe was already more complex than my neoluddite recipe. :) Perhaps I'll find it useful when I build my multi-GPU dream system later this year.

To paraphrase the late great SNL comedian Phil Hartman by way of his recurring [i]Unfrozen Caveman Lawyer[/i] sketches: "I'm just a *caveman* - the ways of you modern human are strange and unfathomable to me. (But what I do know is that my client deserves at least a $5 million triple-damages settement for injuries and psychological trauma resulting from the defendant's spilling ketchup on him.)"

paulunderwood 2020-03-18 21:54

[QUOTE=ewmayer;539872]Are those with 1 worker or 2? Also, what OS distro are you guys running? As I noted, I am not allowed, even as su, to fiddle the mem-clock settings in my ROCm 2.10 setup under Ubuntu 19.10.[/QUOTE]

See posts in [url]https://mersenneforum.org/showthread.php?t=22204&page=149[/url]

I am running at sclk 3 and have just bumped up my ASUS's memory to 1200 and voltage at 830. 187W and no errors yet... I will lower the voltage if I can over time. 2 instances @ 1467 us/it each for FFT 5632K

ewmayer 2020-03-18 22:06

[QUOTE=paulunderwood;540097]See posts in [url]https://mersenneforum.org/showthread.php?t=22204&page=149[/url]

I am running at sclk 3 and have just bumped up my ASUS's memory to 1200 and voltage at 830. 187W and no errors yet... I will lower the voltage if I can over time. 2 instances @ 1467 us/it each for FFT 5632K[/QUOTE]

Thanks - I long ago did the featuremask fiddle. My issue is not a missing pp_od_clk_voltage entry, is that Ubuntu is not allowing me to modify it. Not a huge deal, just eating the extra Watts and running with mclk at stock. Could become an issue when I build that multi-GPU dream system later this year, though - have an 850W[sup]*[/sup] PS laid in, intended to drive something along the lines of an 8-core AMD CPU plus 3 Radeon VIIs. That will likely need sclk = 3 undervolting of the latter to get the wattage within what the PS can stably handle, would be nice to be able to tune mclk to help maximize throughput of the setup by way of another "tuning dial".

-------------

[sup]*[/sup]That seemed to be the sweet spot in terms of $/watt at $120, all the >= 1KW PSs I looked at cost well over $200. Plus I don't want to be running a system needing more than a kW, our household circuit breakers start tripping at that level when anything else that is power-hungry (e.g. toaster, hair dryer) running off the same part of the "household grid" gets turned on.

preda 2020-03-19 13:37

[QUOTE=kriesel;540034]Moving to rocm isn't an option on Windows. [url]https://rocm.github.io/[/url]
Documentation of what -use options are available would be helpful.[/QUOTE]

There is some brief documentation at the top of gpuowl.cl, that we try to maintain in sync with the code as it changes:
[QUOTE]
DEBUG : enable asserts. Slow, but allows to verify that all asserts hold.

NO_ASM : request to not use any inline __asm()
NO_OMOD: do not use GCN output modifiers in __asm()

OUT_WG,OUT_SIZEX,OUT_SPACING <AMD default is 256,32,4> <nVidia default is 256,4,1 but needs testing>
IN_WG,IN_SIZEX,IN_SPACING <AMD default is 256,32,1> <nVidia default is 256,4,1 but needs testing>

ORIG_X2 <nVidia default>
INLINE_X2 <AMD default>

UNROLL_ALL <nVidia default>
UNROLL_NONE
UNROLL_WIDTH
UNROLL_HEIGHT <AMD default>

T2_SHUFFLE <nVidia default>
NO_T2_SHUFFLE <AMD default>

OLD_FFT8 <default>
NEWEST_FFT8
NEW_FFT8

OLD_FFT5
NEW_FFT5 <default>
NEWEST_FFT5

NEW_FFT10 <default>
OLD_FFT10

CARRY32 <AMD default> // This is potentially dangerous option for large FFTs. Carry may not fit in 31 bits.
CARRY64 <nVidia default>

ORIG_SLOWTRIG // Use the compliler's implementation of sin/cos functions
NEW_SLOWTRIG <default> // Our own sin/cos implementation

---- P-1 below ----

NO_P2_FUSED_TAIL // Do not use the big kernel tailFusedMulDelta
[/QUOTE]

Prime95 2020-03-19 21:36

[QUOTE=Prime95;539912]Using not yet committed code:

Rocm 2.10, sclk 4, mem 1200, FFT 5M; 662us/it.
Running 2 instances: 604us/it (200W measured by rocm-smi)

I love this GPU.[/QUOTE]

Way to go Mihai!! With his latest commit this GPU has broken through the 600us barrier -- 597us.


All times are UTC. The time now is 23:10.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.