mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

kriesel 2019-12-20 21:12

1 Attachment(s)
[QUOTE=preda;533274]I have no idea, sorry. Could the 2x speed-up be real? Or maybe some error situation was reached in which everything is fast? I don't know.[/QUOTE]I believe it's the longer 153 ms/iter timings in stage 2 (nearly all of stage 2's duration) that are anomalous. Both stage 1 and stage 2 on 530M are significant deviations from the usual run time scaling, but stage 2 is much more dramatic. See the attached pdf for run times etc. and plots.

kriesel 2019-12-21 20:24

Gpuowl v6.11-99-gdd8527b for Windows
 
2 Attachment(s)
This was the current commit as of yesterday on Preda's github. Haven't run it myself yet, beyond generating the help output. See the attachments. The recent shower of build warnings persists.

kriesel 2019-12-22 13:32

Gpuowl 6.11-99 tuning on GTX 1060 3GB
 
PRP runs. No P-1 attempts made yet.[CODE]Gpuowl v6.11-99-gdd8527b
GTX1060 3GB
Windows 7 X64
exponent 90507919
fft length 5120K, PRP3

us/it -use
10265 NO_ASM
10173 NO_ASM
10247 NO_ASM,MERGED_MIDDLE,WORKINGIN
10291 NO_ASM,MERGED_MIDDLE,WORKINGIN
10323 NO_ASM,MERGED_MIDDLE,WORKINGIN1
10311 NO_ASM,MERGED_MIDDLE,WORKINGIN1A
10254 NO_ASM,MERGED_MIDDLE,WORKINGIN2
10104 NO_ASM,MERGED_MIDDLE,WORKINGIN3
[B]10063[/B] NO_ASM,MERGED_MIDDLE,WORKINGIN4
10102 NO_ASM,MERGED_MIDDLE,WORKINGIN5

10240 NO_ASM,MERGED_MIDDLE,WORKINGOUT
10244 NO_ASM,MERGED_MIDDLE,WORKINGOUT0
10244 NO_ASM,MERGED_MIDDLE,WORKINGOUT1
10219 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A
10244 NO_ASM,MERGED_MIDDLE,WORKINGOUT2
10102 NO_ASM,MERGED_MIDDLE,WORKINGOUT3
[B]9973[/B] NO_ASM,MERGED_MIDDLE,WORKINGOUT4
10077 NO_ASM,MERGED_MIDDLE,WORKINGOUT5

9938 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4
9829 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH
9838 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_MIDDLE
9836 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT
9942 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_REVERSELINE
9706 NO_ASM,MERGED_MIDDLE,WORKINGIN4,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE
[B]9622[/B] NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE
9835 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE
9731 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE

base timing 10173 us/iter
repeatability +-22/10269 = +-0.21%
best 9622 us/iter
ratio 10173/9622 = 1.0573[/CODE]

kriesel 2019-12-22 16:29

GTX 1050 Ti gpuowl 6.11-99 tune
 
All the shuffles are beneficial.[CODE]Gpuowl V6.11-99-gdd8527b on Windows 7 X64
GTX 1050Ti timings
5M PRP, exponent 90507919
iters 20000

us/iter -use
15154 NO_ASM
15167 NO_ASM

15389 NO_ASM,MERGED_MIDDLE,WORKINGIN
15389 NO_ASM,MERGED_MIDDLE,WORKINGIN
15392 NO_ASM,MERGED_MIDDLE,WORKINGIN1
15371 NO_ASM,MERGED_MIDDLE,WORKINGIN1A
15395 NO_ASM,MERGED_MIDDLE,WORKINGIN2
15173 NO_ASM,MERGED_MIDDLE,WORKINGIN3
[B]15084[/B] NO_ASM,MERGED_MIDDLE,WORKINGIN4
15166 NO_ASM,MERGED_MIDDLE,WORKINGIN5

15387 NO_ASM,MERGED_MIDDLE,WORKINGOUT
15387 NO_ASM,MERGED_MIDDLE,WORKINGOUT0
15386 NO_ASM,MERGED_MIDDLE,WORKINGOUT1
15357 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A
15389 NO_ASM,MERGED_MIDDLE,WORKINGOUT2
15169 NO_ASM,MERGED_MIDDLE,WORKINGOUT3
[B]15040[/B] NO_ASM,MERGED_MIDDLE,WORKINGOUT4
15166 NO_ASM,MERGED_MIDDLE,WORKINGOUT5

14958 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4
14790 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH
14840 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_MIDDLE
14808 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT
14952 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_REVERSELINE

14802 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE
14684 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE
[B]14520[/B] NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE

repeatability 0%
base 15167
best 14520
ratio 15167/14520 = 1.0446[/CODE]

kriesel 2019-12-22 17:42

[QUOTE=kriesel;533378]All the shuffles are beneficial.[/QUOTE]Re the results on GTX1050Ti, T2 shuffle etc effects, on workingin4 and out4:
width -168, middle -118, height -150, reverse -6, sum: -442: 14958->predicted 14516,
vs. measured all 14520, close, at 4 off,
and resulting from 6 measures, so up to +-~6 digitization noise,
so within the error bars

kriesel 2019-12-22 21:48

little gain for RX480 tune on gpuowl v6.11-99-gdd8527b
 
Almost no gain on RX480[CODE]Gpuowl version and commit V6.11-99-gdd8527b
GPU model RX480
GPU clock free running ~
Host OS Windows 7 Pro x64
Notes

Exponent timed 90507919
Computation type (PRP, P-1 stage 1, P-1 stage 2): PRP
FFT length FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.13 bits/word (copy/paste from console or log)
config file entries -time -iters 10000 -device 0 -user kriesel -cpu condorella/rx480

varying tuning -use options, in chronological order
3567 NO_ASM us/sq warmup, end user interaction, stabilize
3575 NO_ASM baseline

In benchmarking (highlight fastest time in bold)
6209 NO_ASM,MERGED_MIDDLE,WORKINGIN
6203 NO_ASM,MERGED_MIDDLE,WORKINGIN (repeatability)
3601 NO_ASM,MERGED_MIDDLE,WORKINGIN1
3591 NO_ASM,MERGED_MIDDLE,WORKINGIN1A
3677 NO_ASM,MERGED_MIDDLE,WORKINGIN2
3715 NO_ASM,MERGED_MIDDLE,WORKINGIN3
4134 NO_ASM,MERGED_MIDDLE,WORKINGIN4
[B]3579[/B] NO_ASM,MERGED_MIDDLE,WORKINGIN5

Out benchmarking (highlight fastest time in bold)
6086 NO_ASM,MERGED_MIDDLE,WORKINGOUT
4797 NO_ASM,MERGED_MIDDLE,WORKINGOUT0
3598 NO_ASM,MERGED_MIDDLE,WORKINGOUT1
3589 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A
3941 NO_ASM,MERGED_MIDDLE,WORKINGOUT2
[B]3573[/B] NO_ASM,MERGED_MIDDLE,WORKINGOUT3
3646 NO_ASM,MERGED_MIDDLE,WORKINGOUT4
3690 NO_ASM,MERGED_MIDDLE,WORKINGOUT5

baseline WORKINGIN4, WORKINGOUT4 combination:
4227 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4

Shuffle/reverse options:
4239 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH
4209 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_MIDDLE
4213 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT
4225 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_REVERSELINE
4209 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE
4195 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_REVERSELINE
4199 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE

3572 NO_ASM
3579 NO_ASM
[B]3566 [/B]NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT3,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_REVERSELINE

repeatability +-4/3571 = +-0.11%
best 3571 (base)
base 3566
ratio 1.0014[/CODE]I wonder, does the optimal -use option set change much versus fft length for a given gpu? Is there any good reason to believe it does, or that it doesn't?

paulunderwood 2019-12-24 09:00

My card threw another Gerbicz error, this time after a couple of error-free tests. I relaxed the undervolt and bumped up the setsclk to 5 and the fan is at 160:

[CODE]sh pp.sh
1 1160 840 1050 5
[/CODE]

[CODE]amdgpu-pci-0300
Adapter: PCI adapter
vddgfx: +0.97 V
fan1: 2955 RPM (min = 0 RPM, max = 3850 RPM)
edge: +72.0°C (crit = +100.0°C, hyst = -273.1°C)
(emerg = +105.0°C)
junction: +96.0°C (crit = +110.0°C, hyst = -273.1°C)
(emerg = +115.0°C)
mem: +77.0°C (crit = +94.0°C, hyst = -273.1°C)
(emerg = +99.0°C)
power1: 241.00 W (cap = 250.00 W)
[/CODE]

Now 755 us/it

preda 2019-12-25 22:31

ROCm GPU unique_id
 
ROCm exposes a per-GPU unique_id, e.g.:

[CODE]
cat /sys/class/drm/card0/device/unique_id
3044212172dc768c
[/CODE]

This id is a property of the GPU itself, and does not depend on the system or PCIe slot. So changing a GPU in a different slot, or in a different system, preserves the UID.

I added a way to specify the GPU to run on by using this unique id:
./gpuowl -uid 3044212172dc768c

this can be used instead of -device (-d) which specifies the device by position in the list of devices. The advantage is that the identity of the GPU is preserved when swapping the PCIe slots.

Combining -uid with -cpu allows to associate a stable symbolic name to an actual GPU.

I also added a few small python scripts (ROCm) under the tools/ directory in the source code:
- monitor.py : prints general information about all the ROCm GPUs found
- device.py : given a UID, prints the device serial id

The last script, device.py, can be used in user power-play scripts that set parameters of GPUs (e.g. memory frequency, undervolting, fan etc), to identify GPUs by UID instead of serial-id to achieve a correct GPU identification.

kriesel 2019-12-26 16:40

[QUOTE=preda;533571]ROCm exposes a per-GPU unique_id,[/QUOTE]I wonder how that's done, since gpus do not have readable serial number registers. Serial number is a human-readable sticker on the device, at best. Nearest we could find for CUDALucas use in Windows was the 64-bit-only Windows uuid, which changes upon removal/reinstall of the same piece of hardware. There's also [URL]https://www.mersenneforum.org/showpost.php?p=460426&postcount=2603[/URL]
[CODE]CUDALucas v2.06beta 64-bit build, compiled May 5 2017 @ 13:00:15

binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 8.0

------- DEVICE 0 -------
name GeForce GTX 1060 3GB
[B]UUID GPU-5e2c5531-4684-57ec-6393-8b762f286c70[/B]
ECC Support? Disabled
Compatibility 6.1
[/CODE]
[url]https://stackoverflow.com/questions/13781738/how-does-cuda-assign-device-ids-to-gpus[/url]

preda 2019-12-26 17:53

I suppose it's coming from the GPU ROM or GPU BIOS. It's not the serial number, but it is stable when moving the GPU between systems which is great. I added stickers on my GPUs with their UID to help identify them :)

[QUOTE=kriesel;533597]I wonder how that's done, since gpus do not have readable serial number registers. Serial number is a human-readable sticker on the device, at best. Nearest we could find for CUDALucas use in Windows was the 64-bit-only Windows uuid, which changes upon removal/reinstall of the same piece of hardware. There's also [URL]https://www.mersenneforum.org/showpost.php?p=460426&postcount=2603[/URL]
[CODE]CUDALucas v2.06beta 64-bit build, compiled May 5 2017 @ 13:00:15

binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 8.0

------- DEVICE 0 -------
name GeForce GTX 1060 3GB
[B]UUID GPU-5e2c5531-4684-57ec-6393-8b762f286c70[/B]
ECC Support? Disabled
Compatibility 6.1
[/CODE]
[url]https://stackoverflow.com/questions/13781738/how-does-cuda-assign-device-ids-to-gpus[/url][/QUOTE]

paulunderwood 2019-12-27 08:05

I noticed a huge difference in running times on Linux between compiling with [c]make[/c] and [c]make gpuowl[/c].


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.