![]() |
1 Attachment(s)
[QUOTE=preda;533274]I have no idea, sorry. Could the 2x speed-up be real? Or maybe some error situation was reached in which everything is fast? I don't know.[/QUOTE]I believe it's the longer 153 ms/iter timings in stage 2 (nearly all of stage 2's duration) that are anomalous. Both stage 1 and stage 2 on 530M are significant deviations from the usual run time scaling, but stage 2 is much more dramatic. See the attached pdf for run times etc. and plots.
|
Gpuowl v6.11-99-gdd8527b for Windows
2 Attachment(s)
This was the current commit as of yesterday on Preda's github. Haven't run it myself yet, beyond generating the help output. See the attachments. The recent shower of build warnings persists.
|
Gpuowl 6.11-99 tuning on GTX 1060 3GB
PRP runs. No P-1 attempts made yet.[CODE]Gpuowl v6.11-99-gdd8527b
GTX1060 3GB Windows 7 X64 exponent 90507919 fft length 5120K, PRP3 us/it -use 10265 NO_ASM 10173 NO_ASM 10247 NO_ASM,MERGED_MIDDLE,WORKINGIN 10291 NO_ASM,MERGED_MIDDLE,WORKINGIN 10323 NO_ASM,MERGED_MIDDLE,WORKINGIN1 10311 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 10254 NO_ASM,MERGED_MIDDLE,WORKINGIN2 10104 NO_ASM,MERGED_MIDDLE,WORKINGIN3 [B]10063[/B] NO_ASM,MERGED_MIDDLE,WORKINGIN4 10102 NO_ASM,MERGED_MIDDLE,WORKINGIN5 10240 NO_ASM,MERGED_MIDDLE,WORKINGOUT 10244 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 10244 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 10219 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 10244 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 10102 NO_ASM,MERGED_MIDDLE,WORKINGOUT3 [B]9973[/B] NO_ASM,MERGED_MIDDLE,WORKINGOUT4 10077 NO_ASM,MERGED_MIDDLE,WORKINGOUT5 9938 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4 9829 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH 9838 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_MIDDLE 9836 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT 9942 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_REVERSELINE 9706 NO_ASM,MERGED_MIDDLE,WORKINGIN4,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE [B]9622[/B] NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE 9835 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE 9731 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE base timing 10173 us/iter repeatability +-22/10269 = +-0.21% best 9622 us/iter ratio 10173/9622 = 1.0573[/CODE] |
GTX 1050 Ti gpuowl 6.11-99 tune
All the shuffles are beneficial.[CODE]Gpuowl V6.11-99-gdd8527b on Windows 7 X64
GTX 1050Ti timings 5M PRP, exponent 90507919 iters 20000 us/iter -use 15154 NO_ASM 15167 NO_ASM 15389 NO_ASM,MERGED_MIDDLE,WORKINGIN 15389 NO_ASM,MERGED_MIDDLE,WORKINGIN 15392 NO_ASM,MERGED_MIDDLE,WORKINGIN1 15371 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 15395 NO_ASM,MERGED_MIDDLE,WORKINGIN2 15173 NO_ASM,MERGED_MIDDLE,WORKINGIN3 [B]15084[/B] NO_ASM,MERGED_MIDDLE,WORKINGIN4 15166 NO_ASM,MERGED_MIDDLE,WORKINGIN5 15387 NO_ASM,MERGED_MIDDLE,WORKINGOUT 15387 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 15386 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 15357 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 15389 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 15169 NO_ASM,MERGED_MIDDLE,WORKINGOUT3 [B]15040[/B] NO_ASM,MERGED_MIDDLE,WORKINGOUT4 15166 NO_ASM,MERGED_MIDDLE,WORKINGOUT5 14958 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4 14790 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH 14840 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_MIDDLE 14808 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT 14952 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_REVERSELINE 14802 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE 14684 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE [B]14520[/B] NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE repeatability 0% base 15167 best 14520 ratio 15167/14520 = 1.0446[/CODE] |
[QUOTE=kriesel;533378]All the shuffles are beneficial.[/QUOTE]Re the results on GTX1050Ti, T2 shuffle etc effects, on workingin4 and out4:
width -168, middle -118, height -150, reverse -6, sum: -442: 14958->predicted 14516, vs. measured all 14520, close, at 4 off, and resulting from 6 measures, so up to +-~6 digitization noise, so within the error bars |
little gain for RX480 tune on gpuowl v6.11-99-gdd8527b
Almost no gain on RX480[CODE]Gpuowl version and commit V6.11-99-gdd8527b
GPU model RX480 GPU clock free running ~ Host OS Windows 7 Pro x64 Notes Exponent timed 90507919 Computation type (PRP, P-1 stage 1, P-1 stage 2): PRP FFT length FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.13 bits/word (copy/paste from console or log) config file entries -time -iters 10000 -device 0 -user kriesel -cpu condorella/rx480 varying tuning -use options, in chronological order 3567 NO_ASM us/sq warmup, end user interaction, stabilize 3575 NO_ASM baseline In benchmarking (highlight fastest time in bold) 6209 NO_ASM,MERGED_MIDDLE,WORKINGIN 6203 NO_ASM,MERGED_MIDDLE,WORKINGIN (repeatability) 3601 NO_ASM,MERGED_MIDDLE,WORKINGIN1 3591 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 3677 NO_ASM,MERGED_MIDDLE,WORKINGIN2 3715 NO_ASM,MERGED_MIDDLE,WORKINGIN3 4134 NO_ASM,MERGED_MIDDLE,WORKINGIN4 [B]3579[/B] NO_ASM,MERGED_MIDDLE,WORKINGIN5 Out benchmarking (highlight fastest time in bold) 6086 NO_ASM,MERGED_MIDDLE,WORKINGOUT 4797 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 3598 NO_ASM,MERGED_MIDDLE,WORKINGOUT1 3589 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 3941 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 [B]3573[/B] NO_ASM,MERGED_MIDDLE,WORKINGOUT3 3646 NO_ASM,MERGED_MIDDLE,WORKINGOUT4 3690 NO_ASM,MERGED_MIDDLE,WORKINGOUT5 baseline WORKINGIN4, WORKINGOUT4 combination: 4227 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4 Shuffle/reverse options: 4239 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH 4209 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_MIDDLE 4213 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT 4225 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_REVERSELINE 4209 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE 4195 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_REVERSELINE 4199 NO_ASM,MERGED_MIDDLE,WORKINGIN4,WORKINGOUT4,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE 3572 NO_ASM 3579 NO_ASM [B]3566 [/B]NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT3,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_REVERSELINE repeatability +-4/3571 = +-0.11% best 3571 (base) base 3566 ratio 1.0014[/CODE]I wonder, does the optimal -use option set change much versus fft length for a given gpu? Is there any good reason to believe it does, or that it doesn't? |
My card threw another Gerbicz error, this time after a couple of error-free tests. I relaxed the undervolt and bumped up the setsclk to 5 and the fan is at 160:
[CODE]sh pp.sh 1 1160 840 1050 5 [/CODE] [CODE]amdgpu-pci-0300 Adapter: PCI adapter vddgfx: +0.97 V fan1: 2955 RPM (min = 0 RPM, max = 3850 RPM) edge: +72.0°C (crit = +100.0°C, hyst = -273.1°C) (emerg = +105.0°C) junction: +96.0°C (crit = +110.0°C, hyst = -273.1°C) (emerg = +115.0°C) mem: +77.0°C (crit = +94.0°C, hyst = -273.1°C) (emerg = +99.0°C) power1: 241.00 W (cap = 250.00 W) [/CODE] Now 755 us/it |
ROCm GPU unique_id
ROCm exposes a per-GPU unique_id, e.g.:
[CODE] cat /sys/class/drm/card0/device/unique_id 3044212172dc768c [/CODE] This id is a property of the GPU itself, and does not depend on the system or PCIe slot. So changing a GPU in a different slot, or in a different system, preserves the UID. I added a way to specify the GPU to run on by using this unique id: ./gpuowl -uid 3044212172dc768c this can be used instead of -device (-d) which specifies the device by position in the list of devices. The advantage is that the identity of the GPU is preserved when swapping the PCIe slots. Combining -uid with -cpu allows to associate a stable symbolic name to an actual GPU. I also added a few small python scripts (ROCm) under the tools/ directory in the source code: - monitor.py : prints general information about all the ROCm GPUs found - device.py : given a UID, prints the device serial id The last script, device.py, can be used in user power-play scripts that set parameters of GPUs (e.g. memory frequency, undervolting, fan etc), to identify GPUs by UID instead of serial-id to achieve a correct GPU identification. |
[QUOTE=preda;533571]ROCm exposes a per-GPU unique_id,[/QUOTE]I wonder how that's done, since gpus do not have readable serial number registers. Serial number is a human-readable sticker on the device, at best. Nearest we could find for CUDALucas use in Windows was the 64-bit-only Windows uuid, which changes upon removal/reinstall of the same piece of hardware. There's also [URL]https://www.mersenneforum.org/showpost.php?p=460426&postcount=2603[/URL]
[CODE]CUDALucas v2.06beta 64-bit build, compiled May 5 2017 @ 13:00:15 binary compiled for CUDA 6.50 CUDA runtime version 6.50 CUDA driver version 8.0 ------- DEVICE 0 ------- name GeForce GTX 1060 3GB [B]UUID GPU-5e2c5531-4684-57ec-6393-8b762f286c70[/B] ECC Support? Disabled Compatibility 6.1 [/CODE] [url]https://stackoverflow.com/questions/13781738/how-does-cuda-assign-device-ids-to-gpus[/url] |
I suppose it's coming from the GPU ROM or GPU BIOS. It's not the serial number, but it is stable when moving the GPU between systems which is great. I added stickers on my GPUs with their UID to help identify them :)
[QUOTE=kriesel;533597]I wonder how that's done, since gpus do not have readable serial number registers. Serial number is a human-readable sticker on the device, at best. Nearest we could find for CUDALucas use in Windows was the 64-bit-only Windows uuid, which changes upon removal/reinstall of the same piece of hardware. There's also [URL]https://www.mersenneforum.org/showpost.php?p=460426&postcount=2603[/URL] [CODE]CUDALucas v2.06beta 64-bit build, compiled May 5 2017 @ 13:00:15 binary compiled for CUDA 6.50 CUDA runtime version 6.50 CUDA driver version 8.0 ------- DEVICE 0 ------- name GeForce GTX 1060 3GB [B]UUID GPU-5e2c5531-4684-57ec-6393-8b762f286c70[/B] ECC Support? Disabled Compatibility 6.1 [/CODE] [url]https://stackoverflow.com/questions/13781738/how-does-cuda-assign-device-ids-to-gpus[/url][/QUOTE] |
I noticed a huge difference in running times on Linux between compiling with [c]make[/c] and [c]make gpuowl[/c].
|
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.