![]() |
[QUOTE=tServo;499671]I just tested V5.0-9c13870 downloaded from post # 869 on a RX 580 and it was 3.7% slower than 3.8.
I will look into overclocking the 580 a little to compensate.[/QUOTE] Marv, I assume you're on Windows, thus not using ROCm? |
[QUOTE=tServo;499671]I just tested V5.0-9c13870 downloaded from post # 869 on a RX 580 and it was 3.7% slower than 3.8.
I will look into overclocking the 580 a little to compensate.[/QUOTE] tServo, What exponent or fft length did you run the comparison on? If you would provide also driver version and ms/sq numbers, and OS, for your recent V5.0-9c13870 test run, that could provide an OS to OS comparison on same gpu model as SELROC, which could be informative and useful. Post 869 is a Windows executable. It's a fat executable, >1.5MB. (I did not apply strip to it like kracker recommended optionally back at v2.0.) Strip gets that commit down under 0.5MB executable size, and only affects file size, not iteration speed. |
possible Windows AMD driver issue affecting GPU-Z
After reporting the following issue to the authors of GPU-Z several times, for ~V2.7.0 through 2.14.0, without resolution, I have submitted it as an issue with the latest available AMD Adrenalin driver for Windows, v18.10.2. With Windows 7 x64 Pro, on a system with one or more RX480 or RX550 gpus installed, run GPU-Z during local console access. All parameters display ok. Switch to accessing that system via Windows Remote Desktop. Upon the switch to remote desktop, in all running sessions of GPU-Z, the GPU Core clock and GPU memory clock both drop to indicated values of zero; gpu temperature drops out to null degrees. Same system type (HP Z600, Windows 7 X64 Pro, same amount of memory etc) but NVIDIA gpus, no such issue. But it was also an issue with earlier AMD drivers.
|
[QUOTE=kriesel;499674]tServo,
What exponent or fft length did you run the comparison on? If you would provide also driver version and ms/sq numbers, and OS, for your recent V5.0-9c13870 test run, that could provide an OS to OS comparison on same gpu model as SELROC, which could be informative and useful. Post 869 is a Windows executable. It's a fat executable, >1.5MB. (I did not apply strip to it like kracker recommended optionally back at v2.0.) Strip gets that commit down under 0.5MB executable size, and only affects file size, not iteration speed.[/QUOTE] Here are the requested data: Windoze 10, 18.03 current to within a few months. AMD Adrenaline driver 17.7 ( see below ) exponent tested is 87,3xxx,xxx FFT size is 5120k ms/sq is 4.52 ( for 3.8 it is 4.32 ) Note the ms/sq is 4.4% difference whereas yesterday I reported a 3.7 % difference. The 3.7 was based on the ETA difference between the 2 versions. The AMD driver is old, probably not updated since I got the machine. I will update tomorrow and report the new times, if any. I'm skeptical there will be much difference because my impression is that both AMD & Nvidia pay lots of attention in their drivers to the performance of the latest & greatest video games and perhaps BSOD complaints but not much else. |
New AMD driver results
Installing the AMD driver 18.10 shows the two versions almost the same:
3.8 4.52 -> 4.54 5.0 4.52 -> 4.53 I will probably apply about a 5% overclock in a week and see how much that improves it. However, RX 580s are notoriously difficult to overclock. If the overclock jacks the power consumption too much, I will back it off because the extra cost for power won't justify a small increase in speed. |
PRP 89m completed on Win7 x64, gpuowl V5.0-9c13870
[url]https://www.mersenne.org/report_exponent/?exp_lo=89000167&full=1[/url] with the Adrenalin 18.10.2 driver. (Base=3 indicates no P-1 in the PRP run.)
|
[QUOTE=preda;499580]I just added an FFT-3 "middle" step.[/QUOTE]
Here is a Debianized version of gpuowl 5.0 [url]https://drive.google.com/file/d/1MvWBK5ArXDcnEqCDjpa8nDgLJIhCnzWr/view?usp=sharing[/url] to install issue the command: [CODE]dpkg -i gpuowl.deb [/CODE] |
Not directly related to gpuowl but just a PSA to anyone trying the bleeding edge mainline kernels, ROCm fails to compile on 4.20-rc2 so don't bother trying an rc kernel until ROCm gets updated. It looks like some deprecated timing functions have been removed from the kernel at least and maybe some refactoring needs to be done, hopefully nothing major. If you don't spot the warning to check the error log it looks like ROCm installed but the HelloWorld test from README.md fails and gpuowl will fail as soon as it tries to call OpenCL. I don't think I installed the kernel incorrectly but if anyone is successfully using an rc kernel with ROCm please let me know.
[code]DKMS make.log for amdgpu-1.9-224 for kernel 4.20.0-042000rc2-generic (x86_64) Fri 16 Nov 13:48:45 GMT 2018 make: Entering directory '/usr/src/linux-headers-4.20.0-042000rc2-generic' Makefile:968: "Cannot use CONFIG_STACK_VALIDATION=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel" CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_drm.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/main.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/symbols.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_fence.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_fence_array.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_kthread.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_io.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_module.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_device.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_mn.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_reservation.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/lib/chash.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/scheduler/gpu_scheduler.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_memory.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/scheduler/sched_fence.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_kms.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_drm_global.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_bitmap.o LD [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/lib/amdchash.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_pci.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_atombios.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_device.c: In function ‘kgd2kfd_interrupt’: /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_device.c:708:2: warning: ISO C90 forbids variable length array ‘patched_ihre’ [-Wvla] uint32_t patched_ihre[DIV_ROUND_UP( ^~~~~~~~ CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_tt.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_prime.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.c: In function ‘kfd_ioctl_get_clock_counters’: /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.c:837:2: error: implicit declaration of function ‘getrawmonotonic64’; did you mean ‘getrawmonotonic’? [-Werror=implicit-function-declaration] getrawmonotonic64(&time); ^~~~~~~~~~~~~~~~~ getrawmonotonic CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_topology.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.c:840:2: error: implicit declaration of function ‘get_monotonic_boottime64’; did you mean ‘getboottime64’? [-Werror=implicit-function-declaration] get_monotonic_boottime64(&time); ^~~~~~~~~~~~~~~~~~~~~~~~ getboottime64 CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_pasid.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_util.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_doorbell.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_module.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.c: In function ‘amdgpu_pmops_runtime_suspend’: /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.c:768:2: error: implicit declaration of function ‘vga_switcheroo_set_dynamic_switch’; did you mean ‘vga_switcheroo_process_delayed_switch’? [-Werror=implicit-function-declaration] vga_switcheroo_set_dynamic_switch(pdev, VGA_SWITCHEROO_OFF); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vga_switcheroo_process_delayed_switch CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_flat_memory.o LD [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/amdkcl.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/atombios_crtc.o cc1: some warnings being treated as errors scripts/Makefile.build:293: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.o' failed make[2]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.o] Error 1 make[2]: *** Waiting for unfinished jobs.... CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_process.o LD [M] /var/lib/dkms/amdgpu/1.9-224/build/scheduler/amd-sched.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_object.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_queue.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c: In function ‘amdgpu_device_get_pcie_info’: /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3350:9: error: implicit declaration of function ‘drm_pcie_get_speed_cap_mask’; did you mean ‘pcie_get_speed_cap’? [-Werror=implicit-function-declaration] ret = drm_pcie_get_speed_cap_mask(adev->ddev, &mask); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ pcie_get_speed_cap /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3356:15: error: ‘DRM_PCIE_SPEED_25’ undeclared (first use in this function); did you mean ‘PCIE_SPEED_2_5GT’? if (mask & DRM_PCIE_SPEED_25) ^~~~~~~~~~~~~~~~~ PCIE_SPEED_2_5GT /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3356:15: note: each undeclared identifier is reported only once for each function it appears in /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3358:15: error: ‘DRM_PCIE_SPEED_50’ undeclared (first use in this function); did you mean ‘DRM_PCIE_SPEED_25’? if (mask & DRM_PCIE_SPEED_50) ^~~~~~~~~~~~~~~~~ DRM_PCIE_SPEED_25 /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3360:15: error: ‘DRM_PCIE_SPEED_80’ undeclared (first use in this function); did you mean ‘DRM_PCIE_SPEED_50’? if (mask & DRM_PCIE_SPEED_80) ^~~~~~~~~~~~~~~~~ DRM_PCIE_SPEED_50 CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_lock.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_mqd_manager.o /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.c: In function ‘ttm_bo_vm_fault’: /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.c:314:10: error: implicit declaration of function ‘vm_insert_mixed’; did you mean ‘vmf_insert_mixed’? [-Werror=implicit-function-declaration] ret = vm_insert_mixed(&cvma, address, ^~~~~~~~~~~~~~~ vmf_insert_mixed /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.c:324:10: error: implicit declaration of function ‘vm_insert_pfn’; did you mean ‘vmf_insert_pfn’? [-Werror=implicit-function-declaration] ret = vm_insert_pfn(&cvma, address, pfn); ^~~~~~~~~~~~~ vmf_insert_pfn /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3367:9: error: implicit declaration of function ‘drm_pcie_get_max_link_width’; did you mean ‘drm_dp_max_link_rate’? [-Werror=implicit-function-declaration] ret = drm_pcie_get_max_link_width(adev->ddev, &mask); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ drm_dp_max_link_rate CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_execbuf_util.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_mqd_manager_cik.o cc1: some warnings being treated as errors scripts/Makefile.build:293: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.o' failed make[2]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.o] Error 1 CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_page_alloc.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_manager.o cc1: some warnings being treated as errors scripts/Makefile.build:293: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.o' failed make[2]: *** [/var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.o] Error 1 make[2]: *** Waiting for unfinished jobs.... CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_mqd_manager_vi.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_mqd_manager_v9.o cc1: some warnings being treated as errors scripts/Makefile.build:293: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.o' failed make[2]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.o] Error 1 make[2]: *** Waiting for unfinished jobs.... scripts/Makefile.build:518: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu' failed make[1]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu] Error 2 make[1]: *** Waiting for unfinished jobs.... scripts/Makefile.build:518: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd' failed make[1]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd] Error 2 scripts/Makefile.build:518: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/ttm' failed make[1]: *** [/var/lib/dkms/amdgpu/1.9-224/build/ttm] Error 2 Makefile:1565: recipe for target '_module_/var/lib/dkms/amdgpu/1.9-224/build' failed make: *** [_module_/var/lib/dkms/amdgpu/1.9-224/build] Error 2 make: Leaving directory '/usr/src/linux-headers-4.20.0-042000rc2-generic'[/code] |
[QUOTE=preda;499437]Valerio: could you please prepare a speed comparison between "the fastest" (3.5) and "head" (5.0, with B1=0 (default)), on a FFT 5120K exponent (an exponent around 89M), using ROCm 1.9.1 if you can (i.e. not amdgpu-pro), and any GPU (probably RX580). Maybe you can also get GPU power information (reported by rocm-smi) in the two cases. Maybe switch between the different FFT 5120K variants on 5.0 and select the fastest.
Ken, if you have it handy, maybe I could get similar information from you (with these differences: not ROCm, but just specify the driver you use; and different GPU, that's fine; and use your fastest as baseline, not necessarily 3.5). I'm limited in my analysis because right now I have ONLY Vega64 to test on. Thus any perf testing I do of this problem will be partially "in the dark" if it does not manifest in the same way on Vega64. Thanks, Mihai[/QUOTE] [QUOTE=M344587487;500678]Not directly related to gpuowl but just a PSA to anyone trying the bleeding edge mainline kernels, ROCm fails to compile on 4.20-rc2 so don't bother trying an rc kernel until ROCm gets updated. It looks like some deprecated timing functions have been removed from the kernel at least and maybe some refactoring needs to be done, hopefully nothing major. If you don't spot the warning to check the error log it looks like ROCm installed but the HelloWorld test from README.md fails and gpuowl will fail as soon as it tries to call OpenCL. I don't think I installed the kernel incorrectly but if anyone is successfully using an rc kernel with ROCm please let me know. [code]DKMS make.log for amdgpu-1.9-224 for kernel 4.20.0-042000rc2-generic (x86_64) Fri 16 Nov 13:48:45 GMT 2018 make: Entering directory '/usr/src/linux-headers-4.20.0-042000rc2-generic' Makefile:968: "Cannot use CONFIG_STACK_VALIDATION=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel" CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_drm.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/main.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/symbols.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_fence.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_fence_array.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_kthread.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_io.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_module.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_device.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_mn.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_reservation.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/lib/chash.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/scheduler/gpu_scheduler.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_memory.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/scheduler/sched_fence.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_kms.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_drm_global.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_bitmap.o LD [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/lib/amdchash.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_pci.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_atombios.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_device.c: In function ‘kgd2kfd_interrupt’: /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_device.c:708:2: warning: ISO C90 forbids variable length array ‘patched_ihre’ [-Wvla] uint32_t patched_ihre[DIV_ROUND_UP( ^~~~~~~~ CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_tt.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_prime.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.c: In function ‘kfd_ioctl_get_clock_counters’: /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.c:837:2: error: implicit declaration of function ‘getrawmonotonic64’; did you mean ‘getrawmonotonic’? [-Werror=implicit-function-declaration] getrawmonotonic64(&time); ^~~~~~~~~~~~~~~~~ getrawmonotonic CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_topology.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.c:840:2: error: implicit declaration of function ‘get_monotonic_boottime64’; did you mean ‘getboottime64’? [-Werror=implicit-function-declaration] get_monotonic_boottime64(&time); ^~~~~~~~~~~~~~~~~~~~~~~~ getboottime64 CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_pasid.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_util.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_doorbell.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_module.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.c: In function ‘amdgpu_pmops_runtime_suspend’: /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.c:768:2: error: implicit declaration of function ‘vga_switcheroo_set_dynamic_switch’; did you mean ‘vga_switcheroo_process_delayed_switch’? [-Werror=implicit-function-declaration] vga_switcheroo_set_dynamic_switch(pdev, VGA_SWITCHEROO_OFF); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vga_switcheroo_process_delayed_switch CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_flat_memory.o LD [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/amdkcl.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/atombios_crtc.o cc1: some warnings being treated as errors scripts/Makefile.build:293: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.o' failed make[2]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.o] Error 1 make[2]: *** Waiting for unfinished jobs.... CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_process.o LD [M] /var/lib/dkms/amdgpu/1.9-224/build/scheduler/amd-sched.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_object.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_queue.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c: In function ‘amdgpu_device_get_pcie_info’: /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3350:9: error: implicit declaration of function ‘drm_pcie_get_speed_cap_mask’; did you mean ‘pcie_get_speed_cap’? [-Werror=implicit-function-declaration] ret = drm_pcie_get_speed_cap_mask(adev->ddev, &mask); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ pcie_get_speed_cap /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3356:15: error: ‘DRM_PCIE_SPEED_25’ undeclared (first use in this function); did you mean ‘PCIE_SPEED_2_5GT’? if (mask & DRM_PCIE_SPEED_25) ^~~~~~~~~~~~~~~~~ PCIE_SPEED_2_5GT /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3356:15: note: each undeclared identifier is reported only once for each function it appears in /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3358:15: error: ‘DRM_PCIE_SPEED_50’ undeclared (first use in this function); did you mean ‘DRM_PCIE_SPEED_25’? if (mask & DRM_PCIE_SPEED_50) ^~~~~~~~~~~~~~~~~ DRM_PCIE_SPEED_25 /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3360:15: error: ‘DRM_PCIE_SPEED_80’ undeclared (first use in this function); did you mean ‘DRM_PCIE_SPEED_50’? if (mask & DRM_PCIE_SPEED_80) ^~~~~~~~~~~~~~~~~ DRM_PCIE_SPEED_50 CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_lock.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_mqd_manager.o /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.c: In function ‘ttm_bo_vm_fault’: /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.c:314:10: error: implicit declaration of function ‘vm_insert_mixed’; did you mean ‘vmf_insert_mixed’? [-Werror=implicit-function-declaration] ret = vm_insert_mixed(&cvma, address, ^~~~~~~~~~~~~~~ vmf_insert_mixed /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.c:324:10: error: implicit declaration of function ‘vm_insert_pfn’; did you mean ‘vmf_insert_pfn’? [-Werror=implicit-function-declaration] ret = vm_insert_pfn(&cvma, address, pfn); ^~~~~~~~~~~~~ vmf_insert_pfn /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3367:9: error: implicit declaration of function ‘drm_pcie_get_max_link_width’; did you mean ‘drm_dp_max_link_rate’? [-Werror=implicit-function-declaration] ret = drm_pcie_get_max_link_width(adev->ddev, &mask); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ drm_dp_max_link_rate CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_execbuf_util.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_mqd_manager_cik.o cc1: some warnings being treated as errors scripts/Makefile.build:293: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.o' failed make[2]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.o] Error 1 CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_page_alloc.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_manager.o cc1: some warnings being treated as errors scripts/Makefile.build:293: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.o' failed make[2]: *** [/var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.o] Error 1 make[2]: *** Waiting for unfinished jobs.... CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_mqd_manager_vi.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_mqd_manager_v9.o cc1: some warnings being treated as errors scripts/Makefile.build:293: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.o' failed make[2]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.o] Error 1 make[2]: *** Waiting for unfinished jobs.... scripts/Makefile.build:518: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu' failed make[1]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu] Error 2 make[1]: *** Waiting for unfinished jobs.... scripts/Makefile.build:518: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd' failed make[1]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd] Error 2 scripts/Makefile.build:518: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/ttm' failed make[1]: *** [/var/lib/dkms/amdgpu/1.9-224/build/ttm] Error 2 Makefile:1565: recipe for target '_module_/var/lib/dkms/amdgpu/1.9-224/build' failed make: *** [_module_/var/lib/dkms/amdgpu/1.9-224/build] Error 2 make: Leaving directory '/usr/src/linux-headers-4.20.0-042000rc2-generic'[/code][/QUOTE] Exact. I have compiled kernel 4.20-rc1 successfully, but then it fails to load the ROCm modules. |
GpuOwl cpp sanity check fails
[QUOTE=preda;499563]Yes it makes sense. I'll look into implementing that.[/QUOTE]
[CODE]# cppcheck --enable=all --quiet . [GCD.cpp:38]: (warning) Assert statement calls a function which may have desired side effects: 'isOngoing'. [GCD.h:11]: (style) The class 'GCD' does not have a constructor. [kernel.h:31]: (warning) Member variable 'Kernel::timeSum' is not initialized in the constructor. [kernel.h:31]: (warning) Member variable 'Kernel::nCalls' is not initialized in the constructor. [clwrap.h:78]: (style) Class 'Queue' has a constructor with 1 argument that is not explicit. [Primes.h:18]: (style) Class 'Primes' has a constructor with 1 argument that is not explicit. [./Result.cpp:24]: (information) Skipping configuration 'REV' since the value of 'REV' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly. [Worktodo.cpp:26]: (warning) %d in format string (no. 2) requires 'int *' but the argument type is 'unsigned int *'. [common.cpp:16]: (warning) Return value of function fopen() is not used. [common.cpp:16]: (error) Return value of allocation function 'fopen' is not stored. [./gpuowl.cpp:13]: (information) Skipping configuration 'REV' since the value of 'REV' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly. (information) Cppcheck cannot find all the include files (use --check-config for details) [/CODE] |
[QUOTE=SELROC;500720][CODE]# cppcheck --enable=all --quiet .
[GCD.cpp:38]: (warning) Assert statement calls a function which may have desired side effects: 'isOngoing'. [GCD.h:11]: (style) The class 'GCD' does not have a constructor. [kernel.h:31]: (warning) Member variable 'Kernel::timeSum' is not initialized in the constructor. [kernel.h:31]: (warning) Member variable 'Kernel::nCalls' is not initialized in the constructor. [clwrap.h:78]: (style) Class 'Queue' has a constructor with 1 argument that is not explicit. [Primes.h:18]: (style) Class 'Primes' has a constructor with 1 argument that is not explicit. [./Result.cpp:24]: (information) Skipping configuration 'REV' since the value of 'REV' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly. [Worktodo.cpp:26]: (warning) %d in format string (no. 2) requires 'int *' but the argument type is 'unsigned int *'. [common.cpp:16]: (warning) Return value of function fopen() is not used. [common.cpp:16]: (error) Return value of allocation function 'fopen' is not stored. [./gpuowl.cpp:13]: (information) Skipping configuration 'REV' since the value of 'REV' is unknown. Use -D if you want to check it. You can use -U to skip it explicitly. (information) Cppcheck cannot find all the include files (use --check-config for details) [/CODE][/QUOTE] Thanks, I suspect this is nothing serious, but worth fixing. Unfortunately I am currently travelling without access to a developer machine. I'll look into it when I get access. |
[QUOTE=preda;500823]Thanks, I suspect this is nothing serious, but worth fixing. Unfortunately I am currently travelling without access to a developer machine. I'll look into it when I get access.[/QUOTE]
I am currently trying to get ROCm and GpuOwl loaded into debian repositories. Running cppcheck is a necessary step for debianization to be successful. |
[QUOTE=SELROC;500973]I am currently trying to get ROCm and GpuOwl loaded into debian repositories.
Running cppcheck is a necessary step for debianization to be successful.[/QUOTE] Wow thanks! I'll look into this as soon as I can. |
[QUOTE=preda;501123]Wow thanks! I'll look into this as soon as I can.[/QUOTE]
Not without difficulty for me as a new debian packager, I need to find the exact syntax of .dsc file in order to build a debian source package. |
1 Attachment(s)
[QUOTE=preda;501123]Wow thanks! I'll look into this as soon as I can.[/QUOTE]
First attempt to build a Debian source package with [CODE]dpkg-source --build . [/CODE] |
[QUOTE=M344587487;500678]Not directly related to gpuowl but just a PSA to anyone trying the bleeding edge mainline kernels, ROCm fails to compile on 4.20-rc2 so don't bother trying an rc kernel until ROCm gets updated. It looks like some deprecated timing functions have been removed from the kernel at least and maybe some refactoring needs to be done, hopefully nothing major. If you don't spot the warning to check the error log it looks like ROCm installed but the HelloWorld test from README.md fails and gpuowl will fail as soon as it tries to call OpenCL. I don't think I installed the kernel incorrectly but if anyone is successfully using an rc kernel with ROCm please let me know.
[code]DKMS make.log for amdgpu-1.9-224 for kernel 4.20.0-042000rc2-generic (x86_64) Fri 16 Nov 13:48:45 GMT 2018 make: Entering directory '/usr/src/linux-headers-4.20.0-042000rc2-generic' Makefile:968: "Cannot use CONFIG_STACK_VALIDATION=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel" CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_drm.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/main.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/symbols.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_fence.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_fence_array.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_kthread.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_io.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_module.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_device.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_mn.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_reservation.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/lib/chash.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/scheduler/gpu_scheduler.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_memory.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/scheduler/sched_fence.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_kms.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_drm_global.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_bitmap.o LD [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/lib/amdchash.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_pci.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_atombios.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_device.c: In function ‘kgd2kfd_interrupt’: /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_device.c:708:2: warning: ISO C90 forbids variable length array ‘patched_ihre’ [-Wvla] uint32_t patched_ihre[DIV_ROUND_UP( ^~~~~~~~ CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_tt.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/kcl_prime.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.c: In function ‘kfd_ioctl_get_clock_counters’: /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.c:837:2: error: implicit declaration of function ‘getrawmonotonic64’; did you mean ‘getrawmonotonic’? [-Werror=implicit-function-declaration] getrawmonotonic64(&time); ^~~~~~~~~~~~~~~~~ getrawmonotonic CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_topology.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.c:840:2: error: implicit declaration of function ‘get_monotonic_boottime64’; did you mean ‘getboottime64’? [-Werror=implicit-function-declaration] get_monotonic_boottime64(&time); ^~~~~~~~~~~~~~~~~~~~~~~~ getboottime64 CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_pasid.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_util.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_doorbell.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_module.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.c: In function ‘amdgpu_pmops_runtime_suspend’: /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.c:768:2: error: implicit declaration of function ‘vga_switcheroo_set_dynamic_switch’; did you mean ‘vga_switcheroo_process_delayed_switch’? [-Werror=implicit-function-declaration] vga_switcheroo_set_dynamic_switch(pdev, VGA_SWITCHEROO_OFF); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ vga_switcheroo_process_delayed_switch CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_flat_memory.o LD [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkcl/amdkcl.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/atombios_crtc.o cc1: some warnings being treated as errors scripts/Makefile.build:293: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.o' failed make[2]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_drv.o] Error 1 make[2]: *** Waiting for unfinished jobs.... CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_process.o LD [M] /var/lib/dkms/amdgpu/1.9-224/build/scheduler/amd-sched.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_object.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_queue.o /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c: In function ‘amdgpu_device_get_pcie_info’: /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3350:9: error: implicit declaration of function ‘drm_pcie_get_speed_cap_mask’; did you mean ‘pcie_get_speed_cap’? [-Werror=implicit-function-declaration] ret = drm_pcie_get_speed_cap_mask(adev->ddev, &mask); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ pcie_get_speed_cap /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3356:15: error: ‘DRM_PCIE_SPEED_25’ undeclared (first use in this function); did you mean ‘PCIE_SPEED_2_5GT’? if (mask & DRM_PCIE_SPEED_25) ^~~~~~~~~~~~~~~~~ PCIE_SPEED_2_5GT /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3356:15: note: each undeclared identifier is reported only once for each function it appears in /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3358:15: error: ‘DRM_PCIE_SPEED_50’ undeclared (first use in this function); did you mean ‘DRM_PCIE_SPEED_25’? if (mask & DRM_PCIE_SPEED_50) ^~~~~~~~~~~~~~~~~ DRM_PCIE_SPEED_25 /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3360:15: error: ‘DRM_PCIE_SPEED_80’ undeclared (first use in this function); did you mean ‘DRM_PCIE_SPEED_50’? if (mask & DRM_PCIE_SPEED_80) ^~~~~~~~~~~~~~~~~ DRM_PCIE_SPEED_50 CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_lock.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_mqd_manager.o /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.c: In function ‘ttm_bo_vm_fault’: /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.c:314:10: error: implicit declaration of function ‘vm_insert_mixed’; did you mean ‘vmf_insert_mixed’? [-Werror=implicit-function-declaration] ret = vm_insert_mixed(&cvma, address, ^~~~~~~~~~~~~~~ vmf_insert_mixed /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.c:324:10: error: implicit declaration of function ‘vm_insert_pfn’; did you mean ‘vmf_insert_pfn’? [-Werror=implicit-function-declaration] ret = vm_insert_pfn(&cvma, address, pfn); ^~~~~~~~~~~~~ vmf_insert_pfn /var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.c:3367:9: error: implicit declaration of function ‘drm_pcie_get_max_link_width’; did you mean ‘drm_dp_max_link_rate’? [-Werror=implicit-function-declaration] ret = drm_pcie_get_max_link_width(adev->ddev, &mask); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ drm_dp_max_link_rate CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_execbuf_util.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_mqd_manager_cik.o cc1: some warnings being treated as errors scripts/Makefile.build:293: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.o' failed make[2]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu/amdgpu_device.o] Error 1 CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_page_alloc.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_manager.o cc1: some warnings being treated as errors scripts/Makefile.build:293: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.o' failed make[2]: *** [/var/lib/dkms/amdgpu/1.9-224/build/ttm/ttm_bo_vm.o] Error 1 make[2]: *** Waiting for unfinished jobs.... CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_mqd_manager_vi.o CC [M] /var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_mqd_manager_v9.o cc1: some warnings being treated as errors scripts/Makefile.build:293: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.o' failed make[2]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd/kfd_chardev.o] Error 1 make[2]: *** Waiting for unfinished jobs.... scripts/Makefile.build:518: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu' failed make[1]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdgpu] Error 2 make[1]: *** Waiting for unfinished jobs.... scripts/Makefile.build:518: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd' failed make[1]: *** [/var/lib/dkms/amdgpu/1.9-224/build/amd/amdkfd] Error 2 scripts/Makefile.build:518: recipe for target '/var/lib/dkms/amdgpu/1.9-224/build/ttm' failed make[1]: *** [/var/lib/dkms/amdgpu/1.9-224/build/ttm] Error 2 Makefile:1565: recipe for target '_module_/var/lib/dkms/amdgpu/1.9-224/build' failed make: *** [_module_/var/lib/dkms/amdgpu/1.9-224/build] Error 2 make: Leaving directory '/usr/src/linux-headers-4.20.0-042000rc2-generic'[/code][/QUOTE] - Reworked STIBP Code Lands In Linux 4.20 To Fix The Performance [url]https://www.phoronix.com/scan.php?page=news_item&px=Fixed-STIBP-Lands-In-Linux-4.20[/url] - Benchmarking The Work-In-Progress Spectre/STIBP Code On The Way For Linux 4.20 [url]https://www.phoronix.com/scan.php?page=article&item=linux-420wip-stibp&num=1[/url] with this in mind, I keep an eye at ROCm updates. |
First P-1 factor found by GpuOwl
I think this is a first for GpuOwl, a genuine factor found during PRP-1.
[QUOTE] 2018-12-04 05:49:48 vega0 89625761 OK 9120000 10.18%; 2.41 ms/sq, 749 MULs; ETA 2d 10:40; 36dad77fac83843a (check 1.92s) 2018-12-04 05:50:14 vega0 89625761 9130000 10.19%; 2.41 ms/sq, 709 MULs; ETA 2d 10:32; 44880cedb4ce3506 2018-12-04 05:50:40 vega0 89625761 9140000 10.20%; 2.41 ms/sq, 736 MULs; ETA 2d 10:35; 131ff9126585b1de 2018-12-04 05:50:45 vega0 89625761 GCD 353351713683290534092214911 (56.55s) 2018-12-04 05:50:45 vega0 {"exponent":"89625761", "worktype":"PRP,P-1", "status":"F", "program":{"name":"gpuowl", "version":"5.0"}, "timestamp":"2018-12-03 18:50:45 UTC", "user":"preda", "computer":"vega0", "aid":"redacted", "fft-length":5242880, "factors":["353351713683290534092214911"], "b2":"40000000", "base":{"b1":"1000000", "bias":{"2":19}, "res64":"ed2da130bd266660"}} [/QUOTE] At 10% of PRP along 89625761 it found a factor! This factor is about 88.19bits in size, and can be found by P-1 with B1=234383 B2=17323057. |
[QUOTE=preda;501612]I think this is a first for GpuOwl, a genuine factor found during PRP-1.
At 10% of PRP along 89625761 it found a factor! This factor is about 88.19bits in size, and can be found by P-1 with B1=234383 B2=17323057.[/QUOTE] Yay, saved almost 90% of the PRP run time (and 100% of any future PRP DC). The server's parsing of the result looks incomplete to me: [URL]https://www.mersenne.org/report_exponent/?exp_lo=89625761&exp_hi=&full=1[/URL] shows "Factor:" as result for that, while an ordinary P-1 factor shows up as "Factor: 3326174307660372811303879 / (P-1, B1=730000, B2=14782500, E=12)" for [URL]https://www.mersenne.org/report_exponent/?exp_lo=89694721&full=1[/URL] |
[QUOTE=kriesel;501615]Yay, saved almost 90% of the PRP run time (and 100% of any future PRP DC).
The server's parsing of the result looks incomplete to me: [URL]https://www.mersenne.org/report_exponent/?exp_lo=89625761&exp_hi=&full=1[/URL] shows "Factor:" as result for that, while an ordinary P-1 factor shows up as "Factor: 3326174307660372811303879 / (P-1, B1=730000, B2=14782500, E=12)" for [URL]https://www.mersenne.org/report_exponent/?exp_lo=89694721&full=1[/URL][/QUOTE] Yes, James is aware of the parsing problem, will be fixed. |
1 Attachment(s)
[QUOTE=SELROC;501127]First attempt to build a Debian source package with
[CODE]dpkg-source --build . [/CODE][/QUOTE] second attempt to build a Debian source package, with upstream build system "pbuilder". |
[QUOTE=SELROC;501648]second attempt to build a Debian source package, with upstream build system "pbuilder".[/QUOTE]
Debian Request for Sponsorship and bug filed: [url]https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=915704[/url] |
[QUOTE=SELROC;501844]Debian Request for Sponsorship and bug filed:
[URL]https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=915704[/URL][/QUOTE] But I went ahead and uploaded the package myself: [url]https://mentors.debian.net/package/gpuowl[/url] I am now a Debian maintainer |
[QUOTE=SELROC;501886]But I went ahead and uploaded the package myself:
[url]https://mentors.debian.net/package/gpuowl[/url] I am now a Debian maintainer[/QUOTE] Cool, congrats! Let me know if there's anything I can help with. And thanks! |
[QUOTE=preda;501964]Cool, congrats! Let me know if there's anything I can help with. And thanks![/QUOTE]
I don't know if you are "back to home" :-) I have submitted a pull request for gpuowl that includes all the files needed for debian. - Important for you: you need to run cppcheck against gpuowl source - Important for me: I need to run Lintian against the gpuowl package |
[QUOTE=SELROC;501965]I don't know if you are "back to home" :-)
I have submitted a pull request for gpuowl that includes all the files needed for debian. - Important for you: you need to run cppcheck against gpuowl source - Important for me: I need to run Lintian against the gpuowl package[/QUOTE] Hi Valerio, I just moved into a new home, and I did turn on the desktop already :) I ran cppcheck and fixed everything except a few "style" issues. |
[QUOTE=preda;502011]Hi Valerio, I just moved into a new home, and I did turn on the desktop already :)
I ran cppcheck and fixed everything except a few "style" issues.[/QUOTE] Thats cool, I hope you enjoy your new home, for the moment I am going to sleep, so tomorrow I update the package. |
[QUOTE=preda;502011]Hi Valerio, I just moved into a new home, and I did turn on the desktop already :)
I ran cppcheck and fixed everything except a few "style" issues.[/QUOTE] Hi, [url]https://github.com/preda/gpuowl/issues/20#issuecomment-445437109[/url] |
[QUOTE=SELROC;502012]Thats cool, I hope you enjoy your new home, for the moment I am going to sleep, so tomorrow I update the package.[/QUOTE]
Hi, look here, [url]https://mentors.debian.net/[/url] |
missing newline case
In v5.0-5c13870, at least if doing PRP (no P-1), result lines output to the results.txt file seem to be lacking the trailing \n and so append onto a single run-on record.
|
[QUOTE=kriesel;502358]In v5.0-5c13870, at least if doing PRP (no P-1), result lines output to the results.txt file seem to be lacking the trailing \n and so append onto a single run-on record.[/QUOTE]
By looking at the source, it seems to me that at least one '\n' should be there. It is possible that this is not working correctly on Windows, where the editor may expect \r\n (depending on editor). Are you sure there is not any '\n'? And, interestingly, I couldn't find commit 5c13870, I don't know why.. |
[QUOTE=preda;502368]By looking at the source, it seems to me that at least one '\n' should be there. It is possible that this is not working correctly on Windows, where the editor may expect \r\n (depending on editor). Are you sure there is not any '\n'?
And, interestingly, I couldn't find commit 5c13870, I don't know why..[/QUOTE] That's a decade-old problem, maybe older than that, of end-of-line characters translation between Windows and Linux, the standard Windows editor Notepad does not understand the difference, and paragraphs will appear as a long continuous line. One editor that understands the difference is Notepad++. |
[QUOTE=preda;502368]By looking at the source, it seems to me that at least one '\n' should be there. It is possible that this is not working correctly on Windows, where the editor may expect \r\n (depending on editor). Are you sure there is not any '\n'?
And, interestingly, I couldn't find commit 5c13870, I don't know why..[/QUOTE] I cannot find it also... |
[QUOTE=preda;502368]By looking at the source, it seems to me that at least one '\n' should be there. It is possible that this is not working correctly on Windows, where the editor may expect \r\n (depending on editor). Are you sure there is not any '\n'?
And, interestingly, I couldn't find commit 5c13870, I don't know why..[/QUOTE] Sorry; [B]9[/B]c13870. Observed on Windows, with default editor Notepad; Wordpad finds a separator. Free hex editor Frhed shows hex 0a (LF). |
[QUOTE=kriesel;502393]Sorry; [B]9[/B]c13870.
Observed on Windows, with default editor Notepad; Wordpad finds a separator. Free hex editor Frhed shows hex 0a (LF).[/QUOTE] LF is \n = Line Feed aka newline CR is \r = Carriage Return [url]https://en.wikipedia.org/wiki/Newline#History[/url] |
All known Mersenne primes Mp above p=132049 were run on gpuowl v5.0-9c13870, and run times tabulated and graphed. Fit for run time at exponents near the current wavefront is p[SUP]1.82[/SUP], a lower power than expected from asymptotic fft multiplication time n ln n ln ln n times number of iterations n. See [url]https://www.mersenneforum.org/showpost.php?p=502776&postcount=10[/url]
|
[QUOTE=kriesel;504039]All known Mersenne primes Mp above p=132049 were run on gpuowl v5.0-9c13870, and run times tabulated and graphed. Fit for run time at exponents near the current wavefront is p[SUP]1.82[/SUP], a lower power than expected from asymptotic fft multiplication time n ln n ln ln n times number of iterations n. See [url]https://www.mersenneforum.org/showpost.php?p=502776&postcount=10[/url][/QUOTE]
Thank you Ken! I know I didn't get around to apply some of the changes you suggested -- I've been busy with other external things recently. But I still plan to follow through when I get a moment. From what I remember: - documentation - better way to specify FFT size - makefile small changes to better support windows or mingw - display FFT list on -h without starting and probably a few more. |
[QUOTE=preda;504043]Thank you Ken!
I know I didn't get around to apply some of the changes you suggested -- I've been busy with other external things recently. But I still plan to follow through when I get a moment. From what I remember: - documentation - better way to specify FFT size - makefile small changes to better support windows or mingw - display FFT list on -h without starting and probably a few more.[/QUOTE]You're welcome. The outcome of no false negatives and no logged errors was encouraging. I would add conditional compilation, so \r\n could be there for log and result entries for Windows builds. Beyond that, I'd need to refer to the equivalent of [URL]https://www.mersenneforum.org/showpost.php?p=488537&postcount=3[/URL] myself. A quick look indicates it's gotten multiple versions and months out of date. So see also forum thread content. Something I've considered lately is making a table of interim residues, perhaps approx half way through, per known Mp. I'm keeping the gpuowl logs indefinitely. I have a PRP-1 running on a current wavefront exponent in 5.0-9c13870 currently; ETA late Dec 30. After that I'll probably switch the RX480 back to V3.8 and production PRP. |
[QUOTE=preda;504043]Thank you Ken!
I know I didn't get around to apply some of the changes you suggested -- I've been busy with other external things recently. But I still plan to follow through when I get a moment. From what I remember: - documentation - better way to specify FFT size - makefile small changes to better support windows or mingw - display FFT list on -h without starting and probably a few more.[/QUOTE] I've updated the spreadsheet and posted an updated pdf at [URL]https://www.mersenneforum.org/showpost.php?p=488537&postcount=3[/URL] Happy (and perhaps busy) New Year! |
GpuOwl simple primenet integration
I added a primenet.py script to GpuOwl, inspired from the mlucas-primenet.py of Mlucas.
[url]https://github.com/preda/gpuowl/blob/master/primenet.py[/url] It's a 100 lines python script intended to run in the background. It wants to run in the same folder as a gpuOwl instance. It works with these files in that folder: results.txt : reads results from this file new results sent.txt : appends processed results here retry.txt : appends results that were not accepted by the server here worktodo.txt : reads this file to see how many assignment are active. If less than 2 assignments (i.e. the current and the next), retrieves and appends a new assignment. How to run, with different ways of passing the primenet password in: cd gpuowl ./primenet.py -u user -p password echo password | ./primenet.py -u user ./primenet.py -u user < pass.txt By default it wakes up every hour, to check if there is work to do. Work can be: - a new result is present in results.txt that was not sent to the server previously. - the number of tasks it worktodo.txt dropped below 2, new tasks need fetching |
GpuOwl v6
In recent updates to GpuOwl I dropped the PRP-1 feature (which allowed to do a normal P-1 first-stage before the PRP, and a P-1 second-stage in parallel with the PRP).
This was because: - most exponents at the wavefront (all?) already have TF to high bits and some P-1 already done. - even without any P-1 done, the rate of factors found by PRP-1 is not huge, let's say about 5% depending on the level of TF Thus the benefit of PRP-1 was marginal. To make useful use of it, it would need to be run on larger exponents (not at wavefront), that have no P-1 and lower TF. Dropping PRP-1 removes the dependency on the GMP library. If anybody has PRP-1 work ongoing (i.e. with B1 != 0), they should finish it before upgrading because GpuOwl v6 can't do PRP-1. |
[QUOTE=preda;504858]- even without any P-1 done, the rate of factors found by PRP-1 is not huge, let's say about 5% depending on the level of TF[/QUOTE]
But this is certainly no worse than the usual rate for finding factors by ordinary P−1. Remember that as you increase bit-length (for TF) and non-smoothness of k (for P−1), the difficulty goes up exponentially. So there is a relatively narrow transition zone between trivially easy factors that were found long ago and impossibly hard factors that we'll never find. Maybe you got discouraged too easily, and should reconsider. |
[QUOTE=GP2;504861]But this is certainly no worse than the usual rate for finding factors by ordinary P−1.
Remember that as you increase bit-length (for TF) and non-smoothness of k (for P−1), the difficulty goes up exponentially. So there is a relatively narrow transition zone between trivially easy factors that were found long ago and impossibly hard factors that we'll never find. Maybe you got discouraged too easily, and should reconsider.[/QUOTE] Yes, I'm open to reconsider. But, P-1 can and is done anyway. It's done outside of GpuOwl. I think it's most commonly done on CPUs. So not having the P-1 in gpuowl does not mean it does not get done, only that it's done in a different way. |
[QUOTE=preda;504865]Yes, I'm open to reconsider.
But, P-1 can and is done anyway. It's done outside of GpuOwl. I think it's most commonly done on CPUs. So not having the P-1 in gpuowl does not mean it does not get done, only that it's done in a different way.[/QUOTE] A similar argument could be applied to PRP. That wouldn't leave much for gpuOwl to do. gpuowl v4-5 is the only route I know of to do P-1 useful for GIMPS on OpenCl. For NVIDIA there's CUDAPm1, and on Intel cpus, prime95/mprime. |
@Mihai, you could split it in two, if not difficult for you to maintain both applications, or keep it as an option/command line switch. GPU P-1 is faster than CPU and some may prefer to do it separate. I use both cudaLucas and cudaPM1, and I am happy of the fact that they are separate applications.
This is by no mean intending to give you any push, it is just that I would feel kinda sorry to see you abandoning the P-1 stuff... In fact it seems that you are one of the very few openCL "experts" remaining here, and I would be quite happy to see you taking over mfakto too :razz: (especially now, with Bdot seems abandoning it, see the discussion in the mfakto thread). (hehe, știi chestia aia cu ”bate calul ăla care trage mai tare”?) |
I think we are up to speed with gpu manual testing as primenet.py works pretty well.
|
[QUOTE=LaurV;504885]@Mihai, you could split it in two, if not difficult for you to maintain both applications, or keep it as an option/command line switch. GPU P-1 is faster than CPU and some may prefer to do it separate. I use both cudaLucas and cudaPM1, and I am happy of the fact that they are separate applications.
This is by no mean intending to give you any push, it is just that I would feel kinda sorry to see you abandoning the P-1 stuff... In fact it seems that you are one of the very few openCL "experts" remaining here, and I would be quite happy to see you taking over mfakto too :razz: (especially now, with Bdot seems abandoning it, see the discussion in the mfakto thread). (hehe, știi chestia aia cu ”bate calul ăla care trage mai tare”?)[/QUOTE] :) OK, I would like to have a standalone OpenCL P-1 tester. The problem is that while first-stage P-1 is simple, and I've implemented it already, second-stage of P-1 is more complex and I'd need to learn how to do it before I can implement it. PRP-1 was not a good replacement for P-1, because it was doing a full slow PRP even if somebody only wanted a quick P-1. PRP-1 was very good for this use-case: somebody actually wanted to do the PRP of an exponent without any P-1 done previously. Brainstorming, I'm thinking of this: I could do a stand-alone "variant" P-1 that uses the ideas from PRP-1. The first stage P-1 is always the same between the two (classic P-1 and "variant P-1"). The difference is in the second stage. The output of the "variant P-1" would be either a factor found (rarely, 5%), or a pair (base, residue) that can be continued into a PRP. Let's take an example: for an 90M exponent, without any P-1 done. Let's say somebody wants to do P-1(B1=1M, B2=20M) on it. Doing it in the "variant P-1" way, it would require: - about 1.44M squarings for stage-1 (same as classic P-1, ~ B1 * 1.44) - about (20M squaring + about 1M muls) for stage 2 to B2=20M done "PRP-1 way" (this, I think, is competitive with classic P-1, although the difference (in stage 2) is small) Now, for a no-factor-found P-1, the gain might come by saving the final residue of the "variant P-1" and, when a PRP is desired in the future for the same exponent, start it from there. For the example, the PRP instead of doing 90M iteration would only need 70M (90 - 20) iterations. This may be a gain, but it puts a lot of burden on the server, who would need to save full residue of the P-1 to allow to continue a PRP from there. Thus, storage and implementation to do. |
[QUOTE=preda;504907]:)
OK, I would like to have a standalone OpenCL P-1 tester. The problem is that while first-stage P-1 is simple, and I've implemented it already, second-stage of P-1 is more complex and I'd need to learn how to do it before I can implement it. PRP-1 was not a good replacement for P-1, because it was doing a full slow PRP even if somebody only wanted a quick P-1. PRP-1 was very good for this use-case: somebody actually wanted to do the PRP of an exponent without any P-1 done previously. Brainstorming, I'm thinking of this: I could do a stand-alone "variant" P-1 that uses the ideas from PRP-1. The first stage P-1 is always the same between the two (classic P-1 and "variant P-1"). The difference is in the second stage. The output of the "variant P-1" would be either a factor found (rarely, 5%), or a pair (base, residue) that can be continued into a PRP. Let's take an example: for an 90M exponent, without any P-1 done. Let's say somebody wants to do P-1(B1=1M, B2=20M) on it. Doing it in the "variant P-1" way, it would require: - about 1.44M squarings for stage-1 (same as classic P-1, ~ B1 * 1.44) - about (20M squaring + about 1M muls) for stage 2 to B2=20M done "PRP-1 way" (this, I think, is competitive with classic P-1, although the difference (in stage 2) is small) Now, for a no-factor-found P-1, the gain might come by saving the final residue of the "variant P-1" and, when a PRP is desired in the future for the same exponent, start it from there. For the example, the PRP instead of doing 90M iteration would only need 70M (90 - 20) iterations. This may be a gain, but it puts a lot of burden on the server, who would need to save full residue of the P-1 to allow to continue a PRP from there. Thus, storage and implementation to do.[/QUOTE] In the CUDAPm1 source code, there's a notation for where "stage 3" would be added. This is extension of stage 1 to higher bounds than initially run. By analogy, a "stage 4" extension of B2 could also be possible. Conventional P-1 does a gcd at the end of stage one and another at the end of stage 2. There are other possibilities. One could go ~halfway through stage one and do a gcd. If a factor is found, early exit. If not, continue to the full B1 and gcd again. Same for stage 2. As I recall prime95 offers the option to save locally the P-1 full residue files for extension to higher bounds, perhaps by some other program. In the case where an OpenCl P-1-only program exists, but the server does not (yet?) accept the P-1 residue files, or the user hasn't the bandwidth to upload them (still stuck at 128 kbit/sec here, fiber is probably another year out), and primality testing does not make any further use of those computed powers of 3, nor does most extension of P-1 bounds higher, that's about the same as the current prime95 and CUDApm1 and CUDALucas status quo and typical usage. Saving the full-residue files should be optional. Those who want to keep them for backups, proof of work, debugging, bounds extension, or for use by other programs can. Those who just want P-1 factoring answers and conserved disk space will probably be the majority. Volunteers working to factor further can share the P-1 full residue files among themselves without necessarily involving the primenet server. (Google drive?) Making double use of the powering of 3 as in PRP-1 was very creative, and that efficiency is desirable, but it is not an immediate or global requirement. P-1 code that "merely" does P-1 efficiently and reliably on OpenCl is a good thing, an advance from the status quo. Any performance advantage, reusability of the full P-1 residue, etc, is a bonus. There are possibilities for using the Jacobi symbol as error checking, which to my knowledge no other P-1 code has. It may be too expensive computationally to be a net productivity gain, but might be useful as a user controllable option. People sometimes get no factors for a long time and wonder if something is wrong. Nobody knows what the error rate of current or past P-1 factoring is. I suppose it could be estimated by linear interpolation from primality test run times assuming errors occur at a rate over time; ~2% x 2.5% = ~0.05%. I think the complexities of CUDAPm1 make its error rate higher; the bug list is nontrivial. |
[QUOTE=preda;504858]In recent updates to GpuOwl I dropped the PRP-1 feature (which allowed to do a normal P-1 first-stage before the PRP, and a P-1 second-stage in parallel with the PRP).
This was because: - most exponents at the wavefront (all?) already have TF to high bits and some P-1 already done. - even without any P-1 done, the rate of factors found by PRP-1 is not huge, let's say about 5% depending on the level of TF Thus the benefit of PRP-1 was marginal. To make useful use of it, it would need to be run on larger exponents (not at wavefront), that have no P-1 and lower TF. Dropping PRP-1 removes the dependency on the GMP library. If anybody has PRP-1 work ongoing (i.e. with B1 != 0), they should finish it before upgrading because GpuOwl v6 can't do PRP-1.[/QUOTE] How does V6 compare in PRP speed to previous versions? Any data, any gpu model, any OS, any fft length, anyone? |
[QUOTE=kriesel;504932]How does V6 compare in PRP speed to previous versions? Any data, any gpu model, any OS, any fft length, anyone?[/QUOTE]
Should be exactly the same speed... |
-time crashes on Windows after collecting data (slowly)
Ran v3.8 on RX480 in Windows 7, AMD Adrenalin driver 18.10.2, with -time option. The GhzD/day figures and gpu utilization are quite low compared to running without the -time option; ~76 GhzD/day.The following is what it ran in two sessions terminating in application crashes. Next up, a try of the equivalent in v5.[CODE]2019-01-04 15:32:48 condorella-rx480 gpuowl-OpenCL 3.8-91c52fa
2019-01-04 15:32:48 condorella-rx480 FFT 9216K: Width 1024 (256x4), Height 512 (64x8), Middle 9; 16.27 bits/word 2019-01-04 15:32:48 condorella-rx480 Note: using short carry kernels 2019-01-04 15:32:51 condorella-rx480 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2019-01-04 15:32:55 condorella-rx480 OpenCL compilation in 3921 ms, with "-DEXP=153500033u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2019-01-04 15:32:56 condorella-rx480 PRP M(153500033), FFT 9216K, 16.27 bits/word, 1045 GHz-day 2019-01-04 15:33:55 condorella-rx480 OK loaded: 146771000/153500033, blockSize 500, 6d17e08673529e2f 2019-01-04 15:34:14 condorella-rx480 OK initial check: 6d17e08673529e2f 2019-01-04 15:35:24 condorella-rx480 OK 146772000/153500033 [95.62%], 36.65 ms/it [35.55, 37.74] (16.0 GHz-day/day); ETA 2d 20:30; 8daf006dbb9b251c (check 19.69s) (saved) 2019-01-04 15:35:24 condorella-rx480 15.2% tailFused : 2605 [ 1999, 5101] us/call x 2499 calls 2019-01-04 15:35:24 condorella-rx480 15.1% carryFused : 3225 [ 2686, 7314] us/call x 1996 calls 2019-01-04 15:35:24 condorella-rx480 13.2% transposeW : 1613 [ 999, 4683] us/call x 3503 calls 2019-01-04 15:35:24 condorella-rx480 12.7% fftMiddleIn : 1552 [ 991, 4890] us/call x 3503 calls 2019-01-04 15:35:24 condorella-rx480 11.3% fftMiddleOut : 1608 [ 999, 5096] us/call x 3001 calls 2019-01-04 15:35:24 condorella-rx480 11.2% transposeH : 1590 [ 995, 4536] us/call x 3001 calls 2019-01-04 15:35:24 condorella-rx480 7.0% fftP : 1980 [ 1000, 4885] us/call x 1507 calls 2019-01-04 15:35:24 condorella-rx480 4.3% carryA : 1813 [ 1000, 4295] us/call x 1002 calls 2019-01-04 15:35:24 condorella-rx480 4.1% mulFused : 3452 [ 3000, 6524] us/call x 502 calls 2019-01-04 15:35:24 condorella-rx480 3.8% fftW : 1607 [ 993, 4640] us/call x 1005 calls 2019-01-04 15:35:24 condorella-rx480 2.1% carryB : 887 [ 0, 3332] us/call x 1005 calls 2019-01-04 15:35:24 condorella-rx480 2019-01-04 15:40:12 condorella-rx480 146780000/153500033 [95.62%], 35.99 ms/it [31.15, 37.90] (16.3 GHz-day/day); ETA 2d 19:12; 3c480a9a9b10ec2a 2019-01-04 15:40:12 condorella-rx480 26.4% carryFused : 3242 [ 2680, 7270] us/call x 7984 calls 2019-01-04 15:40:12 condorella-rx480 21.0% tailFused : 2577 [ 1998, 5352] us/call x 8000 calls 2019-01-04 15:40:12 condorella-rx480 13.7% transposeW : 1672 [ 998, 4898] us/call x 8032 calls 2019-01-04 15:40:12 condorella-rx480 13.3% fftMiddleOut : 1625 [ 993, 5161] us/call x 8016 calls 2019-01-04 15:40:12 condorella-rx480 12.8% transposeH : 1572 [ 994, 4640] us/call x 8016 calls 2019-01-04 15:40:12 condorella-rx480 12.6% fftMiddleIn : 1538 [ 985, 5001] us/call x 8032 calls 2019-01-04 15:40:12 condorella-rx480 0.1% fftP : 2188 [ 1179, 9406] us/call x 48 calls 2019-01-04 15:49:23 condorella-rx480 gpuowl-OpenCL 3.8-91c52fa 2019-01-04 15:49:23 condorella-rx480 FFT 9216K: Width 1024 (256x4), Height 512 (64x8), Middle 9; 16.27 bits/word 2019-01-04 15:49:23 condorella-rx480 Note: using short carry kernels 2019-01-04 15:49:26 condorella-rx480 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2019-01-04 15:49:30 condorella-rx480 OpenCL compilation in 3897 ms, with "-DEXP=153500033u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2019-01-04 15:49:31 condorella-rx480 PRP M(153500033), FFT 9216K, 16.27 bits/word, 1045 GHz-day 2019-01-04 15:50:32 condorella-rx480 OK loaded: 146772000/153500033, blockSize 500, 8daf006dbb9b251c 2019-01-04 15:50:50 condorella-rx480 OK initial check: 8daf006dbb9b251c 2019-01-04 15:51:36 condorella-rx480 OK 146773000/153500033 [95.62%], 27.31 ms/it [18.28, 36.34] (21.5 GHz-day/day); ETA 2d 03:02; a98af4a2b6a710da (check 17.96s) (saved) 2019-01-04 15:51:36 condorella-rx480 15.0% tailFused : 2543 [ 1999, 5639] us/call x 2499 calls 2019-01-04 15:51:36 condorella-rx480 14.9% carryFused : 3159 [ 2624, 8031] us/call x 1996 calls 2019-01-04 15:51:36 condorella-rx480 13.2% transposeW : 1596 [ 996, 4933] us/call x 3503 calls 2019-01-04 15:51:36 condorella-rx480 12.5% fftMiddleIn : 1505 [ 986, 5055] us/call x 3503 calls 2019-01-04 15:51:36 condorella-rx480 11.4% fftMiddleOut : 1601 [ 993, 5165] us/call x 3001 calls 2019-01-04 15:51:36 condorella-rx480 10.8% transposeH : 1530 [ 990, 4827] us/call x 3001 calls 2019-01-04 15:51:36 condorella-rx480 7.1% fftP : 2003 [ 1162, 7116] us/call x 1507 calls 2019-01-04 15:51:36 condorella-rx480 4.7% carryA : 2005 [ 1197, 6959] us/call x 1002 calls 2019-01-04 15:51:36 condorella-rx480 4.2% mulFused : 3544 [ 2986, 7226] us/call x 502 calls 2019-01-04 15:51:36 condorella-rx480 4.0% fftW : 1667 [ 996, 5241] us/call x 1005 calls 2019-01-04 15:51:36 condorella-rx480 2.1% carryB : 874 [ 0, 3383] us/call x 1005 calls 2019-01-04 15:51:36 condorella-rx480 2019-01-04 15:55:50 condorella-rx480 146780000/153500033 [95.62%], 36.41 ms/it [33.72, 38.19] (16.2 GHz-day/day); ETA 2d 19:58; 3c480a9a9b10ec2a [/CODE] |
v5 -time output on RX480, 79M PRP DC, Win
This is a no P-1 run, just PRP. -time output does not handle the 0-calls cases as gracefully as it could. Probably an easy fix there. No crashes in 10 minutes. ETA with -time ~ 3 weeks, without it ~4 days. This is an assigned PRP DC. Unfortunately the first test was also with offset zero.[CODE]C:\msys64\home\ken\gpuowl-compile\v5.0-9c13870>openowl -time
2019-01-04 16:19:46 gpuowl 5.0-9c13870 2019-01-04 16:19:46 -time 2019-01-04 16:19:46 79055077 FFT 4608K: Width 256x4, Height 64x4, Middle 9; 16.75 bits/word 2019-01-04 16:19:46 using short carry kernels 2019-01-04 16:19:49 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2019-01-04 16:19:52 OpenCL compilation in 2627 ms, with "-DEXP=79055077u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2019-01-04 16:19:53 79055077.owl not found, starting from the beginning. 2019-01-04 16:20:40 79055077 OK 800 0.00%; 28.15 ms/sq, 0 MULs; ETA 25d 18:05; 3aa268928b9c975c (check 11.28s) 2019-01-04 16:20:41 nan% carryFused : 2446 us/call x 1568 calls 2019-01-04 16:20:41 nan% carryFusedMul : nan us/call x 0 calls 2019-01-04 16:20:41 nan% fftP : 1755 us/call x 98 calls 2019-01-04 16:20:41 nan% fftW : 953 us/call x 64 calls 2019-01-04 16:20:41 nan% fftH : 1340 us/call x 100 calls 2019-01-04 16:20:41 nan% fftMiddleIn : 1337 us/call x 1666 calls 2019-01-04 16:20:41 nan% fftMiddleOut : 1021 us/call x 1632 calls 2019-01-04 16:20:41 nan% carryA : 1781 us/call x 64 calls 2019-01-04 16:20:41 nan% carryM : nan us/call x 0 calls 2019-01-04 16:20:41 nan% carryB : 438 us/call x 64 calls 2019-01-04 16:20:41 nan% transposeW : 1053 us/call x 1666 calls 2019-01-04 16:20:41 nan% transposeH : 1600 us/call x 1632 calls 2019-01-04 16:20:41 nan% transposeIn : 1000 us/call x 4 calls 2019-01-04 16:20:41 nan% transposeOut : 0 us/call x 1 calls 2019-01-04 16:20:41 nan% square : nan us/call x 0 calls 2019-01-04 16:20:41 nan% multiply : 1939 us/call x 33 calls 2019-01-04 16:20:41 nan% multiplySub : nan us/call x 0 calls 2019-01-04 16:20:41 nan% tailFused : 2114 us/call x 1599 calls 2019-01-04 16:20:41 nan% readResidue : 1000 us/call x 2 calls 2019-01-04 16:20:41 nan% isNotZero : 9000 us/call x 1 calls 2019-01-04 16:20:41 nan% isEqual : 0 us/call x 1 calls 2019-01-04 16:20:41 2019-01-04 16:24:16 79055077 10000 0.01%; 23.40 ms/sq, 0 MULs; ETA 21d 09:47; fa9ad651bc910bc8 2019-01-04 16:24:16 nan% carryFused : 2100 us/call x 9177 calls 2019-01-04 16:24:16 nan% carryFusedMul : nan us/call x 0 calls 2019-01-04 16:24:16 nan% fftP : 1174 us/call x 69 calls 2019-01-04 16:24:16 nan% fftW : 848 us/call x 46 calls 2019-01-04 16:24:16 nan% fftH : 1130 us/call x 69 calls 2019-01-04 16:24:16 nan% fftMiddleIn : 1201 us/call x 9246 calls 2019-01-04 16:24:16 nan% fftMiddleOut : 960 us/call x 9223 calls 2019-01-04 16:24:16 nan% carryA : 1174 us/call x 46 calls 2019-01-04 16:24:16 nan% carryM : nan us/call x 0 calls 2019-01-04 16:24:16 nan% carryB : 435 us/call x 46 calls 2019-01-04 16:24:16 nan% transposeW : 983 us/call x 9246 calls 2019-01-04 16:24:16 nan% transposeH : 1342 us/call x 9223 calls 2019-01-04 16:24:16 nan% transposeIn : nan us/call x 0 calls 2019-01-04 16:24:16 nan% transposeOut : nan us/call x 0 calls 2019-01-04 16:24:16 nan% square : nan us/call x 0 calls 2019-01-04 16:24:16 nan% multiply : 2652 us/call x 23 calls 2019-01-04 16:24:16 nan% multiplySub : nan us/call x 0 calls 2019-01-04 16:24:16 nan% tailFused : 1850 us/call x 9200 calls 2019-01-04 16:24:16 nan% readResidue : 0 us/call x 1 calls 2019-01-04 16:24:16 nan% isNotZero : nan us/call x 0 calls 2019-01-04 16:24:16 nan% isEqual : nan us/call x 0 calls[/CODE] |
I don't know much about OpenCL, but according to the Wikipedia page, it can be used not just on GPUs but also on other hardware such as FPGAs.
I don't know much about FPGAs either, but they are available for rent by the hour on AWS cloud. Specifically, Xilinx Virtex UltraScale+ VU9P FPGAs. A bit pricey though. Would it be possible to run gpuOwL on an FPGA? Do FPGAs offer greater flexibility that would enable tuning and better performance than a GPU could achieve? |
[QUOTE=GP2;505068]I don't know much about OpenCL, but according to the Wikipedia page, it can be used not just on GPUs but also on other hardware such as FPGAs.
I don't know much about FPGAs either, but they are available for rent by the hour on AWS cloud. Specifically, Xilinx Virtex UltraScale+ VU9P FPGAs. A bit pricey though. Would it be possible to run gpuOwL on an FPGA? Do FPGAs offer greater flexibility that would enable tuning and better performance than a GPU could achieve?[/QUOTE] Maybe. Not all OpenCLs are created equal. I tried running an early version of gpuOwl on an Intel IGP, and the results were less than promising. NVIDIA OpenCl was a no go at v0.5 and v1.9 and ~v3.8. Search for VCU1525, $3500 on eBay. No refunds. Just the board, not the whole development kit. [URL]https://www.xilinx.com/products/boards-and-kits/vcu1525-a.html#tabAnchor-documentation[/URL] |
[QUOTE=GP2;505068]I don't know much about OpenCL, but according to the Wikipedia page, it can be used not just on GPUs but also on other hardware such as FPGAs.
I don't know much about FPGAs either, but they are available for rent by the hour on AWS cloud. Specifically, Xilinx Virtex UltraScale+ VU9P FPGAs. A bit pricey though. Would it be possible to run gpuOwL on an FPGA? Do FPGAs offer greater flexibility that would enable tuning and better performance than a GPU could achieve?[/QUOTE] While OpenCL *may* be portable to some degree, the performance is not portable (and thus, IMO, the whole point of OpenCL "portability" is moot). I mean that even if it would run on an FPGA, it would probably run extremely slow before perf tuning. In practice, I strongly expect GpuOwl to not run at all on an FPGA. It does use LDS (Local Data Share) which likely is not available on FPGA. It uses DP FP heavily, which may not be present as specialized hardware sub-elements on the FPGA, and thus would be very expensive and rather slow to implement on plain FPGA. For FPGA, I think a different design that plays into FPGA's strengths is needed. And maybe some specialized DP units would help too. |
[QUOTE=kriesel;504925]
P-1 code that "merely" does P-1 efficiently and reliably on OpenCl is a good thing, an advance from the status quo. Any performance advantage, reusability of the full P-1 residue, etc, is a bonus. [/QUOTE] I agree. Unfortunately my PRP-1 implementation is not an efficient P-1 implementation: if used only for P-1, PRP-1 is wasteful. If OTOH the P-1 is continued with the PRP, then PRP-1 becomes an efficient implementation overall. In other words: if one wants to do standalone P-1 in OpenCL, a different implementation is needed, which would probably do "classic" P-1 as in mprime and cudapm1. |
gpuowl for OpenCL 1.2 binary
Hi kriesel and preda,
May I ask you to compile gpuowl for OpenCL 1.2 that supposed to run on Windows 10 1803 x64 with latest AMD OpenCL 15.7.1/15.11.1 driver for 5870 which doesn't support OpenCL 2.0. Unfortunately, gpuowl 5.0-df2bdf2 throws "aclBinary init failure" error: [code] 2019-01-08 08:27:48 Exiting because "OpenCL compilation" 2019-01-08 08:27:48 Bye 2019-01-08 08:34:57 gpuowl 5.0-df2bdf2 2019-01-08 08:34:57 2019-01-08 08:34:57 216091 FFT 128K: Width 64x4, Height 64x4; 1.65 bits/word 2019-01-08 08:34:57 using long carry kernels 2019-01-08 08:34:57 Cypress-20x 850-@1:0.0 AMD Radeon HD 5800 Series 2019-01-08 08:34:57 OpenCL compilation error -11 (args -DEXP=216091u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0 ) 2019-01-08 08:34:57 Error: aclBinary init failure 2019-01-08 08:34:57 Exiting because "OpenCL compilation" 2019-01-08 08:34:57 Bye 2019-01-08 08:53:24 gpuowl 5.0-df2bdf2[/code] I guess it expects OpenCL 2.0 due to [b]-cl-std=CL2.0[/b] parameter. Perhaps CL_HPP_CL_1_2_DEFAULT_BUILD macro should be defined before include cl2.hpp [url]https://github.com/KhronosGroup/OpenCL-CLHPP/issues/27#issuecomment-282794419[/url] |
[QUOTE=clarke;505291]Hi kriesel and preda,
May I ask you to compile gpuowl for OpenCL 1.2 that supposed to run on Windows 10 1803 x64 with latest AMD OpenCL 15.7.1/15.11.1 driver for 5870 which doesn't support OpenCL 2.0. [/QUOTE] gpuowl at this point requires OpenCL 2.0. The main reason for this was to get access to the new atomics primitives that become available in 2.0. It would be possible to move back and support again OpenCL 1.2, but that would require some work. Another option would be to get a driver that accepts -cl-std=2.0, which is not unusual on AMD at this point (e.g. both ROCm and amdgpu-pro accept -cl-std=2.0). I don't know about Windows and your particular GPU. |
[QUOTE=preda;505296]gpuowl at this point requires OpenCL 2.0. The main reason for this was to get access to the new atomics primitives that become available in 2.0.
It would be possible to move back and support again OpenCL 1.2, but that would require some work. Another option would be to get a driver that accepts -cl-std=2.0, which is not unusual on AMD at this point (e.g. both ROCm and amdgpu-pro accept -cl-std=2.0). I don't know about Windows and your particular GPU.[/QUOTE] One advantage of also supporting 1.2 would be nvidia support. |
Did gpuowl move to PRP before moving to OpenCL 2.0? Maybe there's an older version of gpuowl that would work.
|
1 Attachment(s)
Fairly current driver for RX480 & RX 550 is AMD Adrenalin 18.10.2, supports OpenCL 2.0 and some 2.1 apparently. Latest currently is 18.12.3.
For links to posts of Windows executables for various versions of gpuowl, see bottom of post 4 in [URL]https://www.mersenneforum.org/showthread.php?t=23391[/URL] If the 5870 of interest is an HD5870, that was introduced 9 years ago. [URL]https://www.techpowerup.com/gpu-specs/radeon-hd-5870.c253[/URL] May be worth upgrading the gpu, on a cost of electricity per throughput basis, or for performance. As I recall, the switch to PRP was around V0.7. Early versions had limited fft lengths. The GIMPS wavefront has passed 4M and is close to passing 5M (gpuowl V2.0). gpuowl 1.9 had 4, 8 and 16M. Performance and fft length choice is better in 3.5-3.8 but I think that is already OpenCl 2.0 territory. |
[QUOTE=preda;505296]gpuowl at this point requires OpenCL 2.0. The main reason for this was to get access to the new atomics primitives that become available in 2.0.
It would be possible to move back and support again OpenCL 1.2, but that would require some work. Another option would be to get a driver that accepts -cl-std=2.0, which is not unusual on AMD at this point (e.g. both ROCm and amdgpu-pro accept -cl-std=2.0). I don't know about Windows and your particular GPU.[/QUOTE] Unfortunately, it seems like there is no OpenCL 2.0 for pre-GCN AMD cards. [QUOTE=kriesel;505300]Fairly current driver for RX480 & RX 550 is AMD Adrenalin 18.10.2, supports OpenCL 2.0 and some 2.1 apparently. Latest currently is 18.12.3. For links to posts of Windows executables for various versions of gpuowl, see bottom of post 4 in [URL]https://www.mersenneforum.org/showthread.php?t=23391[/URL] If the 5870 of interest is an HD5870, that was introduced 9 years ago. [URL]https://www.techpowerup.com/gpu-specs/radeon-hd-5870.c253[/URL] May be worth upgrading the gpu, on a cost of electricity per throughput basis, or for performance. As I recall, the switch to PRP was around V0.7. Early versions had limited fft lengths. The GIMPS wavefront has passed 4M and is close to passing 5M (gpuowl V2.0). gpuowl 1.9 had 4, 8 and 16M. Performance and fft length choice is better in 3.5-3.8 but I think that is already OpenCl 2.0 territory.[/QUOTE] Agreed, that card is very old and power-hungry (; but I'm not ready to upgrade it. Yep, I've tried each and every Windows-version at that page and only 1.9 starts running with OCL 1.0/1.2. But it catches some error and doesn't move forward: [code] gpuOwL v1.9- GPU Mersenne primality checker AMD Radeon HD 5800 Series 20 @1:0.0, Cypress 850MHz OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=75002911u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 ) Error: aclBinary init failure ".\gpuowl.cl", line 67: warning: OpenCL extension is now part of core #pragma OPENCL EXTENSION cl_khr_fp64 : enable ^ OpenCL compilation in 2674 ms, with "-I. -cl-fast-relaxed-math -DEXP=75002911u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 " PRP-3: FFT 4M (1024 * 2048 * 2) of 75002911 (17.88 bits/word) [2019-01-08 09:52:45] Starting at iteration 0 OK 0 / 75002911 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [09:53:01] EE 1000 / 75002911 [ 0.00%], 27.83 ms/it; ETA 24d 03:42; 463de8cc34b3766c [09:53:44] EE 1000 / 75002911 [ 0.00%], 27.84 ms/it; ETA 24d 03:55; 463de8cc34b3766c [09:54:27] (1 errors) EE 1000 / 75002911 [ 0.00%], 27.82 ms/it; ETA 24d 03:38; 463de8cc34b3766c [09:55:11] (2 errors) EE 1000 / 75002911 [ 0.00%], 27.84 ms/it; ETA 24d 04:01; 463de8cc34b3766c [09:55:54] (3 errors) EE 1000 / 75002911 [ 0.00%], 27.82 ms/it; ETA 24d 03:41; 463de8cc34b3766c [09:56:37] (4 errors) EE 1000 / 75002911 [ 0.00%], 27.83 ms/it; ETA 24d 03:53; 463de8cc34b3766c [09:57:20] (5 errors) [/code] If this was fixed at later gpuowl versions, then I'm out of luck for now. |
[QUOTE=clarke;505337]Unfortunately, it seems like there is no OpenCL 2.0 for pre-GCN AMD cards.
Agreed, that card is very old and power-hungry (; but I'm not ready to upgrade it. Yep, I've tried each and every Windows-version at that page and only 1.9 starts running with OCL 1.0/1.2. But it catches some error and doesn't move forward: [code] gpuOwL v1.9- GPU Mersenne primality checker AMD Radeon HD 5800 Series 20 @1:0.0, Cypress 850MHz OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=75002911u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 ) Error: aclBinary init failure ".\gpuowl.cl", line 67: warning: OpenCL extension is now part of core #pragma OPENCL EXTENSION cl_khr_fp64 : enable ^ OpenCL compilation in 2674 ms, with "-I. -cl-fast-relaxed-math -DEXP=75002911u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 " PRP-3: FFT 4M (1024 * 2048 * 2) of 75002911 (17.88 bits/word) [2019-01-08 09:52:45] Starting at iteration 0 OK 0 / 75002911 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [09:53:01] EE 1000 / 75002911 [ 0.00%], 27.83 ms/it; ETA 24d 03:42; 463de8cc34b3766c [09:53:44] EE 1000 / 75002911 [ 0.00%], 27.84 ms/it; ETA 24d 03:55; 463de8cc34b3766c [09:54:27] (1 errors) EE 1000 / 75002911 [ 0.00%], 27.82 ms/it; ETA 24d 03:38; 463de8cc34b3766c [09:55:11] (2 errors) EE 1000 / 75002911 [ 0.00%], 27.84 ms/it; ETA 24d 04:01; 463de8cc34b3766c [09:55:54] (3 errors) EE 1000 / 75002911 [ 0.00%], 27.82 ms/it; ETA 24d 03:41; 463de8cc34b3766c [09:56:37] (4 errors) EE 1000 / 75002911 [ 0.00%], 27.83 ms/it; ETA 24d 03:53; 463de8cc34b3766c [09:57:20] (5 errors) [/code]If this was fixed at later gpuowl versions, then I'm out of luck for now.[/QUOTE] It's not broken, it's just your setup is incompatible. [CODE]gpuOwL v1.9- GPU Mersenne primality checker Radeon 500 Series 8 @f:0.0, gfx804 1203MHz OpenCL compilation in 2147 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 " PRP-3: FFT 4M (1024 * 2048 * 2) of 76812401 (18.31 bits/word) [2018-01-23 12:43:49 Central Standard Time] Starting at iteration 25373000 OK 25373000 / 76812401 [33.03%], 0.00 ms/it; ETA 0d 00:00; 6d6a6ebc97092826 [12:43:57] OK 25374000 / 76812401 [33.03%], 11.97 ms/it; ETA 7d 03:05; bb937b8a48c69d60 [12:44:17] OK 25375000 / 76812401 [33.03%], 11.97 ms/it; ETA 7d 03:04; b81a6f51602c2bd8 [12:44:36] OK 25380000 / 76812401 [33.04%], 11.96 ms/it; ETA 7d 02:50; 60bcb33b85922094 [12:45:44] OK 25390000 / 76812401 [33.05%], 12.00 ms/it; ETA 7d 03:26; 516093b7988f8ac4 [12:47:52] OK 25400000 / 76812401 [33.07%], 12.00 ms/it; ETA 7d 03:23; 5313239afe8bcffe [12:50:00] OK 25420000 / 76812401 [33.09%], 12.00 ms/it; ETA 7d 03:20; d04bc7fd72b07e36 [12:54:07] OK 25440000 / 76812401 [33.12%], 12.00 ms/it; ETA 7d 03:15; 679e6f34ac35a983 [/CODE] |
[QUOTE=preda;505076]While OpenCL *may* be portable to some degree, the performance is not portable (and thus, IMO, the whole point of OpenCL "portability" is moot). I mean that even if it would run on an FPGA, it would probably run extremely slow before perf tuning.
In practice, I strongly expect GpuOwl to not run at all on an FPGA. It does use LDS (Local Data Share) which likely is not available on FPGA. It uses DP FP heavily, which may not be present as specialized hardware sub-elements on the FPGA, and thus would be very expensive and rather slow to implement on plain FPGA. For FPGA, I think a different design that plays into FPGA's strengths is needed. And maybe some specialized DP units would help too.[/QUOTE] I have access to quite a few different FPGAs, and would happily provide them to anyone that wants to develop such a thing for trial factoring or prime searching. Happy to donate a few hundred of our Acorn boards to the cause, or a dozen VU9Ps. I do also have access to HBM FPGAs, but I don’t expect them to beat nviidia GPUs with the same memory bandwidth - since bandwidth seems to be the issue. |
[QUOTE=airsquirrels;505341]but I don’t expect them to beat nviidia GPUs with the same memory bandwidth - since bandwidth seems to be the issue.[/QUOTE]
OK, for LL testing there would be issues, but what about factoring? Factoring doesn't need memory bandwidth. Doesn't need DP either. Would there be any hope of running mfakto (OpenCL) on an FPGA? |
[QUOTE=GP2;505346]OK, for LL testing there would be issues, but what about factoring?
Factoring doesn't need memory bandwidth. Doesn't need DP either. Would there be any hope of running mfakto (OpenCL) on an FPGA?[/QUOTE] I don't have experience with FPGA development, so I'm not able to help here. But this is what I would see as an approach: - extract tiny streamlined, simplified OpenCL components from a trial-factorer. E.g. a very basic and simple sieve, or a simple modular exponentiation. - test and adapt for the FPGA in separation - repeat with the next component - when all the basic simple pieces work, put them together into an FPGA TFer. Starting with mfackto as a whole.. may not work as easily. Anyway, somebody with more FPGA experience should try I guess. |
[QUOTE=kriesel;505338]It's not broken, it's just your setup is incompatible.
[CODE]gpuOwL v1.9- GPU Mersenne primality checker Radeon 500 Series 8 @f:0.0, gfx804 1203MHz OpenCL compilation in 2147 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 " PRP-3: FFT 4M (1024 * 2048 * 2) of 76812401 (18.31 bits/word) [2018-01-23 12:43:49 Central Standard Time] Starting at iteration 25373000 OK 25373000 / 76812401 [33.03%], 0.00 ms/it; ETA 0d 00:00; 6d6a6ebc97092826 [12:43:57] OK 25374000 / 76812401 [33.03%], 11.97 ms/it; ETA 7d 03:05; bb937b8a48c69d60 [12:44:17] OK 25375000 / 76812401 [33.03%], 11.97 ms/it; ETA 7d 03:04; b81a6f51602c2bd8 [12:44:36] OK 25380000 / 76812401 [33.04%], 11.96 ms/it; ETA 7d 02:50; 60bcb33b85922094 [12:45:44] OK 25390000 / 76812401 [33.05%], 12.00 ms/it; ETA 7d 03:26; 516093b7988f8ac4 [12:47:52] OK 25400000 / 76812401 [33.07%], 12.00 ms/it; ETA 7d 03:23; 5313239afe8bcffe [12:50:00] OK 25420000 / 76812401 [33.09%], 12.00 ms/it; ETA 7d 03:20; d04bc7fd72b07e36 [12:54:07] OK 25440000 / 76812401 [33.12%], 12.00 ms/it; ETA 7d 03:15; 679e6f34ac35a983 [/CODE][/QUOTE] Your setup is RX5xx OpenCL 2.0. Indeed, something is wrong at my end: [code] gpuOwL v1.9- GPU Mersenne primality checker AMD Radeon HD 5800 Series 20 @1:0.0, Cypress 850MHz OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 ) Error: aclBinary init failure ".\gpuowl.cl", line 67: warning: OpenCL extension is now part of core #pragma OPENCL EXTENSION cl_khr_fp64 : enable ^ OpenCL compilation in 2771 ms, with "-I. -cl-fast-relaxed-math -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 " PRP-3: FFT 4M (1024 * 2048 * 2) of 76812401 (18.31 bits/word) [2019-01-09 07:49:05] Starting at iteration 0 OK 0 / 76812401 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [07:49:21] EE 1000 / 76812401 [ 0.00%], 27.18 ms/it; ETA 24d 03:55; c89d15ae90d209ec [07:50:03] EE 1000 / 76812401 [ 0.00%], 27.18 ms/it; ETA 24d 03:55; c89d15ae90d209ec [07:50:45] (1 errors) EE 1000 / 76812401 [ 0.00%], 27.15 ms/it; ETA 24d 03:12; c89d15ae90d209ec [07:51:28] (2 errors) [/code] Wondering if somebody has run 1.9 with 5xxx/6xxx series successfully. |
[QUOTE=clarke;505365]Wondering if somebody has run 1.9 with 5xxx/6xxx series successfully.[/QUOTE]Have you tried mfakto? (Might keep your HD5870 usefully busy while you look for a solution or save for a new card)
|
[QUOTE=kriesel;505390]Have you tried mfakto? (Might keep your HD5870 usefully busy while you look for a solution or save for a new card)[/QUOTE]
Yep, thank you, mfakto works well. I'll try to figure out if different 15.7.1/15.11.1 OpenCL releases make a difference for gpuowl for now. |
[QUOTE=clarke;505365]Your setup is RX5xx OpenCL 2.0. Indeed, something is wrong at my end:
[code] gpuOwL v1.9- GPU Mersenne primality checker AMD Radeon HD 5800 Series 20 @1:0.0, Cypress 850MHz OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 ) Error: aclBinary init failure ".\gpuowl.cl", line 67: warning: OpenCL extension is now part of core #pragma OPENCL EXTENSION cl_khr_fp64 : enable ^ OpenCL compilation in 2771 ms, with "-I. -cl-fast-relaxed-math -DEXP=76812401u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 " PRP-3: FFT 4M (1024 * 2048 * 2) of 76812401 (18.31 bits/word) [2019-01-09 07:49:05] Starting at iteration 0 OK 0 / 76812401 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [07:49:21] EE 1000 / 76812401 [ 0.00%], 27.18 ms/it; ETA 24d 03:55; c89d15ae90d209ec [07:50:03] EE 1000 / 76812401 [ 0.00%], 27.18 ms/it; ETA 24d 03:55; c89d15ae90d209ec [07:50:45] (1 errors) EE 1000 / 76812401 [ 0.00%], 27.15 ms/it; ETA 24d 03:12; c89d15ae90d209ec [07:51:28] (2 errors) [/code]Wondering if somebody has run 1.9 with 5xxx/6xxx series successfully.[/QUOTE] It may be possible that your FFT size is too small for the exponent. Try to specify the argument "-fft 5M". |
[QUOTE=SELROC;505962]It may be possible that your FFT size is too small for the exponent.
Try to specify the argument "-fft 5M".[/QUOTE] Belay that; V1.9 was before Preda implemented a 5M fft in V2.0. The purpose of running V1.9 was to try to get back to a version not requiring OpenCl V2. If fft size is an issue he could try -fft M61 instead of DP. It's slower than 4M DP but gives about 7% higher max exponent for the 4M size, and is faster than 8M DP. But the OpenCl version appears to still be an issue for his old gpu's driver at V1.9. The 4M DP transform in gpuOwL was capable of 78M exponent as I recall. |
[QUOTE=kriesel;505982]Belay that; V1.9 was before Preda implemented a 5M fft in V2.0. The purpose of running V1.9 was to try to get back to a version not requiring OpenCl V2. If fft size is an issue he could try -fft M61 instead of DP. It's slower than 4M DP but gives about 7% higher max exponent for the 4M size, and is faster than 8M DP. But the OpenCl version appears to still be an issue for his old gpu's driver at V1.9. The 4M DP transform in gpuOwL was capable of 78M exponent as I recall.[/QUOTE]
So there is no hope for this version ? |
P-1
It is my pleasure to announce.. P-1 in GpuOwl. Good old classic P-1.
1. worktodo.txt PFactor=90551623 PFactor=AID,1,2,90551623,-1,77,2 PFactor=N/A,1,2,90551623,-1,77,2 (in all the PFactor cases above, only the exponent and the AID are used) By default the P-1 task is processed with B1=1M and B2=30 * B1. These can be overriden by prepending the limits to any PFactor line above, with this syntax: B1=2000000;PFactor=90551623 B1=500000,B2=10000000;PFactor=90551623 The P-1 in GpuOwl always has E=2 (a parameter in stage2). The D parameter ("block size") is normally computed automatically based on the amount of memory available on the GPU. It can also be specified on the command line e.g. -D 210. The block size D must be a multiple of 210. Good values are D=2310 (but that wouldn't fit in a GPU with 8GB RAM), and D=210 or small multiples of 210. P-1 does not save the work to a savefile. If stopped (crash etc) the progress is lost. At this stage I'm very interested in bug reports. Most importantly, it situations where a factor which should be detected given the B1/B2, is not found. |
GpuOwl v6.1, just commit on github, has P-1. It needs GMP (for the GCD done on the CPU, as was before with PRP-1)
I must say, it was rather hard for me to understand P-1 stage2. (after the fact it doesn't look so terrible, I could explain it simply now I think) I found useful Alexander Kruppa's thesis: [url]https://tel.archives-ouvertes.fr/file/index/docid/477005/filename/thesis.ps[/url] (although even that was not easy reading). |
Does gpuowl accept cofactor work ?
PRP cofactor work type 160 PRP cofactor DC type 161 |
[QUOTE=SELROC;506750]Does gpuowl accept cofactor work ?
PRP cofactor work type 160 PRP cofactor DC type 161[/QUOTE] No, not now. I guess it could be added, but I don't know exactly how a cofactor test works, so I don't know how much work that'd be. |
[QUOTE=preda;506751]No, not now. I guess it could be added, but I don't know exactly how a cofactor test works, so I don't know how much work that'd be.[/QUOTE]
Right, I have just added a reminder to gpuowl for 100 million digit numbers. Work type 153. |
PRP cofactor work should very rarely be necessary anymore.
Instead, just do a single PRP test of the exponent itself, not taking factors into account even if there are known factors, and retaining a large number of bits in the residue (say, 2048). Then do a Gerbicz cofactor compositeness test for the cofactor. It is much faster than a PRP cofactor test. See [URL="https://mersenneforum.org/showthread.php?t=23462"]the original post[/URL], and here is [URL="http://mprime.s3-website.us-west-1.amazonaws.com/code/gerbicz_prp_cofactor.py"]my Python implementation[/URL] (using the gmpy2 module). Every time new factors are discovered, thereby creating a new cofactor, you just reuse the original 2048-bit PRP residue and re-run the Gerbicz cofactor compositeness test on the new cofactor. It will either tell you that the cofactor is definitely composite, or that it is a possible probable prime. Only in the latter case do you actually need to run a PRP cofactor test to confirm that it actually is a probable prime. However, for non-tiny exponents the chance of a false positive are very small. Note: the Gerbicz cofactor-compositeness test is completely different from Gerbicz error checking. It was just invented by the same guy. |
P-1 speed on Vega64
[QUOTE=preda;506748]P-1 in GpuOwl. Good old classic P-1.
[/QUOTE] As a rough speed indication, for a 90.6M exponent (the "P-1 wavefront"), on my Vega64 it takes about 2h for B1=1M, B2=30M. The time is split about equally between the two stages. The credit for P1 to those bounds is about 13.55GHzDays. |
I'm assuming 5b26497 (v6.2) is usable/stable for P-1, or is it still in testing?
Also, do you happen to know the last version that did not require OpenCL 2.x? |
[QUOTE=kracker;506993]Also, do you happen to know the last version that did not require OpenCL 2.x?[/QUOTE]
Why? some clues it goes way back, at[URL="https://github.com/preda/gpuowl"] https://github.com/preda/gpuowl[/URL] "use opencl 2.0 atomics in carry fused" Jul 27 2018 dd0f2b2 "dont attempt initial CL2.0 compilation anymore" Jan 22 2018 1aee5cc (V1.9?) "fix opencl 1.x FGT compilation (missing global)" Nov 8 2017 8c2e6d6 (V1.8 or 1.9 time frame) "add stupid global to pointers everywhere to make it compilable in cl 1.2" Sep 18 2017 d7930ed "bump version to 1.0; log and result format minor change; persistent c..." Aug 27 2017 676be1c |
[QUOTE=kracker;506993]I'm assuming 5b26497 (v6.2) is usable/stable for P-1, or is it still in testing?
Also, do you happen to know the last version that did not require OpenCL 2.x?[/QUOTE] Should be usable, yes, and I hope it's not buggy ("no known bugs" :), you're welcome to try it out. Should be pretty fast too. Let me know what exponent ranges you test, what FFT is selected, and of course if you find any factors :) You could try it initially with a couple of known factors, ideally in the same exponent range, to verify they're detected properly. About OpenCL 2.x -- as Ken said, it goes a bit back. The problem with OpenCL 1.x is that the kernel "carryFused" does not work without openCL 2.0 atomics, at least it does not work under ROCm which is a major driver for AMD. So maybe you could use a modern driver, such as ROCm or amdgpu-pro, which both support OpenCL 2.x |
GpuOwl 6.2 just gained -0.33 ms/sq on 5M FFT over the previous version for PRP.
|
[QUOTE=SELROC;507011]GpuOwl 6.2 just gained -0.33 ms/sq on 5M FFT over the previous version for PRP.[/QUOTE]
Side gains from P-1 :) I'll try to explain what changed. the FFT transforms that are power-of-two in size (such as 4M) are split (using a schema similar to the "matrix FFT algorithm") into two subtransforms of sizes WIDTH and HEIGHT, such that: Size = Width * Height, where both W and H are powers of two. the FFT transforms that are not power-of-two in size (such as 4.6M or 5M) are split into 3 sub-FFTs, of sizes that I call WIDTH, MIDDLE, and HEIGHT, with Size = Width * Middle * Height. Until now, Middle was one of: 3, 5, 9. In P-1 I found the need to reduce the Height size, and one way to achieve that was by increasing the Middle size. Thus I changed the possible Middle sizes to one of: 6, 9, 10 (by doubling the 3 and 5). As a side effect, PRP 5M now uses Middle=10 instead of the previous 5, and it turns out that this results in better performance. |
[QUOTE=preda;507002]Should be usable, yes, and I hope it's not buggy ("no known bugs" :), you're welcome to try it out. Should be pretty fast too. Let me know what exponent ranges you test, what FFT is selected, and of course if you find any factors :)
You could try it initially with a couple of known factors, ideally in the same exponent range, to verify they're detected properly. About OpenCL 2.x -- as Ken said, it goes a bit back. The problem with OpenCL 1.x is that the kernel "carryFused" does not work without openCL 2.0 atomics, at least it does not work under ROCm which is a major driver for AMD. So maybe you could use a modern driver, such as ROCm or amdgpu-pro, which both support OpenCL 2.x[/QUOTE] There's an assortment of known factor P-1 verification candidates in post 811. What's next? P-1 save files would be good to have for higher exponents. 100Mdigit exponents often don't get P-1 currently before primality testing, which is unfortunate. P-1 run times scale similarly to primality testing (p^2+) so may be a full 24 hour day or more for 100M digit. That or higher exponents are a bit long to go without save files. |
[QUOTE=kriesel;506999]Why?
some clues it goes way back, at[URL="https://github.com/preda/gpuowl"] https://github.com/preda/gpuowl[/URL] "use opencl 2.0 atomics in carry fused" Jul 27 2018 dd0f2b2 "dont attempt initial CL2.0 compilation anymore" Jan 22 2018 1aee5cc (V1.9?) "fix opencl 1.x FGT compilation (missing global)" Nov 8 2017 8c2e6d6 (V1.8 or 1.9 time frame) "add stupid global to pointers everywhere to make it compilable in cl 1.2" Sep 18 2017 d7930ed "bump version to 1.0; log and result format minor change; persistent c..." Aug 27 2017 676be1c[/QUOTE] Well, I have a older HD7770 card that doesn't support OpenCL 2.x(funny enough, the rebranded r7 250X does support it.. guess it's just a driver switch) I just remember I was able to run gpuowl on it before.(probably not worth running though considering the electricity cost...) |
How difficult would it be to add a version of carryfused which would run on OpenCL 1.x? Old gpus aren't worth much but if that would make it run on Nvidia cards it may be worth doing.
|
[QUOTE=henryzz;507108]How difficult would it be to add a version of carryfused which would run on OpenCL 1.x? Old gpus aren't worth much but if that would make it run on Nvidia cards it may be worth doing.[/QUOTE]
No, that won't make it run on Nvidia. There are deeper problems with Nvidia's OpenCL than just 1.x vs. 2.x. I've tried in the past to run GpuOwl on an Nvidia GPU, and it failed in funky ways. My take from that was that Nvidia is not interested in fixing their (1.x) OpenCL implementation. |
For the past few days I've been running P-1 with GpuOwl (with B1=1M, B2=30M). I'm surprised by the small number of factors found, which seems to be significantly less than the expected 3% - 3.4%. The first suspect is a bug in GpuOwl, particularly in stage2, which would cause it to miss factors. OTOH for all the tests I tried (where I ran GpuOwl on exponents with known factors), the factors were correctly detected in stage2.
Anyway, if anybody finds an exponent with a factor that should be detected (given the bounds) and isn't, that would be proof of a bug and would allow debugging. Until then, I only have a suspicion of a bug in stage2. |
[QUOTE=preda;507389]For the past few days I've been running P-1 with GpuOwl (with B1=1M, B2=30M). I'm surprised by the small number of factors found, which seems to be significantly less than the expected 3% - 3.4%. The first suspect is a bug in GpuOwl, particularly in stage2, which would cause it to miss factors. OTOH for all the tests I tried (where I ran GpuOwl on exponents with known factors), the factors were correctly detected in stage2.
Anyway, if anybody finds an exponent with a factor that should be detected (given the bounds) and isn't, that would be proof of a bug and would allow debugging. Until then, I only have a suspicion of a bug in stage2.[/QUOTE] How many known-factor cases have you run, and what did you find? (See post 811 in this thread for a list of some possible candidates.) |
1 Attachment(s)
[QUOTE=kriesel;507402]How many known-factor cases have you run, and what did you find? (See post 811 in this thread for a list of some possible candidates.)[/QUOTE]
I've run about 10 from the attached list. All that I've tried were detected correctly. Here are the factor found results. The ones without AID are test cases of known factors. [QUOTE] 2019-01-25 22:30:53 vega1 {"exponent":"86014009", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"6.1"}, "timestamp":"2019-01-25 11:30:53 UTC", "user":"preda", "computer":"vega1", "fft-length":4718592, "B1":10000, "B2":1000000, "factors":["262147231459344118478999"]} 2019-01-26 22:46:37 vega1 {"exponent":"86001449", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"6.1"}, "timestamp":"2019-01-26 11:46:37 UTC", "user":"preda", "computer":"vega1", "fft-length":4718592, "B1":10000, "B2":300000, "factors":["64262023024984019615711"]} 2019-01-26 22:53:56 vega1 {"exponent":"86001449", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"6.1"}, "timestamp":"2019-01-26 11:53:56 UTC", "user":"preda", "computer":"vega1", "fft-length":4718592, "B1":20000, "B2":1000000, "factors":["64262023024984019615711"]} 2019-01-26 23:00:41 vega1 {"exponent":"86001449", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"6.1"}, "timestamp":"2019-01-26 12:00:41 UTC", "user":"preda", "computer":"vega1", "fft-length":4718592, "B1":30000, "B2":1000000, "factors":["64262023024984019615711"]} 2019-01-26 23:07:37 vega1 {"exponent":"86001449", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"6.1"}, "timestamp":"2019-01-26 12:07:37 UTC", "user":"preda", "computer":"vega1", "fft-length":4718592, "B1":30000, "B2":1000000, "factors":["64262023024984019615711"]} 2019-01-29 15:00:59 vega1 {"exponent":"90389279", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"6.1"}, "timestamp":"2019-01-29 04:00:59 UTC", "user":"preda", "computer":"vega1", "aid":"446A4C8D876E30929CFF650BC1510296", "fft-length":5242880, "B1":1000000, "B2":30000000, "factors":["596345629997606032958593"]} 2019-01-30 15:27:56 vega1 {"exponent":"90399973", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"6.2"}, "timestamp":"2019-01-30 04:27:56 UTC", "user":"preda", "computer":"vega1", "aid":"D551E55210FEDE3B539CAC488B140DC0", "fft-length":5242880, "B1":1000000, "B2":30000000, "factors":["2691899164164806875763639"]} 2019-01-30 19:14:29 vega1 {"exponent":"90555943", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"6.2"}, "timestamp":"2019-01-30 08:14:29 UTC", "user":"preda", "computer":"vega1", "fft-length":5242880, "B1":1000000, "B2":30000000, "factors":["2849951345359023265136617"]} 2019-01-14 23:57:42 vega0 {"exponent":"86005021", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"6.0"}, "timestamp":"2019-01-14 12:57:42 UTC", "user":"preda", "computer":"vega0", "fft-length":4718592, "B1":"20000, "B2":"600000, "factors":["43592319559794136631809"]} 2019-01-26 23:42:18 vega0 {"exponent":"86896181", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"6.1"}, "timestamp":"2019-01-26 12:42:18 UTC", "user":"preda", "computer":"vega0", "fft-length":5242880, "B1":250000, "B2":1000000, "factors":["27645613040037353343863"]} 2019-01-28 04:01:43 vega0 {"exponent":"86897623", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"6.1"}, "timestamp":"2019-01-27 17:01:43 UTC", "user":"preda", "computer":"vega0", "fft-length":5242880, "B1":300000, "B2":10000000, "factors":["184649453011014777569639"]} 2019-01-29 01:38:33 vega0 {"exponent":"90547781", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"6.2"}, "timestamp":"2019-01-28 14:38:33 UTC", "user":"preda", "computer":"vega0", "fft-length":5242880, "B1":500000, "B2":15000000, "factors":["171971868822535851152989810327"]} 2019-01-30 20:38:52 vega0 {"exponent":"90391841", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"6.2"}, "timestamp":"2019-01-30 09:38:52 UTC", "user":"preda", "computer":"vega0", "fft-length":5242880, "B1":1000000, "B2":30000000, "factors":["1704260124325096144766123992129"]} [/QUOTE] |
GPU OWL crashes upon starting?
I recently updated to the newest adrenaline 19.2.1 driver and no matter what version of GPUOWL I use it would just crash and freeze my system when my GPU is 100% stable in anything else. What can I do to resolve that?
|
[QUOTE=xx005fs;507489]I recently updated to the newest adrenaline 19.2.1 driver and no matter what version of GPUOWL I use it would just crash and freeze my system when my GPU is 100% stable in anything else. What can I do to resolve that?[/QUOTE]That must be frustrating. Why upgrade to it? What OS version? What clues are present in system logs? Could you downgrade the driver?
I'm running drver version 18.10.2 on Win7 x64 and gpuowl runs for weeks on it. I've found that usually driver upgrades reduce gpuowl performance, so I delay upgrades. |
[QUOTE=kriesel;507496]That must be frustrating. Why upgrade to it? What OS version? What clues are present in system logs? Could you downgrade the driver?
I'm running drver version 18.10.2 on Win7 x64 and gpuowl runs for weeks on it. I've found that usually driver upgrades reduce gpuowl performance, so I delay upgrades.[/QUOTE] I was originally using 18.9.1 driver which was perfectly fine. I reckon I wanted to update the driver to check out the new features, and that's when I realized it broke. Downgraded driver and everything was okay again. I am using windows 10 home with the newest update and there is nothing in the logs beside just saying what FFT is used. It doesn't even pass the first 800 iters initial test before freezing my entire system. This issue was originally there when I first upgraded to adrenaline 2019 and 2018 versions are perfectly okay (FYI). |
gpuowl V6.0-b7bb1c3 Win64 build and first takes
1 Attachment(s)
Help output:
[CODE]C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -h 2019-02-04 23:01:34 gpuowl 6.0-b7bb1c3 Command line options: -user <name> : specify the user name. -cpu <name> : specify the hardware name. -time : display kernel profiling information. -fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1. -block <value> : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner. -carry long|short : force carry type. Short carry may be faster, but requires high bits/word. -list fft : display a list of available FFT configurations. -tf <bit-offset> : enable auto trial factoring before PRP. Pass 0 to bit-offset for default TF depth. -device <N> : select a specific device: Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics gfx804-8x1203-@3:0.0 Radeon 550 Series [/CODE]fft length list not available without a worktodo file with a valid first entry[CODE] C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -list fft 2019-02-04 23:01:53 gpuowl 6.0-b7bb1c3 2019-02-04 23:01:53 -list fft 2019-02-04 23:01:53 Can't open 'worktodo.txt' (mode 'rb') 2019-02-04 23:01:53 Bye[/CODE]New shorter fft lengths[CODE] C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -list fft 2019-02-04 23:05:21 gpuowl 6.0-b7bb1c3 2019-02-04 23:05:21 -list fft 2019-02-04 23:05:21 FFT 8K [ 0.01M - 0.18M] 64-64 2019-02-04 23:05:21 FFT 24K [ 0.04M - 0.51M] 64-64-3 2019-02-04 23:05:21 FFT 32K [ 0.05M - 0.68M] 64-256 256-64 2019-02-04 23:05:21 FFT 40K [ 0.06M - 0.85M] 64-64-5 2019-02-04 23:05:21 FFT 64K [ 0.10M - 1.34M] 64-512 512-64 2019-02-04 23:05:21 FFT 72K [ 0.11M - 1.50M] 64-64-9 2019-02-04 23:05:21 FFT 96K [ 0.15M - 1.99M] 64-256-3 256-64-3 2019-02-04 23:05:21 FFT 128K [ 0.20M - 2.63M] 1K-64 64-1K 256-256 2019-02-04 23:05:21 FFT 160K [ 0.25M - 3.27M] 64-256-5 256-64-5 2019-02-04 23:05:21 FFT 192K [ 0.29M - 3.91M] 64-512-3 512-64-3 2019-02-04 23:05:21 FFT 256K [ 0.39M - 5.18M] 64-2K 256-512 512-256 2K-64 2019-02-04 23:05:21 FFT 288K [ 0.44M - 5.81M] 64-256-9 256-64-9 2019-02-04 23:05:21 FFT 320K [ 0.49M - 6.44M] 64-512-5 512-64-5 2019-02-04 23:05:21 FFT 384K [ 0.59M - 7.69M] 1K-64-3 64-1K-3 256-256-3 2019-02-04 23:05:21 FFT 512K [ 0.79M - 10.18M] 1K-256 256-1K 512-512 4K-64 2019-02-04 23:05:21 FFT 576K [ 0.88M - 11.42M] 64-512-9 512-64-9 2019-02-04 23:05:21 FFT 640K [ 0.98M - 12.66M] 1K-64-5 64-1K-5 256-256-5 2019-02-04 23:05:21 FFT 768K [ 1.18M - 15.12M] 64-2K-3 256-512-3 512-256-3 2K-64-3 2019-02-04 23:05:21 FFT 1M [ 1.57M - 20.02M] 1K-512 256-2K 512-1K 2K-256 2019-02-04 23:05:21 FFT 1152K [ 1.77M - 22.45M] 1K-64-9 64-1K-9 256-256-9 2019-02-04 23:05:21 FFT 1280K [ 1.97M - 24.88M] 64-2K-5 256-512-5 512-256-5 2K-64-5 2019-02-04 23:05:21 FFT 1536K [ 2.36M - 29.72M] 1K-256-3 256-1K-3 512-512-3 4K-64-3 2019-02-04 23:05:21 FFT 2M [ 3.15M - 39.34M] 1K-1K 512-2K 2K-512 4K-256 2019-02-04 23:05:21 FFT 2304K [ 3.54M - 44.13M] 64-2K-9 256-512-9 512-256-9 2K-64-9 2019-02-04 23:05:21 FFT 2560K [ 3.93M - 48.90M] 1K-256-5 256-1K-5 512-512-5 4K-64-5 2019-02-04 23:05:21 FFT 3M [ 4.72M - 58.41M] 1K-512-3 256-2K-3 512-1K-3 2K-256-3 2019-02-04 23:05:21 FFT 4M [ 6.29M - 77.30M] 1K-2K 2K-1K 4K-512 2019-02-04 23:05:21 FFT 4608K [ 7.08M - 86.70M] 1K-256-9 256-1K-9 512-512-9 4K-64-9 2019-02-04 23:05:21 FFT 5M [ 7.86M - 96.07M] 1K-512-5 256-2K-5 512-1K-5 2K-256-5 2019-02-04 23:05:21 FFT 6M [ 9.44M - 114.74M] 1K-1K-3 512-2K-3 2K-512-3 4K-256-3 2019-02-04 23:05:21 FFT 8M [ 12.58M - 151.83M] 2K-2K 4K-1K 2019-02-04 23:05:21 FFT 9M [ 14.16M - 170.28M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9 2019-02-04 23:05:21 FFT 10M [ 15.73M - 188.68M] 1K-1K-5 512-2K-5 2K-512-5 4K-256-5 2019-02-04 23:05:21 FFT 12M [ 18.87M - 225.32M] 1K-2K-3 2K-1K-3 4K-512-3 2019-02-04 23:05:21 FFT 16M [ 25.17M - 298.13M] 4K-2K 2019-02-04 23:05:21 FFT 18M [ 28.31M - 334.34M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9 2019-02-04 23:05:21 FFT 20M [ 31.46M - 370.44M] 1K-2K-5 2K-1K-5 4K-512-5 2019-02-04 23:05:21 FFT 24M [ 37.75M - 442.34M] 2K-2K-3 4K-1K-3 2019-02-04 23:05:21 FFT 36M [ 56.62M - 656.22M] 1K-2K-9 2K-1K-9 4K-512-9 2019-02-04 23:05:21 FFT 40M [ 62.91M - 727.03M] 2K-2K-5 4K-1K-5 2019-02-04 23:05:21 FFT 48M [ 75.50M - 868.07M] 4K-2K-3 2019-02-04 23:05:21 FFT 72M [113.25M - 1287.53M] 2K-2K-9 4K-1K-9 2019-02-04 23:05:21 FFT 80M [125.83M - 1426.38M] 4K-2K-5 2019-02-04 23:05:21 FFT 144M [226.49M - 2525.23M] 4K-2K-9 2019-02-04 23:05:21 332220523 FFT 18432K: Width 256x4, Height 256x4, Middle 9; 17.60 bits/word 2019-02-04 23:05:21 using short carry kernels 2019-02-04 23:05:21 Exiting because "No OpenCL device" 2019-02-04 23:05:21 Bye[/CODE]but did not default to device zero and just run afterward. Exits if smallest fft length is too large for first worktodo entry. (why not comment out first entry, and go on to next?)[CODE] C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -device 0 -user kriesel -cpu condorella/rx-480 2019-02-04 23:17:18 gpuowl 6.0-b7bb1c3 2019-02-04 23:17:18 condorella/rx-480 -device 0 -user kriesel -cpu condorella/rx-480 2019-02-04 23:17:18 condorella/rx-480 11213 FFT 8K: Width 8x8, Height 8x8; 1.37 bits/word 2019-02-04 23:17:18 condorella/rx-480 FFT size too large for exponent (1.37 bits/word). 2019-02-04 23:17:18 condorella/rx-480 Exiting because "FFT size too large" 2019-02-04 23:17:18 condorella/rx-480 Bye[/CODE]Had a problem with 216091 (default, long or short carry):[CODE] 2019-02-04 23:22:47 condorella/rx-480 216091 FFT 24K: Width 8x8, Height 8x8, Middle 3; 8.79 bits/word 2019-02-04 23:22:47 condorella/rx-480 using long carry kernels 2019-02-04 23:22:50 condorella/rx-480 OpenCL compilation in 2510 ms, with "-DEXP=216091u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=3u - cl-std=CL2.0 " 2019-02-04 23:22:50 condorella/rx-480 216091.owl not found, starting from the beginning. 2019-02-04 23:22:50 condorella/rx-480 216091 EE loaded: 0, blockSize 400, 0000000000000000 (expected 0000000000000003) 2019-02-04 23:22:50 condorella/rx-480 Exiting because "error on load" 2019-02-04 23:22:50 condorella/rx-480 Bye[/CODE]Similar issue with 756839:[CODE] C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -device 0 -user kriesel -cpu condorella/rx-480 -carry long 2019-02-04 23:25:27 gpuowl 6.0-b7bb1c3 2019-02-04 23:25:27 condorella/rx-480 -device 0 -user kriesel -cpu condorella/rx-480 -carry long 2019-02-04 23:25:27 condorella/rx-480 756839 FFT 40K: Width 8x8, Height 8x8, Middle 5; 18.48 bits/word 2019-02-04 23:25:27 condorella/rx-480 using long carry kernels 2019-02-04 23:25:34 condorella/rx-480 OpenCL compilation in 3007 ms, with "-DEXP=756839u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=5u - cl-std=CL2.0 " 2019-02-04 23:25:34 condorella/rx-480 756839.owl not found, starting from the beginning. 2019-02-04 23:25:34 condorella/rx-480 756839 EE loaded: 0, blockSize 400, 0000000000000000 (expected 0000000000000003) 2019-02-04 23:25:34 condorella/rx-480 Exiting because "error on load" 2019-02-04 23:25:34 condorella/rx-480 Bye C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -device 0 -user kriesel -cpu condorella/rx-480 2019-02-04 23:25:45 gpuowl 6.0-b7bb1c3 2019-02-04 23:25:45 condorella/rx-480 -device 0 -user kriesel -cpu condorella/rx-480 2019-02-04 23:25:45 condorella/rx-480 756839 FFT 40K: Width 8x8, Height 8x8, Middle 5; 18.48 bits/word 2019-02-04 23:25:45 condorella/rx-480 using short carry kernels 2019-02-04 23:25:51 condorella/rx-480 OpenCL compilation in 2980 ms, with "-DEXP=756839u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=5u - cl-std=CL2.0 " 2019-02-04 23:25:51 condorella/rx-480 756839.owl not found, starting from the beginning. 2019-02-04 23:25:51 condorella/rx-480 756839 EE loaded: 0, blockSize 400, 0000000000000000 (expected 0000000000000003) 2019-02-04 23:25:51 condorella/rx-480 Exiting because "error on load" 2019-02-04 23:25:51 condorella/rx-480 Bye[/CODE]The 4.5M fft length appeared to be a few percent faster than the previous best (V3.8) on RX-480. |
[QUOTE=kriesel;507684]Help output:
[CODE]C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -h 2019-02-04 23:01:34 gpuowl 6.0-b7bb1c3 Command line options: -user <name> : specify the user name. -cpu <name> : specify the hardware name. -time : display kernel profiling information. -fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1. -block <value> : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner. -carry long|short : force carry type. Short carry may be faster, but requires high bits/word. -list fft : display a list of available FFT configurations. -tf <bit-offset> : enable auto trial factoring before PRP. Pass 0 to bit-offset for default TF depth. -device <N> : select a specific device: Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics gfx804-8x1203-@3:0.0 Radeon 550 Series [/CODE]fft length list not available without a worktodo file with a valid first entry[CODE] C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -list fft 2019-02-04 23:01:53 gpuowl 6.0-b7bb1c3 2019-02-04 23:01:53 -list fft 2019-02-04 23:01:53 Can't open 'worktodo.txt' (mode 'rb') 2019-02-04 23:01:53 Bye[/CODE]New shorter fft lengths[CODE] C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -list fft 2019-02-04 23:05:21 gpuowl 6.0-b7bb1c3 2019-02-04 23:05:21 -list fft 2019-02-04 23:05:21 FFT 8K [ 0.01M - 0.18M] 64-64 2019-02-04 23:05:21 FFT 24K [ 0.04M - 0.51M] 64-64-3 2019-02-04 23:05:21 FFT 32K [ 0.05M - 0.68M] 64-256 256-64 2019-02-04 23:05:21 FFT 40K [ 0.06M - 0.85M] 64-64-5 2019-02-04 23:05:21 FFT 64K [ 0.10M - 1.34M] 64-512 512-64 2019-02-04 23:05:21 FFT 72K [ 0.11M - 1.50M] 64-64-9 2019-02-04 23:05:21 FFT 96K [ 0.15M - 1.99M] 64-256-3 256-64-3 2019-02-04 23:05:21 FFT 128K [ 0.20M - 2.63M] 1K-64 64-1K 256-256 2019-02-04 23:05:21 FFT 160K [ 0.25M - 3.27M] 64-256-5 256-64-5 2019-02-04 23:05:21 FFT 192K [ 0.29M - 3.91M] 64-512-3 512-64-3 2019-02-04 23:05:21 FFT 256K [ 0.39M - 5.18M] 64-2K 256-512 512-256 2K-64 2019-02-04 23:05:21 FFT 288K [ 0.44M - 5.81M] 64-256-9 256-64-9 2019-02-04 23:05:21 FFT 320K [ 0.49M - 6.44M] 64-512-5 512-64-5 2019-02-04 23:05:21 FFT 384K [ 0.59M - 7.69M] 1K-64-3 64-1K-3 256-256-3 2019-02-04 23:05:21 FFT 512K [ 0.79M - 10.18M] 1K-256 256-1K 512-512 4K-64 2019-02-04 23:05:21 FFT 576K [ 0.88M - 11.42M] 64-512-9 512-64-9 2019-02-04 23:05:21 FFT 640K [ 0.98M - 12.66M] 1K-64-5 64-1K-5 256-256-5 2019-02-04 23:05:21 FFT 768K [ 1.18M - 15.12M] 64-2K-3 256-512-3 512-256-3 2K-64-3 2019-02-04 23:05:21 FFT 1M [ 1.57M - 20.02M] 1K-512 256-2K 512-1K 2K-256 2019-02-04 23:05:21 FFT 1152K [ 1.77M - 22.45M] 1K-64-9 64-1K-9 256-256-9 2019-02-04 23:05:21 FFT 1280K [ 1.97M - 24.88M] 64-2K-5 256-512-5 512-256-5 2K-64-5 2019-02-04 23:05:21 FFT 1536K [ 2.36M - 29.72M] 1K-256-3 256-1K-3 512-512-3 4K-64-3 2019-02-04 23:05:21 FFT 2M [ 3.15M - 39.34M] 1K-1K 512-2K 2K-512 4K-256 2019-02-04 23:05:21 FFT 2304K [ 3.54M - 44.13M] 64-2K-9 256-512-9 512-256-9 2K-64-9 2019-02-04 23:05:21 FFT 2560K [ 3.93M - 48.90M] 1K-256-5 256-1K-5 512-512-5 4K-64-5 2019-02-04 23:05:21 FFT 3M [ 4.72M - 58.41M] 1K-512-3 256-2K-3 512-1K-3 2K-256-3 2019-02-04 23:05:21 FFT 4M [ 6.29M - 77.30M] 1K-2K 2K-1K 4K-512 2019-02-04 23:05:21 FFT 4608K [ 7.08M - 86.70M] 1K-256-9 256-1K-9 512-512-9 4K-64-9 2019-02-04 23:05:21 FFT 5M [ 7.86M - 96.07M] 1K-512-5 256-2K-5 512-1K-5 2K-256-5 2019-02-04 23:05:21 FFT 6M [ 9.44M - 114.74M] 1K-1K-3 512-2K-3 2K-512-3 4K-256-3 2019-02-04 23:05:21 FFT 8M [ 12.58M - 151.83M] 2K-2K 4K-1K 2019-02-04 23:05:21 FFT 9M [ 14.16M - 170.28M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9 2019-02-04 23:05:21 FFT 10M [ 15.73M - 188.68M] 1K-1K-5 512-2K-5 2K-512-5 4K-256-5 2019-02-04 23:05:21 FFT 12M [ 18.87M - 225.32M] 1K-2K-3 2K-1K-3 4K-512-3 2019-02-04 23:05:21 FFT 16M [ 25.17M - 298.13M] 4K-2K 2019-02-04 23:05:21 FFT 18M [ 28.31M - 334.34M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9 2019-02-04 23:05:21 FFT 20M [ 31.46M - 370.44M] 1K-2K-5 2K-1K-5 4K-512-5 2019-02-04 23:05:21 FFT 24M [ 37.75M - 442.34M] 2K-2K-3 4K-1K-3 2019-02-04 23:05:21 FFT 36M [ 56.62M - 656.22M] 1K-2K-9 2K-1K-9 4K-512-9 2019-02-04 23:05:21 FFT 40M [ 62.91M - 727.03M] 2K-2K-5 4K-1K-5 2019-02-04 23:05:21 FFT 48M [ 75.50M - 868.07M] 4K-2K-3 2019-02-04 23:05:21 FFT 72M [113.25M - 1287.53M] 2K-2K-9 4K-1K-9 2019-02-04 23:05:21 FFT 80M [125.83M - 1426.38M] 4K-2K-5 2019-02-04 23:05:21 FFT 144M [226.49M - 2525.23M] 4K-2K-9 2019-02-04 23:05:21 332220523 FFT 18432K: Width 256x4, Height 256x4, Middle 9; 17.60 bits/word 2019-02-04 23:05:21 using short carry kernels 2019-02-04 23:05:21 Exiting because "No OpenCL device" 2019-02-04 23:05:21 Bye[/CODE]but did not default to device zero and just run afterward. Exits if smallest fft length is too large for first worktodo entry. (why not comment out first entry, and go on to next?)[CODE] C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -device 0 -user kriesel -cpu condorella/rx-480 2019-02-04 23:17:18 gpuowl 6.0-b7bb1c3 2019-02-04 23:17:18 condorella/rx-480 -device 0 -user kriesel -cpu condorella/rx-480 2019-02-04 23:17:18 condorella/rx-480 11213 FFT 8K: Width 8x8, Height 8x8; 1.37 bits/word 2019-02-04 23:17:18 condorella/rx-480 FFT size too large for exponent (1.37 bits/word). 2019-02-04 23:17:18 condorella/rx-480 Exiting because "FFT size too large" 2019-02-04 23:17:18 condorella/rx-480 Bye[/CODE]Had a problem with 216091 (default, long or short carry):[CODE] 2019-02-04 23:22:47 condorella/rx-480 216091 FFT 24K: Width 8x8, Height 8x8, Middle 3; 8.79 bits/word 2019-02-04 23:22:47 condorella/rx-480 using long carry kernels 2019-02-04 23:22:50 condorella/rx-480 OpenCL compilation in 2510 ms, with "-DEXP=216091u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=3u - cl-std=CL2.0 " 2019-02-04 23:22:50 condorella/rx-480 216091.owl not found, starting from the beginning. 2019-02-04 23:22:50 condorella/rx-480 216091 EE loaded: 0, blockSize 400, 0000000000000000 (expected 0000000000000003) 2019-02-04 23:22:50 condorella/rx-480 Exiting because "error on load" 2019-02-04 23:22:50 condorella/rx-480 Bye[/CODE]Similar issue with 756839:[CODE] C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -device 0 -user kriesel -cpu condorella/rx-480 -carry long 2019-02-04 23:25:27 gpuowl 6.0-b7bb1c3 2019-02-04 23:25:27 condorella/rx-480 -device 0 -user kriesel -cpu condorella/rx-480 -carry long 2019-02-04 23:25:27 condorella/rx-480 756839 FFT 40K: Width 8x8, Height 8x8, Middle 5; 18.48 bits/word 2019-02-04 23:25:27 condorella/rx-480 using long carry kernels 2019-02-04 23:25:34 condorella/rx-480 OpenCL compilation in 3007 ms, with "-DEXP=756839u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=5u - cl-std=CL2.0 " 2019-02-04 23:25:34 condorella/rx-480 756839.owl not found, starting from the beginning. 2019-02-04 23:25:34 condorella/rx-480 756839 EE loaded: 0, blockSize 400, 0000000000000000 (expected 0000000000000003) 2019-02-04 23:25:34 condorella/rx-480 Exiting because "error on load" 2019-02-04 23:25:34 condorella/rx-480 Bye C:\msys64\home\ken\gpuowl-compile\v6.0-b7bb1c3>openowl -device 0 -user kriesel -cpu condorella/rx-480 2019-02-04 23:25:45 gpuowl 6.0-b7bb1c3 2019-02-04 23:25:45 condorella/rx-480 -device 0 -user kriesel -cpu condorella/rx-480 2019-02-04 23:25:45 condorella/rx-480 756839 FFT 40K: Width 8x8, Height 8x8, Middle 5; 18.48 bits/word 2019-02-04 23:25:45 condorella/rx-480 using short carry kernels 2019-02-04 23:25:51 condorella/rx-480 OpenCL compilation in 2980 ms, with "-DEXP=756839u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=5u - cl-std=CL2.0 " 2019-02-04 23:25:51 condorella/rx-480 756839.owl not found, starting from the beginning. 2019-02-04 23:25:51 condorella/rx-480 756839 EE loaded: 0, blockSize 400, 0000000000000000 (expected 0000000000000003) 2019-02-04 23:25:51 condorella/rx-480 Exiting because "error on load" 2019-02-04 23:25:51 condorella/rx-480 Bye[/CODE]The 4.5M fft length appeared to be a few percent faster than the previous best (V3.8) on RX-480.[/QUOTE] That is an old version. Now at version 6.2 ... |
[QUOTE=SELROC;507686]That is an old version. Now at version 6.2 ...[/QUOTE]Yes. I'm playing catchup.
|
gpuowl v6.1 Windows build and first takes
1 Attachment(s)
A few warnings during the make (which also occurred for v6.0):[CODE]$ make openowl-win
g++ -std=c++17 -O2 -DREV=\"569e6ef\" -Wall Pm1Plan.cpp GmpUtil.cpp Worktodo.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp Primes.cpp state.cpp Signal.cpp FFTConfig.cpp -o openowl-win-569e6ef -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static Gpu.cpp: In member function 'PRPState Gpu::loadPRP(u32, u32, Buffer&, Buffer&, Buffer&)': Gpu.cpp:470:9: warning: unknown conversion type character 'l' in format [-Wformat=] log("%u EE loaded: %d, blockSize %d, %016llx (expected %016llx)\n", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Gpu.cpp:470:9: warning: unknown conversion type character 'l' in format [-Wformat=] Gpu.cpp:470:9: warning: too many arguments for format [-Wformat-extra-args] Gpu.cpp: In member function 'std::pair<bool, long long unsigned int> Gpu::isPrimePRP(u32, const Args&)': Gpu.cpp:517:11: warning: unknown conversion type character 'l' in format [-Wformat=] log("%s %8d / %d, %016llx\n", isPrime ? "PP" : "CC", kEnd, E, finalRes64); ^~~~~~~~~~~~~~~~~~~~~~~~ Gpu.cpp:517:11: warning: too many arguments for format [-Wformat-extra-args] checkpoint.cpp: In member function 'void PRPState::loadInt(u32, u32)': checkpoint.cpp:81:9: warning: unknown conversion type character 'l' in format [-Wformat=] log("%s loaded: k %u, block %u, res64 %016llx\n", name.c_str(), k, blockSize, res64); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ checkpoint.cpp:81:9: warning: too many arguments for format [-Wformat-extra-args] [/CODE]Showing devices with preceding device numbers, and combining list of available fft configurations into -h are good changes. [CODE]C:\msys64\home\ken\gpuowl-compile\v6.1-569e6ef>openowl-win-v61-569e6ef -h 2019-02-05 08:48:52 gpuowl 6.1-569e6ef Command line options: -user <name> : specify the user name. -cpu <name> : specify the hardware name. -time : display kernel profiling information. -fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1. -block <value> : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner. -carry long|short : force carry type. Short carry may be faster, but requires high bits/word. -D <value> : P-1 second-stage D block size; multiple of 210; default auto based on GPU available memory. -device <N> : select a specific device: 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series FFT Configurations: FFT 8K [ 0.01M - 0.18M] 64-64 FFT 24K [ 0.04M - 0.51M] 64-64-3 FFT 32K [ 0.05M - 0.68M] 64-256 256-64 FFT 40K [ 0.06M - 0.85M] 64-64-5 FFT 64K [ 0.10M - 1.34M] 64-512 512-64 FFT 72K [ 0.11M - 1.50M] 64-64-9 FFT 96K [ 0.15M - 1.99M] 64-256-3 256-64-3 FFT 128K [ 0.20M - 2.63M] 1K-64 64-1K 256-256 FFT 160K [ 0.25M - 3.27M] 64-256-5 256-64-5 FFT 192K [ 0.29M - 3.91M] 64-512-3 512-64-3 FFT 256K [ 0.39M - 5.18M] 64-2K 256-512 512-256 2K-64 FFT 288K [ 0.44M - 5.81M] 64-256-9 256-64-9 FFT 320K [ 0.49M - 6.44M] 64-512-5 512-64-5 FFT 384K [ 0.59M - 7.69M] 1K-64-3 64-1K-3 256-256-3 FFT 512K [ 0.79M - 10.18M] 1K-256 256-1K 512-512 4K-64 FFT 576K [ 0.88M - 11.42M] 64-512-9 512-64-9 FFT 640K [ 0.98M - 12.66M] 1K-64-5 64-1K-5 256-256-5 FFT 768K [ 1.18M - 15.12M] 64-2K-3 256-512-3 512-256-3 2K-64-3 FFT 1M [ 1.57M - 20.02M] 1K-512 256-2K 512-1K 2K-256 FFT 1152K [ 1.77M - 22.45M] 1K-64-9 64-1K-9 256-256-9 FFT 1280K [ 1.97M - 24.88M] 64-2K-5 256-512-5 512-256-5 2K-64-5 FFT 1536K [ 2.36M - 29.72M] 1K-256-3 256-1K-3 512-512-3 4K-64-3 FFT 2M [ 3.15M - 39.34M] 1K-1K 512-2K 2K-512 4K-256 FFT 2304K [ 3.54M - 44.13M] 64-2K-9 256-512-9 512-256-9 2K-64-9 FFT 2560K [ 3.93M - 48.90M] 1K-256-5 256-1K-5 512-512-5 4K-64-5 FFT 3M [ 4.72M - 58.41M] 1K-512-3 256-2K-3 512-1K-3 2K-256-3 FFT 4M [ 6.29M - 77.30M] 1K-2K 2K-1K 4K-512 FFT 4608K [ 7.08M - 86.70M] 1K-256-9 256-1K-9 512-512-9 4K-64-9 FFT 5M [ 7.86M - 96.07M] 1K-512-5 256-2K-5 512-1K-5 2K-256-5 FFT 6M [ 9.44M - 114.74M] 1K-1K-3 512-2K-3 2K-512-3 4K-256-3 FFT 8M [ 12.58M - 151.83M] 2K-2K 4K-1K FFT 9M [ 14.16M - 170.28M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9 FFT 10M [ 15.73M - 188.68M] 1K-1K-5 512-2K-5 2K-512-5 4K-256-5 FFT 12M [ 18.87M - 225.32M] 1K-2K-3 2K-1K-3 4K-512-3 FFT 16M [ 25.17M - 298.13M] 4K-2K FFT 18M [ 28.31M - 334.34M] 1K-1K-9 512-2K-9 2K-512-9 4K-256-9 FFT 20M [ 31.46M - 370.44M] 1K-2K-5 2K-1K-5 4K-512-5 FFT 24M [ 37.75M - 442.34M] 2K-2K-3 4K-1K-3 FFT 36M [ 56.62M - 656.22M] 1K-2K-9 2K-1K-9 4K-512-9 FFT 40M [ 62.91M - 727.03M] 2K-2K-5 4K-1K-5 FFT 48M [ 75.50M - 868.07M] 4K-2K-3 FFT 72M [113.25M - 1287.53M] 2K-2K-9 4K-1K-9 FFT 80M [125.83M - 1426.38M] 4K-2K-5 FFT 144M [226.49M - 2525.23M] 4K-2K-9 [/CODE]216091 and 756839 ran into trouble like in V6.0. Not tested then was 1398269, which in V6.1 also has the error on load. [CODE] 2019-02-05 08:45:49 condorella/rx-480 1398269 FFT 72K: Width 8x8, Height 8x8, Middle 9; 18.97 bits/word 2019-02-05 08:45:49 condorella/rx-480 using short carry kernels 2019-02-05 08:45:53 condorella/rx-480 OpenCL compilation in 3968 ms, with "-DEXP=1398269u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-02-05 08:45:53 condorella/rx-480 1398269.owl not found, starting from the beginning. 2019-02-05 08:45:54 condorella/rx-480 1398269 EE loaded: 0, blockSize 400, 0000000000000000 (expected 0000000000000003) 2019-02-05 08:45:54 condorella/rx-480 Exiting because "error on load" 2019-02-05 08:45:54 condorella/rx-480 Bye[/CODE][CODE]2019-02-05 09:12:18 condorella/rx-480 6972593 FFT 384K: Width 256x4, Height 8x8, Middle 3; 17.73 bits/word 2019-02-05 09:12:18 condorella/rx-480 using short carry kernels 2019-02-05 09:12:25 condorella/rx-480 OpenCL compilation in 4100 ms, with "-DEXP=6972593u -DWIDTH=1024u -DSMALL_HEIGHT=64u -DMIDDLE=3u -I. -cl-fast-relaxed-mat h -cl-std=CL2.0" 2019-02-05 09:12:25 condorella/rx-480 6972593.owl not found, starting from the beginning. 2019-02-05 09:12:25 condorella/rx-480 6972593 EE loaded: 0, blockSize 400, 0000000000000000 (expected 0000000000000003) 2019-02-05 09:12:25 condorella/rx-480 Exiting because "error on load" 2019-02-05 09:12:25 condorella/rx-480 Bye[/CODE][CODE]2019-02-05 09:14:39 condorella/rx-480 20996011 FFT 1152K: Width 256x4, Height 8x8, Middle 9; 17.80 bits/word 2019-02-05 09:14:39 condorella/rx-480 using short carry kernels 2019-02-05 09:14:46 condorella/rx-480 OpenCL compilation in 4130 ms, with "-DEXP=20996011u -DWIDTH=1024u -DSMALL_HEIGHT=64u -DMIDDLE=9u -I. -cl-fast-relaxed-ma th -cl-std=CL2.0" 2019-02-05 09:14:46 condorella/rx-480 20996011.owl not found, starting from the beginning. 2019-02-05 09:14:47 condorella/rx-480 20996011 EE loaded: 0, blockSize 400, ca26e8b69c18204c (expected 0000000000000003) 2019-02-05 09:14:47 condorella/rx-480 Exiting because "error on load" 2019-02-05 09:14:47 condorella/rx-480 Bye[/CODE]Timing for 1280K fft at 2.95ms/iter was anomalously higher than for 1536k (1.33 ms/iter) Similarly for 2304K at 5.15 ms/iter vs. 3072K at 2.75 ms/iter I only tried the fft lengths the program chose for known Mp exponents. So there may be more cases of low speed by default selection. There were cases where the same fft length was a few percent faster or slower in V6.1 than the best previously observed in V3.5 to v5.0. The executable in the attached zip file is considerably smaller than the one for V6.0, because I omitted doing "strip openowl" at V6.0. |
| All times are UTC. The time now is 07:02. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.