![]() |
|
|
#23 | ||||
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Quote:
gpu mfakto active (with prime95), gpu-z reports: gpu chip 22.5w vddc 17 vddci 3w gpu 1124Mhz 71C cpu 89C, 7.7 ms/iter at M56610787 LLDC cpuid hwmonitor reports 15W cpu tdp prime95 stop, cpuid reports ~2W tdp, cpu drops to ~60C switch to gpuowl gpu chip power 20W, vddc 15W, vddci 3w 43% system memory utilization shut down, insert wattmeter at power plug. Following are plug draw ~15-20W draw during boot, 9-12W editing this text file prime95 running jacobi check 17W total input; iterating, 33W add mfakto, 52W stopped prime95, mfakto running, 33W stopped mfakto, 9W gpuowl resumed 33W prime95 resumed 57W input Quote:
Quote:
Quote:
Data -> analysis->sometimes improved understanding ->sometimes higher performance (share, feedback, iterate) "data scaled as (iters/sec)*(n log n)" -> more columns ->tinier print ->more moaning about readability? ;) |
||||
|
|
|
|
|
#24 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
My NUC arrived couple of days ago, was too busy finishing up my new multi-GPU build to unpack it until late yesterday afternoon. AFAICT system is not just "like new" but in fact brand-new - shrink wrap on box looks like the professional factory variety, no sign of anything inside having been previously touched. First thing was to pop the plastic decorative cap - yep, more or less the same nice-looking but horribly heat-trapping design as my older Broadwell NUC (but see below - it only *looks* the same). Clean Ubuntu 19.10 install (bye, bye, Windows) no problems, Mlucas v19 built in both avx2 and avx512 SIMD mode, here the mlucas.cfg-file timings captured via the standard '-s m [-cpu ...]' self-tests. Without any qualification 'core' refers to physical core (or pcore), of which there are 2, alongside 2 additional logical cores (lcores) by way of hyperthreading. Thus e.g. '1c2t' in the table means 1 physical core was overloaded with 2 threads by way of assigning 1 thread to each of the 2 logical cores mapping to that physical core. All timings in ms/iter. The Max-Thruput column is based on the AVX-512 2c4t data immediately to its left:
Code:
AVX2 build: AVX-512 build: Max Thruput FFT(k) 1c1t 1c2t 2c2t 2c4t 1c1t 1c2t 2c2t 2c4t iters/sec 2048 16.80 15.78 9.10 8.63 13.35 12.60 7.70 7.02 142.5 2304 19.88 18.11 10.49 11.11 15.76 13.99 8.46 7.71 129.7 2560 21.65 19.84 11.37 12.10 17.56 15.39 9.52 8.53 117.2 2816 25.37 22.66 13.18 13.82 20.06 17.96 10.79 9.71 103.0 3072 26.48 25.09 14.12 15.38 21.10 20.24 11.94 10.68 93.6 3328 29.56 26.95 15.56 16.34 24.55 21.55 12.92 11.75 85.1 3584 30.61 28.26 17.27 17.09 25.06 22.01 13.78 13.79 72.5 3840 34.21 31.85 19.49 19.17 27.50 24.40 15.70 15.17 65.9 4096 35.53 32.30 19.91 19.88 28.59 26.00 17.48 16.34 61.2 4608 39.95 37.30 22.61 22.63 33.88 29.65 19.27 17.66 56.6 5120 44.74 41.29 25.41 25.50 37.35 33.66 21.32 20.89 47.9 5632 51.99 47.33 29.23 28.74 43.02 39.19 24.62 23.85 41.9 6144 58.16 54.36 32.45 32.85 45.24 43.59 27.91 26.38 37.9 6656 62.43 56.27 35.37 34.99 52.49 46.99 29.88 28.85 34.7 7168 64.12 58.93 36.17 36.23 53.56 48.22 31.48 29.89 33.5 7680 72.02 65.98 40.89 40.01 58.34 53.55 33.63 32.60 30.7 The gain from using AVX-512 over AVX2 is a modest 1.2-1.4x, depending on FFT length and core|thread count, said modestness likely reflects the half-speed AVX-512 vector-MUL support on this CPU. Per Ken's numbers, Max Thruput for George's code ranges from 210 iters/sec @2048K to 52 iters/sec @7680K so George is faster, as expected, but not by a huge margin. I fired up a 2c4t job on a 5.5M-FFT exponent and let that run overnight, timing was rock-steady around 24.7 ms/iter, matching the value in the table. Thus throttling not an issue despite the fact that it was a warm night, and this a.m. the metal surface under the decorative plastic cap I'd popped off yesterday was merely lukewarm to the touch, suggesting that Intel solved the cap-traps-heat design problem I see in my Broadwell NUC. Gonna leave it off, though, until I get a chance to run gpuOwl on the AMD Radeon 540 GPU, to see if that affects the heat equation. I also tried running 2 jobs each using the 1c2t setup (using Intel's logical core numbering convention, lcores 0,2 map to pcore 0 and lcores 1,3 map to pcore 1, thus Mlucas core affinities for said 2 jobs were -cpu 0,2 and -cpu 1,3 (note: comma, not colon!), respectively), throughput was indistinguishable from 2c4t, i.e. each of the 2 jobs' per-iter times were 2x that of the 1-job timing. The max. throughput of this NUC is ~1.8x that of my older Broadwell NUC running AVX2 code, also in 2c4t mode. To set up for gpuOwl running, I followed the same recipe as I did on my Ubuntu 19.10 systems hosting Radeon VII cards ... all went smoothly until the link step: Code:
ewmayer@ewmayer-NUC8i3CYS:~/RUN$ git clone https://github.com/preda/gpuowl && cd gpuowl && make Cloning into 'gpuowl'... remote: Enumerating objects: 159, done. remote: Counting objects: 100% (159/159), done. remote: Compressing objects: 100% (106/106), done. remote: Total 5303 (delta 95), reused 95 (delta 53), pack-reused 5144 Receiving objects: 100% (5303/5303), 12.67 MiB | 1.79 MiB/s, done. Resolving deltas: 100% (3801/3801), done. ./tools/expand.py < gpuowl.cl > gpuowl-expanded.cl cat head.txt gpuowl-expanded.cl tail.txt > gpuowl-wrap.cpp echo \"`git describe --long --dirty --always`\" > version.new diff -q -N version.new version.inc >/dev/null || mv version.new version.inc echo Version: `cat version.inc` Version: "v6.11-311-gfa76bd9" g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -O2 -std=c++17 -c -o Pm1Plan.o Pm1Plan.cpp g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -O2 -std=c++17 -c -o GmpUtil.o GmpUtil.cpp g++ -MT Worktodo.o -MMD -MP -MF .d/Worktodo.Td -Wall -O2 -std=c++17 -c -o Worktodo.o Worktodo.cpp g++ -MT common.o -MMD -MP -MF .d/common.Td -Wall -O2 -std=c++17 -c -o common.o common.cpp g++ -MT main.o -MMD -MP -MF .d/main.Td -Wall -O2 -std=c++17 -c -o main.o main.cpp g++ -MT Gpu.o -MMD -MP -MF .d/Gpu.Td -Wall -O2 -std=c++17 -c -o Gpu.o Gpu.cpp g++ -MT clwrap.o -MMD -MP -MF .d/clwrap.Td -Wall -O2 -std=c++17 -c -o clwrap.o clwrap.cpp g++ -MT Task.o -MMD -MP -MF .d/Task.Td -Wall -O2 -std=c++17 -c -o Task.o Task.cpp g++ -MT checkpoint.o -MMD -MP -MF .d/checkpoint.Td -Wall -O2 -std=c++17 -c -o checkpoint.o checkpoint.cpp g++ -MT timeutil.o -MMD -MP -MF .d/timeutil.Td -Wall -O2 -std=c++17 -c -o timeutil.o timeutil.cpp g++ -MT Args.o -MMD -MP -MF .d/Args.Td -Wall -O2 -std=c++17 -c -o Args.o Args.cpp g++ -MT state.o -MMD -MP -MF .d/state.Td -Wall -O2 -std=c++17 -c -o state.o state.cpp g++ -MT Signal.o -MMD -MP -MF .d/Signal.Td -Wall -O2 -std=c++17 -c -o Signal.o Signal.cpp g++ -MT FFTConfig.o -MMD -MP -MF .d/FFTConfig.Td -Wall -O2 -std=c++17 -c -o FFTConfig.o FFTConfig.cpp g++ -MT AllocTrac.o -MMD -MP -MF .d/AllocTrac.Td -Wall -O2 -std=c++17 -c -o AllocTrac.o AllocTrac.cpp g++ -MT gpuowl-wrap.o -MMD -MP -MF .d/gpuowl-wrap.Td -Wall -O2 -std=c++17 -c -o gpuowl-wrap.o gpuowl-wrap.cpp g++ -MT sha3.o -MMD -MP -MF .d/sha3.Td -Wall -O2 -std=c++17 -c -o sha3.o sha3.cpp g++ -o gpuowl Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o AllocTrac.o gpuowl-wrap.o sha3.o -lstdc++fs -lOpenCL -lgmp -pthread -L/opt/rocm-3.3.0/opencl/lib/x86_64 -L/opt/rocm-3.1.0/opencl/lib/x86_64 -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L. /usr/bin/ld: cannot find -lOpenCL collect2: error: ld returned 1 exit status make: *** [Makefile:19: gpuowl] Error 1 Code:
WARNING: apt does not have a stable CLI interface. Use with caution in scripts. 1349:ocl-icd-libopencl1/eoan,now 2.2.11-1ubuntu1 amd64 [installed,automatic] Last fiddled with by ewmayer on 2020-06-03 at 22:43 |
|
|
|
|
|
#25 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
Ernst, run yours 24/7 and let's see how stable it is. Mine's poor. Longest uptime maybe 2 days between hangs, crashes, bugchecks, bluescreens and sometimes fails to start/restart; I've had it take as little as a minute between stops. It appears to be at least partly OS-independent; I've had multiple POST fails also. Found it this afternoon displaying a solid green screen and completely unresponsive except to pressing the power button for four seconds. Several stops and a little useful work later, it's probably going back soon for refund.
|
|
|
|
|
|
#26 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
I've been running nearly 24 hours w/o any problems ... suggest you request a return/refund on yours and order a like-new (= brand-new in my case) one from the same seller I bought from:
https://www.amazon.com/Intel-BOXNUC8.../dp/B07HHB2YLG That item is listed via Amzn for $343 + free-ship, listing says "Available at a lower price from other sellers that may not offer free Prime shipping" ... but if your click on the 'other sellers' embedded link, at top is OEM XS INC, $255 + free-ship. I could not be more pleased with mine, I simply want to double my total throughput by also crunching on the GPU. Last fiddled with by ewmayer on 2020-06-03 at 23:04 |
|
|
|
|
|
#27 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Thanks. Try run prime95 and gpuowl on it together. That tips mine over sometimes within minutes. My seller's tech support asked me to run Seagate SSD diagnostics on the rotating HD! I get a variety of Windows stop codes, rarely a solid green screen, and often POST fails on restart. I suspect system ram. It's on a UPS. NUC is on the floor and that's often 80-85F ambient.
Last fiddled with by kriesel on 2020-06-03 at 23:32 |
|
|
|
|
|
#28 |
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
|
|
|
|
|
|
#29 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
782410 Posts |
So presumably then mfakto and cllucas also would have link problems. Try a prebuilt image. Maybe something from the mersenne.ca mirror. Or cross compile on your Haswell? There have been gpuowl builds and other things compiled for linux and posted on the forum.
Last fiddled with by kriesel on 2020-06-04 at 00:21 |
|
|
|
|
|
#30 | |
|
"Mihai Preda"
Apr 2015
22·3·112 Posts |
Quote:
First step is to get clinfo to report at least one OpenCL device (other than the CPU). Next you can search for libOpenCL: $sudo updatedb $locate OpenCL If needed, edit the Makefile and add the path with libOpenCL to -L (libs search path for linking). |
|
|
|
|
|
|
#31 | |||
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
Quote:
Code:
ewmayer@ewmayer-NUC8i3CYS:~$ ll /usr/lib/x86_64-linux-gnu/libOpenCL.so* lrwxrwxrwx 1 root root 18 Apr 5 2017 /usr/lib/x86_64-linux-gnu/libOpenCL.so.1 -> libOpenCL.so.1.0.0 -rw-r--r-- 1 root root 43072 Apr 5 2017 /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0 Code:
[sudo] password for ewmayer: 2020-06-03 18:26:55 gpuowl v6.11-311-gfa76bd9 2020-06-03 18:26:55 Note: not found 'config.txt' 2020-06-03 18:26:55 device 0, unique id '' [then nothing, empty space where occasional checkpoint output should be] Quote:
Quote:
Code:
/etc/OpenCL /etc/OpenCL/vendors /etc/OpenCL/vendors/amdocl64.icd /opt/rocm-3.5.0/lib/libOpenCL.so /opt/rocm-3.5.0/lib/libOpenCL.so.1 /opt/rocm-3.5.0/lib/libOpenCL.so.1.2 /opt/rocm-3.5.0/opencl/lib/libOpenCL.so /opt/rocm-3.5.0/opencl/lib/libOpenCL.so.1 /opt/rocm-3.5.0/opencl/lib/libOpenCL.so.1.2 /usr/lib/x86_64-linux-gnu/libOpenCL.so /usr/lib/x86_64-linux-gnu/libOpenCL.so.1 /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0 /usr/share/doc/ocl-icd-libopencl1/html/libOpenCL.html /usr/share/man/man7/libOpenCL.7.gz /usr/share/man/man7/libOpenCL.so.7.gz Oh, /opt/rocm/bin/rocm-smi shows Code:
GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 48.0c 3.214W 214Mhz 300Mhz 18.82% auto 25.0W 1% 0% Last fiddled with by ewmayer on 2020-06-04 at 01:39 |
|||
|
|
|
|
|
#32 |
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
Forgot to answer Mihai's Q re. clinfo - that shows no valid device:
Code:
Number of platforms 1 Platform Name AMD Accelerated Parallel Processing Platform Vendor Advanced Micro Devices, Inc. Platform Version OpenCL 2.0 AMD-APP (3137.0) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_amd_event_callback Platform Extensions function suffix AMD Platform Name AMD Accelerated Parallel Processing Number of devices 0 NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No devices found in platform |
|
|
|
|
|
#33 |
|
Sep 2002
Database er0rr
5·937 Posts |
I think you are out of luck:
https://github.com/RadeonOpenCompute...ftware-Support https://en.wikipedia.org/wiki/Radeon_RX_500_series |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| AVX512 performance on new shiny Intel kit | heliosh | Hardware | 19 | 2020-01-18 04:01 |
| 29.5 build 5 beta with AVX512 optimizations shows a 15% speed increase | simon389 | Software | 20 | 2018-12-13 21:01 |
| Hardware recommendations for factoring | Mr. Odd | Hardware | 7 | 2016-06-02 01:07 |
| need recommendations for a PC | ixfd64 | Hardware | 45 | 2012-11-14 01:19 |
| Hardware recommendations | Mr. Odd | Factoring | 12 | 2011-11-19 00:32 |