mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2020-05-31, 22:22   #23
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×5×11×37 Posts
Default

Quote:
Originally Posted by ewmayer View Post
That seems quite promising in terms of getting useful work out of both CPU and GPU ... did you observe the TDP for these 3 states?

1. System powered up but otherwise idle;
2. Prime95 running in max-throughput configuration;
3. Both Prime95 and gpuowl running in max-throughput configuration.
Only informally. More now: floor level ambient ~81F
gpu mfakto active (with prime95), gpu-z reports:
gpu chip 22.5w
vddc 17
vddci 3w
gpu 1124Mhz 71C
cpu 89C, 7.7 ms/iter at M56610787 LLDC

cpuid hwmonitor reports 15W cpu tdp
prime95 stop, cpuid reports ~2W tdp, cpu drops to ~60C

switch to gpuowl
gpu chip power 20W, vddc 15W, vddci 3w

43% system memory utilization
shut down, insert wattmeter at power plug. Following are plug draw

~15-20W draw during boot, 9-12W editing this text file
prime95 running jacobi check 17W total input;
iterating, 33W
add mfakto, 52W
stopped prime95, mfakto running, 33W
stopped mfakto, 9W
gpuowl resumed 33W
prime95 resumed 57W input
Quote:
Re. heat, did you try popping the plastic top panel like I suggested? My i3-NUC has yet to arrive, but if the chassis is similarly designed as my Broadwell NUC, there's a flat sheet-metal panel underneath the plastic cap which can serve as a radiator, and is also a tempting target for affixing a heatsink-possible-with-fan.
No. Hadn't even removed the plastic finish protector film yet. Just took the film off. It was so flaky in the early going I thought I would be returning it shortly. It took ~a dozen restarts from crashes or hangs to get through OS install completion, initial configure and setup, including at least 4 during the prime95 benchmark. Temperatures don't look bad to me today, and I won't be modding it during the returns period.
Quote:
My, you've been busy. :) Not sure what I'm supposed to be seeing in the plots on page 5 and 6 - in 5 you do best-fits using a monomial which gives an x^(-1.08...) best-fit behavior, I would be interested in seeing how that compares to a best-fit of a straight line to the data scaled as (iters/sec)*(n log n), i.e. the expected throughput based on FFT opcount.
Look at the local scatter. What causes the variation from one point to its immediate neighbors? George's rounding and threshold decisions? Or something intrinsic to 2-smooth vs. 3-smooth vs. n-smooth FFT lengths? Added 2 columns, and a graph on page 8.
Quote:
...Sounds like you're having fun, in your own distinctive Krieselian "data ... must have ... more data" fashion. :)
"MORE INPUT!" https://en.wikipedia.org/wiki/Short_Circuit_(1986_film)
Data -> analysis->sometimes improved understanding ->sometimes higher performance
(share, feedback, iterate)
"data scaled as (iters/sec)*(n log n)" -> more columns ->tinier print ->more moaning about readability? ;)
kriesel is offline   Reply With Quote
Old 2020-06-03, 21:53   #24
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2×5,647 Posts
Default

My NUC arrived couple of days ago, was too busy finishing up my new multi-GPU build to unpack it until late yesterday afternoon. AFAICT system is not just "like new" but in fact brand-new - shrink wrap on box looks like the professional factory variety, no sign of anything inside having been previously touched. First thing was to pop the plastic decorative cap - yep, more or less the same nice-looking but horribly heat-trapping design as my older Broadwell NUC (but see below - it only *looks* the same). Clean Ubuntu 19.10 install (bye, bye, Windows) no problems, Mlucas v19 built in both avx2 and avx512 SIMD mode, here the mlucas.cfg-file timings captured via the standard '-s m [-cpu ...]' self-tests. Without any qualification 'core' refers to physical core (or pcore), of which there are 2, alongside 2 additional logical cores (lcores) by way of hyperthreading. Thus e.g. '1c2t' in the table means 1 physical core was overloaded with 2 threads by way of assigning 1 thread to each of the 2 logical cores mapping to that physical core. All timings in ms/iter. The Max-Thruput column is based on the AVX-512 2c4t data immediately to its left:
Code:
             AVX2 build:                   AVX-512 build:      Max Thruput
FFT(k) 1c1t   1c2t   2c2t   2c4t     1c1t   1c2t   2c2t   2c4t  iters/sec
2048  16.80  15.78   9.10   8.63    13.35  12.60   7.70   7.02  142.5
2304  19.88  18.11  10.49  11.11    15.76  13.99   8.46   7.71  129.7
2560  21.65  19.84  11.37  12.10    17.56  15.39   9.52   8.53  117.2
2816  25.37  22.66  13.18  13.82    20.06  17.96  10.79   9.71  103.0
3072  26.48  25.09  14.12  15.38    21.10  20.24  11.94  10.68   93.6
3328  29.56  26.95  15.56  16.34    24.55  21.55  12.92  11.75   85.1
3584  30.61  28.26  17.27  17.09    25.06  22.01  13.78  13.79   72.5
3840  34.21  31.85  19.49  19.17    27.50  24.40  15.70  15.17   65.9
4096  35.53  32.30  19.91  19.88    28.59  26.00  17.48  16.34   61.2
4608  39.95  37.30  22.61  22.63    33.88  29.65  19.27  17.66   56.6
5120  44.74  41.29  25.41  25.50    37.35  33.66  21.32  20.89   47.9
5632  51.99  47.33  29.23  28.74    43.02  39.19  24.62  23.85   41.9
6144  58.16  54.36  32.45  32.85    45.24  43.59  27.91  26.38   37.9
6656  62.43  56.27  35.37  34.99    52.49  46.99  29.88  28.85   34.7
7168  64.12  58.93  36.17  36.23    53.56  48.22  31.48  29.89   33.5
7680  72.02  65.98  40.89  40.01    58.34  53.55  33.63  32.60   30.7
Thus for 1-core we see a ~10% gain from 2-threads via the hyperthreading, but for 2-core the boost from running 4 threads is much more modest, and sometimes even negative for the AVX2 build. For the AVX-512 build the 2c2t->2c4t boost is consistently of the desired sign, but highly FFT-length-dependent, ranging from ~10% to 0.

The gain from using AVX-512 over AVX2 is a modest 1.2-1.4x, depending on FFT length and core|thread count, said modestness likely reflects the half-speed AVX-512 vector-MUL support on this CPU.

Per Ken's numbers, Max Thruput for George's code ranges from 210 iters/sec @2048K to 52 iters/sec @7680K so George is faster, as expected, but not by a huge margin.

I fired up a 2c4t job on a 5.5M-FFT exponent and let that run overnight, timing was rock-steady around 24.7 ms/iter, matching the value in the table. Thus throttling not an issue despite the fact that it was a warm night, and this a.m. the metal surface under the decorative plastic cap I'd popped off yesterday was merely lukewarm to the touch, suggesting that Intel solved the cap-traps-heat design problem I see in my Broadwell NUC. Gonna leave it off, though, until I get a chance to run gpuOwl on the AMD Radeon 540 GPU, to see if that affects the heat equation. I also tried running 2 jobs each using the 1c2t setup (using Intel's logical core numbering convention, lcores 0,2 map to pcore 0 and lcores 1,3 map to pcore 1, thus Mlucas core affinities for said 2 jobs were -cpu 0,2 and -cpu 1,3 (note: comma, not colon!), respectively), throughput was indistinguishable from 2c4t, i.e. each of the 2 jobs' per-iter times were 2x that of the 1-job timing. The max. throughput of this NUC is ~1.8x that of my older Broadwell NUC running AVX2 code, also in 2c4t mode.

To set up for gpuOwl running, I followed the same recipe as I did on my Ubuntu 19.10 systems hosting Radeon VII cards ... all went smoothly until the link step:
Code:
ewmayer@ewmayer-NUC8i3CYS:~/RUN$ git clone https://github.com/preda/gpuowl && cd gpuowl && make
Cloning into 'gpuowl'...
remote: Enumerating objects: 159, done.
remote: Counting objects: 100% (159/159), done.
remote: Compressing objects: 100% (106/106), done.
remote: Total 5303 (delta 95), reused 95 (delta 53), pack-reused 5144
Receiving objects: 100% (5303/5303), 12.67 MiB | 1.79 MiB/s, done.
Resolving deltas: 100% (3801/3801), done.
./tools/expand.py < gpuowl.cl > gpuowl-expanded.cl
cat head.txt gpuowl-expanded.cl tail.txt > gpuowl-wrap.cpp
echo \"`git describe --long --dirty --always`\" > version.new
diff -q -N version.new version.inc >/dev/null || mv version.new version.inc
echo Version: `cat version.inc`
Version: "v6.11-311-gfa76bd9"
g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -O2 -std=c++17   -c -o Pm1Plan.o Pm1Plan.cpp
g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -O2 -std=c++17   -c -o GmpUtil.o GmpUtil.cpp
g++ -MT Worktodo.o -MMD -MP -MF .d/Worktodo.Td -Wall -O2 -std=c++17   -c -o Worktodo.o Worktodo.cpp
g++ -MT common.o -MMD -MP -MF .d/common.Td -Wall -O2 -std=c++17   -c -o common.o common.cpp
g++ -MT main.o -MMD -MP -MF .d/main.Td -Wall -O2 -std=c++17   -c -o main.o main.cpp
g++ -MT Gpu.o -MMD -MP -MF .d/Gpu.Td -Wall -O2 -std=c++17   -c -o Gpu.o Gpu.cpp
g++ -MT clwrap.o -MMD -MP -MF .d/clwrap.Td -Wall -O2 -std=c++17   -c -o clwrap.o clwrap.cpp
g++ -MT Task.o -MMD -MP -MF .d/Task.Td -Wall -O2 -std=c++17   -c -o Task.o Task.cpp
g++ -MT checkpoint.o -MMD -MP -MF .d/checkpoint.Td -Wall -O2 -std=c++17   -c -o checkpoint.o checkpoint.cpp
g++ -MT timeutil.o -MMD -MP -MF .d/timeutil.Td -Wall -O2 -std=c++17   -c -o timeutil.o timeutil.cpp
g++ -MT Args.o -MMD -MP -MF .d/Args.Td -Wall -O2 -std=c++17   -c -o Args.o Args.cpp
g++ -MT state.o -MMD -MP -MF .d/state.Td -Wall -O2 -std=c++17   -c -o state.o state.cpp
g++ -MT Signal.o -MMD -MP -MF .d/Signal.Td -Wall -O2 -std=c++17   -c -o Signal.o Signal.cpp
g++ -MT FFTConfig.o -MMD -MP -MF .d/FFTConfig.Td -Wall -O2 -std=c++17   -c -o FFTConfig.o FFTConfig.cpp
g++ -MT AllocTrac.o -MMD -MP -MF .d/AllocTrac.Td -Wall -O2 -std=c++17   -c -o AllocTrac.o AllocTrac.cpp
g++ -MT gpuowl-wrap.o -MMD -MP -MF .d/gpuowl-wrap.Td -Wall -O2 -std=c++17   -c -o gpuowl-wrap.o gpuowl-wrap.cpp
g++ -MT sha3.o -MMD -MP -MF .d/sha3.Td -Wall -O2 -std=c++17   -c -o sha3.o sha3.cpp
g++ -o gpuowl Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o AllocTrac.o gpuowl-wrap.o sha3.o -lstdc++fs -lOpenCL -lgmp -pthread -L/opt/rocm-3.3.0/opencl/lib/x86_64 -L/opt/rocm-3.1.0/opencl/lib/x86_64 -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L.
/usr/bin/ld: cannot find -lOpenCL
collect2: error: ld returned 1 exit status
make: *** [Makefile:19: gpuowl] Error 1
'apt list --installed | grep libopencl1' shows this:
Code:
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

1349:ocl-icd-libopencl1/eoan,now 2.2.11-1ubuntu1 amd64 [installed,automatic]

Last fiddled with by ewmayer on 2020-06-03 at 22:43
ewmayer is online now   Reply With Quote
Old 2020-06-03, 22:50   #25
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

FE616 Posts
Default

Ernst, run yours 24/7 and let's see how stable it is. Mine's poor. Longest uptime maybe 2 days between hangs, crashes, bugchecks, bluescreens and sometimes fails to start/restart; I've had it take as little as a minute between stops. It appears to be at least partly OS-independent; I've had multiple POST fails also. Found it this afternoon displaying a solid green screen and completely unresponsive except to pressing the power button for four seconds. Several stops and a little useful work later, it's probably going back soon for refund.
kriesel is offline   Reply With Quote
Old 2020-06-03, 23:04   #26
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·5,647 Posts
Default

I've been running nearly 24 hours w/o any problems ... suggest you request a return/refund on yours and order a like-new (= brand-new in my case) one from the same seller I bought from:

https://www.amazon.com/Intel-BOXNUC8.../dp/B07HHB2YLG

That item is listed via Amzn for $343 + free-ship, listing says "Available at a lower price from other sellers that may not offer free Prime shipping" ... but if your click on the 'other sellers' embedded link, at top is OEM XS INC, $255 + free-ship. I could not be more pleased with mine, I simply want to double my total throughput by also crunching on the GPU.

Last fiddled with by ewmayer on 2020-06-03 at 23:04
ewmayer is online now   Reply With Quote
Old 2020-06-03, 23:27   #27
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×5×11×37 Posts
Default

Quote:
Originally Posted by ewmayer View Post
I've been running nearly 24 hours w/o any problems
Thanks. Try run prime95 and gpuowl on it together. That tips mine over sometimes within minutes. My seller's tech support asked me to run Seagate SSD diagnostics on the rotating HD! I get a variety of Windows stop codes, rarely a solid green screen, and often POST fails on restart. I suspect system ram. It's on a UPS. NUC is on the floor and that's often 80-85F ambient.

Last fiddled with by kriesel on 2020-06-03 at 23:32
kriesel is offline   Reply With Quote
Old 2020-06-03, 23:39   #28
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·5,647 Posts
Default

Quote:
Originally Posted by kriesel View Post
Thanks. Try run prime95 and gpuowl on it together.
I'd like to, but as noted above my attempt to compile gpuOwl fails with an OpenCL-related link error.
ewmayer is online now   Reply With Quote
Old 2020-06-04, 00:20   #29
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

77468 Posts
Default

Quote:
Originally Posted by ewmayer View Post
I'd like to, but as noted above my attempt to compile gpuOwl fails with an OpenCL-related link error.
So presumably then mfakto and cllucas also would have link problems. Try a prebuilt image. Maybe something from the mersenne.ca mirror. Or cross compile on your Haswell? There have been gpuowl builds and other things compiled for linux and posted on the forum.

Last fiddled with by kriesel on 2020-06-04 at 00:21
kriesel is offline   Reply With Quote
Old 2020-06-04, 01:05   #30
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2×19×29 Posts
Default

Quote:
Originally Posted by ewmayer View Post
/usr/bin/ld: cannot find -lOpenCL
collect2: error: ld returned 1 exit status
make: *** [Makefile:19: gpuowl] Error 1[/code]
'apt list --installed | grep libopencl1' shows this:
Code:
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

1349:ocl-icd-libopencl1/eoan,now 2.2.11-1ubuntu1 amd64 [installed,automatic]
does clinfo work? which OpenCL provider did you install -- e.g. ROCm (does ROCm support that GPU?) or amdgpu-pro.

First step is to get clinfo to report at least one OpenCL device (other than the CPU).
Next you can search for libOpenCL:
$sudo updatedb
$locate OpenCL

If needed, edit the Makefile and add the path with libOpenCL to -L (libs search path for linking).
preda is online now   Reply With Quote
Old 2020-06-04, 01:34   #31
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101100000111102 Posts
Default

Quote:
Originally Posted by paulunderwood
Something like this should fix it:

Code:
sudo ln -s /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0  /usr/lib/x86_64-linux-gnu/libOpenCL.so
Check that the first file exists -- it might be 1.0 ir something else.
You nailed it:
Code:
ewmayer@ewmayer-NUC8i3CYS:~$ ll /usr/lib/x86_64-linux-gnu/libOpenCL.so*
lrwxrwxrwx 1 root root    18 Apr  5  2017 /usr/lib/x86_64-linux-gnu/libOpenCL.so.1 -> libOpenCL.so.1.0.0
-rw-r--r-- 1 root root 43072 Apr  5  2017 /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0
Ensuing git-clone/cd-into-gpuowl/make succeeds ... but after creating a worktodo with a couple PRP assignments, when try running, get the expected usual startup stuff but left hanging - ctrl-c and ctrl-z have no effect, and 'pidof gpuowl' in a separate window comes up empty:
Code:
[sudo] password for ewmayer: 
2020-06-03 18:26:55 gpuowl v6.11-311-gfa76bd9
2020-06-03 18:26:55 Note: not found 'config.txt'
2020-06-03 18:26:55 device 0, unique id ''
[then nothing, empty space where occasional checkpoint output should be]
Quote:
Originally Posted by preda View Post
does clinfo work? which OpenCL provider did you install -- e.g. ROCm (does ROCm support that GPU?) or amdgpu-pro.

First step is to get clinfo to report at least one OpenCL device (other than the CPU).
Next you can search for libOpenCL:
$sudo updatedb
Gives 'sudo: updatedb: command not found'

Quote:
$locate OpenCL
After installing the locate package, gives
Code:
/etc/OpenCL
/etc/OpenCL/vendors
/etc/OpenCL/vendors/amdocl64.icd
/opt/rocm-3.5.0/lib/libOpenCL.so
/opt/rocm-3.5.0/lib/libOpenCL.so.1
/opt/rocm-3.5.0/lib/libOpenCL.so.1.2
/opt/rocm-3.5.0/opencl/lib/libOpenCL.so
/opt/rocm-3.5.0/opencl/lib/libOpenCL.so.1
/opt/rocm-3.5.0/opencl/lib/libOpenCL.so.1.2
/usr/lib/x86_64-linux-gnu/libOpenCL.so
/usr/lib/x86_64-linux-gnu/libOpenCL.so.1
/usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0
/usr/share/doc/ocl-icd-libopencl1/html/libOpenCL.html
/usr/share/man/man7/libOpenCL.7.gz
/usr/share/man/man7/libOpenCL.so.7.gz
4th-from-bottom is the ...1.0.0 Paul suggested looking for.

Oh, /opt/rocm/bin/rocm-smi shows
Code:
GPU  Temp   AvgPwr  SCLK    MCLK    Fan     Perf  PwrCap  VRAM%  GPU%  
0    48.0c  3.214W  214Mhz  300Mhz  18.82%  auto  25.0W     1%   0%
Is ROCm appropriate for this model AMD GPU?

Last fiddled with by ewmayer on 2020-06-04 at 01:39
ewmayer is online now   Reply With Quote
Old 2020-06-04, 01:55   #32
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2×5,647 Posts
Default

Forgot to answer Mihai's Q re. clinfo - that shows no valid device:
Code:
Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.0 AMD-APP (3137.0)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback 
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 0

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No devices found in platform
ROCm, OTOH, seems to see a valid device.
ewmayer is online now   Reply With Quote
Old 2020-06-04, 02:36   #33
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

17·193 Posts
Default

I think you are out of luck:

https://github.com/RadeonOpenCompute...ftware-Support

https://en.wikipedia.org/wiki/Radeon_RX_500_series
paulunderwood is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
AVX512 performance on new shiny Intel kit heliosh Hardware 19 2020-01-18 04:01
29.5 build 5 beta with AVX512 optimizations shows a 15% speed increase simon389 Software 20 2018-12-13 21:01
Hardware recommendations for factoring Mr. Odd Hardware 7 2016-06-02 01:07
need recommendations for a PC ixfd64 Hardware 45 2012-11-14 01:19
Hardware recommendations Mr. Odd Factoring 12 2011-11-19 00:32

All times are UTC. The time now is 06:08.

Fri Jul 10 06:08:19 UTC 2020 up 107 days, 3:41, 0 users, load averages: 1.82, 1.61, 1.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.