![]() |
![]() |
#1 |
"Composite as Heck"
Oct 2017
10111111102 Posts |
![]()
The usual benchmarking has been done in the usual threads but it's been too long since I've properly fondled some hardware so I plan to bench this to hell and back. The hardware is an Asus PN50 SFF PC paired with 2x16GB SO-DIMM DDR4 3200 CL22 DR and a 1TB NVMe TLC M.2 SSD. The bios is extremely basic with no overclocking options or much of anything, so the RAM is what it is AFAIK and the CPU/GPU can only be fiddled with using whatever tools exist for Linux.
ryzenadj ( https://github.com/FlyGoat/RyzenAdj ) seems to work well at setting power targets for the CPU, so that's how I've tested underclocking with M60721417: Code:
CPU Clock Power Power At (MHz) Target (W) Wall (W) ms/it it/s j/it Command 2825 20 33.5 4.87 205.3 0.163 "ryzenadj --stapm-limit=20000 --fast-limit=20000 --slow-limit=20000" 2400 15 27 4.93 203.0 0.133 "ryzenadj --stapm-limit=15000 --fast-limit=15000 --slow-limit=15000" 2025 13 24 5.01 199.6 0.120 "ryzenadj --stapm-limit=13000 --fast-limit=13000 --slow-limit=13000" 1730 12 23 5.15 194.1 0.118 "ryzenadj --stapm-limit=12000 --fast-limit=12000 --slow-limit=12000" 1450 11.5 21.5 5.52 181.1 0.118 "ryzenadj --stapm-limit=11500 --fast-limit=11500 --slow-limit=11500" 1400 11 21 5.55 180.1 0.116 "ryzenadj --stapm-limit=11000 --fast-limit=11000 --slow-limit=11000" 1375 10.5 20.5 5.85 170.9 0.119 "ryzenadj --stapm-limit=10500 --fast-limit=10500 --slow-limit=10500" 1350 10 20 6.14 162.8 0.122 "ryzenadj --stapm-limit=10000 --fast-limit=10000 --slow-limit=10000" 1250 8.5 17.5 7.77 128.7 0.135 "ryzenadj --stapm-limit=8500 --fast-limit=8500 --slow-limit=8500" 630 7 15 10.15 98.5 0.152 "ryzenadj --stapm-limit=7000 --fast-limit=7000 --slow-limit=7000" 400 5 11.5 16.90 59.1 0.194 "ryzenadj --stapm-limit=5000 --fast-limit=5000 --slow-limit=5000" Tried to run FlopsCL ( http://olab.is.s.u-tokyo.ac.jp/~kami.../projects.html ) to measure the GFlops of the iGPU, it compiled after setting OPENCL_LIBRARY_DIR = /opt/rocm/lib/ and OPENCL_INCLUDE_DIR = /opt/rocm/opencl/include/ in the Makefile, but the kernels fail. It might be because the rocm install doesn't appear to include OpenCL 2.0? /opt/rocm/opencl/lib/made no difference: Code:
pn50@pn50:~/FlopsCL_src_linux$ ./flops 1 OpenCL platform(s) detected: Platform 0: Advanced Micro Devices, Inc. AMD Accelerated Parallel Processing OpenCL 2.0 AMD-APP (3186.0), FULL_PROFILE 1 device(s) found supporting OpenCL: Device 0: CL_DEVICE_NAME = gfx900 CL_DEVICE_VENDOR = Advanced Micro Devices, Inc. CL_DEVICE_VERSION = OpenCL 2.0 CL_DRIVER_VERSION = 3186.0 (HSA1.1,LC) CL_DEVICE_MAX_COMPUTE_UNITS = 27 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3 CL_DEVICE_MAX_WORK_ITEM_SIZES = 1024 / 1024 / 1024 CL_DEVICE_MAX_WORK_GROUP_SIZE = 256 CL_DEVICE_MAX_CLOCK_FREQUENCY = 1600 MHz CL_DEVICE_GLOBAL_MEM_SIZE = 512 MB CL_DEVICE_ERROR_CORRECTION_SUPPORT = NO CL_DEVICE_LOCAL_MEM_SIZE = 64 kB CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE = 445644 kB Compiling... Starting tests... ERROR: clEnqueueNDRangeKernel failed, cl_invalid_work_group_size [float ] Time: 0.016776s, 16385.00 GFLOP/s ERROR: clEnqueueNDRangeKernel failed, cl_invalid_work_group_size [float2 ] Time: 0.016776s, 32770.00 GFLOP/s |
![]() |
![]() |
![]() |
#2 |
"Composite as Heck"
Oct 2017
13768 Posts |
![]()
Tried a few power saving tools. tlp on it's own gave some nice power saving, due to the way the CPU has been "underclocked" this translates into better iteration timings instead of lower power consumption. Adding powertop on top of tlp regresses the power savings:
Code:
CPU Clock Power Power At (MHz) Target (W) Wall (W) ms/it it/s j/it Command 1400 11 21 5.55 180.2 0.117 "ryzenadj --stapm-limit=11000 --fast-limit=11000 --slow-limit=11000" 1400 11 21 5.18 189.4 0.109 "tlp bat && ryzenadj --stapm-limit=11000 --fast-limit=11000 --slow-limit=11000" 1400 11 21 5.28 193.0 0.111 "powertop && tlp bat && ryzenadj --stapm-limit=11000 --fast-limit=11000 --slow-limit=11000" |
![]() |
![]() |
![]() |
#3 |
"Composite as Heck"
Oct 2017
2×383 Posts |
![]()
Got FlopsCL working by lowering the THREADS_PER_BLOCK variable from the default of 1024 to 256. The average of some runs yields this:
Code:
[float ]1410.088 GFlop/s [float2 ]1416.99 GFlop/s [float4 ]1424.106 GFlop/s [float8 ]1427.09 GFlop/s [float16]1427.668 GFlop/s [double ]89.354 GFlop/s [double2 ]89.364 GFlop/s [double4 ]89.39 GFlop/s [double8 ]89.39 GFlop/s [double16]89.39 GFlop/s Code:
shader_units*speed*instructions_per_clock = 448 * 1600 * 2 =1433.6 |
![]() |
![]() |
![]() |
#4 | |
"Mike"
Aug 2002
11111001100002 Posts |
![]() Quote:
https://github.com/sbski/Renoir-Mobile-Tuning |
|
![]() |
![]() |
![]() |
#5 |
"Composite as Heck"
Oct 2017
2×383 Posts |
![]()
Thanks I hadn't come across that one. It's C# so I probably can't get it to run on Linux, always had trouble getting supposedly cross-platform .NET stuff running. Doesn't look like it matters either way as I'm running headless via ssh and it's a GUI program. This is probably one of the better GUI tools for Windows users, I'm just in the exact wrong niche.
|
![]() |
![]() |
![]() |
#6 |
"Composite as Heck"
Oct 2017
76610 Posts |
![]()
Tried running an Ubuntu livecd from RAM (by adding toram as a boot parameter so the USB stick can be removed, entire OS resides in RAM), so that the NVMe could be removed entirely to see how much impact the SSD power draw has. It was a massive failure. Updated OS, power target set to 11W, tlp not installed as it wouldn't on the livecd so the comparison point is in the first table. The no-NVMe timings are just over 8ms/it with the same wall power draw, compared to the 5.55ms/it figure from above it's a massive regression. The 5.55ms/it figure was headless but I tried again with it not headless using both i3 and gnome and the timings stayed the same within variability. A livecd uses a few extra gigs of RAM as storage but they shouldn't be in active use so that shouldn't hinder p95 timings, and I was under the impression that all 32GB of RAM has to be refreshed regardless of whether it's in use so RAM occupancy shouldn't matter when it comes to power draw. To rule occupancy out a ramdisk occupying most of the remaining RAM was created, filled with random data instead of zeroes just incase that matters. As expected there was no noticable difference to timings or power use.
Can't see how removing the NVMe can be a massive negative, at worst there should be no difference and at best a power save, so the conclusion is that a livecd just has some undetermined nonsense that makes it unsuitable for use as a compute environment. Doesn't make a lot of sense as if anything I'd expect a livecd to be lighter on the system than its installed counterpart, but there we are. Last fiddled with by M344587487 on 2020-10-16 at 13:47 Reason: title |
![]() |
![]() |
![]() |
#7 | |
Aug 2020
37 Posts |
![]() Quote:
fp32 13481 fp64 3417 |
|
![]() |
![]() |
![]() |
#8 |
Sep 2009
7D416 Posts |
![]()
Try running the live CD with the NVMe plugged in but not being used, and compare power draw with when it's not plugged in. That should isolate its power draw. You probably want to compare both cases when the system's idle and when it's under load (see if the difference between idle and busy varies if the NVMe is there).
Chris |
![]() |
![]() |
![]() |
#9 | |
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
22×3×5×97 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#10 | ||
"Composite as Heck"
Oct 2017
13768 Posts |
![]() Quote:
Quote:
It probably is, I'm just stubbornly against the entire ecosystem as I've been burned too many times. If I had a penny for every community tool I've encountered that doesn't have source (or does but it's not written for core) and only has a windows .NET binary that absolutely fails in wine even if you use MS's libraries that you need to accept their EULA for, well I'd have a handful of pennies but it's very annoying. EXE tools written in C/C++ tend to work flawlessly under wine unless they interact closely with the hardware. I'd prefer a community tool be written in java than .NET which is saying something. iGPU debugging Did a mfakto run with stock settings but found that the iGPU endlessly cycles between being occupied for ~10 seconds then dropping to its minimum frequency for a few seconds, a run completed in 1h37m. mfakto live output shows 110 GHz-d/day at high frequency and 55 GHz-d/day at low frequency, wall power is 5-6W at low frequency so the chip must be using very little power then. Searching suggests that perhaps there's a bug with the VRAM/RAM split, using more than the 512MiB dedicated to the GPU means relying on the OS to manage it dynamically and it apparently can bottleneck somehow. Unfortunately my bios doesn't expose the split as a variable so it's fixed at 512MiB. To try and confirm that going into dynamically allocated memory is what's causing the cycling I'm setting an upper vram limit using amdgpu.vramlimit=x as a kernel boot parameter (default is vramlimit=0 meaning no limit): Factor=77936863,71,72 tests: Code:
Boot Parameter Time (M) amdgpu.vramlimit=256 83 amdgpu.vramlimit=448 85 amdgpu.vramlimit=512 122 No boot parameter 97 Code:
[ 3581.454872] ------------[ cut here ]------------ [ 3581.454974] WARNING: CPU: 0 PID: 2178 at /var/lib/dkms/amdgpu/3.8-30/build/amd/amdgpu/amdgpu_gmc.h:265 amdgpu_cs_bo_validate+0x196/0x1c0 [amdgpu] [ 3581.454977] Modules linked in: ccm nls_iso8859_1 rtsx_usb_ms memstick btusb btrtl btbcm btintel bluetooth joydev input_leds ecdh_generic ecc snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel tps6598x snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm iwlmvm snd_seq_midi snd_seq_midi_event mac80211 edac_mce_amd snd_rawmidi libarc4 kvm_amd ccp kvm snd_seq crct10dif_pclmul ghash_clmulni_intel snd_seq_device aesni_intel ipmi_devintf iwlwifi crypto_simd snd_timer wmi_bmof cryptd k10temp ipmi_msghandler eeepc_wmi glue_helper asus_wmi sparse_keymap snd_rn_pci_acp3x snd cfg80211 soundcore snd_pci_acp3x ite_cir rc_core ucsi_acpi typec_ucsi typec i2c_multi_instantiate mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlua(PO) rtsx_usb_sdmmc rtsx_usb hid_generic usbhid amdgpu(OE) amd_iommu_v2 amd_sched(OE) amdttm(OE) amdkcl(OE) i2c_algo_bit drm_kms_helper [ 3581.455027] nvme syscopyarea sysfillrect crc32_pclmul sysimgblt fb_sys_fops ahci drm i2c_piix4 libahci nvme_core r8169 realtek wmi video i2c_hid hid [ 3581.455038] CPU: 0 PID: 2178 Comm: Xorg:cs0 Tainted: P W OE 5.4.0-51-generic #56-Ubuntu [ 3581.455040] Hardware name: ASUSTeK COMPUTER INC. MINIPC PN50/PN50, BIOS 0416 08/27/2020 [ 3581.455137] RIP: 0010:amdgpu_cs_bo_validate+0x196/0x1c0 [amdgpu] [ 3581.455142] Code: ff 77 42 74 1d f6 83 b8 02 00 00 01 74 14 49 8b 86 c0 00 00 00 49 39 86 d0 00 00 00 0f 83 ef fe ff ff 44 8b 03 e9 eb fe ff ff <0f> 0b e9 2f ff ff ff 8b 53 04 44 39 c2 0f 84 77 ff ff ff 41 89 d0 [ 3581.455143] RSP: 0018:ffffad61844c7a20 EFLAGS: 00010206 [ 3581.455144] RAX: 0000000000000000 RBX: ffff9be9a2b1ac00 RCX: 0000000010000000 [ 3581.455146] RDX: 0000000000000001 RSI: 0000000000040002 RDI: 0000000000000000 [ 3581.455147] RBP: ffffad61844c7a78 R08: 0000000000000002 R09: ffff9be9a2b1ac14 [ 3581.455148] R10: ffff9be9ef417848 R11: 0000000000000000 R12: ffff9be9a2b1ac50 [ 3581.455149] R13: ffff9be9a2b1ac30 R14: ffffad61844c7b80 R15: ffff9be9d95e50a0 [ 3581.455150] FS: 00007fc891a42700(0000) GS:ffff9be9ef400000(0000) knlGS:0000000000000000 [ 3581.455152] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3581.455155] CR2: 000055ad591d7fe4 CR3: 00000007a467e000 CR4: 0000000000340ef0 [ 3581.455156] Call Trace: [ 3581.455254] amdgpu_cs_validate+0x17/0x40 [amdgpu] [ 3581.455354] amdgpu_cs_list_validate+0x100/0x140 [amdgpu] [ 3581.455454] amdgpu_cs_ioctl+0x1a55/0x1f00 [amdgpu] [ 3581.455557] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu] [ 3581.455577] drm_ioctl_kernel+0xae/0xf0 [drm] [ 3581.455594] drm_ioctl+0x234/0x3d0 [drm] [ 3581.455694] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu] [ 3581.455792] amdgpu_drm_ioctl+0x4e/0x80 [amdgpu] [ 3581.455798] do_vfs_ioctl+0x407/0x670 [ 3581.455802] ? do_futex+0x160/0x1e0 [ 3581.455806] ksys_ioctl+0x67/0x90 [ 3581.455808] __x64_sys_ioctl+0x1a/0x20 [ 3581.455811] do_syscall_64+0x57/0x190 [ 3581.455815] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 3581.455819] RIP: 0033:0x7fc89834350b [ 3581.455820] Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48 [ 3581.455821] RSP: 002b:00007fc891a41868 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 3581.455823] RAX: ffffffffffffffda RBX: 00007fc891a418d0 RCX: 00007fc89834350b [ 3581.455824] RDX: 00007fc891a418d0 RSI: 00000000c0186444 RDI: 000000000000000d [ 3581.455825] RBP: 00000000c0186444 R08: 00007fc891a41a20 R09: 0000000000000020 [ 3581.455826] R10: 00007fc891a41a20 R11: 0000000000000246 R12: 00005606f5538940 [ 3581.455827] R13: 000000000000000d R14: 00005606f55f4c2c R15: 00005606f55eca38 [ 3581.455829] ---[ end trace f6219db7c8930d05 ]--- |
||
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
AMD Ryzen 7 3700X? | lukerichards | Hardware | 13 | 2020-07-28 07:43 |
Ryzen help | Prime95 | Hardware | 9 | 2018-05-14 04:06 |
Ryzen 2 efficiency improvements | M344587487 | Hardware | 3 | 2018-04-25 15:23 |
29.2 benchmark help #2 (Ryzen only) | Prime95 | Software | 10 | 2017-05-08 13:24 |
AMD Ryzen is risin' up. | jasong | Hardware | 11 | 2017-03-02 19:56 |