![]() |
|
|
#2157 | |
|
Feb 2005
Colorado
5×131 Posts |
Quote:
|
|
|
|
|
|
|
#2158 |
|
∂2ω=0
Sep 2002
República de California
2×32×647 Posts |
I figured us = microseconds was clear from the context.
@PaulU: Just noticed per-iter timings of the 2 jobs on my R7 also went askew early this a.m. ... as of midnight both were PRPing expos ~103.9M @5.5M FFT, each run ~1470 us/iter. Around 1am one job's PRP finished and that task started a PRP of an expo ~104.9M, still at 5.5M FFT, but the per-iter time of that job dropped to 1265 us/iter right from the beginning, at the same time the per-iter times of the other ongoing job with p ~ 103.9M jumped to 1664 us/iter. I killed and restarted both jobs first thing this morning by way of daily kworker-task CPU-cycle parasitism control, the timing disparity continued after both were restarted. Looking closely at the two OpenCL args lists for the 2 jobs, FFT params same, main diffs are the expected ones in the various DTW-weights-associated consts of the 2 expos ... the only salient-appearing diff I see is that the p ~ 104.9M job sports an extra "-DMM2_CHAIN=1u" arg which the other one lacks. Whatever that means code-branch and memory-map-wise, it caused the ROCm priority management engine to apparently give a higher priority to that job. Total throughput for 2 jobs running ~1470 us/iter each was ~1360 iter/sec, with the timing disparity it is ~1390 us/iter, so I've actually gained a few % total throughput. Last fiddled with by ewmayer on 2020-05-07 at 22:34 |
|
|
|
|
|
#2159 | |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
Quote:
|
|
|
|
|
|
|
#2160 |
|
Romulan Interpreter
Jun 2011
Thailand
100101101101012 Posts |
I assume that was a nitpicking/joke. If it was not, then you should learn that "u" is the right/standard/accepted abbreviation for "micro" in all domains I ever touched, and where typing µ or µ or \(\mu\) would be tedious. Including computer science and software manufacturing (see the famous uVision from Keil, or uTorrent, etc). In my daily work I measure the electric potential in uV (microvolts), current in uA (microamperes), and thickness of bonding wires in um (micrometers, or microns).
Last fiddled with by LaurV on 2020-05-08 at 06:29 |
|
|
|
|
|
#2161 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
152B16 Posts |
The usual shower of compile warnings, tested only as far as included help output, etc.
|
|
|
|
|
|
#2162 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
Third RX550 (a 2GB model) showed a transient EE issue on a different system This is on an open frame setup with no temperature issues known.
Code:
2020-05-07 23:10:36 gpuowl v6.11-272-g07718b9 2020-05-07 23:10:36 config: -user kriesel -cpu asr2/rx550 -d 1 -use NO_ASM 2020-05-07 23:10:36 device 1, unique id '' 2020-05-07 23:10:36 asr2/rx550 worktodo.txt line ignored: "" 2020-05-07 23:10:36 asr2/rx550 107000389 FFT: 6M 1K:12:256 (17.01 bpw) 2020-05-07 23:10:36 asr2/rx550 Expected maximum carry32: 25260000 2020-05-07 23:10:37 asr2/rx550 OpenCL args "-DEXP=107000389u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DWEIGHT_STEP=0xf.eb7509fc7be48p-3 -DIWEIGHT_STEP=0x 8.0a52bc152d0dp-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2. 0 " 2020-05-07 23:10:39 asr2/rx550 OpenCL compilation in 2.29 s 2020-05-07 23:10:48 asr2/rx550 107000389 OK 0 loaded: blockSize 400, 0000000000000003 2020-05-07 23:11:09 asr2/rx550 107000389 OK 800 0.00%; 17645 us/it; ETA 21d 20:27; 4f39fc137c27de54 (check 7.30s) 2020-05-08 00:13:46 asr2/rx550 107000389 EE 200000 0.19%; 18837 us/it; ETA 23d 06:49; 65fe4f6dd6c92d4e (check 8.97s) 2020-05-08 00:13:55 asr2/rx550 107000389 EE 800 loaded: blockSize 400, 79b18fd6bfda22f9 (expected 4f39fc137c27de54) 2020-05-08 00:13:55 asr2/rx550 Exiting because "error on load" 2020-05-08 00:13:55 asr2/rx550 Bye C:\Users\ken\Documents\gpuowl-v6.11-272>gpuowl-win 2020-05-08 01:03:03 gpuowl v6.11-272-g07718b9 2020-05-08 01:03:03 config: -user kriesel -cpu asr2/rx550 -d 1 -use NO_ASM 2020-05-08 01:03:03 device 1, unique id '' 2020-05-08 01:03:03 asr2/rx550 worktodo.txt line ignored: "" 2020-05-08 01:03:03 asr2/rx550 107000389 FFT: 6M 1K:12:256 (17.01 bpw) 2020-05-08 01:03:03 asr2/rx550 Expected maximum carry32: 25260000 2020-05-08 01:03:04 asr2/rx550 OpenCL args "-DEXP=107000389u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DWEIGHT_STEP=0xf.eb7509fc7be48p-3 -DIWEIGHT_STEP=0x 8.0a52bc152d0dp-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2. 0 " 2020-05-08 01:03:11 asr2/rx550 OpenCL compilation in 7.20 s 2020-05-08 01:03:20 asr2/rx550 107000389 OK 800 loaded: blockSize 400, 4f39fc137c27de54 2020-05-08 01:03:41 asr2/rx550 107000389 OK 1600 0.00%; 17649 us/it; ETA 21d 20:34; 00cff77f1e4010a4 (check 7.32s) 2020-05-08 02:02:08 asr2/rx550 107000389 OK 200000 0.19%; 17639 us/it; ETA 21d 19:18; 65fe4f6dd6c92d4e (check 7.32s) 2020-05-08 03:01:04 asr2/rx550 107000389 OK 400000 0.37%; 17645 us/it; ETA 21d 18:29; bbdb6f6d3790a362 (check 7.32s) 2020-05-08 04:00:00 asr2/rx550 107000389 OK 600000 0.56%; 17639 us/it; ETA 21d 17:20; 902382f6237a1979 (check 7.32s) 2020-05-08 04:58:56 asr2/rx550 107000389 OK 800000 0.75%; 17645 us/it; ETA 21d 16:31; 8087f982145cff93 (check 7.32s) 2020-05-08 05:57:51 asr2/rx550 107000389 OK 1000000 0.93%; 17639 us/it; ETA 21d 15:22; 6d75bf2bfb36a594 (check 7.32s) 2020-05-08 06:56:47 asr2/rx550 107000389 OK 1200000 1.12%; 17645 us/it; ETA 21d 14:34; 48c73046fff69459 (check 7.32s) 2020-05-08 07:55:43 asr2/rx550 107000389 OK 1400000 1.31%; 17639 us/it; ETA 21d 13:25; e84d6918ae180382 (check 7.32s) 2020-05-08 08:54:39 asr2/rx550 107000389 OK 1600000 1.50%; 17645 us/it; ETA 21d 12:37; 0e04b83a1aa1b2f2 (check 7.32s) 2020-05-08 09:53:34 asr2/rx550 107000389 OK 1800000 1.68%; 17639 us/it; ETA 21d 11:28; 8d7a21d8a97a586f (check 7.32s) 2020-05-08 10:52:31 asr2/rx550 107000389 OK 2000000 1.87%; 17645 us/it; ETA 21d 10:39; c213cfc1386c1fca (check 7.32s) - |
|
|
|
|
|
#2163 | |
|
∂2ω=0
Sep 2002
República de California
2D7E16 Posts |
Quote:
Preda, I'm guessing -DMM2_CHAIN is an accuracy-related flag, which kicks in at the higher p-ranges of each FFT length? If so, what is the precise breakover point at 5.5M FFT? Last fiddled with by ewmayer on 2020-05-09 at 20:31 |
|
|
|
|
|
|
#2164 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
1D6F16 Posts |
Quote:
From FFTConfig.h: 5.5M FFT supports 18.489 bits-per-FFT-word which gets the slowest code. From FFTconfig.cpp: {0.06964, 0.14050, 0.03840, 0.02710, 0.01719, 0.00497}, which says 0.00497 bpw from the max we ease up a little bit, at 0.00497+0.01719 bpw from the max we ease up a little more, and so forth. Last fiddled with by Prime95 on 2020-05-09 at 03:49 |
|
|
|
|
|
|
#2165 | |
|
∂2ω=0
Sep 2002
República de California
265768 Posts |
Quote:
And as I noted, on my system having one run at MM2_CHAIN=1 and the other with no ease-up counterintuitively gave me 2% more total throughput than with both runs using expos below the threshold, so I'd like to try forcing both of my current runs (which are below-threshold) to use MM2_CHAIN=1 to see what the resulting total throughput is. May I presume that forcing MM2_CHAIN=1 for an expo that does not need it is safe to do? Last fiddled with by ewmayer on 2020-05-09 at 19:57 |
|
|
|
|
|
|
#2166 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
5×11×137 Posts |
Quote:
|
|
|
|
|
|
|
#2167 |
|
∂2ω=0
Sep 2002
República de California
2×32×647 Posts |
Cool - did this for my run of 104892731, expected timing-skew between the 2 runs resumed, total throughput again went from ~1360 iter/sec to ~1390 iter/sec
Then also switched the other run (p = 103923257) to using the flag, timings again equalize, but at 1410 us/iter, meaning total throughput ~1420 iter/sec, a gain of 4.5% [!] over both runs using default settings. That's nearly as much gain as I get from upping my sclk setting from 4 to 5, but the latter ups the wattage by a massive 60W, temps increase proportionally. Wattage currently is a mere 5-10W higher than before the switch to both runs using MM2_CHAIN=1. Say I start making it the the default ... if a run hits an expo which needs an even-higher extra-accuracy setting, will that automatically kick in, thus overriding the user's setting of the flag? Last fiddled with by ewmayer on 2020-05-09 at 21:56 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |