![]() |
![]() |
#177 | |
"Robert Gerbicz"
Oct 2005
Hungary
2·7·103 Posts |
![]() Quote:
The product of prime powers up to that limit has roughly 5.5*10^6/ln(2)=7934822 bits. And there was no P-1 method inside the prp test in earlier version, just before(!) the prp test [and halt if a factor is found]. But it should not be that slow, maybe you could wait longer time to see more reasonable times. Last fiddled with by R. Gerbicz on 2020-11-22 at 16:18 Reason: grammar |
|
![]() |
![]() |
![]() |
#178 |
"Mihai Preda"
Apr 2015
2·11·61 Posts |
![]() |
![]() |
![]() |
![]() |
#179 |
"Viliam Furík"
Jul 2018
Martin, Slovakia
2×191 Posts |
![]() |
![]() |
![]() |
![]() |
#180 | |
"Mihai Preda"
Apr 2015
24768 Posts |
![]() Quote:
- time helps a bit with timing the kernels. Sometimes running with -time old/new and comparing may provide a hint as to what's regressed. Another tool is to dump the ISA (using -dump) and compare using the delta.py script from tools/. Differences in occupancy usually have a performance impact. PS: in the future please provide -maxAlloc when running P1/P2, it will speed things up for those stages. Last fiddled with by preda on 2020-11-23 at 09:53 |
|
![]() |
![]() |
![]() |
#181 |
Aug 2020
37 Posts |
![]()
I reverted back to v6.380 for my Radeon VII cards because I had to increase the fft size with v7 (under Windows). I am currently running 110M+ exponents at 907µs/it with power -20%, mem clock +10% and gpu clock -10%.
|
![]() |
![]() |
![]() |
#182 |
Sep 2017
USA
2×5×23 Posts |
![]()
Can exponents started on GpuOwl be transferred and finished on mprime?
I am GPU poor. I'd like to run PRP on a 332M+ exponent until the massive P-1 has concluded, and then finish the remaining PRP on a CPU while another 332M+ exponent runs TF and PRP/P-1 on the GPU. Is that possible with the current save-file formats? Thank you. |
![]() |
![]() |
![]() |
#183 | |
"Mihai Preda"
Apr 2015
2×11×61 Posts |
![]() Quote:
OTOH mprime may be offering the merged PRP+P1 at some point in the future. |
|
![]() |
![]() |
![]() |
#184 | |
"Viliam Furík"
Jul 2018
Martin, Slovakia
2·191 Posts |
![]() Quote:
Code:
2020-11-24 18:39:05 GpuOwl VERSION v7.2-13-g266aed4 2020-11-24 18:39:05 config: -device 1 2020-11-24 18:39:05 config: -proof 8 2020-11-24 18:39:05 config: -nospin 2020-11-24 18:39:05 config: -maxAlloc 12288 2020-11-24 18:39:05 config: -time new 2020-11-24 18:39:05 device 1, unique id '' 2020-11-24 18:39:05 gfx906-1 108850051 FFT: 6M 1K:12:256 (17.30 bpw) 2020-11-24 18:39:05 gfx906-1 108850051 OpenCL args "-DEXP=108850051u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0.62309825525553619 -DIWEIGHT_STEP_MINUS_1=-0.3838943534305243 -DIWEIGHTS={0,-0.3838943534305243,-0.24082766453041662,-0.064539274795706897,-0.42365736505765839,-0.28982409650658664,-0.12491323160025802,-0.46085410075068395,-0.33565833429543745,-0.18139069701609609,-0.49565018609731404,-0.37853446361658188,-0.23422314777169634,-0.056401114659886273,-0.41864339864529271,-0.28364583046985026,} -cl-std=CL2.0 -cl-finite-math-only " 2020-11-24 18:39:05 gfx906-1 108850051 ASM compilation failed, retrying compilation using NO_ASM 2020-11-24 18:39:10 gfx906-1 108850051 OpenCL compilation in 4.88 s 2020-11-24 18:39:10 gfx906-1 108850051 maxAlloc: 12.0 GB 2020-11-24 18:39:11 gfx906-1 108850051 P1(5.5M) 7935851 bits 2020-11-24 18:39:11 gfx906-1 108850051 OK 23181600 on-load: blockSize 400, 1b5e1ed9eeabd073 2020-11-24 18:39:11 gfx906-1 108850051 validating proof residues for power 8 2020-11-24 18:39:16 gfx906-1 108850051 Proof using power 8 2020-11-24 18:39:17 gfx906-1 108850051 OK 23182400 21.30% b8b6300a79c5adab 1072 us/it + check 0.52s + save 0.20s; ETA 1d 01:30 2020-11-24 18:39:17 gfx906-1 108850051 32.86% carryFused : 386 us/call x 771 calls 2020-11-24 18:39:17 gfx906-1 108850051 20.78% tailFusedSquare : 236 us/call x 799 calls 2020-11-24 18:39:17 gfx906-1 108850051 20.65% fftMiddleOut : 226 us/call x 828 calls 2020-11-24 18:39:17 gfx906-1 108850051 19.39% fftMiddleIn : 205 us/call x 858 calls 2020-11-24 18:39:17 gfx906-1 108850051 2.96% tailFusedMul : 925 us/call x 29 calls 2020-11-24 18:39:17 gfx906-1 108850051 1.36% fftP : 142 us/call x 87 calls 2020-11-24 18:39:17 gfx906-1 108850051 1.18% fftW : 188 us/call x 57 calls 2020-11-24 18:39:17 gfx906-1 108850051 0.65% carryA : 105 us/call x 56 calls 2020-11-24 18:39:17 gfx906-1 108850051 0.15% carryB : 24 us/call x 57 calls 2020-11-24 18:39:17 gfx906-1 108850051 0.01% carryM : 106 us/call x 1 calls 2020-11-24 18:39:17 gfx906-1 108850051 Total time 0.906 s 2020-11-24 18:39:25 gfx906-1 108850051 23190000 21.30% 64c7d926d1d9b5de 1066 us/it 2020-11-24 18:39:37 gfx906-1 108850051 OK 23200000 21.31% e9c259cd41928e74 1067 us/it + check 0.53s + save 0.19s; ETA 1d 01:24 2020-11-24 18:39:37 gfx906-1 108850051 36.49% carryFused : 386 us/call x 17199 calls 2020-11-24 18:39:37 gfx906-1 108850051 22.27% tailFusedSquare : 235 us/call x 17200 calls 2020-11-24 18:39:37 gfx906-1 108850051 21.43% fftMiddleOut : 226 us/call x 17242 calls 2020-11-24 18:39:37 gfx906-1 108850051 19.43% fftMiddleIn : 205 us/call x 17244 calls 2020-11-24 18:39:37 gfx906-1 108850051 0.24% tailFusedMul : 1060 us/call x 42 calls 2020-11-24 18:39:37 gfx906-1 108850051 0.07% fftP : 280 us/call x 45 calls 2020-11-24 18:39:37 gfx906-1 108850051 0.04% fftW : 187 us/call x 43 calls 2020-11-24 18:39:37 gfx906-1 108850051 0.02% carryA : 105 us/call x 43 calls 2020-11-24 18:39:37 gfx906-1 108850051 Total time 18.177 s 2020-11-24 18:39:47 gfx906-1 108850051 23210000 21.32% 97594ec565fc36c5 1066 us/it 2020-11-24 18:39:58 gfx906-1 108850051 23220000 21.33% 40ad02e262e45845 1067 us/it 2020-11-24 18:40:09 gfx906-1 108850051 23230000 21.34% 64fbb613bc7be810 1067 us/it 2020-11-24 18:40:10 gfx906-1 108850051 Stopping, please wait.. BTW, what does this "tailFusedMul" do? It seems to be called not often and is the slowest one of them. If it was faster, the whole iteration time could rapidly decrease, no? |
|
![]() |
![]() |
![]() |
#185 | |
Jul 2009
Germany
10001000112 Posts |
![]() Quote:
You also have a bit of a slowdown at tailFusedMul. (maybe because of heat) Here comparison values with Vega 64 6.11.364 Win 10 Pro Code:
2020-11-24 19:36:00 config: -user geschwen 2020-11-24 19:36:00 config: -cpu AMD_RXVega64 2020-11-24 19:36:00 config: -carry short 2020-11-24 19:36:00 config: -time new -prp 108850051 2020-11-24 19:36:00 device 0, unique id '' 2020-11-24 19:36:00 AMD_RXVega64 108850051 FFT: 6M 1K:12:256 (17.30 bpw) 2020-11-24 19:36:00 AMD_RXVega64 Expected maximum carry32: 2D8C0000 2020-11-24 19:36:00 AMD_RXVega64 OpenCL args "-DEXP=108850051u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0x9.f835e0484667p-4 -DIWEIGHT_STEP_MINUS_1=-0xc.48dccfa34d258p-5 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-11-24 19:36:00 AMD_RXVega64 ASM compilation failed, retrying compilation using NO_ASM 2020-11-24 19:36:03 AMD_RXVega64 OpenCL compilation in 2.61 s 2020-11-24 19:36:04 AMD_RXVega64 108850051 OK 800 loaded: blockSize 400, 2396026441e24dde 2020-11-24 19:36:04 AMD_RXVega64 validating proof residues for power 8 2020-11-24 19:36:04 AMD_RXVega64 Proof using power 8 2020-11-24 19:36:06 AMD_RXVega64 108850051 OK 1600 0.00%; 1772 us/it; ETA 2d 05:34; 7f6b9fd4a86eb8b1 (check 0.78s) 2020-11-24 19:36:06 AMD_RXVega64 33.87% carryFused : 651 us/call x 1169 calls 2020-11-24 19:36:06 AMD_RXVega64 23.91% tailFusedSquare : 448 us/call x 1199 calls 2020-11-24 19:36:06 AMD_RXVega64 19.95% fftMiddleIn : 356 us/call x 1259 calls 2020-11-24 19:36:06 AMD_RXVega64 16.34% fftMiddleOut : 299 us/call x 1229 calls 2020-11-24 19:36:06 AMD_RXVega64 2.53% tailFusedMul : 1898 us/call x 30 calls 2020-11-24 19:36:06 AMD_RXVega64 1.38% fftP : 345 us/call x 90 calls 2020-11-24 19:36:06 AMD_RXVega64 1.00% fftW : 375 us/call x 60 calls 2020-11-24 19:36:06 AMD_RXVega64 0.92% carryA : 350 us/call x 59 calls 2020-11-24 19:36:06 AMD_RXVega64 0.07% carryB : 27 us/call x 60 calls 2020-11-24 19:36:06 AMD_RXVega64 0.02% carryM : 380 us/call x 1 calls 2020-11-24 19:36:06 AMD_RXVega64 Total time 2.246 s 2020-11-24 19:42:03 AMD_RXVega64 108850051 OK 200000 0.18%; 1794 us/it; ETA 2d 06:08; 33200cbce32214be (check 0.80s) 2020-11-24 19:42:03 AMD_RXVega64 36.76% carryFused : 659 us/call x 197904 calls 2020-11-24 19:42:03 AMD_RXVega64 25.53% tailFusedSquare : 457 us/call x 198400 calls 2020-11-24 19:42:03 AMD_RXVega64 20.17% fftMiddleIn : 359 us/call x 199390 calls 2020-11-24 19:42:03 AMD_RXVega64 16.93% fftMiddleOut : 302 us/call x 198895 calls 2020-11-24 19:42:03 AMD_RXVega64 0.26% tailFusedMul : 1847 us/call x 495 calls 2020-11-24 19:42:03 AMD_RXVega64 0.15% fftP : 347 us/call x 1486 calls 2020-11-24 19:42:03 AMD_RXVega64 0.10% fftW : 363 us/call x 991 calls 2020-11-24 19:42:03 AMD_RXVega64 0.10% carryA : 352 us/call x 991 calls 2020-11-24 19:42:03 AMD_RXVega64 Total time 354.767 s 2020-11-24 19:42:39 AMD_RXVega64 Stopping, please wait.. Last fiddled with by moebius on 2020-11-24 at 18:48 |
|
![]() |
![]() |
![]() |
#186 | |
"Mihai Preda"
Apr 2015
2×11×61 Posts |
![]() Quote:
For this kernel in particular, you see that it's invoked only like 500 times while the others are called 200'000 times, so its speed does not matter! This kernel is using in total 0.26% of time, so if magically you'd speed it up to take 0 time, you'd still not gain half a percent overall. Also, the numbers with a small total number of iterations (e.g. measured over a verry small real time) are not very meaningful, you want larger numbers as you have in the second part of your timing log. Last fiddled with by preda on 2020-11-24 at 21:39 |
|
![]() |
![]() |
![]() |
#187 |
Bemusing Prompter
"Danny"
Dec 2002
California
44748 Posts |
![]()
Has 6.x reached end-of-life, or are you going to continue updating it alongside the 7.x branch?
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
GpuOwl PRP-Proof changes | preda | GpuOwl | 20 | 2020-10-17 06:51 |
gpuowl: runtime error | SELROC | GpuOwl | 59 | 2020-10-02 03:56 |
gpuOWL for Wagstaff | GP2 | GpuOwl | 22 | 2020-06-13 16:57 |
gpuowl tuning | M344587487 | GpuOwl | 14 | 2018-12-29 08:11 |
How to interface gpuOwl with PrimeNet | preda | PrimeNet | 2 | 2017-10-07 21:32 |