![]() |
|
|
#78 | |
|
"Mihai Preda"
Apr 2015
101010110112 Posts |
Quote:
|
|
|
|
|
|
|
#79 |
|
"David"
Jul 2015
Ohio
11×47 Posts |
I could load gpuOwl on all of my AMD systems (with the latest driver) and concentrate on lots of DC in the 4096k range.
How many exponents that need D.C. Would be covered more efficiently by this implementation vs clLucas? That's an interesting question. Personally I'd love a mid range option to continue working on the small end of the D.C. backlog. And a big option for 100M digits ;) Last fiddled with by airsquirrels on 2017-05-01 at 23:38 |
|
|
|
|
|
#80 | |
|
"/X\(‘-‘)/X\"
Jan 2013
B7416 Posts |
Quote:
|
|
|
|
|
|
|
#81 |
|
"David"
Jul 2015
Ohio
11×47 Posts |
I have continued to look at the performance discrepancy on my older Debian Jessie systems that are stuck with the fglrx driver. I've noticed for one that the fglrx driver is advertising OpenCL 2.0?
Testing with a nice low 70000141, all 4096K: Code:
Fiji; OpenCL 2.0 AMD-APP (1912.5) (Catalyst 15.12) is seeing 2.255 ms/iter <clLucas: 3.713 ms/iter> (gpuOwl 1.6x faster) Fiji; OpenCL 2.0 AMD-APP (1800.5) (Catalyst 15.7, old bad one) is seeing 5.513 ms/iter <clLucas: 3.972 ms/iter> (gpuOwl at 72% of clLucas speed) Fiji; OpenCL 1.2 AMD-APP (2348.3) (AMDGPU 17.10) is seeing 2.42ms/iter <clLucas: 5.16ms/iter> (gpuOwl 2.1x faster) |
|
|
|
|
|
#82 | |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Quote:
Note that the iteration time *decreases* a bit as the exponent grows (while staying at the same FFT size). This is because the carry propagation step takes longer for smaller exponents (because the word size is smaller and the carry spans more words). But overall the carry propagation time is a small percentage of the total, thus the impact of this is small. |
|
|
|
|
|
|
#83 |
|
"David"
Jul 2015
Ohio
11·47 Posts |
I went through some old conversations I had with Madpoo:
4M FFT is essentially between 73.18M and 77.99M for Prime95 (sweet spot) I went through the months of performance logs from my AMD systems to get a good baseline for FFT size performance (in GhzDay/Day): In my Mersenne.org "Dashboard" I use this generalized formula to quickly estimate GhzDays.: Code:
llcredit=0.0246*(($exponent/1000000)-35)^2 + 3.2416*(($exponent/1000000)-35) + 41.369 ghzDayDay=(86400000/($exponent*$msPerIter)) * $llcredit Code:
Avg Mode Min Max 2048K) 56.5443 56.47 48.20 60.82 2240K) 10.9852 10.86 10.74 11.36 2304K) 47.5053 47.07 35.99 54.34 2400K) 43.8843 44.25 39.54 62.68 2560K) 51.0052 50.79 2.05 54.06 2688K) 25.1341 25.14 25.02 25.21 2880K) 44.7719 45.83 42.34 46.20 3072K) 58.9334 59.04 56.81 60.31 3200K) 49.976 49.95 49.34 50.46 3360K) 32.7874 32.87 31.96 33.07 3456K) 55.2102 55.22 54.77 55.55 3840K) 44.1085 43.38 42.62 46.06 4000K) 50.9889 51.05 45.95 51.40 4096K) 60.9276 61.06 45.05 61.70 4480K) 30.3824 30.41 29.09 30.52 Code:
Avg Mode Min Max 2048K) 76.6284 78.29 45.63 89.37 2160K) 66.4163 65.90 49.10 78.65 2304K) 69.0988 71.78 54.58 79.31 2352K) 64.5668 64.64 63.75 64.66 2592K) 76.4363 75.27 59.89 89.20 2880K) 72.355 71.03 69.94 73.97 3024K) 75.0943 78.21 63.86 81.43 3584K) 60.9709 58.75 57.07 65.83 4096K) 80.1439 82.72 67.85 91.17 4320K) 66.5236 63.43 57.70 76.42 19208K) 41.36 41.40 41.02 41.40 Code:
Exp ms/iter LL Credit GhzDay/Day 40 2.25 58.192 55.86432 42 2.25 65.2656 59.67140571 44 2.25 72.536 63.30414545 46 2.25 80.0032 66.78528 48 2.25 87.6672 70.13376 50 2.25 95.528 73.365504 52 2.25 103.5856 76.49398154 54 2.25 111.84 79.53066667 56 2.25 120.2912 82.48539429 58 2.25 128.9392 85.36664276 60 2.25 137.784 88.18176 62 2.25 146.8256 90.93714581 64 2.25 156.064 93.6384 66 2.25 165.4992 96.29044364 68 2.25 175.1312 98.89761882 70 2.25 184.96 101.4637714 72 2.25 194.9856 103.99232 74 2.25 205.208 106.4863135 76 2.25 215.6272 108.94848 78 2.25 226.2432 111.3812677 Last fiddled with by airsquirrels on 2017-05-02 at 01:00 |
|
|
|
|
|
#84 | |
|
"David"
Jul 2015
Ohio
20516 Posts |
Quote:
Using 75000143 with my best 15.12 driver I get 2.36ms/iter, which is just sightly slower than the 70M range. |
|
|
|
|
|
|
#85 |
|
"David"
Jul 2015
Ohio
51710 Posts |
One more reply to myself. I tested this theory on a 7GPU system with a simple patch to gpuOwl to match the output format of clLucas (so it could drop-into my management scripts):
Original work load, GPU 1 using gpuOwl Code:
FFTSize: 4096K Exponent: 42424699 (0.31%) Error: 0.0000 ms: 2.2720 eta: 26:41:34 Card 1 (gpuOwl AMD Radeon (TM) R9 Fury Series - 31.00C, 100% Load [1050/1050], M42424699 using 4096K) GhzDay: 59.87 FFTSize: 2304K Exponent: 42446867 (31.35%) Error: 0.1885 ms: 2.4882 eta: 20:08:25 Card 2 (AMD Radeon (TM) R9 Fury Series - 32.00C, 100% Load [1050/1050], M42446867 using 2304K) GhzDay: 54.71 FFTSize: 2304K Exponent: 42495623 (30.44%) Error: 0.1875 ms: 2.4890 eta: 20:26:13 Card 3 (AMD Radeon (TM) R9 Fury Series - 33.00C, 100% Load [1050/1050], M42495623 using 2304K) GhzDay: 54.77 FFTSize: 2560K Exponent: 42852191 (63.55%) Error: 0.0208 ms: 2.8335 eta: 12:17:38 Card 4 (AMD Radeon (TM) R9 Fury Series - 34.00C, 100% Load [1050/1050], M42852191 using 2560K) GhzDay: 48.63 FFTSize: 4480K Exponent: 78920381 (76.28%) Error: 0.1064 ms: 9.1494 eta: 47:34:36 Card 5 (AMD Radeon (TM) R9 Fury Series - 38.00C, 100% Load [1050/1050], M78920381 using 4480K) GhzDay: 27.66 FFTSize: 4480K Exponent: 78920419 (89.19%) Error: 0.0996 ms: 7.9857 eta: 18:55:17 Card 6 (AMD Radeon (TM) R9 Fury Series - 35.00C, 100% Load [1050/1050], M78920419 using 4480K) GhzDay: 31.69 FFTSize: 4480K Exponent: 78920497 (89.19%) Error: 0.1016 ms: 7.9951 eta: 18:56:38 Card 7 (AMD Radeon (TM) R9 Fury Series - 33.00C, 100% Load [1050/1050], M78920497 using 4480K) GhzDay: 31.66 Total GhzDay(7 cards): 308.99 Code:
FFTSize: 4096K Exponent: 42424699 (0.26%) Error: 0.0000 ms: 2.2747 eta: 26:44:13 Card 1 (gpuOwl AMD Radeon (TM) R9 Fury Series - 31.00C, 0% Load [1050/1050], M42424699 using 4096K) GhzDay: 59.80 FFTSize: 4096K Exponent: 42446867 (0.07%) Error: 0.0000 ms: 2.2731 eta: 26:46:58 Card 2 (gpuOwl AMD Radeon (TM) R9 Fury Series - 33.00C, 100% Load [1050/1050], M42446867 using 4096K) GhzDay: 59.88 FFTSize: 4096K Exponent: 42495623 (0.07%) Error: 0.0000 ms: 2.2734 eta: 26:49:01 Card 3 (gpuOwl AMD Radeon (TM) R9 Fury Series - 34.00C, 100% Load [1050/1050], M42495623 using 4096K) GhzDay: 59.96 FFTSize: 4096K Exponent: 42852191 (0.07%) Error: 0.0000 ms: 2.2751 eta: 27:03:45 Card 4 (gpuOwl AMD Radeon (TM) R9 Fury Series - 35.00C, 100% Load [1050/1050], M42852191 using 4096K) GhzDay: 60.56 FFTSize: 4480K Exponent: 78920381 (76.34%) Error: 0.0840 ms: 8.0626 eta: 41:48:49 Card 5 (AMD Radeon (TM) R9 Fury Series - 38.00C, 100% Load [1050/1050], M78920381 using 4480K) GhzDay: 31.39 FFTSize: 4480K Exponent: 78920419 (89.26%) Error: 0.0742 ms: 6.9726 eta: 16:25:27 Card 6 (AMD Radeon (TM) R9 Fury Series - 35.00C, 100% Load [1050/1050], M78920419 using 4480K) GhzDay: 36.30 FFTSize: 4480K Exponent: 78920497 (89.26%) Error: 0.1016 ms: 7.9935 eta: 18:49:44 Card 7 (AMD Radeon (TM) R9 Fury Series - 33.00C, 100% Load [1050/1050], M78920497 using 4480K) GhzDay: 31.66 Total GhzDay(7 cards): 339.55 Code:
FFTSize: 4096K Exponent: 73001809 (0.03%) Error: 0.0625 ms: 2.2571 eta: 45:45:27 Card 1 (gpuOwl AMD Radeon (TM) R9 Fury Series - 31.00C, 100% Load [1050/1050], M73001809 using 4096K) GhzDay: 104.91 FFTSize: 4096K Exponent: 73001989 (0.03%) Error: 0.0625 ms: 2.2615 eta: 45:50:49 Card 2 (gpuOwl AMD Radeon (TM) R9 Fury Series - 33.00C, 100% Load [1050/1050], M73001989 using 4096K) GhzDay: 104.71 FFTSize: 4096K Exponent: 73002113 (0.03%) Error: 0.0703 ms: 2.2623 eta: 45:51:47 Card 3 (gpuOwl AMD Radeon (TM) R9 Fury Series - 34.00C, 100% Load [1050/1050], M73002113 using 4096K) GhzDay: 104.67 FFTSize: 4096K Exponent: 73002211 (0.03%) Error: 0.0625 ms: 2.2624 eta: 45:51:55 Card 4 (gpuOwl AMD Radeon (TM) R9 Fury Series - 36.00C, 0% Load [1050/1050], M73002211 using 4096K) GhzDay: 104.67 FFTSize: 4096K Exponent: 73001413 (0.03%) Error: 0.0625 ms: 2.2628 eta: 45:52:22 Card 5 (gpuOwl AMD Radeon (TM) R9 Fury Series - 40.00C, 100% Load [1050/1050], M73001413 using 4096K) GhzDay: 104.65 FFTSize: 4096K Exponent: 73001603 (0.03%) Error: 0.0625 ms: 2.2595 eta: 45:48:22 Card 6 (gpuOwl AMD Radeon (TM) R9 Fury Series - 36.00C, 100% Load [1050/1050], M73001603 using 4096K) GhzDay: 104.80 FFTSize: 4480K Exponent: 78920497 (89.34%) Error: 0.1016 ms: 8.0108 eta: 18:42:50 Card 7 (AMD Radeon (TM) R9 Fury Series - 33.00C, 100% Load [1050/1050], M78920497 using 4480K) GhzDay: 31.60 Total GhzDay(7 cards): 660.01 Code:
Card 1 (GeForce GTX TITAN Black - 78.00C, 100% Load [862/1202]@247.13W/250.00W, M73002467 using 4096K) GhzDay: 91.14 FFTSize: 4096K Exponent: 73004003 (0.01%) Error: 0.07422 ms: 2.7951 eta: 2:08:40:28 Card 2 (GeForce GTX TITAN - 79.00C, 100% Load [758/1254]@207.82W/250.00W, M73004003 using 4096K) GhzDay: 84.72 FFTSize: 4096K Exponent: 73002749 (0.02%) Error: 0.07812 ms: 2.5954 eta: 2:04:29:18 Card 3 (GeForce GTX TITAN Black - 86.00C, 100% Load [901/1280]@249.31W/250.00W, M73002749 using 4096K) GhzDay: 91.24 FFTSize: 4096K Exponent: 73003157 (0.02%) Error: 0.07812 ms: 2.6037 eta: 2:04:49:09 Card 4 (GeForce GTX TITAN Black - 79.00C, 100% Load [862/1202]@247.68W/250.00W, M73003157 using 4096K) GhzDay: 90.95 FFTSize: 4096K Exponent: 73003547 (0.02%) Error: 0.07812 ms: 2.6015 eta: 2:04:45:14 Card 5 (GeForce GTX TITAN Black - 77.00C, 99% Load [862/1202]@249.90W/250.00W, M73003547 using 4096K) GhzDay: 91.03 FFTSize: 4096K Exponent: 73003741 (0.02%) Error: 0.07812 ms: 2.6205 eta: 2:04:59:23 Card 6 (GeForce GTX TITAN Black - 76.00C, 100% Load [862/1202]@249.87W/250.00W, M73003741 using 4096K) GhzDay: 90.37 FFTSize: 4096K Exponent: 73003859 (0.02%) Error: 0.07812 ms: 2.5973 eta: 2:04:37:29 Card 7 (GeForce GTX TITAN Black - 77.00C, 100% Load [862/1202]@248.60W/250.00W, M73003859 using 4096K) GhzDay: 91.17 FFTSize: 4096K Exponent: 73003939 (0.01%) Error: 0.07031 ms: 2.6021 eta: 2:04:45:41 Card 8 (GeForce GTX TITAN Black - 75.00C, 100% Load [888/1202]@248.09W/250.00W, M73003939 using 4096K) GhzDay: 91.01 Total GhzDay(8 cards): 721.63 |
|
|
|
|
|
#86 |
|
Romulan Interpreter
Jun 2011
Thailand
961110 Posts |
Hey, let the titans apart, you are comparing apples and watermelons
![]() By the way, long ago you said you will send me some damaged titans, which, if I could repair, I could use for myself. I even offered to pay for shipping. Any news? Could you repair them by yourself? Did you give up? Throw them away? (that should be vert bad of you! )
|
|
|
|
|
|
#87 | |
|
"Mihai Preda"
Apr 2015
101010110112 Posts |
Quote:
54460000 / 76008281 [71.65%], ms/iter: 2.124, ETA: 0d 12:43; 34e5c50b53ce4ce4 error 0.21875 (max 0.21875) 54480000 / 76008281 [71.68%], ms/iter: 2.128, ETA: 0d 12:43; 817ee6b4419a5303 error 0.1875 (max 0.21875) 54500000 / 76008281 [71.70%], ms/iter: 2.126, ETA: 0d 12:42; bb7a5ac252b61a9e error 0.1875 (max 0.21875) |
|
|
|
|
|
|
#88 |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
@airsquirrels : Impressive hardware! Do you have a description somewhere of you hardware setup? (e.g. what motherboard, how are the GPUs connected and cooled, pictures, power use etc).
30C is such a low temperature, how do you cool? or was that only on startup? |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |