mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2017-05-01, 17:20   #78
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

101010110112 Posts
Default

Quote:
Originally Posted by LaurV View Post
Short question: Does the new format of the file implies that I can not resume from the old format? (if so, then I will have to wait first to finish 76453229 before playing with the new version, sorry.You do not have to do anything in this direction, whatever format you chose for the future, it is ok with us).
No, the program still reads the old format from save-N.bin if the new file (cN.ll) is not there. So it should be possible to "switch format" in the middle.
preda is offline   Reply With Quote
Old 2017-05-01, 23:37   #79
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11×47 Posts
Default

I could load gpuOwl on all of my AMD systems (with the latest driver) and concentrate on lots of DC in the 4096k range.

How many exponents that need D.C. Would be covered more efficiently by this implementation vs clLucas? That's an interesting question.

Personally I'd love a mid range option to continue working on the small end of the D.C. backlog. And a big option for 100M digits ;)

Last fiddled with by airsquirrels on 2017-05-01 at 23:38
airsquirrels is offline   Reply With Quote
Old 2017-05-02, 00:24   #80
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

B7416 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
I could load gpuOwl on all of my AMD systems (with the latest driver) and concentrate on lots of DC in the 4096k range.

How many exponents that need D.C. Would be covered more efficiently by this implementation vs clLucas? That's an interesting question.

Personally I'd love a mid range option to continue working on the small end of the D.C. backlog. And a big option for 100M digits ;)
Between a 3584K and 4096K FFT (68M to 78M), there are approximately 158K exponents needing a DC, 25K awaiting LL, 6.4K assigned LL, and 0.5K assigned DC.
Mark Rose is offline   Reply With Quote
Old 2017-05-02, 00:28   #81
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11×47 Posts
Default

I have continued to look at the performance discrepancy on my older Debian Jessie systems that are stuck with the fglrx driver. I've noticed for one that the fglrx driver is advertising OpenCL 2.0?

Testing with a nice low 70000141, all 4096K:

Code:
Fiji; OpenCL 2.0 AMD-APP (1912.5) (Catalyst 15.12) is seeing 2.255 ms/iter
<clLucas: 3.713 ms/iter> (gpuOwl 1.6x faster)
Fiji; OpenCL 2.0 AMD-APP (1800.5) (Catalyst 15.7, old bad one) is seeing 5.513 ms/iter
<clLucas: 3.972 ms/iter> (gpuOwl at 72% of clLucas speed)
Fiji; OpenCL 1.2 AMD-APP (2348.3) (AMDGPU 17.10) is seeing 2.42ms/iter
<clLucas: 5.16ms/iter> (gpuOwl 2.1x faster)
Sounds like for me I'll be sticking with the fglrx 15.12 / Debian 8 system for the best times on both applications with the Fury X. All residues matched.
airsquirrels is offline   Reply With Quote
Old 2017-05-02, 00:47   #82
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
I have continued to look at the performance discrepancy on my older Debian Jessie systems that are stuck with the fglrx driver. I've noticed for one that the fglrx driver is advertising OpenCL 2.0?

Testing with a nice low 70000141, all 4096K:

Code:
Fiji; OpenCL 2.0 AMD-APP (1912.5) (Catalyst 15.12) is seeing 2.255 ms/iter
<clLucas: 3.713 ms/iter> (gpuOwl 1.6x faster)
Fiji; OpenCL 2.0 AMD-APP (1800.5) (Catalyst 15.7, old bad one) is seeing 5.513 ms/iter
<clLucas: 3.972 ms/iter> (gpuOwl at 72% of clLucas speed)
Fiji; OpenCL 1.2 AMD-APP (2348.3) (AMDGPU 17.10) is seeing 2.42ms/iter
<clLucas: 5.16ms/iter> (gpuOwl 2.1x faster)
Sounds like for me I'll be sticking with the fglrx 15.12 / Debian 8 system for the best times on both applications with the Fury X. All residues matched.
Could you please report the iteration time on the best OpenCL setup on FuryX standard (not-overclocked) for something around 76M, e.g. 76008281. For this particular exponent I see 2.125ms, I'm curious how fglrx 15.12 compares. (I use amdgpu 17.10 on Ubuntu 16.10)

Note that the iteration time *decreases* a bit as the exponent grows (while staying at the same FFT size). This is because the carry propagation step takes longer for smaller exponents (because the word size is smaller and the carry spans more words). But overall the carry propagation time is a small percentage of the total, thus the impact of this is small.
preda is offline   Reply With Quote
Old 2017-05-02, 00:49   #83
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11·47 Posts
Default

I went through some old conversations I had with Madpoo:

4M FFT is essentially between 73.18M and 77.99M for Prime95 (sweet spot)

I went through the months of performance logs from my AMD systems to get a good baseline for FFT size performance (in GhzDay/Day):

In my Mersenne.org "Dashboard" I use this generalized formula to quickly estimate GhzDays.:

Code:
llcredit=0.0246*(($exponent/1000000)-35)^2 + 3.2416*(($exponent/1000000)-35) + 41.369
ghzDayDay=(86400000/($exponent*$msPerIter)) * $llcredit
AMD Cards (clFFT, Fury X):
Code:
	Avg	Mode	Min	Max
2048K)	56.5443	56.47	48.20	60.82
2240K)	10.9852	 10.86	 10.74	 11.36
2304K)	47.5053	 47.07	 35.99	 54.34
2400K)	43.8843	 44.25	 39.54	 62.68
2560K)	51.0052	 50.79	 2.05	 54.06
2688K)	25.1341	 25.14	 25.02	 25.21
2880K)	44.7719	 45.83	 42.34	 46.20
3072K)	58.9334	 59.04	 56.81	 60.31
3200K)	49.976	 49.95	 49.34	 50.46
3360K)	32.7874	 32.87	 31.96	 33.07
3456K)	55.2102	 55.22	 54.77	 55.55
3840K)	44.1085	 43.38	 42.62	 46.06
4000K)	50.9889	 51.05	 45.95	 51.40
4096K)	60.9276	 61.06	 45.05	 61.70
4480K)	30.3824	 30.41	 29.09	 30.52
NVIDIA (cuFFT, Titan)
Code:
	Avg	Mode	Min	Max
2048K)	76.6284	 78.29	 45.63	 89.37
2160K)	66.4163	 65.90	 49.10	 78.65
2304K)	69.0988	 71.78	 54.58	 79.31
2352K)	64.5668	 64.64	 63.75	 64.66
2592K)	76.4363	 75.27	 59.89	 89.20
2880K)	72.355	 71.03	 69.94	 73.97
3024K)	75.0943	 78.21	 63.86	 81.43
3584K)	60.9709	 58.75	 57.07	 65.83
4096K)	80.1439	 82.72	 67.85	 91.17
4320K)	66.5236	 63.43	 57.70	 76.42
19208K) 41.36	 41.40	 41.02	 41.40
Since the ms/iter on my Fury X with the fglrx 15.12 driver is a constant 2.25ms/iter @ 4096K, I can extrapolate this chart using the formula:
Code:
Exp	ms/iter	LL Credit	GhzDay/Day
40	2.25	58.192		55.86432
42	2.25	65.2656		59.67140571
44	2.25	72.536		63.30414545
46	2.25	80.0032		66.78528
48	2.25	87.6672		70.13376
50	2.25	95.528		73.365504
52	2.25	103.5856	76.49398154
54	2.25	111.84		79.53066667
56	2.25	120.2912	82.48539429
58	2.25	128.9392	85.36664276
60	2.25	137.784		88.18176
62	2.25	146.8256	90.93714581
64	2.25	156.064		93.6384
66	2.25	165.4992	96.29044364
68	2.25	175.1312	98.89761882
70	2.25	184.96		101.4637714
72	2.25	194.9856	103.99232
74	2.25	205.208		106.4863135
76	2.25	215.6272	108.94848
78	2.25	226.2432	111.3812677
From my perspective, that means even using a 4096k all the way down to the DC line gpuOwl will outperform clLucas

Last fiddled with by airsquirrels on 2017-05-02 at 01:00
airsquirrels is offline   Reply With Quote
Old 2017-05-02, 00:59   #84
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

20516 Posts
Default

Quote:
Originally Posted by preda View Post
Could you please report the iteration time on the best OpenCL setup on FuryX standard (not-overclocked) for something around 76M, e.g. 76008281. For this particular exponent I see 2.125ms, I'm curious how fglrx 15.12 compares. (I use amdgpu 17.10 on Ubuntu 16.10)

Note that the iteration time *decreases* a bit as the exponent grows (while staying at the same FFT size). This is because the carry propagation step takes longer for smaller exponents (because the word size is smaller and the carry spans more words). But overall the carry propagation time is a small percentage of the total, thus the impact of this is small.
Are you able to test 76008281? I get an error and quit due to an error rate of 0.5 with the current code.

Using 75000143 with my best 15.12 driver I get 2.36ms/iter, which is just sightly slower than the 70M range.
airsquirrels is offline   Reply With Quote
Old 2017-05-02, 01:59   #85
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

51710 Posts
Default

One more reply to myself. I tested this theory on a 7GPU system with a simple patch to gpuOwl to match the output format of clLucas (so it could drop-into my management scripts):

Original work load, GPU 1 using gpuOwl
Code:
FFTSize: 4096K Exponent: 42424699 (0.31%) Error: 0.0000 ms: 2.2720 eta: 26:41:34
Card 1 (gpuOwl AMD Radeon (TM) R9 Fury Series - 31.00C, 100% Load [1050/1050], M42424699 using 4096K) GhzDay: 59.87
FFTSize: 2304K Exponent: 42446867 (31.35%) Error: 0.1885 ms: 2.4882 eta: 20:08:25
Card 2 (AMD Radeon (TM) R9 Fury Series - 32.00C, 100% Load [1050/1050], M42446867 using 2304K) GhzDay: 54.71
FFTSize: 2304K Exponent: 42495623 (30.44%) Error: 0.1875 ms: 2.4890 eta: 20:26:13
Card 3 (AMD Radeon (TM) R9 Fury Series - 33.00C, 100% Load [1050/1050], M42495623 using 2304K) GhzDay: 54.77
FFTSize: 2560K Exponent: 42852191 (63.55%) Error: 0.0208 ms: 2.8335 eta: 12:17:38
Card 4 (AMD Radeon (TM) R9 Fury Series - 34.00C, 100% Load [1050/1050], M42852191 using 2560K) GhzDay: 48.63
FFTSize: 4480K Exponent: 78920381 (76.28%) Error: 0.1064 ms: 9.1494 eta: 47:34:36
Card 5 (AMD Radeon (TM) R9 Fury Series - 38.00C, 100% Load [1050/1050], M78920381 using 4480K) GhzDay: 27.66
FFTSize: 4480K Exponent: 78920419 (89.19%) Error: 0.0996 ms: 7.9857 eta: 18:55:17
Card 6 (AMD Radeon (TM) R9 Fury Series - 35.00C, 100% Load [1050/1050], M78920419 using 4480K) GhzDay: 31.69
FFTSize: 4480K Exponent: 78920497 (89.19%) Error: 0.1016 ms: 7.9951 eta: 18:56:38
Card 7 (AMD Radeon (TM) R9 Fury Series - 33.00C, 100% Load [1050/1050], M78920497 using 4480K) GhzDay: 31.66
Total GhzDay(7 cards): 308.99
Original workload, GPUs 1-4 using gpuOwl (due to larger LL tests on 5,6,7). Note that even using 4096k times are still better
Code:
FFTSize: 4096K Exponent: 42424699 (0.26%) Error: 0.0000 ms: 2.2747 eta: 26:44:13
Card 1 (gpuOwl AMD Radeon (TM) R9 Fury Series - 31.00C, 0% Load [1050/1050], M42424699 using 4096K) GhzDay: 59.80
FFTSize: 4096K Exponent: 42446867 (0.07%) Error: 0.0000 ms: 2.2731 eta: 26:46:58
Card 2 (gpuOwl AMD Radeon (TM) R9 Fury Series - 33.00C, 100% Load [1050/1050], M42446867 using 4096K) GhzDay: 59.88
FFTSize: 4096K Exponent: 42495623 (0.07%) Error: 0.0000 ms: 2.2734 eta: 26:49:01
Card 3 (gpuOwl AMD Radeon (TM) R9 Fury Series - 34.00C, 100% Load [1050/1050], M42495623 using 4096K) GhzDay: 59.96
FFTSize: 4096K Exponent: 42852191 (0.07%) Error: 0.0000 ms: 2.2751 eta: 27:03:45
Card 4 (gpuOwl AMD Radeon (TM) R9 Fury Series - 35.00C, 100% Load [1050/1050], M42852191 using 4096K) GhzDay: 60.56
FFTSize: 4480K Exponent: 78920381 (76.34%) Error: 0.0840 ms: 8.0626 eta: 41:48:49
Card 5 (AMD Radeon (TM) R9 Fury Series - 38.00C, 100% Load [1050/1050], M78920381 using 4480K) GhzDay: 31.39
FFTSize: 4480K Exponent: 78920419 (89.26%) Error: 0.0742 ms: 6.9726 eta: 16:25:27
Card 6 (AMD Radeon (TM) R9 Fury Series - 35.00C, 100% Load [1050/1050], M78920419 using 4480K) GhzDay: 36.30
FFTSize: 4480K Exponent: 78920497 (89.26%) Error: 0.1016 ms: 7.9935 eta: 18:49:44
Card 7 (AMD Radeon (TM) R9 Fury Series - 33.00C, 100% Load [1050/1050], M78920497 using 4480K) GhzDay: 31.66
Total GhzDay(7 cards): 339.55
New workload, all assignments in the 73M range and GPU1-6 on gpuOwl, GPU7 with the remaining 78M assignments:
Code:
FFTSize: 4096K Exponent: 73001809 (0.03%) Error: 0.0625 ms: 2.2571 eta: 45:45:27
Card 1 (gpuOwl AMD Radeon (TM) R9 Fury Series - 31.00C, 100% Load [1050/1050], M73001809 using 4096K) GhzDay: 104.91
FFTSize: 4096K Exponent: 73001989 (0.03%) Error: 0.0625 ms: 2.2615 eta: 45:50:49
Card 2 (gpuOwl AMD Radeon (TM) R9 Fury Series - 33.00C, 100% Load [1050/1050], M73001989 using 4096K) GhzDay: 104.71
FFTSize: 4096K Exponent: 73002113 (0.03%) Error: 0.0703 ms: 2.2623 eta: 45:51:47
Card 3 (gpuOwl AMD Radeon (TM) R9 Fury Series - 34.00C, 100% Load [1050/1050], M73002113 using 4096K) GhzDay: 104.67
FFTSize: 4096K Exponent: 73002211 (0.03%) Error: 0.0625 ms: 2.2624 eta: 45:51:55
Card 4 (gpuOwl AMD Radeon (TM) R9 Fury Series - 36.00C, 0% Load [1050/1050], M73002211 using 4096K) GhzDay: 104.67
FFTSize: 4096K Exponent: 73001413 (0.03%) Error: 0.0625 ms: 2.2628 eta: 45:52:22
Card 5 (gpuOwl AMD Radeon (TM) R9 Fury Series - 40.00C, 100% Load [1050/1050], M73001413 using 4096K) GhzDay: 104.65
FFTSize: 4096K Exponent: 73001603 (0.03%) Error: 0.0625 ms: 2.2595 eta: 45:48:22
Card 6 (gpuOwl AMD Radeon (TM) R9 Fury Series - 36.00C, 100% Load [1050/1050], M73001603 using 4096K) GhzDay: 104.80
FFTSize: 4480K Exponent: 78920497 (89.34%) Error: 0.1016 ms: 8.0108 eta: 18:42:50
Card 7 (AMD Radeon (TM) R9 Fury Series - 33.00C, 100% Load [1050/1050], M78920497 using 4480K) GhzDay: 31.60
Total GhzDay(7 cards): 660.01
Compared to an 8 card Titan Black (Air) system on 4096K 73M exponents:
Code:
Card 1 (GeForce GTX TITAN Black - 78.00C, 100% Load [862/1202]@247.13W/250.00W, M73002467 using 4096K) GhzDay: 91.14
FFTSize: 4096K Exponent: 73004003 (0.01%) Error: 0.07422 ms: 2.7951 eta: 2:08:40:28
Card 2 (GeForce GTX TITAN - 79.00C, 100% Load [758/1254]@207.82W/250.00W, M73004003 using 4096K) GhzDay: 84.72
FFTSize: 4096K Exponent: 73002749 (0.02%) Error: 0.07812 ms: 2.5954 eta: 2:04:29:18
Card 3 (GeForce GTX TITAN Black - 86.00C, 100% Load [901/1280]@249.31W/250.00W, M73002749 using 4096K) GhzDay: 91.24
FFTSize: 4096K Exponent: 73003157 (0.02%) Error: 0.07812 ms: 2.6037 eta: 2:04:49:09
Card 4 (GeForce GTX TITAN Black - 79.00C, 100% Load [862/1202]@247.68W/250.00W, M73003157 using 4096K) GhzDay: 90.95
FFTSize: 4096K Exponent: 73003547 (0.02%) Error: 0.07812 ms: 2.6015 eta: 2:04:45:14
Card 5 (GeForce GTX TITAN Black - 77.00C, 99% Load [862/1202]@249.90W/250.00W, M73003547 using 4096K) GhzDay: 91.03
FFTSize: 4096K Exponent: 73003741 (0.02%) Error: 0.07812 ms: 2.6205 eta: 2:04:59:23
Card 6 (GeForce GTX TITAN Black - 76.00C, 100% Load [862/1202]@249.87W/250.00W, M73003741 using 4096K) GhzDay: 90.37
FFTSize: 4096K Exponent: 73003859 (0.02%) Error: 0.07812 ms: 2.5973 eta: 2:04:37:29
Card 7 (GeForce GTX TITAN Black - 77.00C, 100% Load [862/1202]@248.60W/250.00W, M73003859 using 4096K) GhzDay: 91.17
FFTSize: 4096K Exponent: 73003939 (0.01%) Error: 0.07031 ms: 2.6021 eta: 2:04:45:41
Card 8 (GeForce GTX TITAN Black - 75.00C, 100% Load [888/1202]@248.09W/250.00W, M73003939 using 4096K) GhzDay: 91.01
Total GhzDay(8 cards): 721.63
airsquirrels is offline   Reply With Quote
Old 2017-05-02, 04:23   #86
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

961110 Posts
Default

Hey, let the titans apart, you are comparing apples and watermelons

By the way, long ago you said you will send me some damaged titans, which, if I could repair, I could use for myself. I even offered to pay for shipping. Any news? Could you repair them by yourself? Did you give up? Throw them away? (that should be vert bad of you! )
LaurV is offline   Reply With Quote
Old 2017-05-02, 07:20   #87
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

101010110112 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
Are you able to test 76008281? I get an error and quit due to an error rate of 0.5 with the current code.
76M is small enough for 4096K FFT, an error of 0.5 is not normal. This is what I see:

54460000 / 76008281 [71.65%], ms/iter: 2.124, ETA: 0d 12:43; 34e5c50b53ce4ce4 error 0.21875 (max 0.21875)
54480000 / 76008281 [71.68%], ms/iter: 2.128, ETA: 0d 12:43; 817ee6b4419a5303 error 0.1875 (max 0.21875)
54500000 / 76008281 [71.70%], ms/iter: 2.126, ETA: 0d 12:42; bb7a5ac252b61a9e error 0.1875 (max 0.21875)
preda is offline   Reply With Quote
Old 2017-05-02, 07:26   #88
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

@airsquirrels : Impressive hardware! Do you have a description somewhere of you hardware setup? (e.g. what motherboard, how are the GPUs connected and cooled, pictures, power use etc).

30C is such a low temperature, how do you cool? or was that only on startup?
preda is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 02:44.


Sat Jul 17 02:44:16 UTC 2021 up 50 days, 31 mins, 1 user, load averages: 1.67, 1.52, 1.47

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.