mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2015-07-30, 19:54   #320
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

10058 Posts
Default

Quote:
Originally Posted by frmky View Post
I expect no faster than an R9 290X. The Fiji GPU runs DP at 1/16 SP, while the Hawaii GPU runs DP at 1/8 SP. So although the Fury X is over 50% faster than the R9 290X at SP, it will likely lag behind the R9 290X at clLucas even with the faster memory.
I have a decent range of AMD GPUs on 64bit Linux (Debian Jessie). I compiled with App SDK 3.0 and clFFT 2.6, here are the results with no active CPU load, for those interested, stats for the Fury X, 390X, R9 270x, HD7990, and HD7950.

Is anyone making active efforts on this front? I would love to contribute more GPU effort towards LL instead of trial factoring. If this codebase is "orphaned" I'm happy to help add the basics and work on the performance tuning.

Fury X (oc 1100,500) - Fiji
M(1257787) Iteration 360000 0x53cd344786484a70, n = 65536 err = 0.124 (0:03 real, 0.2503 ms/iter, ETA 3:42)
M( 1398269 ) Iteration 100000 0x34bbb2325387a40c, n = 73728 err = 0.099 (0:04 real, 0.4755 ms/iter, ETA 10:13)
M( 2976221 ) Iteration 30000 0xd0ce8dd8970f7920, n = 163840 err = 0.04492 (0:07 real, 0.6295 ms/iter, ETA 30:50)
M( 3021377 ) Iteration 30000 0x49541e08f14df9cb, n = 163840 err = 0.06445 (0:06 real, 0.6357 ms/iter, ETA 31:40)
M( 6972593 ) Iteration 30000 0xce5ba3e233214969, n = 393216 err = 0.0498 (0:07 real, 0.7594 ms/iter, ETA 1:27:50)
M( 13466917 ) Iteration 30000 0x46381cfb4bc15a20, n = 786432 err = 0.03027 (0:13 real, 1.2700 ms/iter, ETA 4:44:16)
M( 20996011 ) Iteration 30000 0xb65e9537585c439b, n = 1179648 err = 0.09766 (0:16 real, 1.6364 ms/iter, ETA 9:31:38)
M( 24036583 ) Iteration 30000 0x83ffcb39e3037f28, n = 1310720 err = 0.1973 (0:19 real, 1.9326 ms/iter, ETA 12:53:02)
M( 25964951 ) Iteration 20000 0x97e817b5d84ee8a8, n = 1572864 err = 0.02051 (0:20 real, 2.0141 ms/iter, ETA 14:30:46)
M( 30402457 ) Iteration 30000 0x40ef1d2c4c90fd11, n = 1638400 err = 0.2734 (0:24 real, 2.4549 ms/iter, ETA 20:42:36)
M( 32582657 ) Iteration 20000 0x477c7088fadfc973, n = 1769472 err = 0.2812 (0:21 real, 2.1716 ms/iter, ETA 19:38:27)
M( 37156667 ) Iteration 30000 0x5c0e5a643545a36d, n = 2097152 err = 0.1152 (0:21 real, 2.1407 ms/iter, ETA 22:04:22)
M( 42643801 ) Iteration 30000 0xb504bfbc2d2fff6a, n = 2359296 err = 0.2109 (0:29 real, 2.8523 ms/iter, ETA 33:45:37)
M( 43112609 ) Iteration 30000 0xa93787c90f94abf2, n = 2359296 err = 0.2578 (0:29 real, 2.8586 ms/iter, ETA 34:12:27)
M( 75002911 ) Iteration 30000 0x0f0c343e5174fa89, n = 4194304 err = 0.209 (0:39 real, 3.9026 ms/iter, ETA 81:16:15)
M ( 120909799 ) - Iteration 30000 0x94559314016d82c3, n = 7077888 err = 0.1016 (1:27 real, 8.6740 ms/iter, ETA 291:13:47)

Fury X (No Overclock) - Fiji
M(1257787) Iteration 30000 0x53fec009337bd2d5, n = 65536 err = 0.1162 (0:03 real, 0.2563 ms/iter, ETA 5:12)
M( 1398269 ) Iteration 30000 0x32e7a8b19576df92, n = 73728 err = 0.09375 (0:05 real, 0.5186 ms/iter, ETA 11:45)
M( 2976221 ) Iteration 30000 0xd0ce8dd8970f7920, n = 163840 err = 0.04492 (0:07 real, 0.6804 ms/iter, ETA 33:20)
M( 3021377 ) Iteration 30000 0x49541e08f14df9cb, n = 163840 err = 0.06445 (0:07 real, 0.6804 ms/iter, ETA 33:54)
M( 6972593 ) Iteration 30000 0xce5ba3e233214969, n = 393216 err = 0.0498 (0:09 real, 0.8297 ms/iter, ETA 1:35:58)
M( 13466917 ) Iteration 30000 0x46381cfb4bc15a20, n = 786432 err = 0.03027 (0:14 real, 1.3449 ms/iter, ETA 5:01:02)
M( 20996011 ) Iteration 30000 0xb65e9537585c439b, n = 1179648 err = 0.09766 (0:18 real, 1.7503 ms/iter, ETA 10:11:26)
M( 24036583 ) Iteration 30000 0x83ffcb39e3037f28, n = 1310720 err = 0.1973 (0:20 real, 2.0487 ms/iter, ETA 13:39:27)
M( 25964951 ) Iteration 30000 0x5f3b108387a1afaf, n = 1572864 err = 0.02051 (0:22 real, 2.1621 ms/iter, ETA 15:34:24
M( 30402457 ) Iteration 30000 0x40ef1d2c4c90fd11, n = 1638400 err = 0.2734 (0:27 real, 2.6630 ms/iter, ETA 22:27:54)
M( 32582657 ) Iteration 30000 0xcf8a24f368570cdb, n = 1769472 err = 0.2812 (0:23 real, 2.3203 ms/iter, ETA 20:58:46)
M( 37156667 ) Iteration 30000 0x5c0e5a643545a36d, n = 2097152 err = 0.1152 (0:22 real, 2.2355 ms/iter, ETA 23:02:59)
M( 42643801 ) Iteration 30000 0xb504bfbc2d2fff6a, n = 2359296 err = 0.2109 (0:30 real, 3.0595 ms/iter, ETA 36:12:44)
M( 43112609 ) Iteration 30000 0xa93787c90f94abf2, n = 2359296 err = 0.2578 (0:31 real, 3.0758 ms/iter, ETA 36:48:25)
M( 75002911 ) Iteration 20000 0x777b6635d6b78b75, n = 4194304 err = 0.209 (0:41 real, 4.0754 ms/iter, ETA 84:52:53)
M ( 120909799 ) Iteration 20000 0xca7a93a2afafe4b6, n = 7077888 err = 0.1016 (1:30 real, 9.0156 ms/iter, ETA 302:43:19)

R9 390X - Hawaii
M(1257787) Iteration 30000 0x53fec009337bd2d5, n = 65536 err = 0.1162 (0:02 real, 0.2220 ms/iter, ETA 4:30)
M( 1398269 ) Iteration 30000 0x32e7a8b19576df92, n = 73728 err = 0.09375 (0:04 real, 0.4008 ms/iter, ETA 9:05)
M( 2976221 ) Iteration 30000 0xd0ce8dd8970f7920, n = 163840 err = 0.04492 (0:05 real, 0.5694 ms/iter, ETA 27:53)
M( 3021377 ) Iteration 30000 0x49541e08f14df9cb, n = 163840 err = 0.06445 (0:06 real, 0.5672 ms/iter, ETA 28:15)
M( 6972593 ) Iteration 30000 0xce5ba3e233214969, n = 393216 err = 0.0498 (0:08 real, 0.8740 ms/iter, ETA 1:41:05)
M( 13466917 ) Iteration 30000 0x46381cfb4bc15a20, n = 786432 err = 0.03027 (0:13 real, 1.3209 ms/iter, ETA 4:55:39)
M( 20996011 ) Iteration 30000 0xb65e9537585c439b, n = 1179648 err = 0.09766 (0:18 real, 1.7618 ms/iter, ETA 10:15:27)
M( 24036583 ) Iteration 30000 0x83ffcb39e3037f28, n = 1310720 err = 0.1973 (0:21 real, 2.1362 ms/iter, ETA 14:14:28)
M( 25964951 ) Iteration 30000 0x5f3b108387a1afaf, n = 1572864 err = 0.02051 (0:21 real, 2.1622 ms/iter, ETA 15:34:24)
M( 30402457 ) Iteration 30000 0x40ef1d2c4c90fd11, n = 1638400 err = 0.2734 (0:28 real, 2.7919 ms/iter, ETA 23:33:11)
M( 32582657 ) Iteration 30000 0xcf8a24f368570cdb, n = 1769472 err = 0.2812 (0:23 real, 2.3943 ms/iter, ETA 21:38:55)
M( 37156667 ) Iteration 30000 0x5c0e5a643545a36d, n = 2097152 err = 0.1152 (0:24 real, 2.3605 ms/iter, ETA 24:20:20)
M( 42643801 ) Iteration 30000 0xb504bfbc2d2fff6a, n = 2359296 err = 0.2109 (0:32 real, 3.1333 ms/iter, ETA 37:05:10)
M( 43112609 ) Iteration 30000 0xa93787c90f94abf2, n = 2359296 err = 0.2578 (0:31 real, 3.1325 ms/iter, ETA 37:29:06)
M( 75002911 ) Iteration 30000 0x0f0c343e5174fa89, n = 4194304 err = 0.209 (0:44 real, 4.3873 ms/iter, ETA 91:21:55)
M ( 120909799 ) Iteration 30000 0x94559314016d82c3, n = 7077888 err = 0.1016 (1:43 real, 10.2857 ms/iter, ETA 345:20:27)

R9 270X (Pitcairn)
M(1257787) Iteration 30000 0x53fec009337bd2d5, n = 65536 err = 0.1162 (0:02 real, 0.2385 ms/iter, ETA 4:50)
M( 1398269 ) Iteration 30000 0x32e7a8b19576df92, n = 73728 err = 0.09375 (0:04 real, 0.4698 ms/iter, ETA 10:38)
M( 2976221 ) Iteration 30000 0xd0ce8dd8970f7920, n = 163840 err = 0.04492 (0:08 real, 0.8608 ms/iter, ETA 42:10)
M( 3021377 ) Iteration 30000 0x49541e08f14df9cb, n = 163840 err = 0.06445 (0:09 real, 0.8303 ms/iter, ETA 41:22)
M( 6972593 ) Iteration 30000 0xce5ba3e233214969, n = 393216 err = 0.0498 (0:14 real, 1.3830 ms/iter, ETA 2:39:58)
M( 13466917 )Iteration 30000 0x46381cfb4bc15a20, n = 786432 err = 0.03027 (0:25 real, 2.4661 ms/iter, ETA 9:12:00)
M( 20996011 ) Iteration 30000 0xb65e9537585c439b, n = 1179648 err = 0.09766 (0:37 real, 3.7876 ms/iter, ETA 22:03:07)
M( 24036583 ) Iteration 30000 0x83ffcb39e3037f28, n = 1310720 err = 0.1973 (0:43 real, 4.3121 ms/iter, ETA 28:44:50)
M( 25964951 ) Iteration 30000 0x5f3b108387a1afaf, n = 1572864 err = 0.02051 (0:45 real, 4.4400 ms/iter, ETA 31:58:50)
M( 30402457 ) Iteration 30000 0x40ef1d2c4c90fd11, n = 1638400 err = 0.2734 (0:59 real, 5.8370 ms/iter, ETA 49:14:29)
M( 32582657 ) Iteration 30000 0xcf8a24f368570cdb, n = 1769472 err = 0.2812 (0:54 real, 5.3622 ms/iter, ETA 48:28:58)
M( 37156667 ) Iteration 30000 0x5c0e5a643545a36d, n = 2097152 err = 0.1152 (0:51 real, 5.0492 ms/iter, ETA 52:03:44)
M( 42643801 ) Iteration 30000 0xb504bfbc2d2fff6a, n = 2359296 err = 0.2109 (1:18 real, 7.7441 ms/iter, ETA 91:39:35)
M( 43112609 ) Iteration 30000 0xa93787c90f94abf2, n = 2359296 err = 0.2578 (1:18 real, 7.7393 ms/iter, ETA 92:36:49)
M( 75002911 ) Iteration 30000 0x0f0c343e5174fa89, n = 4194304 err = 0.209 (1:50 real, 10.9624 ms/iter, ETA 228:17:30)
M ( 120909799 ) Iteration 30000 0x94559314016d82c3, n = 7077888 err = 0.1016 (4:19 real, 25.9343 ms/iter, ETA 870:44:41)

HD7990 Tahiti (one chip)
M(1257787) Iteration 30000 0x53fec009337bd2d5, n = 65536 err = 0.1162 (0:03 real, 0.2172 ms/iter, ETA 4:24)
M( 1398269 ) Iteration 30000 0x32e7a8b19576df92, n = 73728 err = 0.09375 (0:04 real, 0.4313 ms/iter, ETA 9:46)
M( 2976221 ) Iteration 30000 0xd0ce8dd8970f7920, n = 163840 err = 0.04492 (0:06 real, 0.6047 ms/iter, ETA 29:37)
M( 3021377 ) Iteration 30000 0x49541e08f14df9cb, n = 163840 err = 0.06445 (0:06 real, 0.5966 ms/iter, ETA 29:43)
M( 6972593 ) Iteration 30000 0xce5ba3e233214969, n = 393216 err = 0.0498 (0:10 real, 0.9579 ms/iter, ETA 1:50:48)
M( 13466917 )Iteration 30000 0x46381cfb4bc15a20, n = 786432 err = 0.03027 (0:15 real, 1.5262 ms/iter, ETA 5:41:37)
M( 20996011 ) Iteration 30000 0xb65e9537585c439b, n = 1179648 err = 0.09766 (0:22 real, 2.1648 ms/iter, ETA 12:36:13)
M( 24036583 ) Iteration 30000 0x83ffcb39e3037f28, n = 1310720 err = 0.1973 (0:24 real, 2.4468 ms/iter, ETA 16:18:42)
M( 25964951 ) Iteration 30000 0x5f3b108387a1afaf, n = 1572864 err = 0.02051 (0:26 real, 2.6486 ms/iter, ETA 19:04:3
M( 30402457 ) Iteration 30000 0x40ef1d2c4c90fd11, n = 1638400 err = 0.2734 (0:33 real, 3.2330 ms/iter, ETA 27:16:25)
M( 32582657 ) Iteration 30000 0xcf8a24f368570cdb, n = 1769472 err = 0.2812 (0:28 real, 2.8465 ms/iter, ETA 25:44:12)
M( 37156667 ) Iteration 30000 0x5c0e5a643545a36d, n = 2097152 err = 0.1152 (0:30 real, 3.0113 ms/iter, ETA 31:02:58)
M( 42643801 ) Iteration 30000 0xb504bfbc2d2fff6a, n = 2359296 err = 0.2109 (0:39 real, 3.9148 ms/iter, ETA 46:20:09)
M( 43112609 ) Iteration 30000 0xa93787c90f94abf2, n = 2359296 err = 0.2578 (0:39 real, 3.9034 ms/iter, ETA 46:42:39)
M( 75002911 ) Iteration 30000 0x0f0c343e5174fa89, n = 4194304 err = 0.209 (0:55 real, 5.4708 ms/iter, ETA 113:55:48)
M ( 120909799 )Iteration 30000 0x94559314016d82c3, n = 7077888 err = 0.1016 (1:57 real, 11.6462 ms/iter, ETA 391:01:17)

HD7950 - Tahiti
M(1257787) Iteration 30000 0x53fec009337bd2d5, n = 65536 err = 0.1162 (0:03 real, 0.2605 ms/iter, ETA 5:17)
M( 1398269 ) Iteration 30000 0x32e7a8b19576df92, n = 73728 err = 0.09375 (0:05 real, 0.4540 ms/iter, ETA 10:17)
M( 2976221 ) Iteration 30000 0xd0ce8dd8970f7920, n = 163840 err = 0.04492 (0:07 real, 0.6601 ms/iter, ETA 32:20)
M( 3021377 ) Iteration 30000 0x49541e08f14df9cb, n = 163840 err = 0.06445 (0:07 real, 0.6672 ms/iter, ETA 33:14)
M( 6972593 ) Iteration 30000 0xce5ba3e233214969, n = 393216 err = 0.0498 (0:11 real, 1.0731 ms/iter, ETA 2:04:07)
M( 13466917 ) Iteration 30000 0x46381cfb4bc15a20, n = 786432 err = 0.03027 (0:18 real, 1.8651 ms/iter, ETA 6:57:28)
M( 20996011 ) Iteration 30000 0xb65e9537585c439b, n = 1179648 err = 0.09766 (0:26 real, 2.6360 ms/iter, ETA 15:20:49)
M( 24036583 ) Iteration 30000 0x83ffcb39e3037f28, n = 1310720 err = 0.1973 (0:29 real, 2.9338 ms/iter, ETA 19:33:31)
M( 25964951 ) Iteration 30000 0x5f3b108387a1afaf, n = 1572864 err = 0.02051 (0:32 real, 3.1916 ms/iter, ETA 22:59:19)
M( 30402457 ) Iteration 30000 0x40ef1d2c4c90fd11, n = 1638400 err = 0.2734 (0:40 real, 3.9598 ms/iter, ETA 33:24:19)
M( 32582657 ) Iteration 30000 0xcf8a24f368570cdb, n = 1769472 err = 0.2812 (0:35 real, 3.5191 ms/iter, ETA 31:49:07)
M( 37156667 ) Iteration 30000 0x5c0e5a643545a36d, n = 2097152 err = 0.1152 (0:36 real, 3.6034 ms/iter, ETA 37:09:18)
M( 42643801 ) Iteration 30000 0xb504bfbc2d2fff6a, n = 2359296 err = 0.2109 (0:47 real, 4.6846 ms/iter, ETA 55:26:49)
M( 43112609 ) Iteration 30000 0xa93787c90f94abf2, n = 2359296 err = 0.2578 (0:47 real, 4.6869 ms/iter, ETA 56:05:13)
M( 75002911 ) Iteration 30000 0x0f0c343e5174fa89, n = 4194304 err = 0.209 (1:06 real, 6.6431 ms/iter, ETA 138:20:30)
M ( 120909799 ) Iteration 30000 0x94559314016d82c3, n = 7077888 err = 0.1016 (2:25 real, 14.4296 ms/iter, ETA 484:28:21)
Attached Thumbnails
Click image for larger version

Name:	AMDGPUclLucasPerformance.png
Views:	179
Size:	74.2 KB
ID:	12924  
airsquirrels is offline   Reply With Quote
Old 2015-07-30, 23:23   #321
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

40428 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
Fury X (No Overclock) - Fiji
M( 37156667 ) Iteration 30000 0x5c0e5a643545a36d, n = 2097152 err = 0.1152 (0:22 real, 2.2355 ms/iter, ETA 23:02:59)
M( 42643801 ) Iteration 30000 0xb504bfbc2d2fff6a, n = 2359296 err = 0.2109 (0:30 real, 3.0595 ms/iter, ETA 36:12:44)
M( 43112609 ) Iteration 30000 0xa93787c90f94abf2, n = 2359296 err = 0.2578 (0:31 real, 3.0758 ms/iter, ETA 36:48:25)
M( 75002911 ) Iteration 20000 0x777b6635d6b78b75, n = 4194304 err = 0.209 (0:41 real, 4.0754 ms/iter, ETA 84:52:53)
M ( 120909799 ) Iteration 20000 0xca7a93a2afafe4b6, n = 7077888 err = 0.1016 (1:30 real, 9.0156 ms/iter, ETA 302:43:19)

R9 390X - Hawaii
M( 37156667 ) Iteration 30000 0x5c0e5a643545a36d, n = 2097152 err = 0.1152 (0:24 real, 2.3605 ms/iter, ETA 24:20:20)
M( 42643801 ) Iteration 30000 0xb504bfbc2d2fff6a, n = 2359296 err = 0.2109 (0:32 real, 3.1333 ms/iter, ETA 37:05:10)
M( 43112609 ) Iteration 30000 0xa93787c90f94abf2, n = 2359296 err = 0.2578 (0:31 real, 3.1325 ms/iter, ETA 37:29:06)
M( 75002911 ) Iteration 30000 0x0f0c343e5174fa89, n = 4194304 err = 0.209 (0:44 real, 4.3873 ms/iter, ETA 91:21:55)
M ( 120909799 ) Iteration 30000 0x94559314016d82c3, n = 7077888 err = 0.1016 (1:43 real, 10.2857 ms/iter, ETA 345:20:27)
I was wrong. It still remains memory bound, so the faster memory wins. Cool!

CUDALucas and clLucas both started from the same codebase, so there are a lot of improvements in the CUDALucas code that might be relatively easy to move over to clLucas.

Last fiddled with by frmky on 2015-07-30 at 23:24
frmky is offline   Reply With Quote
Old 2015-08-06, 21:43   #322
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

32·241 Posts
Default

For anyone who wants to try... x64 windows binaries compiled with gcc 5.2/clFFT 2.6

Yes.. I know this is 1.01, I'll post 1.02 once I can get it compiled/working with APP SDK 3.0(1.02 supports SDK 3.0, that's the only difference)
Attached Files
File Type: zip clLucas.zip (765.9 KB, 101 views)
kracker is offline   Reply With Quote
Old 2015-10-23, 00:45   #323
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

41718 Posts
Default

Quote:
Originally Posted by kracker View Post
Nice performance improvements with the latest clFFT library... playing around with it now

clFFT 2.6
Code:
Platform :Advanced Micro Devices, Inc.
Device 0 : Tonga

M( 1257787 )C, 0x3f45bf9bea7213ea, n = 65536, clLucas v1.01 err = 0.1094 (0:03 real, 0.3001 ms/iter, ETA 6:12)
M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, clLucas v1.01 err = 0.09375 (0:05 real, 0.5239 ms/iter, ETA 12:03)
M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, clLucas v1.01 err = 0.04883 (0:09 real, 0.8545 ms/iter, ETA 42:09)
M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, clLucas v1.01 err = 0.06641 (0:08 real, 0.8560 ms/iter, ETA 42:56)
M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, clLucas v1.01 err = 0.05139 (0:14 real, 1.4218 ms/iter, ETA 2:44:55)
M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, clLucas v1.01 err = 0.03125 (0:24 real, 2.3861 ms/iter, ETA 8:54:52)
M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, clLucas v1.01 err = 0.09375 (0:36 real, 3.5629 ms/iter, ETA 20:45:50)
M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, clLucas v1.01 err = 0.2031 (0:41 real, 4.1131 ms/iter, ETA 27:26:37)
M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, clLucas v1.01 err = 0.02002 (0:42 real, 4.1595 ms/iter, ETA 29:59:00)
M( 30402457 )C, 0x0b8600ef47e69d27, n = 1638400, clLucas v1.01 err = 0.2881 (0:50 real, 5.0494 ms/iter, ETA 42:37:29)
M( 32582657 )C, 0x02751b7fcec76bb1, n = 1769472, clLucas v1.01 err = 0.3125 (0:49 real, 4.8774 ms/iter, ETA 44:07:37)
M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, clLucas v1.01 err = 0.1074 (0:47 real, 4.7492 ms/iter, ETA 48:59:46)
M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, clLucas v1.01 err = 0.209 (1:10 real, 6.9796 ms/iter, ETA 82:39:00)
M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, clLucas v1.01 err = 0.2656 (1:10 real, 6.9811 ms/iter, ETA 83:34:46)
More improvements with clFFT 2.8 looks like...

Code:
M( 1257787 )C, 0x3f45bf9bea7213ea, n = 65536, clLucas v1.01 err = 0.1094 (0:03 real, 0.2521 ms/iter, ETA 5:27)
M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, clLucas v1.01 err = 0.1016 (0:05 real, 0.4449 ms/iter, ETA 10:40)
M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, clLucas v1.01 err = 0.04883 (0:09 real, 0.8195 ms/iter, ETA 42:09)
M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, clLucas v1.01 err = 0.0625 (0:08 real, 0.8216 ms/iter, ETA 42:26)
M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, clLucas v1.01 err = 0.05078 (0:12 real, 1.2720 ms/iter, ETA 2:28:35)
M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, clLucas v1.01 err = 0.03125 (0:21 real, 2.0840 ms/iter, ETA 7:50:28)
M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, clLucas v1.01 err = 0.1016 (0:32 real, 3.1709 ms/iter, ETA 18:34:46)
M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, clLucas v1.01 err = 0.2031 (0:40 real, 3.9753 ms/iter, ETA 26:41:52)
M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, clLucas v1.01 err = 0.021 (0:39 real, 3.8986 ms/iter, ETA 28:13:15)
M( 30402457 )C, 0x0b8600ef47e69d27, n = 1638400, clLucas v1.01 err = 0.2813 (0:47 real, 4.6911 ms/iter, ETA 39:48:26)
M( 32582657 )C, 0x02751b7fcec76bb1, n = 1769472, clLucas v1.01 err = 0.3281 (0:45 real, 4.5531 ms/iter, ETA 41:21:59)
M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, clLucas v1.01 err = 0.1074 (0:46 real, 4.5498 ms/iter, ETA 47:09:18)
M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, clLucas v1.01 err = 0.2031 (1:04 real, 6.3128 ms/iter, ETA 75:03:13)
kracker is offline   Reply With Quote
Old 2015-10-25, 08:33   #324
Jayder
 
Jayder's Avatar
 
Dec 2012

2·139 Posts
Default

Is clLucas actually usable for testing and submitting results? I will probably be purchasing a video card in the coming months, and the state of clLucas will help me decide between Nvidia or AMD. I want to know if I can run a quick double check if I need to. I have no video card at the moment, so I cannot test it out for myself.

Last fiddled with by Jayder on 2015-10-25 at 08:33
Jayder is offline   Reply With Quote
Old 2015-10-25, 17:42   #325
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

32·241 Posts
Default

Quote:
Originally Posted by Jayder View Post
Is clLucas actually usable for testing and submitting results? I will probably be purchasing a video card in the coming months, and the state of clLucas will help me decide between Nvidia or AMD. I want to know if I can run a quick double check if I need to. I have no video card at the moment, so I cannot test it out for myself.
Yes, it is very usable. However, CUDALucas is more refined, with worktodo support, selftests and the like, and cuFFT has much better performance for non power of two FFT's. But, looking at the benchmarks a AMD Fury X is slightly faster than a GTX 980 Ti with the 2M FFT, (2.2 vs 2.8 ms/iter)

Honestly, I wouldn't buy a GPU just on it's LL capabilities since a modern CPU will generally blow them away in efficiency and often in performance too.. (Unless we're talking about those GTX Titans perhaps.. then I'm not sure)

Last fiddled with by kracker on 2015-10-25 at 17:43
kracker is offline   Reply With Quote
Old 2015-10-25, 20:22   #326
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11·47 Posts
Default

My testing against clFFT 2.8 on a mild OC Fury X has been reliable, but ultimately trial factoring is much much more efficient on the GPU than LL tests.

I can do a DC range check fairly quickly, but I can do the same check on one of my CPU cores in 5-6 days. 6x faster than a CPU core for LL vs. order(s) of magnitude faster for TF.

continuing work from a partial result M41783789 fft length = 2359296 iteration = 12112
Iteration 20000 0xfaeba26dafa9c190, n = 2359296 err = 0.1172 (0:21 real, 2.1078 ms/iter, ETA 24:27:01)
Iteration 30000 0x792c2daddb35206b, n = 2359296 err = 0.1172 (0:27 real, 2.6578 ms/iter, ETA 30:49:21)
Iteration 40000 0x34e1e2ca8738e03e, n = 2359296 err = 0.1172 (0:26 real, 2.6495 ms/iter, ETA 30:43:08)
Iteration 50000 0xf5608f00c78edc10, n = 2359296 err = 0.1172 (0:27 real, 2.6803 ms/iter, ETA 31:04:09)
Iteration 60000 0x4de84f8901847e82, n = 2359296 err = 0.1172 (0:27 real, 2.6523 ms/iter, ETA 30:44:15)
airsquirrels is offline   Reply With Quote
Old 2015-10-26, 03:31   #327
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

33·347 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
I can do a DC range check fairly quickly, but I can do the same check on one of my CPU cores in 5-6 days. 6x faster than a CPU core for LL vs. order(s) of magnitude faster for TF.
The real question you have to ask yourself is if you can do ~75 TFs in the same time you could do a clLucas DC test in the same card. And if you can do that, then you better do TF. Otherwise you are better doing LLDC. Because each LLDC clears one exponent for sure, in the second case, but you may or may not clear an exponent in the first case - you will clear it if you find a factor, and probabilistically you will find a factor every ~75 runs. In fact, for DC range, the number is not 75, but close to 90, due to the P-1 done in the range, and if you count the chance you may find a prime (small, but existent chance), glory, money, etc, then you may raise the number to 100. Looking to GPU72 factor found history, we ARE finding a factor in 100 trials, in average, approximately (look to the last line/totals line, of the second table).

So, you should say "I can run more than ~90 or ~100 DCTF assignments in the same time I can run one DCLL test for the same exponent range, in my (same) card, therefore I do DCTF". Can you?

Otherwise you compare apples with oranges, and you are wasting time, better do DCLL.

To stay in the subject, yes, clLucas is reliable, and if you (general you) chose exponents for which "power of 2" FFT is adequate, then it is the same fast and reliable as cudaLucas. Otherwise is slower, but still the same reliable, and accepted by the server. And if you (general you) answered "no" to the question above, then your cards are better doing DC than doing TF.

Last fiddled with by LaurV on 2015-10-26 at 03:37 Reason: link
LaurV is offline   Reply With Quote
Old 2015-10-26, 11:03   #328
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

23×3×72 Posts
Default

I just looked at James his mfaktc/mfakto (http://www.mersenne.ca/mfaktc.php) chart, but that Fury X is a TF beast!
Does it really do 925 GHzdays per day ??
That would be more than my HD7950 and 280X combined!
VictordeHolland is offline   Reply With Quote
Old 2015-10-26, 11:44   #329
blip
 
blip's Avatar
 
Jan 2014

2×73 Posts
Default

IMHO Fury Nano is even more impressive, considering its low TDP
blip is offline   Reply With Quote
Old 2015-10-26, 23:03   #330
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11·47 Posts
Default

Quote:
Originally Posted by VictordeHolland View Post
I just looked at James his mfaktc/mfakto (http://www.mersenne.ca/mfaktc.php) chart, but that Fury X is a TF beast!
Does it really do 925 GHzdays per day ??
That would be more than my HD7950 and 280X combined!
Yep. I run mine at 1100 Mhz instead of the 1050 stock, which has proven completely stable on my watercooling loop. For DCTF to 72 I'm over 1000 Ghzdays/day. 73/4/5 work slows down a bit but still stays in the 900s.

LaurV's statements are correct, and the Fury X also does a pretty decent job at LL. Interestingly cLucas isn't very effective at utilizing a card to full capacity either. I get better total throughput running two instances on each card, which further improves the numbers for doing LL vs. TF.

Unfortunately GIMPs still awards 900-1000 GhzDays to the card for a day of TF work and only 60-90 or so for LL work.

Looking at my recent results, for DCTF I can do:
90 TF to 72 for 1000 GhzDays, clearing about 1 DC number (~same as LL)

For LLTF:
38 TF to 74 for 960 GhzDays, clearing a number every 2-3 days for both TF and LL)
17 TF to 75 for 900 GhzDays, clearing a number every 5-6 days for TF, every 2-3 for LL)

I can otherwise do 1 DC-LL for 60 GhzDays, or 40% of a first time LL check (85 GhzDays)

Clearly GIMPs penalizes my ranking for doing LL on the GPU. Curiously, it doesn't seem like it is worth it for me do the 74->75 work vs. just doing the LL.
airsquirrels is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1668 2020-12-22 15:38
Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS VictordeHolland Linux 4 2018-04-11 13:44
OpenCL accellerated lattice siever pstach Factoring 1 2014-05-23 01:03
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
AMD's Graphics Core Next- a reason to accelerate towards OpenCL? Belteshazzar GPU Computing 19 2012-03-07 18:58

All times are UTC. The time now is 04:35.

Thu Apr 15 04:35:00 UTC 2021 up 6 days, 23:15, 0 users, load averages: 2.00, 2.06, 1.88

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.