![]() |
1 Attachment(s)
Hi,
New Version. Add Six Step FFT option. 1.33 [code] Iteration 10000 M( 57899201 )C, 0xa2ac01bbc76d92ee, n = 3072K, clLucas v1.033 err = 0.2656 (0:37 real, 3.7065 ms/iter, ETA 59:35:32) Iteration 10000 M( 60065651 )C, 0xd3a6310e95df143a, n = 3200K, clLucas v1.033 err = 0.2500 (0:44 real, 4.4200 ms/iter, ETA 73:43:41) Iteration 10000 M( 62005457 )C, 0xf8905d0797009a71, n = 3360K, clLucas v1.033 err = 0.1484 (0:55 real, 5.4793 ms/iter, ETA 94:21:04) Iteration 10000 M( 64011923 )C, 0xcf51ac344357ee8f, n = 3456K, clLucas v1.033 err = 0.1875 (0:43 real, 4.2137 ms/iter, ETA 74:54:36) Iteration 10000 M( 66011723 )C, 0xcfbaa213623e0b3f, n = 3584K, clLucas v1.033 err = 0.1719 (0:44 real, 4.4032 ms/iter, ETA 80:43:32) Iteration 10000 M( 68044853 )C, 0xe962de18ac070a2e, n = 3840K, clLucas v1.033 err = 0.0625 (1:24 real, 8.3692 ms/iter, ETA 158:09:13) Iteration 10000 M( 72084449 )C, 0xc096a69999b97369, n = 4000K, clLucas v1.033 err = 0.0938 (0:51 real, 5.0920 ms/iter, ETA 101:56:19) Iteration 10000 M( 73004357 )C, 0x78ffc8be20de4eca, n = 4000K, clLucas v1.033 err = 0.1250 (0:51 real, 5.1066 ms/iter, ETA 103:32:09) Iteration 10000 M( 76037387 )C, 0x5ba34801f6eb0e6b, n = 4096K, clLucas v1.033 err = 0.2031 (0:43 real, 4.3848 ms/iter, ETA 92:35:30) Iteration 10000 M( 80010323 )C, 0x3ac3a967e67702e8, n = 4480K, clLucas v1.033 err = 0.0840 (3:34 real, 21.4021 ms/iter, ETA 475:36:04) Iteration 10000 M( 85016489 )C, 0x7dd86242f419ef44, n = 4608K, clLucas v1.033 err = 0.1875 (0:52 real, 5.2564 ms/iter, ETA 124:06:29) Iteration 10000 M( 89076083 )C, 0xbc86116194e5cf87, n = 4800K, clLucas v1.033 err = 0.2266 (1:11 real, 7.0577 ms/iter, ETA 174:36:01) Iteration 10000 M( 90035987 )C, 0x31a63dfa13005e2c, n = 5120K, clLucas v1.033 err = 0.0586 (1:00 real, 6.0312 ms/iter, ETA 150:48:52) Iteration 10000 M( 96242033 )C, 0xc3485297aeff33aa, n = 5376K, clLucas v1.033 err = 0.0938 (2:02 real, 12.2531 ms/iter, ETA 327:31:51) Iteration 10000 M( 101514839 )C, 0x5b8b157c2a5f8169, n = 5600K, clLucas v1.033 err = 0.1484 (3:47 real, 22.6871 ms/iter, ETA 639:38:57) [/code] 1.4 [code] Iteration 10000 M( 57899201 )C, 0xa2ac01bbc76d92ee, n = 3072K, clLucas v1.04 err = 0.2656 (0:38 real, 3.7605 ms/iter, ETA 60:27:38) Iteration 10000 M( 60065651 )C, 0xd3a6310e95df143a, n = 3200K, clLucas v1.04 err = 0.2500 (0:44 real, 4.4174 ms/iter, ETA 73:41:07) Iteration 10000 M( 62005457 )C, 0xf8905d0797009a71, n = 3360K, clLucas v1.04 err = 0.1484 (0:53 real, 5.2781 ms/iter, ETA 90:53:12) Iteration 10000 M( 64011923 )C, 0xcf51ac344357ee8f, n = 3456K, clLucas v1.04 err = 0.1875 (0:42 real, 4.1872 ms/iter, ETA 74:26:20) Iteration 10000 M( 66011723 )C, 0xcfbaa213623e0b3f, n = 3584K, clLucas v1.04 err = 0.1719 (0:44 real, 4.4040 ms/iter, ETA 80:44:22) Iteration 10000 M( 68044853 )C, 0xe962de18ac070a2e, n = 3840K, clLucas v1.04 err = 0.0625 (1:23 real, 8.3278 ms/iter, ETA 157:22:18) Iteration 10000 M( 72084449 )C, 0xc096a69999b97369, n = 4000K, clLucas v1.04 err = 0.0938 (0:50 real, 5.0914 ms/iter, ETA 101:55:35) Iteration 10000 M( 73004357 )C, 0x78ffc8be20de4eca, n = 4000K, clLucas v1.04 err = 0.1250 (0:51 real, 5.0961 ms/iter, ETA 103:19:20) Iteration 10000 M( 76037387 )C, 0x5ba34801f6eb0e6b, n = 4096K, clLucas v1.04 err = 0.2031 (0:44 real, 4.3841 ms/iter, ETA 92:34:38) Iteration 10000 M( 80010323 )C, 0x3ac3a967e67702e8, n = 4480K, clLucas v1.04 err = 0.0840 (3:34 real, 21.3886 ms/iter, ETA 475:18:11) Iteration 10000 M( 85016489 )C, 0x7dd86242f419ef44, n = 4608K, clLucas v1.04 err = 0.1875 (0:52 real, 5.2455 ms/iter, ETA 123:51:07) Iteration 10000 M( 89076083 )C, 0xbc86116194e5cf87, n = 4800K, clLucas v1.04 err = 0.2266 (1:11 real, 7.1027 ms/iter, ETA 175:42:45) Iteration 10000 M( 90035987 )C, 0x31a63dfa13005e2c, n = 5120K, clLucas v1.04 err = 0.0586 (1:01 real, 6.0633 ms/iter, ETA 151:37:01) Iteration 10000 M( 96242033 )C, 0xc3485297aeff33aa, n = 5376K, clLucas v1.04 err = 0.0938 (2:02 real, 12.2566 ms/iter, ETA 327:37:32) Iteration 10000 M( 101514839 )C, 0x5b8b157c2a5f8169, n = 5600K, clLucas v1.04 err = 0.1484 (3:47 real, 22.6862 ms/iter, ETA 639:37:31) [/code] 1.4 -sixstepfft [code] Iteration 10000 M( 57899201 )C, 0xa2ac01bbc76d92ee, n = 3072K, clLucas v1.04 err = 0.2500 (0:37 real, 3.6599 ms/iter, ETA 58:50:32) Iteration 10000 M( 60065651 )C, 0xd3a6310e95df143a, n = 3200K, clLucas v1.04 err = 0.2188 (0:45 real, 4.5199 ms/iter, ETA 75:23:41) Iteration 10000 M( 62005457 )C, 0xf8905d0797009a71, n = 3360K, clLucas v1.04 err = 0.1562 (1:10 real, 7.0160 ms/iter, ETA 120:48:38) Iteration 10000 M( 64011923 )C, 0xcf51ac344357ee8f, n = 3456K, clLucas v1.04 err = 0.1719 (0:44 real, 4.3818 ms/iter, ETA 77:53:52) Iteration 10000 M( 66011723 )C, 0xcfbaa213623e0b3f, n = 3584K, clLucas v1.04 err = 0.1562 (0:46 real, 4.5533 ms/iter, ETA 83:28:38) Iteration 10000 M( 68044853 )C, 0xe962de18ac070a2e, n = 3840K, clLucas v1.04 err = 0.0586 (0:56 real, 5.6326 ms/iter, ETA 106:26:27) Iteration 10000 M( 72084449 )C, 0xc096a69999b97369, n = 3840K, clLucas v1.04 err = 0.2500 (0:56 real, 5.6124 ms/iter, ETA 112:21:27) Iteration 10000 M( 73004357 )C, 0x78ffc8be20de4eca, n = 4000K, clLucas v1.04 err = 0.1172 (0:54 real, 5.3825 ms/iter, ETA 109:07:51) Iteration 10000 M( 76037387 )C, 0x5ba34801f6eb0e6b, n = 4096K, clLucas v1.04 err = 0.2188 (0:45 real, 4.4885 ms/iter, ETA 94:46:55) Iteration 10000 M( 80010323 )C, 0x3ac3a967e67702e8, n = 4480K, clLucas v1.04 err = 0.0781 (1:35 real, 9.4992 ms/iter, ETA 211:05:39) Iteration 10000 M( 85016489 )C, 0x7dd86242f419ef44, n = 4608K, clLucas v1.04 err = 0.1875 (1:20 real, 8.0097 ms/iter, ETA 189:07:03) Iteration 10000 M( 89076083 )C, 0xbc86116194e5cf87, n = 4800K, clLucas v1.04 err = 0.2344 (1:37 real, 9.6424 ms/iter, ETA 238:32:34) Iteration 10000 M( 90035987 )C, 0x31a63dfa13005e2c, n = 5120K, clLucas v1.04 err = 0.0586 (1:33 real, 9.2624 ms/iter, ETA 231:36:41) Iteration 10000 M( 96242033 )C, 0xc3485297aeff33aa, n = 5376K, clLucas v1.04 err = 0.0938 (1:42 real, 10.1184 ms/iter, ETA 270:28:10) Iteration 10000 M( 101514839 )C, 0x5b8b157c2a5f8169, n = 5600K, clLucas v1.04 err = 0.1406 (1:41 real, 10.0415 ms/iter, ETA 283:06:47) [/code] |
Intresting.. for my card I'm getting a decent speed improvement.
[code] C:\clLucas>out 36976267 -f 2048K ... Starting M36976267 fft length = 2048K Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length. Iteration 100, average error = 0.04904, max error = 0.07031 Iteration 200, average error = 0.05967, max error = 0.07031 Iteration 300, average error = 0.06322, max error = 0.07031 Iteration 400, average error = 0.06499, max error = 0.07031 Iteration 500, average error = 0.06606, max error = 0.07031 Iteration 600, average error = 0.06677, max error = 0.07031 Iteration 700, average error = 0.06727, max error = 0.07031 Iteration 800, average error = 0.06765, max error = 0.07031 Iteration 900, average error = 0.06795, max error = 0.07031 Iteration 1000, average error = 0.06818 < 0.25 (max error = 0.07031), continuing test. Iteration 10000 M( 36976267 )C, 0x0df928a73e95a858, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 4.5231 ms/iter, ETA 46:26:14) Iteration 20000 M( 36976267 )C, 0x8c1da923bc3ab356, n = 2048K, clLucas v1.04 err = 0.0703 (0:46 real, 4.5281 ms/iter, ETA 46:28:33) Iteration 30000 M( 36976267 )C, 0xb5660bf8d8b9b730, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 4.5219 ms/iter, ETA 46:24:00) Iteration 40000 M( 36976267 )C, 0x91133c8d2a727523, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 4.5262 ms/iter, ETA 46:25:51) Iteration 50000 M( 36976267 )C, 0x664aa945ccf6fcdd, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 4.5176 ms/iter, ETA 46:19:51) Iteration 60000 M( 36976267 )C, 0x5a30204bb64469fa, n = 2048K, clLucas v1.04 err = 0.0703 (0:46 real, 4.5265 ms/iter, ETA 46:24:32) C:\clLucas>out 36976267 -f 2048K -sixstepfft ... Starting M36976267 fft length = 2048K Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length. Iteration 100, average error = 0.04365, max error = 0.05859 Iteration 200, average error = 0.05251, max error = 0.06250 Iteration 300, average error = 0.05584, max error = 0.06250 Iteration 400, average error = 0.05750, max error = 0.06250 Iteration 500, average error = 0.06000, max error = 0.07031 Iteration 600, average error = 0.06172, max error = 0.07031 Iteration 700, average error = 0.06295, max error = 0.07031 Iteration 800, average error = 0.06387, max error = 0.07031 Iteration 900, average error = 0.06458, max error = 0.07031 Iteration 1000, average error = 0.06515 < 0.25 (max error = 0.07031), continuing test. Iteration 10000 M( 36976267 )C, 0x0df928a73e95a858, n = 2048K, clLucas v1.04 err = 0.0703 (0:40 real, 3.9784 ms/iter, ETA 40:50:41) Iteration 20000 M( 36976267 )C, 0x8c1da923bc3ab356, n = 2048K, clLucas v1.04 err = 0.0703 (0:40 real, 3.9604 ms/iter, ETA 40:38:57) Iteration 30000 M( 36976267 )C, 0xb5660bf8d8b9b730, n = 2048K, clLucas v1.04 err = 0.0703 (0:39 real, 3.9589 ms/iter, ETA 40:37:20) Iteration 40000 M( 36976267 )C, 0x91133c8d2a727523, n = 2048K, clLucas v1.04 err = 0.0703 (0:40 real, 3.9607 ms/iter, ETA 40:37:50) [/code]Also... automatic FFT selection isn't quite working at the moment..(-f works of course) :smile: [code] Platform 0 : Advanced Micro Devices, Inc. Platform 1 : Intel(R) Corporation Platform :Advanced Micro Devices, Inc. Device 0 : Tonga Build Options are : -D KHR_DP_EXTENSION CL_DEVICE_NAME Tonga CL_DEVICE_VENDOR Advanced Micro Devices, Inc. CL_DEVICE_VERSION OpenCL 2.0 AMD-APP (1912.5) CL_DRIVER_VERSION 1912.5 (VM) CL_DEVICE_MAX_COMPUTE_UNITS 28 CL_DEVICE_MAX_CLOCK_FREQUENCY 954 CL_DEVICE_GLOBAL_MEM_SIZE 2147483648 CL_DEVICE_MAX_WORK_GROUP_SIZE 256 CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE 1 Starting M36976267 fft length = 4096K Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length. Iteration = 32 < 1000 && err = 32.00000 >= 0.35, increasing n from 4096K Starting M36976267 fft length = 4480K Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length. Iteration = 32 < 1000 && err = 256.00000 >= 0.35, increasing n from 4480K [/code] |
Hi,
[QUOTE=kracker;421594]Also... automatic FFT selection isn't quite working at the moment..(-f works of course) :smile: [/QUOTE] On my machine. [code] msft@fujic:/tmp/clLucas.1.04$ ./clLucas 36976267 Platform 0 : Advanced Micro Devices, Inc. Platform :Advanced Micro Devices, Inc. Device 0 : Fiji Build Options are : -D KHR_DP_EXTENSION Starting M36976267 fft length = 1920K Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length. Iteration = 80 < 1000 && err = 0.37500 >= 0.35, increasing n from 1920K Starting M36976267 fft length = 2048K Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length. Iteration 100, average error = 0.04690, max error = 0.06250 Iteration 200, average error = 0.05470, max error = 0.06250 [/code] |
:gah: Sorry... just noticed that I had the 4096K FFT specified in the ini file.
|
Is this developed enough that benchmarks would be able to be added to mersenne.ca?
|
[QUOTE=Jayder;421726]Is this developed enough that benchmarks would be able to be added to mersenne.ca?[/QUOTE]
Yes. |
Does it have support for worktodo.txt and/or output results to results.txt?
|
[QUOTE=Jayder;421726]Is this developed enough that benchmarks would be able to be added to mersenne.ca?[/QUOTE]
This would be a great idea! It would then allow James (if he has time) to do an analysis as to the "optimal cross over points" for TF'ing vs. LL'ing for AMD GPGPUs like he's done for NVidia's offerings. |
[QUOTE=UBR47K;421795]Does it have support for worktodo.txt and/or output results to results.txt?[/QUOTE]
both support. |
1 Attachment(s)
1.04 binaries for windows.. be sure to try -sixstepfft :smile:
|
Well well... never thought I'd see this day.
[code] 1024K 2.2 ms/iter 1536K 3.6 ms/iter 2048K 4.0 ms/iter 2304K 5.5 ms/iter 2560K 6.2 ms/iter 3072K 6.8 ms/iter 4096K 8.3 ms/iter [/code]Decided to compare it with 1.01, the "first" clLucas version.. [code] 1024K 2.8 ms/iter 1536K 8.2 ms/iter 2048K 5.4 ms/iter 2304K 14.9 ms/iter 2560K 18.9 ms/iter 3072K 25.8 ms/iter 4096K 10.8 ms/iter [/code] EDIT: I wonder what kind of performance you can get out of something like a FirePro W8100.. it's basically a R9 290 with 1/2 DP instead of 1/8 for the R9 290 |
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.