mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   LL with OpenCL (https://www.mersenneforum.org/showthread.php?t=18297)

msft 2016-01-08 08:20

1 Attachment(s)
Hi,
New Version.
Add Six Step FFT option.
1.33
[code]
Iteration 10000 M( 57899201 )C, 0xa2ac01bbc76d92ee, n = 3072K, clLucas v1.033 err = 0.2656 (0:37 real, 3.7065 ms/iter, ETA 59:35:32)
Iteration 10000 M( 60065651 )C, 0xd3a6310e95df143a, n = 3200K, clLucas v1.033 err = 0.2500 (0:44 real, 4.4200 ms/iter, ETA 73:43:41)
Iteration 10000 M( 62005457 )C, 0xf8905d0797009a71, n = 3360K, clLucas v1.033 err = 0.1484 (0:55 real, 5.4793 ms/iter, ETA 94:21:04)
Iteration 10000 M( 64011923 )C, 0xcf51ac344357ee8f, n = 3456K, clLucas v1.033 err = 0.1875 (0:43 real, 4.2137 ms/iter, ETA 74:54:36)
Iteration 10000 M( 66011723 )C, 0xcfbaa213623e0b3f, n = 3584K, clLucas v1.033 err = 0.1719 (0:44 real, 4.4032 ms/iter, ETA 80:43:32)
Iteration 10000 M( 68044853 )C, 0xe962de18ac070a2e, n = 3840K, clLucas v1.033 err = 0.0625 (1:24 real, 8.3692 ms/iter, ETA 158:09:13)
Iteration 10000 M( 72084449 )C, 0xc096a69999b97369, n = 4000K, clLucas v1.033 err = 0.0938 (0:51 real, 5.0920 ms/iter, ETA 101:56:19)
Iteration 10000 M( 73004357 )C, 0x78ffc8be20de4eca, n = 4000K, clLucas v1.033 err = 0.1250 (0:51 real, 5.1066 ms/iter, ETA 103:32:09)
Iteration 10000 M( 76037387 )C, 0x5ba34801f6eb0e6b, n = 4096K, clLucas v1.033 err = 0.2031 (0:43 real, 4.3848 ms/iter, ETA 92:35:30)
Iteration 10000 M( 80010323 )C, 0x3ac3a967e67702e8, n = 4480K, clLucas v1.033 err = 0.0840 (3:34 real, 21.4021 ms/iter, ETA 475:36:04)
Iteration 10000 M( 85016489 )C, 0x7dd86242f419ef44, n = 4608K, clLucas v1.033 err = 0.1875 (0:52 real, 5.2564 ms/iter, ETA 124:06:29)
Iteration 10000 M( 89076083 )C, 0xbc86116194e5cf87, n = 4800K, clLucas v1.033 err = 0.2266 (1:11 real, 7.0577 ms/iter, ETA 174:36:01)
Iteration 10000 M( 90035987 )C, 0x31a63dfa13005e2c, n = 5120K, clLucas v1.033 err = 0.0586 (1:00 real, 6.0312 ms/iter, ETA 150:48:52)
Iteration 10000 M( 96242033 )C, 0xc3485297aeff33aa, n = 5376K, clLucas v1.033 err = 0.0938 (2:02 real, 12.2531 ms/iter, ETA 327:31:51)
Iteration 10000 M( 101514839 )C, 0x5b8b157c2a5f8169, n = 5600K, clLucas v1.033 err = 0.1484 (3:47 real, 22.6871 ms/iter, ETA 639:38:57)
[/code]
1.4
[code]
Iteration 10000 M( 57899201 )C, 0xa2ac01bbc76d92ee, n = 3072K, clLucas v1.04 err = 0.2656 (0:38 real, 3.7605 ms/iter, ETA 60:27:38)
Iteration 10000 M( 60065651 )C, 0xd3a6310e95df143a, n = 3200K, clLucas v1.04 err = 0.2500 (0:44 real, 4.4174 ms/iter, ETA 73:41:07)
Iteration 10000 M( 62005457 )C, 0xf8905d0797009a71, n = 3360K, clLucas v1.04 err = 0.1484 (0:53 real, 5.2781 ms/iter, ETA 90:53:12)
Iteration 10000 M( 64011923 )C, 0xcf51ac344357ee8f, n = 3456K, clLucas v1.04 err = 0.1875 (0:42 real, 4.1872 ms/iter, ETA 74:26:20)
Iteration 10000 M( 66011723 )C, 0xcfbaa213623e0b3f, n = 3584K, clLucas v1.04 err = 0.1719 (0:44 real, 4.4040 ms/iter, ETA 80:44:22)
Iteration 10000 M( 68044853 )C, 0xe962de18ac070a2e, n = 3840K, clLucas v1.04 err = 0.0625 (1:23 real, 8.3278 ms/iter, ETA 157:22:18)
Iteration 10000 M( 72084449 )C, 0xc096a69999b97369, n = 4000K, clLucas v1.04 err = 0.0938 (0:50 real, 5.0914 ms/iter, ETA 101:55:35)
Iteration 10000 M( 73004357 )C, 0x78ffc8be20de4eca, n = 4000K, clLucas v1.04 err = 0.1250 (0:51 real, 5.0961 ms/iter, ETA 103:19:20)
Iteration 10000 M( 76037387 )C, 0x5ba34801f6eb0e6b, n = 4096K, clLucas v1.04 err = 0.2031 (0:44 real, 4.3841 ms/iter, ETA 92:34:38)
Iteration 10000 M( 80010323 )C, 0x3ac3a967e67702e8, n = 4480K, clLucas v1.04 err = 0.0840 (3:34 real, 21.3886 ms/iter, ETA 475:18:11)
Iteration 10000 M( 85016489 )C, 0x7dd86242f419ef44, n = 4608K, clLucas v1.04 err = 0.1875 (0:52 real, 5.2455 ms/iter, ETA 123:51:07)
Iteration 10000 M( 89076083 )C, 0xbc86116194e5cf87, n = 4800K, clLucas v1.04 err = 0.2266 (1:11 real, 7.1027 ms/iter, ETA 175:42:45)
Iteration 10000 M( 90035987 )C, 0x31a63dfa13005e2c, n = 5120K, clLucas v1.04 err = 0.0586 (1:01 real, 6.0633 ms/iter, ETA 151:37:01)
Iteration 10000 M( 96242033 )C, 0xc3485297aeff33aa, n = 5376K, clLucas v1.04 err = 0.0938 (2:02 real, 12.2566 ms/iter, ETA 327:37:32)
Iteration 10000 M( 101514839 )C, 0x5b8b157c2a5f8169, n = 5600K, clLucas v1.04 err = 0.1484 (3:47 real, 22.6862 ms/iter, ETA 639:37:31)
[/code]
1.4 -sixstepfft
[code]
Iteration 10000 M( 57899201 )C, 0xa2ac01bbc76d92ee, n = 3072K, clLucas v1.04 err = 0.2500 (0:37 real, 3.6599 ms/iter, ETA 58:50:32)
Iteration 10000 M( 60065651 )C, 0xd3a6310e95df143a, n = 3200K, clLucas v1.04 err = 0.2188 (0:45 real, 4.5199 ms/iter, ETA 75:23:41)
Iteration 10000 M( 62005457 )C, 0xf8905d0797009a71, n = 3360K, clLucas v1.04 err = 0.1562 (1:10 real, 7.0160 ms/iter, ETA 120:48:38)
Iteration 10000 M( 64011923 )C, 0xcf51ac344357ee8f, n = 3456K, clLucas v1.04 err = 0.1719 (0:44 real, 4.3818 ms/iter, ETA 77:53:52)
Iteration 10000 M( 66011723 )C, 0xcfbaa213623e0b3f, n = 3584K, clLucas v1.04 err = 0.1562 (0:46 real, 4.5533 ms/iter, ETA 83:28:38)
Iteration 10000 M( 68044853 )C, 0xe962de18ac070a2e, n = 3840K, clLucas v1.04 err = 0.0586 (0:56 real, 5.6326 ms/iter, ETA 106:26:27)
Iteration 10000 M( 72084449 )C, 0xc096a69999b97369, n = 3840K, clLucas v1.04 err = 0.2500 (0:56 real, 5.6124 ms/iter, ETA 112:21:27)
Iteration 10000 M( 73004357 )C, 0x78ffc8be20de4eca, n = 4000K, clLucas v1.04 err = 0.1172 (0:54 real, 5.3825 ms/iter, ETA 109:07:51)
Iteration 10000 M( 76037387 )C, 0x5ba34801f6eb0e6b, n = 4096K, clLucas v1.04 err = 0.2188 (0:45 real, 4.4885 ms/iter, ETA 94:46:55)
Iteration 10000 M( 80010323 )C, 0x3ac3a967e67702e8, n = 4480K, clLucas v1.04 err = 0.0781 (1:35 real, 9.4992 ms/iter, ETA 211:05:39)
Iteration 10000 M( 85016489 )C, 0x7dd86242f419ef44, n = 4608K, clLucas v1.04 err = 0.1875 (1:20 real, 8.0097 ms/iter, ETA 189:07:03)
Iteration 10000 M( 89076083 )C, 0xbc86116194e5cf87, n = 4800K, clLucas v1.04 err = 0.2344 (1:37 real, 9.6424 ms/iter, ETA 238:32:34)
Iteration 10000 M( 90035987 )C, 0x31a63dfa13005e2c, n = 5120K, clLucas v1.04 err = 0.0586 (1:33 real, 9.2624 ms/iter, ETA 231:36:41)
Iteration 10000 M( 96242033 )C, 0xc3485297aeff33aa, n = 5376K, clLucas v1.04 err = 0.0938 (1:42 real, 10.1184 ms/iter, ETA 270:28:10)
Iteration 10000 M( 101514839 )C, 0x5b8b157c2a5f8169, n = 5600K, clLucas v1.04 err = 0.1406 (1:41 real, 10.0415 ms/iter, ETA 283:06:47)
[/code]

kracker 2016-01-08 23:40

Intresting.. for my card I'm getting a decent speed improvement.

[code]
C:\clLucas>out 36976267 -f 2048K

...

Starting M36976267 fft length = 2048K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.04904, max error = 0.07031
Iteration 200, average error = 0.05967, max error = 0.07031
Iteration 300, average error = 0.06322, max error = 0.07031
Iteration 400, average error = 0.06499, max error = 0.07031
Iteration 500, average error = 0.06606, max error = 0.07031
Iteration 600, average error = 0.06677, max error = 0.07031
Iteration 700, average error = 0.06727, max error = 0.07031
Iteration 800, average error = 0.06765, max error = 0.07031
Iteration 900, average error = 0.06795, max error = 0.07031
Iteration 1000, average error = 0.06818 < 0.25 (max error = 0.07031), continuing test.
Iteration 10000 M( 36976267 )C, 0x0df928a73e95a858, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 4.5231 ms/iter, ETA 46:26:14)
Iteration 20000 M( 36976267 )C, 0x8c1da923bc3ab356, n = 2048K, clLucas v1.04 err = 0.0703 (0:46 real, 4.5281 ms/iter, ETA 46:28:33)
Iteration 30000 M( 36976267 )C, 0xb5660bf8d8b9b730, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 4.5219 ms/iter, ETA 46:24:00)
Iteration 40000 M( 36976267 )C, 0x91133c8d2a727523, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 4.5262 ms/iter, ETA 46:25:51)
Iteration 50000 M( 36976267 )C, 0x664aa945ccf6fcdd, n = 2048K, clLucas v1.04 err = 0.0703 (0:45 real, 4.5176 ms/iter, ETA 46:19:51)
Iteration 60000 M( 36976267 )C, 0x5a30204bb64469fa, n = 2048K, clLucas v1.04 err = 0.0703 (0:46 real, 4.5265 ms/iter, ETA 46:24:32)

C:\clLucas>out 36976267 -f 2048K -sixstepfft

...

Starting M36976267 fft length = 2048K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.04365, max error = 0.05859
Iteration 200, average error = 0.05251, max error = 0.06250
Iteration 300, average error = 0.05584, max error = 0.06250
Iteration 400, average error = 0.05750, max error = 0.06250
Iteration 500, average error = 0.06000, max error = 0.07031
Iteration 600, average error = 0.06172, max error = 0.07031
Iteration 700, average error = 0.06295, max error = 0.07031
Iteration 800, average error = 0.06387, max error = 0.07031
Iteration 900, average error = 0.06458, max error = 0.07031
Iteration 1000, average error = 0.06515 < 0.25 (max error = 0.07031), continuing test.
Iteration 10000 M( 36976267 )C, 0x0df928a73e95a858, n = 2048K, clLucas v1.04 err = 0.0703 (0:40 real, 3.9784 ms/iter, ETA 40:50:41)
Iteration 20000 M( 36976267 )C, 0x8c1da923bc3ab356, n = 2048K, clLucas v1.04 err = 0.0703 (0:40 real, 3.9604 ms/iter, ETA 40:38:57)
Iteration 30000 M( 36976267 )C, 0xb5660bf8d8b9b730, n = 2048K, clLucas v1.04 err = 0.0703 (0:39 real, 3.9589 ms/iter, ETA 40:37:20)
Iteration 40000 M( 36976267 )C, 0x91133c8d2a727523, n = 2048K, clLucas v1.04 err = 0.0703 (0:40 real, 3.9607 ms/iter, ETA 40:37:50)
[/code]Also... automatic FFT selection isn't quite working at the moment..(-f works of course) :smile:

[code]

Platform 0 : Advanced Micro Devices, Inc.
Platform 1 : Intel(R) Corporation
Platform :Advanced Micro Devices, Inc.
Device 0 : Tonga

Build Options are : -D KHR_DP_EXTENSION

CL_DEVICE_NAME Tonga
CL_DEVICE_VENDOR Advanced Micro Devices, Inc.
CL_DEVICE_VERSION OpenCL 2.0 AMD-APP (1912.5)
CL_DRIVER_VERSION 1912.5 (VM)
CL_DEVICE_MAX_COMPUTE_UNITS 28
CL_DEVICE_MAX_CLOCK_FREQUENCY 954
CL_DEVICE_GLOBAL_MEM_SIZE 2147483648
CL_DEVICE_MAX_WORK_GROUP_SIZE 256
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE 1

Starting M36976267 fft length = 4096K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 32.00000 >= 0.35, increasing n from 4096K
Starting M36976267 fft length = 4480K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 32 < 1000 && err = 256.00000 >= 0.35, increasing n from 4480K
[/code]

msft 2016-01-09 12:12

Hi,
[QUOTE=kracker;421594]Also... automatic FFT selection isn't quite working at the moment..(-f works of course) :smile:
[/QUOTE]
On my machine.
[code]
msft@fujic:/tmp/clLucas.1.04$ ./clLucas 36976267

Platform 0 : Advanced Micro Devices, Inc.
Platform :Advanced Micro Devices, Inc.
Device 0 : Fiji

Build Options are : -D KHR_DP_EXTENSION
Starting M36976267 fft length = 1920K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration = 80 < 1000 && err = 0.37500 >= 0.35, increasing n from 1920K
Starting M36976267 fft length = 2048K
Running careful round off test for 1000 iterations. If average error >= 0.25, the test will restart with a larger FFT length.
Iteration 100, average error = 0.04690, max error = 0.06250
Iteration 200, average error = 0.05470, max error = 0.06250
[/code]

kracker 2016-01-10 00:06

:gah: Sorry... just noticed that I had the 4096K FFT specified in the ini file.

Jayder 2016-01-10 00:35

Is this developed enough that benchmarks would be able to be added to mersenne.ca?

msft 2016-01-10 13:44

[QUOTE=Jayder;421726]Is this developed enough that benchmarks would be able to be added to mersenne.ca?[/QUOTE]
Yes.

UBR47K 2016-01-10 14:36

Does it have support for worktodo.txt and/or output results to results.txt?

chalsall 2016-01-10 17:53

[QUOTE=Jayder;421726]Is this developed enough that benchmarks would be able to be added to mersenne.ca?[/QUOTE]

This would be a great idea! It would then allow James (if he has time) to do an analysis as to the "optimal cross over points" for TF'ing vs. LL'ing for AMD GPGPUs like he's done for NVidia's offerings.

msft 2016-01-11 09:42

[QUOTE=UBR47K;421795]Does it have support for worktodo.txt and/or output results to results.txt?[/QUOTE]
both support.

kracker 2016-01-11 15:09

1 Attachment(s)
1.04 binaries for windows.. be sure to try -sixstepfft :smile:

kracker 2016-01-11 21:16

Well well... never thought I'd see this day.

[code]
1024K 2.2 ms/iter
1536K 3.6 ms/iter
2048K 4.0 ms/iter
2304K 5.5 ms/iter
2560K 6.2 ms/iter
3072K 6.8 ms/iter
4096K 8.3 ms/iter
[/code]Decided to compare it with 1.01, the "first" clLucas version..
[code]
1024K 2.8 ms/iter
1536K 8.2 ms/iter
2048K 5.4 ms/iter
2304K 14.9 ms/iter
2560K 18.9 ms/iter
3072K 25.8 ms/iter
4096K 10.8 ms/iter
[/code]

EDIT: I wonder what kind of performance you can get out of something like a FirePro W8100.. it's basically a R9 290 with 1/2 DP instead of 1/8 for the R9 290


All times are UTC. The time now is 13:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.