![]() |
Is there any kind of benchmark (or reference in the code) that gives a list of all suggested FFT sizes for a given exponent size? A quick glance at the source didn't jump out at me where the lookup table was for what FFT size CUDALucas will start with by default. How can I find this (ideally for all possible exponents that it can handle)?
|
1 Attachment(s)
Attached is my cufftbench for a GTX460. I flagged with a "Y" the FFT sizes that make sense.
|
[QUOTE=James Heinrich;294529]Is there any kind of benchmark (or reference in the code) that gives a list of all suggested FFT sizes for a given exponent size? A quick glance at the source didn't jump out at me where the lookup table was for what FFT size CUDALucas will start with by default. How can I find this (ideally for all possible exponents that it can handle)?[/QUOTE]
As far as I know, right now just pick a range around the exponent and run the cufft test to choose the best FFT for your card/system/exponent. I have not yet tested different thread sizes and the corresponding cufft test for different FFTs. When I get some time, I'll see what i can do. Look back though the thread for more FFT discussions. |
[QUOTE=Prime95;294530]Attached is my cufftbench for a GTX460. I flagged with a "Y" the FFT sizes that make sense.[/QUOTE]
I've had the same question as James about Prime95. Where in the P95 source is the table of possible FFT lengths? 15-20 minutes digging around didn't turn up much. |
[QUOTE=Dubslow;294533]I've had the same question as James about Prime95. Where in the P95 source is the table of possible FFT lengths? 15-20 minutes digging around didn't turn up much.[/QUOTE]
We're getting a little off topic. P95 uses a table in mult.asm. The xjmptable is for SSE2, the yjmptable is for AVX. The table includes FFT size, maximum Mersenne exponent, estimated timing, mem used, which CPU architectures should use it, and some other stuff. |
[QUOTE=Prime95;294530]Attached is my cufftbench for a GTX460. I flagged with a "Y" the FFT sizes that make sense.[/QUOTE]
Found a problem. [CODE]CUFFT_Z2Z size= 1146880 time= 1.417851 msec Y CUFFT_Z2Z size= 1179648 time= 1.390691 msec [/CODE] |
Looking at the multipliers, there are definite patterns. The multipliers 1,3,5,7,9,21,27,45,49 and 81 are always selected as preferred.
Except for one instance of 21 vs 45. 1376256 is slower than 1474560. It might be worth re-benchmarking the following four to see if the results are consistent. [CODE]CUFFT_Z2Z size= 1376256 time= 1.818975 msec (21) CUFFT_Z2Z size= 1474560 time= 1.809079 msec Y (45) CUFFT_Z2Z size= 2752512 time= 3.812189 msec Y (21) CUFFT_Z2Z size= 2949120 time= 3.853927 msec Y (45) [/CODE] |
[QUOTE=Prime95;294530]Attached is my cufftbench for a GTX460. I flagged with a "Y" the FFT sizes that make sense.[/QUOTE]
Rehashed to show a bit more about the FFT size, as well as axn's correction. Edit: Whoops, cross post. @axn: msft said CuLu can use any multiple of 32K, that's why I did as such. Edit2: Redone to show more lengths that are "reasonable", but not best. Those are marked with M. [code]CUFFT_Z2Z size= 1048576 = 1024K = 32*32K time= 1.130540 msec Y CUFFT_Z2Z size= 1146880 = 1120K = 35*32K time= 1.417851 msec M CUFFT_Z2Z size= 1179648 = 1152K = 36*32K time= 1.390691 msec Y CUFFT_Z2Z size= 1310720 = 1280K = 40*32K time= 1.533345 msec Y CUFFT_Z2Z size= 1376256 = 1344K = 42*32K time= 1.818975 msec M CUFFT_Z2Z size= 1474560 = 1440K = 45*32K time= 1.809079 msec Y CUFFT_Z2Z size= 1572864 = 1536K = 48*32K time= 1.937807 msec Y CUFFT_Z2Z size= 1605632 = 1568K = 49*32K time= 2.023415 msec Y CUFFT_Z2Z size= 1638400 = 1600K = 50*32K time= 2.217558 msec M CUFFT_Z2Z size= 1769472 = 1728K = 54*32K time= 2.141137 msec Y CUFFT_Z2Z size= 1835008 = 1792K = 56*32K time= 2.163136 msec Y CUFFT_Z2Z size= 1966080 = 1920K = 60*32K time= 2.700584 msec M CUFFT_Z2Z size= 2064384 = 2016K = 63*32K time= 2.551482 msec M CUFFT_Z2Z size= 2097152 = 2048K = 64*32K time= 2.409963 msec Y CUFFT_Z2Z size= 2293760 = 2240K = 70*32K time= 3.018234 msec M CUFFT_Z2Z size= 2359296 = 2304K = 72*32K time= 2.766602 msec Y CUFFT_Z2Z size= 2457600 = 2400K = 75*32K time= 3.627161 msec M CUFFT_Z2Z size= 2621440 = 2560K = 80*32K time= 3.239111 msec Y CUFFT_Z2Z size= 2654208 = 2592K = 81*32K time= 3.409978 msec Y CUFFT_Z2Z size= 2752512 = 2688K = 84*32K time= 3.812189 msec Y CUFFT_Z2Z size= 2949120 = 2880K = 90*32K time= 3.853927 msec Y CUFFT_Z2Z size= 3145728 = 3072K = 96*32K time= 4.029561 msec Y CUFFT_Z2Z size= 3211264 = 3136K = 98*32K time= 4.324980 msec Y CUFFT_Z2Z size= 3276800 = 3200K = 100*32K time= 4.702814 msec M CUFFT_Z2Z size= 3440640 = 3360K = 105*32K time= 4.934543 msec M CUFFT_Z2Z size= 3538944 = 3456K = 108*32K time= 4.573230 msec Y CUFFT_Z2Z size= 3670016 = 3584K = 112*32K time= 4.591721 msec Y CUFFT_Z2Z size= 3932160 = 3840K = 120*32K time= 5.395338 msec M CUFFT_Z2Z size= 4128768 = 4032K = 126*32K time= 5.436691 msec M CUFFT_Z2Z size= 4194304 = 4096K = 128*32K time= 5.049356 msec Y CUFFT_Z2Z size= 4423680 = 4320K = 135*32K time= 5.862155 msec M CUFFT_Z2Z size= 4587520 = 4480K = 140*32K time= 6.353941 msec M CUFFT_Z2Z size= 4718592 = 4608K = 144*32K time= 5.858453 msec Y CUFFT_Z2Z size= 4816896 = 4704K = 147*32K time= 7.085539 msec M CUFFT_Z2Z size= 4915200 = 4800K = 150*32K time= 7.661496 msec M [/code] [QUOTE=Prime95;294535]We're getting a little off topic. P95 uses a table in mult.asm. The xjmptable is for SSE2, the yjmptable is for AVX. The table includes FFT size, maximum Mersenne exponent, estimated timing, mem used, which CPU architectures should use it, and some other stuff.[/QUOTE] :ouch1: ...The former is 2800 lines. Did you write those all by hand? |
[QUOTE=Dubslow;294539]Rehashed to show a bit more about the FFT size, as well as axn's correction.
Edit: Whoops, cross post. @axn: msft said CuLu can use any multiple of 32K, that's why I did as such. Edit2: Redone to show more lengths that are "reasonable", but not best. Those are marked with M. [/QUOTE] I did a similar exercise, this time normalizing the time by dividing it by (FFT/1048576). There is a clear pattern. Any multiplier that is 7-smooth yields decent (not necessarily preferred) performance. Anything that is not 7-smooth yields terrible performance. Something like 4x or worse. |
[QUOTE=axn;294542]I did a similar exercise, this time normalizing the time by dividing it by (FFT/1048576). There is a clear pattern. Any multiplier that is 7-smooth yields decent (not necessarily preferred) performance. Anything that is not 7-smooth yields terrible performance. Something like 4x or worse.[/QUOTE]
Could you post a chart of the multiplier's factorizations or do you want me to do it? |
[QUOTE=Dubslow;294544]Could you post a chart of the multiplier's factorizations or do you want me to do it?[/QUOTE]
[CODE] FFT Pref Mult Smooth Time (ms) Normalized 1048576 Y 1 1 1.1305 1.130 2097152 Y 1 1 2.4099 1.204 4194304 Y 1 1 5.0493 1.262 1572864 Y 3 3 1.9378 1.291 3145728 Y 3 3 4.0295 1.343 1310720 Y 5 5 1.5333 1.226 2621440 Y 5 5 3.2391 1.295 1835008 Y 7 7 2.1631 1.236 3670016 Y 7 7 4.5917 1.311 1179648 Y 9 3 1.3906 1.236 2359296 Y 9 3 2.7666 1.229 4718592 Y 9 3 5.8584 1.301 1441792 11 11 12.0507 8.764 2883584 11 11 25.2414 9.178 1703936 13 13 15.8089 9.728 3407872 13 13 32.9923 10.151 1966080 15 5 2.7005 1.440 3932160 15 5 5.3953 1.438 1114112 17 17 17.8324 16.783 2228224 17 17 23.8903 11.242 4456448 17 17 54.9814 12.936 1245184 19 19 14.2480 11.998 2490368 19 19 28.5571 12.024 4980736 19 19 65.0263 13.689 1376256 ? 21 7 1.8189 1.385 2752512 Y 21 7 3.8121 1.452 1507328 23 23 19.8118 13.782 3014656 23 23 40.1851 13.977 1638400 25 5 2.2175 1.419 3276800 25 5 4.7028 1.504 1769472 Y 27 3 2.1411 1.268 3538944 Y 27 3 4.5732 1.355 1900544 29 29 30.3831 16.763 3801088 29 29 61.4396 16.948 2031616 31 31 33.3520 17.213 4063232 31 31 67.5301 17.427 1081344 33 11 9.9185 9.618 2162688 33 11 18.5583 8.997 4325376 33 11 40.4085 9.796 1146880 35 7 1.4178 1.296 2293760 35 7 3.0182 1.379 4587520 35 7 6.3539 1.452 1212416 37 37 22.6343 19.575 2424832 37 37 45.7872 19.799 4849664 37 37 99.3098 21.472 1277952 39 13 12.9222 10.602 2555904 39 13 23.8400 9.780 1343488 41 41 27.0680 21.126 2686976 41 41 54.9051 21.426 1409024 43 43 29.3962 21.876 2818048 43 43 59.6049 22.178 1474560 Y 45 5 1.8090 1.286 2949120 Y 45 5 3.8539 1.370 1540096 47 47 33.5578 22.847 3080192 47 47 68.1485 23.199 1605632 Y 49 7 2.0234 1.321 3211264 Y 49 7 4.3249 1.412 1671168 51 17 18.4646 11.585 3342336 51 17 37.9425 11.903 1736704 53 53 13.0645 7.888 3473408 53 53 26.7417 8.072 1802240 55 11 15.3619 8.937 3604480 55 11 33.7740 9.825 1867776 57 19 22.4452 12.600 3735552 57 19 45.0705 12.651 1933312 59 59 15.6682 8.498 3866624 59 59 32.2185 8.737 1998848 61 61 17.2398 9.043 3997696 61 61 36.1076 9.470 2064384 63 7 2.5514 1.295 4128768 63 7 5.4366 1.380 2129920 65 13 20.0319 9.861 4259840 65 13 43.8546 10.794 2195456 67 67 14.6807 7.011 4390912 67 67 31.1684 7.443 2260992 69 23 30.2865 14.045 4521984 69 23 62.9652 14.600 2326528 71 71 17.2002 7.752 4653056 71 71 36.2993 8.180 2392064 73 73 21.2844 9.330 4784128 73 73 44.9508 9.852 2457600 75 5 3.6271 1.547 4915200 75 5 7.6614 1.634 2523136 77 11 21.6817 9.010 2588672 79 79 20.6799 8.376 2654208 Y 81 3 3.4099 1.347 2719744 83 83 19.7181 7.602 2785280 85 17 31.6756 11.924 2850816 87 29 44.9787 16.543 2916352 89 89 20.1506 7.245 2981888 91 13 28.5019 10.022 3047424 93 31 49.9787 17.197 3112960 95 19 37.6290 12.675 3178496 97 97 26.7683 8.830 3244032 99 11 32.7558 10.587 3309568 101 101 23.0312 7.297 3375104 103 103 26.8451 8.340 3440640 105 7 4.9345 1.503 3506176 107 107 23.0966 6.907 3571712 109 109 30.6403 8.995 3637248 111 37 70.8848 20.435 3702784 113 113 26.2526 7.434 3768320 115 23 52.5471 14.621 3833856 117 13 38.0365 10.403 3899392 119 17 44.3925 11.937 3964928 121 11 54.2576 14.349 4030464 123 41 84.7203 22.041 4096000 125 5 6.4623 1.654 4161536 127 127 26.5980 6.701 4227072 129 43 91.9276 22.803 4292608 131 131 36.1097 8.820 4358144 133 19 52.7457 12.690 4423680 135 5 5.8621 1.389 4489216 137 137 40.8194 9.534 4554752 139 139 36.4544 8.392 4620288 141 47 104.9800 23.825 4685824 143 13 68.4353 15.314 4751360 145 29 79.7051 17.590 4816896 147 7 7.0855 1.542 4882432 149 149 38.9179 8.358 4947968 151 151 38.3244 8.121 [/CODE] |
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.