![]() |
@Robish: Try -aggressive if that does anything for you.
(6 hours till my second DC completes :smile:) |
[QUOTE=Robish;353806]Still 12 days though, I'll see if I can teak it a bit more with the settings[/QUOTE]
For me (rather for a 7850), -threads 64 was fastest. Slightly behind was 128, and a lot slower: 256. |
[QUOTE=Bdot;353810]For me (rather for a 7850), -threads 64 was fastest. Slightly behind was 128, and a lot slower: 256.[/QUOTE]
My 7770: 64=12.4 ms 128=12.0 ms 256=12.2 ms |
My Second DC done. :smile:
M( [URL="http://mersenne.org/report_exponent/?exp_lo=30822937"]30822937[/URL] )C, 0x1c656da41a256c21, n = 2097152, clLucas v1.01 M( [URL="http://mersenne.org/report_exponent/?exp_lo=30766511"]30766511[/URL] )C, 0x1ff14c8237b5e935, n = 2097152, clLucas v1.00 |
[QUOTE=kracker;353757]clLucas 1.01 out. :smile:
[/QUOTE] This fixed my issue indeed, and the speed on a 5870 is similar to the other Cypress posted earlier. [code] Adapter 0 - ATI Radeon HD 5800 Series New Core Peak : 900 New Memory Peak : 1200 Platform :Advanced Micro Devices, Inc. Device 0 : Cypress Device 1 : Cypress start M32163559 fft length = 2097152 Iteration 10000 M( 32163559 )C, 0x4fcb7c91ec898d35, n = 2097152, clLucas v1.01 err = 0.003906 (1:32 real, 9.2052 ms/iter, ETA 82:12:26) Iteration 20000 M( 32163559 )C, 0xf7db0862f3ce666d, n = 2097152, clLucas v1.01 err = 0.003906 (1:31 real, 9.1093 ms/iter, ETA 81:19:31) Iteration 30000 M( 32163559 )C, 0xc6ee677eb0f7dab6, n = 2097152, clLucas v1.01 err = 0.003906 (1:33 real, 9.2079 ms/iter, ETA 82:10:50) Iteration 40000 M( 32163559 )C, 0x6090de301e3b5f00, n = 2097152, clLucas v1.01 err = 0.003906 (1:31 real, 9.1100 ms/iter, ETA 81:16:53) [/code] |
1 Attachment(s)
Hi,
1.01 source code. 1) Fix TeknoHog issue. 2) Change from clAmdFft.h to clfft.h. 3) Fix "over specifications Grid = 65536" issue. 4) Change from cudalucas.ini to cllucas.ini. |
[QUOTE=kracker;353809]@Robish: Try -aggressive if that does anything for you.
(6 hours till my second DC completes :smile:)[/QUOTE] Will do Kracker, report back soon as |
1 Attachment(s)
@kracker:
Back from my trip, and thinking to take your challenge seriously, hehe, but I just realized you might be talking about a different challenge :razz: Looking [URL="http://www.gpu72.com/reports/workers/dc/"]here[/URL], which at the moment looks like that: [CODE]..... 4 kracker 43 307 426 [B]13,040[/B] ..... 6 LaurV 374 [B]9,172[/B] ..... [/CODE]it would be a 4THzD difference, which might take time (talking about DC, not about TF where thousands of GHzD/D are possible), even if I work full power and you are sleeping... But I just looked to my other stats, which are the GIMPS' Lifetime, and thinking that you may be talking about this: [ATTACH]10294[/ATTACH] which should be a trifle, just few days of full throttle to put you far behind... :razz: Still thinking about it... But for me now it seems to be more important to do some P-1, because I will be soon pushed out of Lifetime's Top100, where I am trying to stay, once I was able to reach it... hehe. So, my cards will do some P-1 for the time being. Let you get some more advance, hehe ... you know what the problem with roosters is? (quote from the web, I spent some time to search for this old joke, I can never tell it properly in English, but I use to tell it to every young engineer who come to work for the company, when he hits the upper threshold of the door's frame with his head... :razz:, buddy, you are the young rooster here!) [QUOTE] A farmer goes out one day and buys a brand new stud rooster for his chicken coop. The young rooster walks over to the old rooster and says "Ok, old fellow, time to retire." The old rooster says, "You can't handle all these chickens....look at what it did to me!" The young rooster replies, "Now, don't give me a hassle about this. Time for the old to step aside and the young to take over, so take a hike." The old rooster says, "Aw, c'mon.....just let me have the two old hens over in the corner. I won't bother you." The young rooster says, "Scram! Beat it! You're washed up! I'm taking over!" So, the old rooster thinks for a minute and then says to the young rooster, "I'll tell you what, young fellow, I'll have a race with you around the farmhouse. Whoever wins the race gets domain of the chicken coop." The young rooster says, "You know I'm going to beat you, old man, so just to be fair, I'm even going to give you a head start." They line up in back of the farm house, get a chicken to cluck "Go!" and the old rooster takes off running. About 15 seconds later, the young rooster takes off after him. They round the front of the farmhouse and the young rooster is only about 5 inches behind the old rooster and gaining fast. The farmer, sitting on the porch, looks up, sees what's going on, grabs his shotgun and BOOM - he blows the young rooster to bits. He sadly shakes his head and says, "Dammit, third gay rooster I bought this week!" [/QUOTE](no disrespect for the gay's community, this joke is just funny!) Joking apart, I really like the new clLucas! I certainly have to play more with it! I think you guys did a wonderful job! kotgw and kudos! |
[QUOTE=kracker;353809]@Robish: Try -aggressive if that does anything for you.
(6 hours till my second DC completes :smile:)[/QUOTE] Hi Kracker Quick question, what are the fft length multiples of? Im attempting a few 100million jobs with -f 20971520, ETA 190 days but since the -f makes such a difference I would like to try a range of values to find the optimum. I seen it somewhere but cant find now. |
[QUOTE=Robish;353861]
Quick question, what are the fft length multiples of? [/QUOTE] It depends on your number of threads. An 8k (i.e. 8*1024) multiple is ok for most cases, but you have to try few DC first to see if you get the right residues. For paranoids (like me) you can get as low as 1K multiples (1024) which is the minimum allowed (it can not be lower, without producing crap). With many threads (like 512, 1024, for gtx cards) a granulation of 1k will still give you errors. Msft said somewhere that you need something like 32-64k FFT for 1024 threads, 16k FFT for 256-512 threads, etc, but in practice, you need to test for YOUR card. You can do "-cufftbench" to test for your card, and then, for each rang of expos you test, select the FFT length (size) that is faster AND in the same time, gives you errors between 0.1 and 0.23 (otherwise you risk to get rounding/summing errors during running). |
[QUOTE=LaurV;353862]It depends on your number of threads. An 8k (i.e. 8*1024) multiple is ok for most cases, but you have to try few DC first to see if you get the right residues. For paranoids (like me) you can get as low as 1K multiples (1024) which is the minimum allowed (it can not be lower, without producing crap). With many threads (like 512, 1024, for gtx cards) a granulation of 1k will still give you errors. Msft said somewhere that you need something like 32-64k FFT for 1024 threads, 16k FFT for 256-512 threads, etc, but in practice, you need to test for YOUR card. You can do "-cufftbench" to test for your card, and then, for each rang of expos you test, select the FFT length (size) that is faster AND in the same time, gives you errors between 0.1 and 0.23 (otherwise you risk to get rounding/summing errors during running).[/QUOTE]
Thanks LaurV It'll take a while for all that to sink in I'm afraid ;-) I'll try cufftbench 1st. Sorry I probably shouldn't be asking here anyway but I'm using a GTX 690 Cudalucas CUDALucas-2.03-cuda4.2-sm_30-x86-64 -threads 512 -f 20971520 -t 332233123 So if I am reading this right, 8 * 1024 = 8192 so multiples of 8192? ie I'm using 8192 * 2560 = 20971520 Cheers Rob. |
| All times are UTC. The time now is 22:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.