![]() |
[QUOTE=TheJudger;324380]Hi,
so you manage the edit the mfaktc.ini but did you read it? [CODE]# Keep in mind that "number of candidates (M/G)" and "rate (M/s)" are NOT # compareable between CPU- and GPU-sieving. When sieving is done on GPU # those number count all factor candidates prior to sieving while CPU # sieving counts the numbers after the sieving process. # [/CODE][/QUOTE] I'm aware of that; I just didn't think it would make such a big difference. :o |
[QUOTE=swl551;324410]CudaLucas will NOT run at the high OC rates I ran 0.19 on. I learned that instantly with CuLu. The answers regarding CulLus sensitivity to GPU excution errors made sense. No one has stated 0.20 has the similar constraints. Maybe that is what we are uncovering here.[/QUOTE]
Well that's what he said, is that 0.20 stresses memory where 0.19 does not. CUDALucas' sensitivity is in the memory, and that's what's different between the two versions, so if CUDALucas fails at those higher clocks, then there's the issue. Thanks for that information TheJudger -- very good to know. |
[QUOTE=Dubslow;324419]Well that's what he said, is that 0.20 stresses memory where 0.19 does not. CUDALucas' sensitivity is in the memory, and that's what's different between the two versions, so if CUDALucas fails at those higher clocks, then there's the issue.
Thanks for that information TheJudger -- very good to know.[/QUOTE] Yes, I am agreeing with the scenario. :surrender |
Scott,
I'm pretty sure that you had problems with mfaktc 0.19 at your OC clock/voltage, too. But I *guess* that mfaktc has a very high chance for silent errors.[LIST][*]once started there is no memory allocation, everything is static after startup[*]read 12 bytes per factor candidate from memory and than run in registers[*]in very very very rare cases some data is written to memory (only when a factor was found), this can be billions of FCs with no data written into memory[/LIST] The selftest won't catch this usually, each test case typically tests 10-15M FCs but at the end it is only checked whether the known factor was found or not, the other millions of results are [B]not[/B] verified. The selftest usually doesn't stress the GPU very hard. Oliver |
[QUOTE=TheJudger;324438]Scott,
I'm pretty sure that you had problems with mfaktc 0.19 at your OC clock/voltage, too. But I *guess* that mfaktc has a very high chance for silent errors.[LIST][*]once started there is no memory allocation, everything is static after startup[*]read 12 bytes per factor candidate from memory and than run in registers[*]in very very very rare cases some data is written to memory (only when a factor was found), this can be billions of FCs with no data written into memory[/LIST]The selftest won't catch this usually, each test case typically tests 10-15M FCs but at the end it is only checked whether the known factor was found or not, the other millions of results are [B]not[/B] verified. The selftest usually doesn't stress the GPU very hard. Oliver[/QUOTE] Again, Thank you. I no longer feel there is a code defect causing the problem. |
Hi James,
[QUOTE=James Heinrich;324408]I'm just curious if you can quantify "horrible"? Presumably it works, but is slower than CPU-sieving even on a slow CPU? By how much?[/QUOTE] currently I can't give exact number because my GTX 275 retired (new GTX 680 for my main rig, GTX 470 moved to secondary rig replacing the GTX 275). For mfaktc the GTX 680 is not a very smart decission[SUP]*1[/SUP], same speed than GTX 470 (but less electrical energy and noise) but the main purpose is gaming and in this case the 680 is not the worst decission. :smile: From my mind the GPU sieving on GTX 275 was half the speed compared to (GTX 275 + one i7 (Nehalem series) core @ 3.5GHz). The CPU kept the GPU busy easily. I want to setup a test rig with the 275, I can provide exact numbers someday. [SUP]*1[/SUP]I have now permanent access to a Kepler based GPU, perhaps I can tweak the code a little bit but this is no promise. Oliver |
[QUOTE=TheJudger;324483]For mfaktc the GTX 680 is not a very smart decission[SUP]*1[/SUP][/QUOTE]I thought I remembered you having a GTX 680... can you run a benchmark and submit it on [url=http://mersenne.ca/mfaktc.php#benchmark]my site[/url], please?
I currently don't have any data on how CC 3.0 performs on v0.20-32 with GPU sieiving. I've updated the chart to reflect the new performance level of CC 2.0 and 2.1 (thanks everyone who submitted benchmarks!) and performance is much more consistent than it was with CPU-sieving before. But nobody with a CC 3.0 card has submitted a benchmark yet. :sad: So until someone does (preferably several someones) the relative performance of all CC 3.0 GPUs (e.g. GTX 6xx) is probably inaccurate. |
Hi James,
[LIST][*]stock GTX 680 (1008MHz, avg. Boost 1058MHz, actual clock around 1080MHz (this is no OC!))[*]M66362159 from 2[SUP]70[/SUP] to 2[SUP]71[/SUP][*]mfaktc 0.20, Win32, default settings[/LIST] Jan 12 15:21 | 4617 100.0% | 1.085 0m00s | 298.90 82485 n.a.% no factor for M66362159 from 2^70 to 2^71 [mfaktc 0.20 barrett76_mul32_gs] tf(): total time spent: 17m 21.457s OK? Oliver |
Thanks. I've revised my GFLOPS-GHzd/d ratio down from 15.0 in v0.19, down to 12.0 as a conservative estimate last week, and now with your benchmark down to 11.0 (for comparison, CC 2.0 = 3.6, CC 2.1 = 5.3, CC 1.x = 14.0).
I'll refine this as more users (hopefully) submit benchmarks on my site. According to these numbers, a GTX 470 is actually still about 7% [i]faster[/i] than a GTX 680. But 9% lower power consumption, plus higher gaming performance still counts for something. :smile: |
[QUOTE=James Heinrich;324492]
[...] a GTX 470 is actually still about 7% [i]faster[/i] than a GTX 680. But 9% lower power consumption, plus higher gaming performance still counts for something. :smile:[/QUOTE] well 9% lower TDP, but the real power consumption is [B]much[/B] lower. Those Tesla can measure the powerconsumption directly, Comparing Tesla M2075 (GF110, 448Cores @ 1150MHz) with Tesla K10 (2x GK104, 1536 Cores @ 745MHz) the Tesla K10 has ~70% throuhput per GPU compared to M2075 but power consumption is less than half (~70W per GPU vs. 170W). Oliver |
Had an interesting situation with the new .20. I have a GTX 560 in a Core2Quad Q6600. Running the 32 bit program by itself on core 4, it holds a steady 210-220 GHzd/d, but if I start P95 and run DC on cores 1 and 3, mfakts starts varying wildly between 140 and 200 GHzd/d. I switched to the 64 bit program to see what would happen, and even with P95 running it stays fairly steady between 210-220 GHzd/d.
|
| All times are UTC. The time now is 23:16. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.