![]() |
[QUOTE=James Heinrich;332594]Short story: about the same mfaktc performance as a GTX 570.[/QUOTE]
without boost 1.0/2.0 this is correct with boost, it's not |
[QUOTE=ixfd64;332596]Considering that mfaktc was designed to eliminate Mersenne prime candidates, it's probably a bit pointless to make it TF known composite numbers.[/QUOTE]
Correct, this is of no value to GIMPS as a prime search project. Some people, including me, sometimes do look for factors of large 2^n-1 numbers, and it would be awesome to be able to use GPUs for the job. If factoring composite exponents is permitted, I think it should be done with minimal developer effort, even if that neglects some relatively easy optimizations that could be done, precisely because factoring such numbers is not the main purpose of mfakc. |
We had this discussion few times in the past, I still remember [URL="http://www.mersenneforum.org/showthread.php?p=299505"]the last try[/URL]. It turns out that one would need a good way to eliminate the algebraic and intrinsic factors first, and do the sieving then. It is not difficult to change the mfaktc if one don't care about missing some factors, and the goal is just to find some other factors, no matter if they are in order or not. If one is going to make a new mfaktc, I would suggest introducing a new flag, like "-allowcomposite" or whatever, and the program would not check if the exponent is composite when the flag is present (but still check if it is odd, otherwise we have trouble with 3,5 (mod 8), and more classes to parse, different logic). Modifying the program to always allow composite exponents could result in futile work when someone makes a typo, for example. What I want to say, is that is better to keep the check for primality on the default options of the program, but allow odd composites if some "special" flag is present. For these odd composite, the program would work exactly the same way as it does for prime exponents. Of course, it will miss the algebraic factors.
|
[QUOTE=akruppa;332632]Correct, this is of no value to GIMPS as a prime search project. Some people, including me, sometimes do look for factors of large 2^n-1 numbers, and it would be awesome to be able to use GPUs for the job. If factoring composite exponents is permitted, I think it should be done with minimal developer effort, even if that neglects some relatively easy optimizations that could be done, precisely because factoring such numbers is not the main purpose of mfakc.[/QUOTE]
True, there is no harm in giving mfaktc the ability to TF composite exponents. Perhaps there could be a switch that makes it skip the primality check. |
[QUOTE=ixfd64;332769]True, there is no harm in giving mfaktc the ability to TF composite exponents. Perhaps there could be a switch that makes it skip the primality check.[/QUOTE]
I'm not familiar with the mfaktc code at all, or I could probably make the changes myself. If the changes to allow composite exponents are implemented, maybe I can add some code to skip useless classes according to quadratic character, if such changes are well-localized. |
Remove the check for prime exponent is easy, aswell as the classes stuff is easy, too, as long as we'll keep 420/4620 classes. But the (CPU-)sieve needs to be reworked, currently there is no code to remove primes from the sieve base, but this would be needed for composite exponents, otherwise there will be an endless loop in the offset calculation. Anyway, it seems feasible if someone wants to do so.
Oliver |
Btw, what is the relationship between mfaktc and mmff? I haven't kept up with developments, and digging through a 100-page thread seems a daunting task... is one a superset of the other?
|
[QUOTE=akruppa;332861]Btw, what is the relationship between mfaktc and mmff? I haven't kept up with developments, and digging through a 100-page thread seems a daunting task... is one a superset of the other?[/QUOTE]
mfaktc is for TF'ing GIMPS-class numbers, and mmff is for double Mersenne and Fermat numbers. The mmff source code is largely based on that of mfaktc. |
[QUOTE=ixfd64;332864]mfaktc is for TF'ing GIMPS-class numbers, and mmff is for double Mersenne and Fermat numbers. The mmff source code is largely based on that of mfaktc.[/QUOTE]
Although I think the gpu sieving code now in mfaktc was originally in mmff. Just to be confusing. |
So finally I got my hands on a Titan (temporary), too.
Seems that I've underestimated to boost clock. So with stock clockrates the Titan is the fastest GPU for mfaktc, but it wins only by a small margin compared to the old GTX 580. There [B][U]might[/U][/B] be a very small performance increase for the Titan once I test the new [I]funnel shift[/I] instruction. The barrett_{76,77,79} kernels don't make use of multiword shifts expect for the initialization but barrett_{87,88,92} do a multiword shift in each iteration. Oliver |
[QUOTE=TheJudger;333722]So finally I got my hands on a Titan (temporary), too.[/QUOTE]Would you mind sending me a benchmark? I'd feel better about my mfaktc performance chart ratios if I had more than 1 benchmark to go on.
|
Yepp, I was right: the funnel shift gives a small advantage.
A quick hack using stock mfaktc 0.20 code and barrett_87 for testing on a Tesla K20 (CUDA 5.0) [CODE] base 300.8 GHzd/d added code generation for sm_35 298.1 GHzd/d using funnel shift in barrett_87 308.9 GHzd/d [/CODE] Using funnel shift in the initialization phase causes a very small slowdown! So barrett_87 beats now barrett_77, only barett_76 is faster on GK110. For the current TF wavefront the impact is even lower because we do TF to 2[SUP]73[/SUP] there. But hey, it is an improvement... Oliver |
We have been messing around with the extremely confusing "[URL="http://www.evga.com/precision/"]EVGA Precision X[/URL]" software. There are so many options it is ridiculous!
Anyways, we were messing with the memory clock setting. By default on our GTX690 it is 1502.3MHz. We lowered this to 1252.8MHz and the performance did not change! The temperatures and voltages do not change, either. Does this make sense? Would running the memory slower like that make it less likely to have a flipped bit? FWIW, the GPU clock is 1058.2MHz. The performance changes a lot when we mess around with that! :mike: |
I believe mfaktc has very little gpu memory usage. CuLu, however does of course.
|
[QUOTE=kracker;334597]I believe mfaktc has very little gpu memory usage. CuLu, however does of course.[/QUOTE]
Quite. If you're running only mfaktc, memory clock will have little to no impact on performance. If you run CuLu, memory clock should scale mostly linearly with performance (up to some upper limit that's probably out of reach -- be sure your card is absolutely stable with at least a few double checks). |
[QUOTE=Xyzzy;334596]We have been messing around with the extremely confusing "[URL="http://www.evga.com/precision/"]EVGA Precision X[/URL]" software. There are so many options it is ridiculous!
Anyways, we were messing with the memory clock setting. By default on our GTX690 it is 1502.3MHz. We lowered this to 1252.8MHz and the performance did not change! The temperatures and voltages do not change, either. Does this make sense? Would running the memory slower like that make it less likely to have a flipped bit? FWIW, the GPU clock is 1058.2MHz. The performance changes a lot when we mess around with that! :mike:[/QUOTE] I had my memory clock cranked way down on the 580 and it made no noticeable change on the performance or temperature. |
1 Attachment(s)
running MF , the performance should scale linear with GPU-clock
gpu clock * gpufactor = Ghz-d/Day ex. 966mhz *x = 439 x = 0,454 1241mhz * 0,454 = ? *proven* |
Is there any way to disable the "simple selftest" on startup? Maybe just run it once for each version of mfaktc or once per month? It wastes time that it could spend crunching when you switch stop and start it often, e.g. [URL="http://www.mersenneforum.org/showthread.php?t=18088"]to switch speeds[/URL].
|
Without changing some code and recompile or binary hacking: no, not possible.
And there is (at least) one broken driver version available in the wild so I don't recommend to do so. How much time does it really take on your system? Oliver |
[QUOTE=TheJudger;337056]Without changing some code and recompile or binary hacking: no, not possible.
And there is (at least) one broken driver version available in the wild so I don't recommend to do so. How much time does it really take on your system? Oliver[/QUOTE] About 6-7 seconds. It seems longer when using my controller app because it doesn't show the output until mfaktc finishes the first class (first time it senses a newline, I suppose) which means it looks like closer to 18 seconds on my current GPU/assignment. I was under the impression that it really only was meant to test mfaktc's internals, but the fact that it can detect a broken driver makes it more useful. I guess it's really not as bad as I thought, all things considered. I wouldn't expect most people to spend more than about a minute (~10 runs) per day running the selftest, really a trivial amount. Thanks for the info. |
Odd, I don't run that selftest more than once every few days. Do you turn the program off and on a lot?
|
[QUOTE=Aramis Wyler;337143]Odd, I don't run that selftest more than once every few days. Do you turn the program off and on a lot?[/QUOTE]
Yes, as part of switching speeds using my [URL="http://www.mersenneforum.org/showthread.php?t=18088"]MfaktX Controller[/URL] tool. Mfaktc doesn't support a nicer way of switching the GPUSieveSize, so the tool stops mfaktc, changes the ini file, and then starts it again. Due to the selftest, there's a few seconds of lost time each time you do this. I have the tool set up to switch automatically when the screensaver goes off or on, and there's a PauseWhileRunning feature, and you can change it any time you want, so it can happen a number of times per day - but at 6-7 seconds per, it's not a significant loss. |
Okay, I have questions that have probably been asked time and time again: what is the minimum exponent that can be tested with mfaktc 0.20? What is the minimum bit depth? Maximum? Is there a maximum exponent testable by design, or just based on memory?
Thanks in advance! |
If memory serve, smallest expo is 1M, maximum 2^32-1 (or was it 2^31-1?), following that, minimum depth should be 20, and maximum... well.. 2^31-1?
|
[QUOTE=c10ck3r;338186]Okay, I have questions that have probably been asked time and time again: what is the minimum exponent that can be tested with mfaktc 0.20? What is the minimum bit depth? Maximum? Is there a maximum exponent testable by design, or just based on memory?
Thanks in advance![/QUOTE][quote]from: [url]http://mersennewiki.org/index.php/Mfaktc[/url] Current limits[list][*]Prime exponents between 1,000,000 and 2[sup]32[/sup]-1[*]Factor sizes <2[sup]95[/sup] and k<2[sup]63.9[/sup][/list][/quote]Also:[list][*]GPU sieving is not available for < 2[sup]64[/sup] (the kernels that support such small factors are old (and notably slower than the more modern kernels available for "normal" GIMPS-range work) and haven't been rewritten to support GPU sieving)[/list] |
Please see [URL]http://mersenneforum.org/showpost.php?p=344999&postcount=852[/URL]
I'm afraid this also applies to mfaktc, though I'm not sure if GPUSieveProcessSize=24 is commonly used here. Oliver, can you check tf_common_gs.cu, the calculation of numblocks. I think the remainder of the division is lost. |
on my little GT 640 I use GPUSieveProcessSize=8 ;)
|
[QUOTE=Bdot;345001]Please see [URL]http://mersenneforum.org/showpost.php?p=344999&postcount=852[/URL]
I'm afraid this also applies to mfaktc, though I'm not sure if GPUSieveProcessSize=24 is commonly used here. Oliver, can you check tf_common_gs.cu, the calculation of numblocks. I think the remainder of the division is lost.[/QUOTE] I guess mfaktc is affected, too. (What about mmff?) For now I recommend to set GPUSieveSize to a multiple of GPUSieveProcessSize. Keep in mind that GPUSieveProcessSize is in Kibibits (1024 Bits) and GPUSieveSize in Mebibits (1048576 bits), thus GPUSieveSize=5 [B][U]is[/U][/B] a multiple of GPUSieveProcessSize=16. [B]The only problematic setting is GPUSieveProcessSize=24 and GPUSieveSize not a multiple of 3[/B]. For all other valid settings of GPUSieveProcessSize (8, 16, 32) it doesn't matter which value you choose for GPUSieveSize because it will allways be a multiple of GPUSieveProcessSize. Bertram: how do we fix this? The easy way by checking GPUSieveProcessSize and GPUSieveSize when the parameter file is read and autoadjust GPUSieveSize? Oliver |
[QUOTE=TheJudger;345194]
Bertram: how do we fix this? The easy way by checking GPUSieveProcessSize and GPUSieveSize when the parameter file is read and autoadjust GPUSieveSize? Oliver[/QUOTE] Yes, that's what I would prefer. I've sent you an email with my fix. Other options include: [LIST][*]run one more block that is only partially filled (disadvantage: for every sieve block would would run a few thousand "useless" TF threads)[*]not advancing k_min at the end of the loop by mystuff->gpu_sieve_size * NUM_CLASSES, but by mystuff->gpu_sieve_processing_size * numblocks (disadvantage: every sieve block would have an overlap with the previous one, and sieve a few thousand FCs again.)[*]I don't see a disadvantage in adjusting GPUSieveSize (except that it now may not be a nice power of two :smile:)[/LIST](above disadvantages apply only if GPUSieveProcessSize=24 && GPUSieveSize%3 > 0) Edit: I also posted a note to the mmff thread now. |
[QUOTE=Xyzzy;325305]FWIW, we played around with the values to find the most productive combo, and for both of our cards that combo was "GPUSieveSize=128" and "GPUSieveProcessSize=8".
YMMV[/QUOTE] Me too, on a GT 430, and a GTX 760. The GTX 760 went from 229 GHz/days/day to 249 GHz/days/day! |
What is your GPUSievePrimes set to? I get best results with (GTX 580):
GPUSievePrimes=70000 GPUSieveSize=128 GPUSieveProcessSize=16 |
[QUOTE=flashjh;354357]What is your GPUSievePrimes set to? I get best results with (GTX 580):
GPUSievePrimes=70000 GPUSieveSize=128 GPUSieveProcessSize=16[/QUOTE] GPUSievePrimes=82486 I haven't changed it from the default yet. |
[QUOTE=Mark Rose;354365]GPUSievePrimes=82486
I haven't changed it from the default yet.[/QUOTE] I get better output with 70K. |
[QUOTE=flashjh;354367]I get better output with 70K.[/QUOTE]
Really! Which chip(s)? Any or all? EDIT: Boosted the 570 15-18 GHz-d/d at restart. Trailed off to 5-7 GHz-d/d. Not very noticeable with the 580. Harder to say, as that one drives the display. It fluctuates more. |
I can only speak for GTX 580 but it's not too hard to make changes to the number and see what you get.
|
[QUOTE=flashjh;354372]I can only speak for GTX 580 but it's not too hard to make changes to the number and see what you get.[/QUOTE]
For sure. Thanks for mentioning it. I never messed with that value, though I did with others. |
[QUOTE=flashjh;354357]What is your GPUSievePrimes set to? I get best results with (GTX 580):
GPUSievePrimes=70000 GPUSieveSize=128 GPUSieveProcessSize=16[/QUOTE] I see no difference if set to the default, 50000, 70000, or 100000. |
I think it would be useful if mfaktc had an "auto-configure" feature that tries different parameters and determines the best configuration.
|
I end up with 30K for the 570, and 65K for the 580. The difference is still slight, but more pronounced on the 570.
|
Indeed. I'm not going to remote in to test the differences, and I think tweaking between 30K -100K gives no more than 5 or 10 GHzDays/Day per card, but when you're pushing the systems as hard as you can, everything counts.
|
[QUOTE=Mark Rose;354383]I see no difference if set to the default, 50000, 70000, or 100000.[/QUOTE]
Okay, I take that back. After playing with the values for a while, when factoring a 72 Mbit or 74 Mbit number, these are the best values for both the GTX 760 (260 GHz-d/day) and the GT 430 (50 GHz-d/day): GPUSievePrimes=100000 GPUSieveSize=128 GPUSieveProcessSize=8 |
[QUOTE=Mark Rose;354927]GPUSievePrimes=100000
GPUSieveSize=128 GPUSieveProcessSize=8[/QUOTE]I agree that gives good performance, but a high GPUSieveSize kills my GUI experience -- there is significant lag just minimizing a window for example, and forget about playing a video or a game. I find GPUSieveSize=32 to be acceptable, although still not completely unnoticed. |
The systems that I do TF on are mostly for TF. When we want to play a game or use it for something else we just stop the TF temporarily.
|
I'm mostly doing low TF (up to 2^64) in the [url=http://www.mersenne.ca/tf1G.php]1G+ range[/url], which necessitates CPU sieving and 4+ instances of mfaktc, but is utterly transparent to whatever else wants to use the GPU.
|
[QUOTE=James Heinrich;354933]I agree that gives good performance, but a high GPUSieveSize kills my GUI experience -- there is significant lag just minimizing a window for example, and forget about playing a video or a game. I find GPUSieveSize=32 to be acceptable, although still not completely unnoticed.[/QUOTE]
Yeah, I just stop the factoring when I'm actually using the system with the GTX 760. Things like watching videos and scrolling are too slow. My GT 430's are not used for displays at all, so they can keep factoring away :) [QUOTE=James Heinrich;354941]I'm mostly doing low TF (up to 2^64) in the [url=http://www.mersenne.ca/tf1G.php]1G+ range[/url], which necessitates CPU sieving and 4+ instances of mfaktc, but is utterly transparent to whatever else wants to use the GPU.[/QUOTE] I did a bunch of that earlier. I noticed the mersenne.ca stats for it are extremely slow to update though :/ |
[QUOTE=James Heinrich;354941]I'm mostly doing low TF (up to 2^64) in the [URL="http://www.mersenne.ca/tf1G.php"]1G+ range[/URL], which necessitates CPU sieving and 4+ instances of mfaktc, but is utterly transparent to whatever else wants to use the GPU.[/QUOTE]
But isn't that a rather inefficient use of GPUs? I suspect nothing beats old Athlons at TF under 64 bit. |
[QUOTE=Mark Rose;354973]I noticed the mersenne.ca stats for it are extremely slow to update though :/[/QUOTE]The [url=http://www.mersenne.ca/tf1G.php?available_assignments=1]stats for >1G TF[/url], such as they are, update nightly.
[QUOTE=garo;354975]But isn't that a rather inefficient use of GPUs? I suspect nothing beats old Athlons at TF under 64 bit.[/QUOTE]It is (much) less efficient than TF in normal ranges, but not [i]too[/i] horrible. My GTX 670, for example, gets about 150GHz-days/day throughput in this range, compared with approx 238GHz-days/day doing TF in normal ranges. By comparison, an [url=http://www.mersenne.ca/throughput.php?cpu1=AMD%20Athlon%28tm%29%2064%20X2%20Dual%20Core%20Processor%206000%2B|1024|0&mhz1=3000]Athlon X2 6000+[/url] can get about 11GHz-days/day out of both cores up to 2[sup]63[/sup] (9/day to 2[sup]64[/sup]), assuming Prime95 efficiency (although the Prime95 application doesn't support exponents beyond PrimeNet range). For what it's worth, I have used CPUs to TF the entire range up to 2[sup]51[/sup], but it's getting to the point where it's no longer practical to take everything up another bitlevel with CPUs. I don't really expect anyone to join me, or even approve of my pet project, but it's what I've chose to expend my GPU time on for the next few years. :smile: |
[QUOTE=James Heinrich;354982]The [url=http://www.mersenne.ca/tf1G.php?available_assignments=1]stats for >1G TF[/url], such as they are, update nightly.[/quote]
Interesting. I submitted over 1000 TF results in the >1G range yet only [URL="http://www.mersenne.ca/stats.php?showuserstats=shifted"]10 show up[/URL]. :( |
Sorry, yes, that section of the user-stats is known-broken. I have added a warning message to the page to make it clear. I will at some point get around to tracking down where the fault lies.
To be clear: the errors in the user stats pages extend across all ranges, not just the 1G+ range. In the 1G+ range any user-specific factoring effort for factors smaller than 0.1GHz-day effort (roughly 2[sup]67[/sup]) is not recorded (the factor is recorded of course, just not who found it). |
[QUOTE=James Heinrich;355091]Sorry, yes, that section of the user-stats is known-broken. I have added a warning message to the page to make it clear. I will at some point get around to tracking down where the fault lies.
To be clear: the errors in the user stats pages extend across all ranges, not just the 1G+ range. In the 1G+ range any user-specific factoring effort for factors smaller than 0.1GHz-day effort (roughly 2[sup]67[/sup]) is not recorded (the factor is recorded of course, just not who found it).[/QUOTE] Ahh, okay. Thanks for the information. I don't remember the exact level I was factoring to. I think mostly in the 2[sup]66[/sup] to 2[sup]68[/sup] range. |
CUDALucas 2.05 beta and "CUDALucas Road Map"
Wrong forum, meant to go [URL="http://www.mersenneforum.org/showthread.php?p=359150#post359150"]here[/URL]
|
[QUOTE=garo;354975]But isn't that a rather inefficient use of GPUs? I suspect nothing beats old Athlons at TF under 64 bit.[/QUOTE]
Actually, Intel has significantly improved integer-MUL support in their 2 main post-Core 2 chip families - roughly halved the latency, doubled the per-cycle pipelined throughput. [Those 2 are independent, btw.] GMP users may have noticed these speedups, although I have seen no one mention it around here. [Perhaps someone did in the factoring forums]. Here are [url=http://gmplib.org/list-archives/gmp-devel/2013-August/003353.html]comments from early August[/url] by GMP's Torbjorn Granlund: [quote]I got a new Intel Haswell system for the GMP test system array. This CPU line is interesting to GMP because of its improvements in the area of integer arithmetic. The undisputed GMP champion has for years been the now defunct AMD CPUs K8 and K10. The most critical multiplication loops run at between 2.375 and 2.5 cycles per accumulated 64 x 64 -> 128 bit product. No Intel system has come close, and newer AMD systems (Bulldozer, Piledriver) run he loops at between 4.5 and 5.2 cycles per limb. (New GMP code reaches 4.25 cycles.) Haswell adds a new multiply instruction which avoids 2 of 3 fixed- register operands. The old MUL did (rdx,rax) <- rax * regormem, while the new MULX does (reg1,reg2) <- rdx * regormem. I suppose they kept rdx fixed as a concession to the general x86 ugliness. :-) Furthermore, MUL overwrites the carry flag with a useless value, while MULX leaves flags alone. The new instruction is much more suitable for GMP's needs. I have written some preliminary loops using MULX, and optimised them for Haswell. The results are encouraging; this CPU has the potential to outperform all other x86 CPUs. The key multiply loops run at between 1.6 and 2.3 cycles/limb, resulting in about 20% higher performance than on the old K10. Thus far, only mul_1 (1.6 cycles/limb), and addmul_1/submul_1 (2.3 cycles/limb) are in the public repo. I have a 1.75 c/l mul_2 and 2.0 c/l addmul_2 in the assembly works. I strongly suspect it is possible to do addmul at considerably less than 2.0 c/l. (A caveat about the new system: Perhaps I was unlucky, or perhaps the platform in not yet robust, but the first system I got had a dead CPU, and the second is not 100% stable under GNU/Linux; I get rare spurious non-reproducible segfaults. Neither FreeBSD, Debian, or Ubuntu would work at all; they crashed in strange ways during install. Finally Gentoo installed, but has the segfault problem.)[/quote] Having been fully occupied with AVX/float code most of this year, I first noticed the impressive IMUL throughput boost a couple of weeks ago, while porting my TF code [which has macros for both IMUL and SSE/AVX-float-based TF beyond 64 bits] to my Haswell. The float-double TF code [up to 78 bits] got a nice boost from AVX, but the pure-int code [which has x86 asm routines for 64 and 96-bit factor candidates] was even better. A little digging through Agner Fog's pre-Haswell instruction timings PDF confirmed the MUL enhancements already on Sandy Bridge - Haswell further adds the MULX instruction, which I will be playing with going forward, as well as using FMA to boost the float-TF routines. |
I just installed NVIDIA drivers v331.82 and now mfaktc doesn't work anymore. Or, more specifically, the 64-bit LessClasses version doesn't work anymore. I tried the 32-bit regular version and it still seems to work OK.
Crash gives me this error dump:[code]Problem signature: Problem Event Name: APPCRASH Application Name: mfaktc-win-64.exe Application Version: 0.0.0.0 Application Timestamp: 50e9bf08 Fault Module Name: nvcuda.dll Fault Module Version: 8.17.13.3182 Fault Module Timestamp: 5280db7b Exception Code: c0000005 Exception Offset: 000000000009b506 OS Version: 6.1.7601.2.1.0.256.48 Locale ID: 1033 Additional Information 1: 0800 Additional Information 2: 08002199d42341871ec210c846947482 Additional Information 3: 915a Additional Information 4: 915a5873c4a2aec8d9ca7379729b85a7[/code] [i]edit: rolling back to v331.65 didn't fix my problem :sad:[/i] |
I'm still on 331.65. I ignored the update for the time being (it's supposed to be to improve performance in Assassin's Creed: Black Flag and some other game (guess which of the two I am more looking forward to playing... :razz:).
If you manage to get 331.65 to work for you again, I could update as well to check that this isn't just you. If the issue doesn't get solved by the weekend, I'll update my OS SSD image, update the GPU drivers, and restore the entire f*****g image if I get the same problem. You're under Windows 7? Or Linux? Do you have some kind of system restore feature? Windows 7 should have automatically made one before an update of that magnitude. Try restoring from that if it isn't going to hurt anything else of yours. |
I re-updated to 331.82 [i]and rebooted[/i] this time, and now mfaktc is happy again. (I couldn't reboot last night because I was still processing a 45-hour job).
I just found it odd that the LessClasses version wasn't happy but the regular mfatkc worked fine. |
The one time I didn't ask "Have you rebooted"...
|
Any advice to tweak my system for higher output?
[QUOTE]mfaktc v0.20 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 3 CPUStreams 3 GridSize 3 GPU Sieving enabled GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 30s Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID (none) ComputerID (none) AllowSleep no TimeStampInResults no CUDA version info binary compiled for CUDA 4.20 CUDA runtime version 4.20 CUDA driver version 6.0 CUDA device info name GeForce GTX 670 compute capability 3.0 maximum threads per block 1024 number of multiprocessors 7 (1344 shader cores) clock rate 980MHz Automatic parameters threads per grid 917504 running a simple selftest... Selftest statistics number of tests 92 successfull tests 92 selftest PASSED! got assignment: exp=75844001 bit_min=71 bit_max=72 (6.31 GHz-days) Starting trial factoring M75844001 from 2^71 to 2^72 (6.31 GHz-days) k_min = 15566051433240 k_max = 31132102873038 Using GPU kernel "barrett76_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Dec 24 20:21 | 0 0.1% | 2.380 38m02s | 238.45 82485 n.a.% Dec 24 20:21 | 3 0.2% | 2.363 37m44s | 240.17 82485 n.a.% Dec 24 20:21 | 4 0.3% | 2.339 37m18s | 242.63 82485 n.a.% Dec 24 20:21 | 15 0.4% | 2.338 37m15s | 242.74 82485 n.a.% Dec 24 20:21 | 16 0.5% | 2.341 37m16s | 242.43 82485 n.a.% [/QUOTE] |
[QUOTE=xtreme2k;362798]Any advice to tweak my system for higher output?[/QUOTE]
According to [url]http://www.mersenne.ca/mfaktc.php?sort=ghdpd&noA=1[/url], you're right were you should be. |
work in progress
Hello!
[LIST][*][B][U]mfaktc v0.20[/U][/B]: ./mfaktc.exe -tf 1000003 64 65 [CODE]mfaktc v0.20 (64bit built) [...] SievePrimes [B][COLOR="Red"]200000[/COLOR][/B] [...] got assignment: exp=1000003 bit_min=64 bit_max=65 (3.74 GHz-days) WARNING: SievePrimes is too big for the current assignment, lowering to [B][COLOR="Red"]78497[/COLOR][/B] It is not allowed to sieve primes which are equal or bigger than the exponent itself! [...] Using GPU kernel "barrett76_mul32" [...][/CODE][*][B][U]mfaktc v0.21-pre5[/U][/B]: ./mfaktc.exe -tf 1000003 64 65 [CODE]mfaktc v0.21-pre5 (64bit built) [...] SievePrimes [B][COLOR="Red"]200000[/COLOR][/B] [...] got assignment: exp=1000003 bit_min=64 bit_max=65 (3.74 GHz-days) [...] Using GPU kernel "barrett76_mul32" [I]sieve_init_class(1000003, 9223344364080, 200000) last prime in sieve: [B][COLOR="Red"]2750161[/COLOR][/B] removing [B][COLOR="Red"]1000003[/COLOR][/B] from sieve adding [B][COLOR="Red"]2750171[/COLOR][/B] to sieve[/I] [...][/CODE] The [I]italic[/I] part are temporary printfs() in the code, they will be remove in release version. 2750161 is the 200000th odd prime 1000003 is removed from the sieving process because factor candidates (FCs) are 2 * k * exp + 1 so they are always 1 mod <exp>. 2750171 is the 200001st odd prime which takes place for the removed prime.[*][B][U]mfaktc v0.21-pre5[/U][/B]: ./mfaktc.exe -tf [B][COLOR="Red"]100019[/COLOR][/B] 1 55 [CODE]mfaktc v0.21-pre5 (64bit built) [...] SievePrimes [B][COLOR="Red"]200000[/COLOR][/B] [...] got assignment: exp=100019 bit_min=1 bit_max=55 (0.05 GHz-days) [...] Using GPU kernel "71bit_mul24" [I]sieve_init_class(100019, 0, 200000) last prime in sieve: [B][COLOR="Red"]2750161[/COLOR][/B] removing [B][COLOR="Red"]100019[/COLOR][/B] from sieve removing [B][COLOR="Red"]1800343[/COLOR][/B] from sieve adding [B][COLOR="Red"]2750171[/COLOR][/B] to sieve adding [B][COLOR="Red"]2750177[/COLOR][/B] to sieve[/I] [...][/CODE] again 2750161 is the 200000th odd prime 100019 is removed because it is the exponent itself 1800343 is removed from the sieving process because it is a possible FC: 1800343 = 1 mod (2 * 100019). Removing it allows finding composite factors which contain 1800343. If we ignore composite factors an offset for the sieving would be enough. 2000381 is [B]not removed[/B] even it is prime and 2000381 = 1 mod (2 * 100019) but doesn't satisfy the [URL="http://mersenne.org/various/math.php"]mod 8 rule[/URL] so it isn't a FC and can be used for sieving. 2750171 and 2750177 are the 200001st and 200002nd odd primes which take place for the removed primes.[/LIST] Oliver |
Very interesting/nice. Do you also plan to add add files(worktodo.add) as well?
|
I am looking forward to its debut. :tu:
|
Would version 0.21 have any speed improvements over 0.20?
|
Hello!
[QUOTE=kracker;363144]Very interesting/nice. Do you also plan to add add files(worktodo.add) as well?[/QUOTE] Yes, worktodo.add is planned. [QUOTE=ixfd64;363159]Would version 0.21 have any speed improvements over 0.20?[/QUOTE] Yes, for some bitranges if you are running mfaktc[LIST][*]on a CC 1.x GPU by reordering kernel priorities based on recent measurement on my GTX 275[*]on a CC 3.5 GPU for barrett_87/88/92 kernels using funnel shift (see [URL="http://www.mersenneforum.org/showpost.php?p=333722&postcount=2241"]this[/URL] and [URL="http://www.mersenneforum.org/showpost.php?p=334251&postcount=2243"]this[/URL] post) Raw Kernel speeds on a Tesla K20: [CODE]barrett87_mul32 368.01M/s (without funnel-shift 357.09M/s) barrett88_mul32 367.45M/s (without funnel-shift 347.80M/s) barrett92_mul32 306.60M/s (without funnel-shift 293.69M/s)[/CODE][/LIST]So only minor performance improvements in the next release, sorry! Oliver |
It's still a noticeable improvement, though. :smile:
|
work in progress
Happy new year to everyone!
[B]./mfaktc.exe -tf 66362159 68 69[/B] [CODE]mfaktc v0.21-pre6 (64bit built) [...] GPU Sieving [B][COLOR="Red"]enabled[/COLOR][/B] GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits [...] CUDA device info name [B][COLOR="Red"]GeForce GTX 275[/COLOR][/B] compute capability [B][COLOR="Red"]1[/COLOR][/B].3 [...] clock rate 1404MHz [...] got assignment: exp=66362159 bit_min=68 bit_max=69 (0.90 GHz-days) Starting trial factoring M66362159 from 2^68 to 2^69 (0.90 GHz-days) [...] Using GPU kernel "barrett76_mul32[B][COLOR="Red"]_gs[/COLOR][/B]" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Jan 02 15:31 | 0 0.1% | 1.272 20m20s | 63.74 82485 n.a.% Jan 02 15:31 | 4 0.2% | 1.259 20m06s | 64.40 82485 n.a.% Jan 02 15:31 | 9 0.3% | 1.260 20m06s | 64.35 82485 n.a.% [...] [/CODE] Compared to the same GPU using [B]CPU[/B] (Core i7 9xx series @3.5GHz) sieving: [CODE][...] Using GPU kernel "barrett76_mul32" [...] Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Jan 02 15:35 | 24 0.6% | 1.232 19m35s | 65.81 25000 29.72% Jan 02 15:35 | 25 0.7% | 1.207 19m10s | 67.17 28125 25.67% Jan 02 15:35 | 37 0.8% | 1.197 19m00s | 67.73 31640 21.82% Jan 02 15:35 | 40 0.9% | 1.186 18m48s | 68.36 35595 17.45% Jan 02 15:35 | 45 1.0% | 1.175 18m36s | 69.00 40044 12.76% Jan 02 15:35 | 49 1.1% | 1.165 18m26s | 69.59 45049 7.69% Jan 02 15:35 | 52 1.2% | 1.154 18m14s | 70.26 50680 2.25% Jan 02 15:35 | 60 1.4% | 1.154 18m13s | 70.26 50680 2.27% Jan 02 15:35 | 61 1.5% | 1.154 18m12s | 70.26 50680 2.29% [...] [/CODE] I've no clue what went wrong when I did the benchmarks prior release of mfaktc 0.20 and decided to disable GPU sieving on CC 1.x GPUs, performance was reproduceable horrible (less than half of the CPU sieve performance). I didn't do any changes (except the code which disables GPU sieving for old GPUs) related to GPU sieve. Oliver |
[QUOTE=TheJudger;363558]I've no clue what went wrong when I did the benchmarks prior release of mfaktc 0.20 and decided to disable GPU sieving on CC 1.x GPUs[/QUOTE]That's exciting (that GPU-sieving will be available for 1.x) --- I may have to plug my 8800GT back in :smile:
Now if you tell me that you've also enabled GPU sieving for <64-bit target, I'll jump for joy... |
If you have AMD: [url]http://www.mersenneforum.org/showthread.php?p=363545#post363545[/url]
|
[QUOTE=flashjh;363586]If you have AMD: [url]http://www.mersenneforum.org/showthread.php?p=363545#post363545[/url][/QUOTE]Hmm... sounds promising... any chance of it getting ported to mfaktc?
|
Hi James,
[QUOTE=James Heinrich;363688]Hmm... sounds promising... any chance of it getting ported to mfaktc?[/QUOTE] Port what? GPU kernels using only 15 bits per integer? No way (until Nvidia changes the hardware (which I hope never happens)). You need to understand why mfakto has this kernels:[LIST][*]AMD GPUs still [I]prefers[/I] 24 bit integer multiplication over 32 bit[*]AFAIK OpenCL doesn't provide access to hardware carry[*]I'm unsure about this: is it possible to calculate the top bits of a 24x24 multiplication with OpenCL on AMD GPUs?[/LIST] Oliver |
1 Attachment(s)
FWIW, you can install CUDA via a package manager now. We did the Ubuntu 12.04 version and it worked as advertised. This should (?) alleviate problems with updated drivers breaking the system.
[url]http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#install-cuda-software[/url] Edit: Also, with Ubuntu 12.04 we were able to access the fan controller. The default BIOS (?) fan curve allows the card to get real close to 80°C which is (we think) where it is apt to throttle performance. We set the fan to 70% and the temperature is a stable 65°C under load. Unfortunately there is no option to have the fan speed change based on GPU temperature, like using EVGA's Precision X software in Windows, but since mfaktc runs 24×7 the environment is stable. [url]http://askubuntu.com/questions/42494/how-can-i-change-the-nvidia-gpu-fan-speed/299648[/url] →In particular, the answer with "sudo nvidia-xconfig --cool-bits=4" in it. |
Hello,
I've just discovered a bug in mfaktc 0.20. The good news is that I think this bug can never lead to false negatives (factor missed) because it just crashes mfaktc during the first call to a kernel which uses GPU sieving. The issue is that more shared memory than available is requested (depending on the values of GPUSieveProcessSize and GPUSievePrimes). I noticed this when using GPU sieving on CC 1.x GPUs which have only 16kiB of shared memory while newer GPUs (2.x and 3.x) have 48kiB which requires obscure settings of GPUSieveProcessSize and GPUSievePrimes to trigger the bug. mfaktc 0.21 will check the settings: [CODE][...] GPUSievePrimes 50000 [...] GPUSieveProcessSize 32Ki bits [...] CUDA device info name GeForce GTX 275 compute capability 1.3 [...] Using GPU kernel "75bit_mul32_gs" ERROR: Not enough shared memory available! Need 31457 bytes This GPU supports up to 16384 bytes of shared memory. You can lower GPUSieveProcessSize or increase GPUSievePrimes to lower the amount of shared memory needed [...] [/CODE] Oliver |
[CODE]top - 14:14:37 up 2 days, 19:12, 5 users, load average: 4.01, 3.94, 3.80
Tasks: 188 total, 6 running, 182 sleeping, 0 stopped, 0 zombie Cpu0 : 2.2%us, 0.5%sy, 96.6%ni, 0.3%id, 0.0%wa, 0.0%hi, 0.5%si, 0.0%st Cpu1 : 1.7%us, 0.2%sy, 97.7%ni, 0.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 1.8%us, 0.2%sy, 97.7%ni, 0.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 1.8%us, 0.2%sy, 97.6%ni, 0.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16380940k total, 3959672k used, 12421268k free, 214876k buffers Swap: 0k total, 0k used, 0k free, 1427968k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 1145 root 20 0 211m 100m 65m S 0 0.6 51:48.99 1 /usr/bin/X :0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch 2631 m 20 0 1213m 61m 29m S 2 0.4 32:25.85 0 /usr/lib/squeak/4.4.7-2357/squeakvm -encoding UTF-8 -vm-display-x11 -xshm -plugins /usr/lib/scratch/plugins/:/usr/lib/squeak/4.4.7-2357/ -vm-sound-oss /usr/share/scratch/Scratch.image [B]2325 m 20 0 [COLOR="Red"]32.1g[/COLOR] 88m 81m S 2 0.6 19:17.68 0 ./mfaktc.exe[/B][/CODE] :confused: |
[QUOTE=Xyzzy;365479][CODE]top - 14:14:37 up 2 days, 19:12, 5 users, load average: 4.01, 3.94, 3.80
Tasks: 188 total, 6 running, 182 sleeping, 0 stopped, 0 zombie Cpu0 : 2.2%us, 0.5%sy, 96.6%ni, 0.3%id, 0.0%wa, 0.0%hi, 0.5%si, 0.0%st Cpu1 : 1.7%us, 0.2%sy, 97.7%ni, 0.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 1.8%us, 0.2%sy, 97.7%ni, 0.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 1.8%us, 0.2%sy, 97.6%ni, 0.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16380940k total, 3959672k used, 12421268k free, 214876k buffers Swap: 0k total, 0k used, 0k free, 1427968k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 1145 root 20 0 211m 100m 65m S 0 0.6 51:48.99 1 /usr/bin/X :0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch 2631 m 20 0 1213m 61m 29m S 2 0.4 32:25.85 0 /usr/lib/squeak/4.4.7-2357/squeakvm -encoding UTF-8 -vm-display-x11 -xshm -plugins /usr/lib/scratch/plugins/:/usr/lib/squeak/4.4.7-2357/ -vm-sound-oss /usr/share/scratch/Scratch.image [B]2325 m 20 0 [COLOR="Red"]32.1g[/COLOR] 88m 81m S 2 0.6 19:17.68 0 ./mfaktc.exe[/B][/CODE] :confused:[/QUOTE] That's normal. It just means it's using 32.1 GB of virtual memory, including mapped files and shared memory with the graphics card. The actual RAM used is 88 MB in that case, of which 81 MB is shared libraries (which may also be used by other programs, and in this case is probably the CUDA libraries). The SWAP column shows the virtual memory space that's not currently using RAM. It does the same thing on my home machine. You'll see I have 16 GB of RAM and no swap used. [code]top - 15:37:26 up 7 days, 18:15, 3 users, load average: 4.19, 4.07, 4.06 Tasks: 214 total, 1 running, 212 sleeping, 1 stopped, 0 zombie Cpu(s): 0.2%us, 0.0%sy, 99.3%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.5%si, 0.0%st Mem: 16435484k total, 9251140k used, 7184344k free, 347556k buffers Swap: 8384508k total, 0k used, 8384508k free, 7467728k cached PID USER NI VIRT SWAP RES SHR DATA CODE S %CPU P %MEM TIME+ COMMAND 3321 m 10 930m 416m 514m 1824 880m 29m S 397 3 3.2 44381:09 ./mprime -d 3254 m 0 32.1g 32g 47m 42m 31g 604 S 0 2 0.3 16:05.84 ./mfaktc.exe -d 1 3677 m 0 32.1g 32g 43m 37m 31g 604 S 0 3 0.3 71:23.11 ./mfaktc.exe -d 0[/code] |
1 Attachment(s)
[QUOTE]Edit: Also, with Ubuntu 12.04 we were able to access the fan controller.[/QUOTE]
We decided to add a second video card to our system. After much trial and error we discovered that we cannot control the fan on the second video card unless we have a display attached to it. So, since our monitor has multiple ports, we plugged the second video card into the monitor and set up two displays. We have attached our xorg.conf file because we had to edit that manually. We do not use the second display, and if we switch inputs, it just shows a bright blank screen, so we probably set it up wrong. But the fan control works! :mike: [SIZE="1"]PS - The fan settings are not persistent across reboots.[/SIZE] |
Is it possible to modify the program to trial factor Gaussian Mersenne and Quotient? Any help from coders will be appreciated.:bow:
|
CUDA 6.0-rc reveals compute capability 3.[B]2[/B] which supports [URL="http://www.mersenneforum.org/showpost.php?p=363167&postcount=2294"]funnel shift[/URL], too. :smile:
CUDA 5.5 doesn't know 3.2 (while it knows 3.0 and 3.5). I'm curious about Maxwell chips (currently only available on GTX 750 (Ti))... Oliver |
[QUOTE=TheJudger;367461]CUDA 6.0-rc reveals compute capability 3.[B]2[/B] which supports [URL="http://www.mersenneforum.org/showpost.php?p=363167&postcount=2294"]funnel shift[/URL], too. :smile:
CUDA 5.5 doesn't know 3.2 (while it knows 3.0 and 3.5). I'm curious about Maxwell chips (currently only available on GTX 750 (Ti))... Oliver[/QUOTE] Is v0.21 still in beta? Luigi |
[QUOTE=TheJudger;367461]CUDA 6.0-rc reveals compute capability 3.[B]2[/B] which supports [URL="http://www.mersenneforum.org/showpost.php?p=363167&postcount=2294"]funnel shift[/URL], too. :smile:
CUDA 5.5 doesn't know 3.2 (while it knows 3.0 and 3.5). I'm curious about Maxwell chips (currently only available on GTX 750 (Ti))...[/QUOTE]Any idea on the sudden jump from Compute 3.5 to 5.0 for the Titan Black? [url]https://developer.nvidia.com/cuda-gpus[/url] |
[QUOTE=James Heinrich;367472]Any idea on the sudden jump from Compute 3.5 to 5.0 for the Titan Black?
[url]https://developer.nvidia.com/cuda-gpus[/url][/QUOTE] I *guess* it is a typo... |
[QUOTE=ET_;367470]Is v0.21 still in beta?
Luigi[/QUOTE] Yes... sorry! |
[QUOTE=TheJudger;367461]CUDA 6.0-rc reveals compute capability 3.[B]2[/B] which supports [URL="http://www.mersenneforum.org/showpost.php?p=363167&postcount=2294"]funnel shift[/URL], too. :smile:
CUDA 5.5 doesn't know 3.2 (while it knows 3.0 and 3.5). [/QUOTE] I find it interesting that some places reference a cc3.2. The latest nvidia driver refers instead to cc3.7, but has no mention of cc3.2! [CODE]strings /usr/lib/x86_64-linux-gnu/libcuda.so.334.16 |grep CUDA_ARCH -D__CUDA_ARCH__=100 -D__CUDA_ARCH__=110 -D__CUDA_ARCH__=120 -D__CUDA_ARCH__=130 -D__CUDA_ARCH__=200 -D__CUDA_ARCH__=210 -D__CUDA_ARCH__=300 -D__CUDA_ARCH__=350 -D__CUDA_ARCH__=370 -D__CUDA_ARCH__=500[/CODE][QUOTE=TheJudger;367461]I'm curious about Maxwell chips (currently only available on GTX 750 (Ti))... Oliver[/QUOTE] I have a 750Ti. So far doesn't seem that interesting... It's slower than both GTX 460 and GTX 660. deviceQuery doesn't show any differences besides what's been mentioned in various places: [FONT=Verdana]CUDA Capability Major/Minor version number: [B]5.0[/B] ( 5) Multiprocessors, [B](128) CUDA Cores/MP[/B]: 640 CUDA Cores Memory Bus Width: [B]128-bit[/B] L2 Cache Size: [B] 2097152[/B] bytes[/FONT] |
[QUOTE=aaronhaviland;367626]I have a 750Ti. So far doesn't seem that interesting... It's slower than both GTX 460 and GTX 660. deviceQuery doesn't show any differences besides what's been mentioned in various places:[/QUOTE]
750 Ti is a 60w part. 460/560/660 etc are 140-150w parts. Just sayin'. |
[QUOTE=aaronhaviland;367626]I have a 750Ti. So far doesn't seem that interesting... It's slower than both GTX 460 and GTX 660.[/QUOTE]What is your mfaktc performance like? My [url=http://www.mersenne.ca/mfaktc.php]benchmark chart[/url] predicts around 123GHz-days/day @ 1020MHz, does that sound about right? If you have the chance, please send me a benchmark result:
[url]http://www.mersenne.ca/mfaktc.php#benchmark[/url] |
[QUOTE=axn;367631]750 Ti is a 60w part. 460/560/660 etc are 140-150w parts. Just sayin'.[/QUOTE]
Also 750 ti is 28nm Maxwell. 460/560/660 is not. Maxwell has a lot of power saving features. |
[QUOTE=axn;367631]750 Ti is a 60w part. 460/560/660 etc are 140-150w parts. Just sayin'.[/QUOTE]750 Ti should be about 2.053 GHz-days/day per watt
560 Ti = 1.389 GHd/w 660 Ti = 1.547 GHd/w We've certainly come a ways, compare to my 8800 GT (still running) which gets 0.288 GHd/w :sad: |
[QUOTE=James Heinrich;367634]750 Ti should be about 2.053 GHz-days/day per watt
560 Ti = 1.389 GHd/w 660 Ti = 1.547 GHd/w We've certainly come a ways, compare to my 8800 GT (still running) which gets 0.288 GHd/w :sad:[/QUOTE] 105W with only 36 GHz? Damn. :razz: |
[QUOTE=James Heinrich;367632]What is your mfaktc performance like? My [URL="http://www.mersenne.ca/mfaktc.php"]benchmark chart[/URL] predicts around 123GHz-days/day @ 1020MHz, does that sound about right? If you have the chance, please send me a benchmark result:
[URL]http://www.mersenne.ca/mfaktc.php#benchmark[/URL][/QUOTE] Well, this card is running at higher clocks: nvidia-settings reports 1346MHz (and 85W according to EVGA), and mfaktc is showing ~177GHz-days/day. Which is interesting, as the card is only supposed to boost up to 1268MHz... [URL]http://www.evga.com/Products/Product.aspx?pn=02G-P4-3757-KR[/URL] [QUOTE=James Heinrich;367634]750 Ti should be about 2.053 GHz-days/day per watt[/QUOTE] Pretty good estimate. 2.089GHz-days/day/W |
[QUOTE=aaronhaviland;367640]Well, this card is running at higher clocks: nvidia-settings reports 1346MHz (and 85W according to EVGA), and mfaktc is showing ~177GHz-days/day.[/QUOTE]That is more performance than expected, even when scaled for clock speed. A more detailed benchmark would be appreciated:
[url]http://www.mersenne.ca/mfaktc.php#benchmark[/url] |
[QUOTE=kracker]is it possible to have line graphs for your GPU benchmarks? (being a heavy visual user myself...)[/QUOTE]I can do that. Have done that now, in fact:
[url]http://www.mersenne.ca/mfaktc.php[/url] The graph changes depending on the sort order of the page. Unfortunately graphing a few hundred results makes the page a bit slower to load than it was. Please let me know if it causes any problems. |
[QUOTE=James Heinrich;367646]A more detailed benchmark would be appreciated:
[URL]http://www.mersenne.ca/mfaktc.php#benchmark[/URL][/QUOTE] Already submitted... but on linux, so I can't give you a GPU-Z screenshot. I used M213685897 for the benchmark. |
Hi,
[QUOTE=aaronhaviland;367640]Well, this card is running at higher clocks: nvidia-settings reports 1346MHz (and 85W according to EVGA), and mfaktc is showing ~177GHz-days/day. Which is interesting, as the card is only supposed to boost up to 1268MHz... [URL]http://www.evga.com/Products/Product.aspx?pn=02G-P4-3757-KR[/URL] [/QUOTE] 1268MHz is the "average boost clock", not the max boost clock. Don't ask me how Nvidia measures the average boost (which workloads they use) but at least for mfaktc I can say the on non-CC 2.0 GPUs mfaktc keeps well below TDP thus on recent cards with GPU boost you'll see usually high boost clocks. The reported performance looks fine to me, a step in the right direction for mfaktc. :smile: Oliver |
Also regarding the clocks... I think the boost "amount" can get changed as well. I could be wrong. It's a nightmare for overclocking and I don't know why they do it that way. Just wait for a card to overclock itself to instability by boosting up that one extra step it can't handle...
|
Hey James, I might have an idea how to (partially) solve the speed problem with those graphs: you don't show them :razz:
Now seriously, when we access the "lucas" page (hey, btw, when are you going to include the AMD cards on the "lucas" performance page? remark I did not say "cudalucas", now cllucas has comparative performance, at least for 37M range, and with new FFT libraries and new R*90 cards, it can still get better, but this is another discussion), so when we access the "lucas" page, there is no graph shown. Then the user click on some card, and he sees the (whatever) graph. Which, in my opinion, is quite convenient. Letting apart the fact that those graphs were card-specific, and it could not be implemented differently, the idea is that you can also make the mfaktX graphs to be "card specific", i.e. you can implement it in the same way on the "mfaktX" page: when the user accesses the page, there is no graph. Fast, as it was before. If he wants to see some graph, he clicks on his card, then he can see the graph, exactly as it is now, but with the clicked card in a different color (like blue, instead of the red/green used now). |
[QUOTE=LaurV;367669]hey, btw, when are you going to include the AMD cards on the "lucas" performance page? remark I did not say "cudalucas", now cllucas has comparative performance, at least for 37M range, and with new FFT libraries and new R*90 cards, it can still get better, but this is another discussion[/quote]When I get some performance data. I can't chart things I don't know about. If people want to send me benchmarks I will make room for AMD on the [strike]cuda[/strike]lucas page.
[QUOTE=LaurV;367669]If he wants to see some graph, he clicks on his card, then he can see the graph, exactly as it is now, but with the clicked card in a different color (like blue, instead of the red/green used now).[/QUOTE]Implemented as suggested. |
[QUOTE=James Heinrich;367671]When I get some performance data. I can't chart things I don't know about. If people want to send me benchmarks I will make room for AMD on the [strike]cuda[/strike]lucas page.
Implemented as suggested.[/QUOTE] I think (only my opinion) the CUDALucas benchmark numbers look better with FFT length(1024K, 2048K, etc) instead of rough exponent size. |
@James: Thanks. Now, it looks perfect.
@kracker: they translate straight from one to the other, you just need a table (or cheat-sheet), or use an excel with a formula. IIRC, there is a link in James' page to a calculator, also. Showing FFT instead of exponents will favor AMD cards if only powers of 2 are chosen, or will favor nVidia, if not. I like the actual format (i.e. show the exponent, most users have no idea what that "FFT" is, not all gimps participants are math hobbysts). |
[QUOTE=LaurV;367677]@James: Thanks. Now, it looks perfect.
@kracker: they translate straight from one to the other, you just need a table (or cheat-sheet), or use an excel with a formula. IIRC, there is a link in James' page to a calculator, also. Showing FFT instead of exponents will favor AMD cards if only powers of 2 are chosen, or will favor nVidia, if not. I like the actual format (i.e. show the exponent, most users have no idea what that "FFT" is, not all gimps participants are math hobbysts).[/QUOTE] I see. Well, I just jotted down those numbers from my head just to give a example :razz: |
| All times are UTC. The time now is 22:30. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.