![]() |
Hi,
some CUDA internal functions (cudaHostAlloc()?) fail if, well don't know exactly, physical memory addresses are above 1TB? I see issues on big iron, pinning CUDA process to lowest available NUMA-node and it runs fine... pin to another NUMA-node (which memory range is above 1TB) and it fails immediately. Even worse: I see silent data cooruption when process is moved from lower to higher adresses in this case. Oliver |
help needed
Hi,
anyone able to build Windows binaries with CUDA 6.0 or 6.5? Currently I don't feel like f*cking up my Windows installation with Visual Studio or so. It would be nice if someone[LIST][*](shortterm/once) build CUDA 6.x binaries of mfaktc 0.20[*](longterm(?)/repeated) build future mfaktc binaries, including pre-releases and testing[/LIST] If you want to/can help (both cases): Just drop me a note. Oliver |
check the " cant run on 980' thread, there is a compiled version. My 750 TI work with it.
|
[QUOTE=TheJudger;385355]Hi,
anyone able to build Windows binaries with CUDA 6.0 or 6.5? Currently I don't feel like f*cking up my Windows installation with Visual Studio or so. It would be nice if someone[LIST][*](shortterm/once) build CUDA 6.x binaries of mfaktc 0.20[*](longterm(?)/repeated) build future mfaktc binaries, including pre-releases and testing[/LIST] If you want to/can help (both cases): Just drop me a note. Oliver[/QUOTE] Hey Oliver, I should be available to compile shortterm/longterm for you. I don't always track the forums anymore, so you can PM me. I see those quickly. I compiled mfaktc .20 CUDA 6.5 Win 32 and Win 64 I used "[URL="https://developer.nvidia.com/cuda-downloads-geforce-gtx9xx"]CUDA 6.5 Production Release with Support for GeForce GTX9xx GPUs[/URL]" Currently supported compilation architectures included ([URL="http://docs.nvidia.com/cuda/pdf/CUDA_Compiler_Driver_NVCC.pdf"]in NVCC[/URL]) (and in this build) are: virtual architectures: compute_11, compute_12, compute_13, compute_20, compute_30, computer_37, compute_32, compute_35, compute_50, compute_52; and GPU architectures: sm_11, sm_12, sm_13, sm_20, sm_21, sm_30, sm_32, sm_37, sm_35, sm_50, sm_52. compute_37, sm_37 is not documented, but it's supported so I include it in the build I emailed the build to you for upload, if you need anything else built, let me know. Jerry edit: if anyone need the latest lib files for mfaktc, see [URL="https://sourceforge.net/projects/cudalucas/files/CUDA%20Libs/"]here[/URL] |
Hi Jerry,
thank you for your help. :smile: [QUOTE=flashjh;385410]edit: if anyone need the latest lib files for mfaktc, see [URL="https://sourceforge.net/projects/cudalucas/files/CUDA%20Libs/"]here[/URL][/QUOTE] I'll include these libs in the zipfile just I did before. So it will be "all inclusive". Oliver |
Hi all,
[B][U]thanks to Jerry[/U][/B] we have mfaktc 0.20 compiled with CUDA 6.5 for Windows now. You can find it [URL="http://www.mersenneforum.org/mfaktc/mfaktc-0.20/"]here[/URL] - [URL="http://www.mersenneforum.org/mfaktc/mfaktc-0.20/mfaktc-0.20.win.cuda65.zip"]mfaktc-0.20.win.cuda65.zip[/URL][LIST][*][B]will run on [I]Maxwell[/I] GPUs, e.g. GTX 750 (Ti), GTX 970 and GTX 980[/B][*]code is unchanged to the CUDA 4.2 version, just recompiled (and enabled code generation for [I]Maxwell[/I] GPUs)[*]will read savefiles from the CUDA 4.2 version[*]there is no need to upgrade if the CUDA 4.2 version is running fine for you[/LIST] I think you need nvidia 340 series driver or newer. Oliver |
3 Attachment(s)
James, you have under estimated GTX 750 ti troughput by at least a third.
This is with default setting |
[QUOTE=firejuggler;385533]James, you have under estimated GTX 750 ti troughput by at least a third.[/QUOTE]Please send me [url=http://www.mersenne.ca/mfaktc.php#benchmark]a benchmark[/url], including both that GPU-Z screenshot as well as one from the Sensors tab (the only place I've found where the proper boosted clock speed is displayed).
I did have the wrong GFLOPS value for the 750ti, so thanks for catching there was a problem. But I still lack sufficient benchmarks for Compute 5.0 (and 5.2) cards. Based on the single benchmark I have (and my now-corrected data) I suspect your card is running at ~1220MHz. Your benchmark will help refine my data. |
Benchmark sent. 1176 Mhz it seems.
|
[QUOTE=James Heinrich;385539]Please send me [URL="http://www.mersenne.ca/mfaktc.php#benchmark"]a benchmark[/URL], including both that GPU-Z screenshot as well as one from the Sensors tab (the only place I've found where the proper boosted clock speed is displayed).
I did have the wrong GFLOPS value for the 750ti, so thanks for catching there was a problem. But I still lack sufficient benchmarks for Compute 5.0 (and 5.2) cards. Based on the single benchmark I have (and my now-corrected data) I suspect your card is running at ~1220MHz. Your benchmark will help refine my data.[/QUOTE] I've sent a benchmark for a GTX 970 (two submissions because of a typo in the bitlevel field. 71-72 is the right one. 71-73 is wrong). |
decission(s)
Hi all,
after a longer break I started CUDA coding (mfaktc) again. For mfaktc 0.21 some decissions have to be made. Enabling GPU sieve on lower factor sizes AND (relativ low) exponents are problematic, see additional information [URL="http://www.mersenneforum.org/showpost.php?p=363143&postcount=2290"]here[/URL]. For mfakt[B]c[/B] 0.20 this isn't a real issue (it can miss [B][U]composite[/U][/B] factors of relativ low exponents, depending on GPUSievePrimes). It affects only composite factors because lower FC size limit is 2[SUP]64[/SUP] for GPU sieving in mfaktc 0.20. AFAIK mfakt[B]o[/B] has the same problems. In mfaktc 0.21-preX I've generalized GPU sieving code so it is very easy to addept GPU sieving for (allmost) all kernels. So I did for "75bit_mul32" -> "75bit_mul32_gs" and "95bit_mul32" -> "95bit_mul32_gs". Those kernels can handle FCs starting at very low numbers, in mfaktc 0.20 this is 2kp+1 where p is the minimum exponent (1.000.000) so those kernels handle FCs starting at ~2.000.000. Here starts the problem (remember [URL="http://www.mersenneforum.org/showpost.php?p=363143&postcount=2290"]post #2290[/URL]!). GPU sieving supports incredible high values of GPUSievePrimes. With GPUSievePrimes around 149.000 the prime base will contain primes up to a bit above 2.000.000. A good example is [URL="http://www.mersenne.org/report_exponent/?exp_lo=1000151&exp_hi="]M1.000.151[/URL], mfaktc 0.20 will miss the [B][U]composite[/U][/B] factor 1285410593336863915299551 (2000303 * 642607941565284817) when GPU sieving is used AND GPUSievePriemes set above ~148.000. For mfaktc 0.21 I plan to support exponents as low as M100.000 (even if not really usefull for GIMPS finding Mersenne primes it was requested a couple of times). This (and the fact that GPU sieving is enabled for kernels which can TF below 2[SUP]64[/SUP]) causes that mfaktc [B]0.21-preX[/B] currently misses [B][U]prime[/U][/B] factors (on low exponents), depending on the setting of GPUSievePrimes. The simple [I]trick[/I] for CPU sieve (description in [URL="http://www.mersenneforum.org/showpost.php?p=363143&postcount=2290"]post #2290[/URL]) isn't possible that easy for GPU sieving because GPU sieve works with prime distances instead of absolut values (prime gaps stored in 7 bit integers). Remove primes out of the prime base might overflow the prime gaps! I don't say that it is impossible but I don't want to spent (much) time on [I]not-useful-for-GIMPS[/I] features, which, on the other hand, might complicate that code and introduce possible bugs now and in the future. Possible other solutions: [LIST=1][*]Keep exponent minimum at 1.000.000 for GPU sieving AND limit GPUSievePrimes to ~140.000. - simplest, but not smart solution[*]Require min factor size of e.g. 2[SUP]40[/SUP] for GPU sieve enabled kernels (if they, in theory, support lower FCs). - simple but will miss [B][U]composite[/U][/B] factors. [B]I don't like this.[/B][*]dynamical lower GPUSievePrimes - leads into other problems, lower GPUSievePrimes needs more shared memory on GPU. (check [URL="http://www.mersenneforum.org/showpost.php?p=365377&postcount=2302"]post #2302[/URL]) I don't really like this solution.[*]Depending in user setting of GPUSievePrimes calculate minimum valid exponent size for GPU kernels.[LIST][*]currently [B]my prefered solution[/B][*]no additional code in GPU kernels (performance critical code), only some code on mfaktc startup and check exponent size before entering performance critical code. So for future optimizations in the code no need to think about this corner case![*]how to handle if exponent is too low for GPU sieving/GPUSievePrimes setting (but valid for CPU sieving)?[LIST=A][*]write a WARNING (include hint that lowering GPUSievePrimes might be an option) and use CPU sieve[*]write a WARNING (include hint that lowering GPUSievePrimes might be an option) and ignore the assignment[*]write an ERROR (include hint that lowering GPUSievePrimes might be an option) and exit[/LIST][/LIST][/LIST] Because users that do TF on low exponents [B]should[/B] know what they are doing I prefer 4C for now! Currenty Wavefront (TF around M70.000.000+) is not affected. Comments? Oliver P.S. I really prefer 4C, if you want an other solution this solution must be clever/smart and feasible and/or you need good arguments against 4C! |
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.