![]() |
mfaktc and dual GPU (SLI Mode)
TheJudger,
Does mfaktc support multiple GPUs? Looking at: A. NVIDIA GeForce GTX 460 (2GB), SLI Mode (Dual Cards), each with cc=2.1, 7 multiprocessors, 336 CUDA cores. B. NVIDIA GeForce GTX 580 (3GB), SLI Mode (Dual Cards), each with cc=2.0, 16 multiprocessors, 512 CUDA cores. This is mostly a pipe dream, but wondered how mfaktc handles the situation. |
[QUOTE=S34960zz;255414]TheJudger,
Does mfaktc support multiple GPUs? Looking at: A. NVIDIA GeForce GTX 460 (2GB), SLI Mode (Dual Cards), each with cc=2.1, 7 multiprocessors, 336 CUDA cores. B. NVIDIA GeForce GTX 580 (3GB), SLI Mode (Dual Cards), each with cc=2.0, 16 multiprocessors, 512 CUDA cores. This is mostly a pipe dream, but wondered how mfaktc handles the situation.[/QUOTE] Depends what "multiple GPUs" means for you: Single instance of mfaktc using multiple GPUs: no Multiple instances of mfaktc different GPUs: yes, of course! Run [I]mfaktc.exe -h[/I] to see the help. For each instance of mfaktc you can specify which GPU it should use. Btw. GTX 580 is a beast, I think you'll need at least ~8GHz of CPU power (e.g. 3 cores running at 2.66GHz) to feed a single GTX 580. Oliver |
Even more GHz CPU is needed with the new version. Again your mileage may var depending on exponent and bit depth.
On my GTX580 & i7-2600k@4.5GHz with a 100M exponent bit depth 70-71, I get 99% GPU usage with 2x instances with seiveprimes=5300. With bit depth 69-70 or lower, I get sub 99% GPU usage. So that's 9GHz of sandy bridge level CPU grunt at it. What's interesting for me are playing with the numstreams/cpustreams options. [CODE] [Sat Mar 12 09:42:30 2011] M100002251 completed P-1, B1=1090000, B2=21527500, We4: 1A460633, AID: C2C6E70104B1EB9685D0382901771763 [Mon Mar 14 11:29:05 2011] M100001219 completed P-1, B1=1090000, B2=21527500, We4: 1A430636, AID: B56D1F25E07FB380D91E753EB8415443 [Wed Mar 16 09:56:35 2011] M100001569 completed P-1, B1=1090000, B2=21527500, We4: 1A6B063A, AID: B57295946863E577DEDF8A14DE9F496B [/CODE] The P-1 test was done on another core. The P-1 test took longer with the options set to 10/5 than 5/3 (Numstreams/CPUstreams). -- Craig |
Hi,
for those who are interessted in, here is the problematic code from mfaktc 0.16: tf_barrett92.cu line 989+ [CODE] ff= (float)f.d2; ff= ff * 4294967296.0f + (float)f.d1; // f.d0 ingored because lower limit for this kernel are 64 bit which yields at least 32 significant digits without f.d0! ff=__int_as_float(0x3f7ffffb) / ff; // just a little bit below 1.0f so we allways underestimate the quotient tmp192.d4 = 0xFFFFFFFF; // tmp is nearly 2^(80*2) tmp192.d3 = 0xFFFFFFFF; tmp192.d2 = 0xFFFFFFFF; tmp192.d1 = 0xFFFFFFFF; tmp192.d0 = 0xFFFFFFFF; #ifndef CHECKS_MODBASECASE div_160_96(&u,tmp192,f,ff); // u = floor(2^(80*2) / f) #else [/CODE] tf_barrett92.cu line 529+: [CODE] qf= (float)q.d4; #endif qf*= 2097152.0f; [/CODE] where q.d4 is tmp192.d4 from above, both q.d* and tmp192.d* are declared as unsigned int. This is the PTX code from generated with the CUDA toolkit 3.2: [CODE] mov.f32 %f3, 0f4f800000; // 4.29497e+09 fma.rn.f32 %f4, %f3, %f2, %f1; mov.f32 %f5, 0f3f7ffffb; // 1 div.rn.f32 %f6, %f5, %f4; mov.f32 %f7, 0f5a000000; // 9.0072e+15 mul.f32 %f8, %f6, %f7; [/CODE] and here the same with CUDA toolkit 3.1: [CODE] mov.f32 %f3, 0f4f800000; // 4.29497e+09 fma.rn.f32 %f4, %f3, %f2, %f1; mov.f32 %f5, 0f3f7ffffb; // 1 div.rn.f32 %f6, %f5, %f4; mov.f32 %f7, 0fca000000; // -2.09715e+06 mul.f32 %f8, %f6, %f7; [/CODE] So register r7 should contain 4294967295 * 2097152.0f = 9007199252643840 ~= 9.0072e+15 Oliver |
mfaktc 0.16p1
1 Attachment(s)
Hello,
find attached mfaktc 0.16p1. Compared to 0.16 there is only one fix (workaround) for the compiler bug reported in the CUDA toolkit 3.0/3.1. There is a known bug in 0.16 / 0.16p1 (reported by James Heinrich): if the run was restarted than the "estimated total time spent" reported at the end of the run can be wrong. [CODE]tf(): time spent since restart: 2d 10h 46m 12.156s estimated total time spent: 0.000s[/CODE] Oliver |
1 Attachment(s)
Windows 64bit executable
If it reports a missing cudart64_32_16.dll you can - install the CUDA toolkit - put this dll into the mfaktc directory (prefered solution) [url]http://www.mersenneforum.org/showpost.php?p=255068&postcount=632[/url] |
1 Attachment(s)
Windows 32bit executable
If it reports a missing cudart32_32_16.dll you can - install the CUDA toolkit - put this dll into the mfaktc directory (prefered solution) [url]http://www.mersenneforum.org/showpost.php?p=255327&postcount=658[/url] |
[QUOTE=TheJudger;256102]Hello,
find attached mfaktc 0.16p1.[/CODE]Oliver[/QUOTE] I tried to compile for my new Gigabyte GTX560 TI OC and got this: Ubuntu 10.10 64bit / Cuda Toolkit 3.2 / gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5) /NVIDIA-Linux-x86_64-270.26 ~/mfaktc-0.16p1/src$ make nvcc -I/usr/local/cuda/include/ --ptxas-options=-v --generate-code arch=compute_11,code=sm_11 --generate-code arch=compute_20,code=sm_20 --compiler-options=-Wall -c tf_72bit.cu -o tf_72bit.o gcc: error trying to exec 'cc1plus': execvp: No such file or directory make: *** [tf_72bit.o] Error 1 Any Ideas? Edit: no more help needed "sudo apt-get install g++" was the solution |
1 Attachment(s)
wow
|
[QUOTE=moebius;256140]wow[/QUOTE]
What CPU? I'm getting 160+M/sec throughput on my 460GTX/core i7-930@2.8GHz. (using 2x instances with 100M exponents bit depth 2^66-67) -- Craig |
Phenom II 955 BE @3.2Ghz but my CPU frequency alternates from 800 MHZ to 3200 Mhz .Maybe it is a thermal problem or caused by my insufficent power supply.
|
| All times are UTC. The time now is 23:06. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.