mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

S34960zz 2011-03-17 14:11

mfaktc and dual GPU (SLI Mode)
 
TheJudger,

Does mfaktc support multiple GPUs?

Looking at:
A. NVIDIA GeForce GTX 460 (2GB), SLI Mode (Dual Cards), each with cc=2.1, 7 multiprocessors, 336 CUDA cores.
B. NVIDIA GeForce GTX 580 (3GB), SLI Mode (Dual Cards), each with cc=2.0, 16 multiprocessors, 512 CUDA cores.

This is mostly a pipe dream, but wondered how mfaktc handles the situation.

TheJudger 2011-03-17 15:14

[QUOTE=S34960zz;255414]TheJudger,

Does mfaktc support multiple GPUs?

Looking at:
A. NVIDIA GeForce GTX 460 (2GB), SLI Mode (Dual Cards), each with cc=2.1, 7 multiprocessors, 336 CUDA cores.
B. NVIDIA GeForce GTX 580 (3GB), SLI Mode (Dual Cards), each with cc=2.0, 16 multiprocessors, 512 CUDA cores.

This is mostly a pipe dream, but wondered how mfaktc handles the situation.[/QUOTE]

Depends what "multiple GPUs" means for you:
Single instance of mfaktc using multiple GPUs: no
Multiple instances of mfaktc different GPUs: yes, of course!

Run [I]mfaktc.exe -h[/I] to see the help.
For each instance of mfaktc you can specify which GPU it should use.

Btw. GTX 580 is a beast, I think you'll need at least ~8GHz of CPU power (e.g. 3 cores running at 2.66GHz) to feed a single GTX 580.

Oliver

nucleon 2011-03-17 23:54

Even more GHz CPU is needed with the new version. Again your mileage may var depending on exponent and bit depth.

On my GTX580 & i7-2600k@4.5GHz with a 100M exponent bit depth 70-71, I get 99% GPU usage with 2x instances with seiveprimes=5300. With bit depth 69-70 or lower, I get sub 99% GPU usage.

So that's 9GHz of sandy bridge level CPU grunt at it.

What's interesting for me are playing with the numstreams/cpustreams options.
[CODE]
[Sat Mar 12 09:42:30 2011]
M100002251 completed P-1, B1=1090000, B2=21527500, We4: 1A460633, AID: C2C6E70104B1EB9685D0382901771763
[Mon Mar 14 11:29:05 2011]
M100001219 completed P-1, B1=1090000, B2=21527500, We4: 1A430636, AID: B56D1F25E07FB380D91E753EB8415443
[Wed Mar 16 09:56:35 2011]
M100001569 completed P-1, B1=1090000, B2=21527500, We4: 1A6B063A, AID: B57295946863E577DEDF8A14DE9F496B
[/CODE]

The P-1 test was done on another core. The P-1 test took longer with the options set to 10/5 than 5/3 (Numstreams/CPUstreams).

-- Craig

TheJudger 2011-03-18 23:11

Hi,

for those who are interessted in, here is the problematic code from mfaktc 0.16:
tf_barrett92.cu line 989+
[CODE] ff= (float)f.d2;
ff= ff * 4294967296.0f + (float)f.d1; // f.d0 ingored because lower limit for this kernel are 64 bit which yields at least 32 significant digits without f.d0!

ff=__int_as_float(0x3f7ffffb) / ff; // just a little bit below 1.0f so we allways underestimate the quotient

tmp192.d4 = 0xFFFFFFFF; // tmp is nearly 2^(80*2)
tmp192.d3 = 0xFFFFFFFF;
tmp192.d2 = 0xFFFFFFFF;
tmp192.d1 = 0xFFFFFFFF;
tmp192.d0 = 0xFFFFFFFF;

#ifndef CHECKS_MODBASECASE
div_160_96(&u,tmp192,f,ff); // u = floor(2^(80*2) / f)
#else
[/CODE]

tf_barrett92.cu line 529+:
[CODE] qf= (float)q.d4;
#endif
qf*= 2097152.0f;
[/CODE]

where q.d4 is tmp192.d4 from above, both q.d* and tmp192.d* are declared as unsigned int. This is the PTX code from generated with the CUDA toolkit 3.2:
[CODE] mov.f32 %f3, 0f4f800000; // 4.29497e+09
fma.rn.f32 %f4, %f3, %f2, %f1;
mov.f32 %f5, 0f3f7ffffb; // 1
div.rn.f32 %f6, %f5, %f4;
mov.f32 %f7, 0f5a000000; // 9.0072e+15
mul.f32 %f8, %f6, %f7;
[/CODE]
and here the same with CUDA toolkit 3.1:
[CODE] mov.f32 %f3, 0f4f800000; // 4.29497e+09
fma.rn.f32 %f4, %f3, %f2, %f1;
mov.f32 %f5, 0f3f7ffffb; // 1
div.rn.f32 %f6, %f5, %f4;
mov.f32 %f7, 0fca000000; // -2.09715e+06
mul.f32 %f8, %f6, %f7;
[/CODE]

So register r7 should contain 4294967295 * 2097152.0f = 9007199252643840 ~= 9.0072e+15

Oliver

TheJudger 2011-03-19 15:21

mfaktc 0.16p1
 
1 Attachment(s)
Hello,

find attached mfaktc 0.16p1. Compared to 0.16 there is only one fix (workaround) for the compiler bug reported in the CUDA toolkit 3.0/3.1.

There is a known bug in 0.16 / 0.16p1 (reported by James Heinrich): if the run was restarted than the "estimated total time spent" reported at the end of the run can be wrong.
[CODE]tf(): time spent since restart: 2d 10h 46m 12.156s
estimated total time spent: 0.000s[/CODE]

Oliver

TheJudger 2011-03-19 15:24

1 Attachment(s)
Windows 64bit executable
If it reports a missing cudart64_32_16.dll you can
- install the CUDA toolkit
- put this dll into the mfaktc directory (prefered solution) [url]http://www.mersenneforum.org/showpost.php?p=255068&postcount=632[/url]

TheJudger 2011-03-19 15:28

1 Attachment(s)
Windows 32bit executable
If it reports a missing cudart32_32_16.dll you can
- install the CUDA toolkit
- put this dll into the mfaktc directory (prefered solution) [url]http://www.mersenneforum.org/showpost.php?p=255327&postcount=658[/url]

moebius 2011-03-20 01:43

[QUOTE=TheJudger;256102]Hello,

find attached mfaktc 0.16p1.[/CODE]Oliver[/QUOTE]

I tried to compile for my new Gigabyte GTX560 TI OC and got this:

Ubuntu 10.10 64bit / Cuda Toolkit 3.2 / gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5) /NVIDIA-Linux-x86_64-270.26


~/mfaktc-0.16p1/src$ make
nvcc -I/usr/local/cuda/include/ --ptxas-options=-v --generate-code arch=compute_11,code=sm_11 --generate-code arch=compute_20,code=sm_20 --compiler-options=-Wall -c tf_72bit.cu -o tf_72bit.o
gcc: error trying to exec 'cc1plus': execvp: No such file or directory
make: *** [tf_72bit.o] Error 1


Any Ideas?

Edit: no more help needed "sudo apt-get install g++" was the solution

moebius 2011-03-20 02:57

1 Attachment(s)
wow

nucleon 2011-03-20 03:21

[QUOTE=moebius;256140]wow[/QUOTE]

What CPU?

I'm getting 160+M/sec throughput on my 460GTX/core i7-930@2.8GHz. (using 2x instances with 100M exponents bit depth 2^66-67)


-- Craig

moebius 2011-03-20 03:30

Phenom II 955 BE @3.2Ghz but my CPU frequency alternates from 800 MHZ to 3200 Mhz .Maybe it is a thermal problem or caused by my insufficent power supply.


All times are UTC. The time now is 23:06.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.