mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

amphoria 2010-12-06 18:07

mfaktc for win32
 
Oliver has re-written calculate_k() to remove the cast from long double to unsigned long long. The win32 compile has now passed the full self-test and the executable is available here.

[url]http://www.sendspace.com/file/fmvw6m[/url]

Dave

Surge 2010-12-07 02:59

I've downloaded it, but I'm a noob so I don't know how I am supposed to use it.

Here is the output I got when I ran it:
[CODE]
mfaktc v0.13p1-Win
Compiletime options
THREADS_PER_GRID_MAX 1048576
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
VERBOSE_TIMING disabled
MORE_CLASSES enabled
Runtime options
SievePrimes 5000
SievePrimesAdjust 1
NumStreams 3
CPUStreams 3
WorkFile worktodo.txt
Checkpoints enabled
Stages enabled
StopAfterFactor bitlevel
CUDA device info
name Quadro FX 3700M
compute capability 1.1
maximum threads per block 512
number of multiprocessors 16 (128 shader cores)
clock rate 767MHz
CUDA version info
binary compiled for CUDA 3.10
CUDA driver version 3.20
CUDA runtime version 3.10
Automatic parameters
threads per grid 1048576
running a simple selftest...
Selftest statistics
number of tests 31
successfull tests 31
selftest PASSED!
ERROR: get_next_assignment(): can't open "worktodo.txt"
C:\Users\user1\Desktop\mfa>mfa
mfaktc v0.13p1-Win
Compiletime options
THREADS_PER_GRID_MAX 1048576
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
VERBOSE_TIMING disabled
MORE_CLASSES enabled
Runtime options
SievePrimes 5000
SievePrimesAdjust 1
NumStreams 3
CPUStreams 3
WorkFile worktodo.txt
Checkpoints enabled
Stages enabled
StopAfterFactor bitlevel
CUDA device info
name Quadro FX 3700M
compute capability 1.1
maximum threads per block 512
number of multiprocessors 16 (128 shader cores)
clock rate 767MHz
CUDA version info
binary compiled for CUDA 3.10
CUDA driver version 3.20
CUDA runtime version 3.10
Automatic parameters
threads per grid 1048576
running a simple selftest...
Selftest statistics
number of tests 31
successfull tests 31
selftest PASSED!
ERROR: get_next_assignment(): Cannot read file or file empty
[/CODE]

What do I put in the worktodo file?


EDIT:

This is what I have in my prime95 worktodo file:
[CODE]
[Worker #1]
Test=F244F2A93332245B011BD2B189F7E9D9,45321743,70,1
[/CODE]

So I put the following in the worktodo file for mfaktc:
[CODE]
Factor=F244F2A93332245B011BD2B189F7E9D9,45321743,1,70
[/CODE]

Now it seems to be doing something.

So is mfaktc now working on the same thing that prime95 was working on and can I now use mfaktc instead of prime95?


Is the correct way to pause it just pressing the pause-break button?

vsuite 2010-12-07 03:42

How to set up please
 
Windows Vista
Core 2 Duo Notebook
Nvidia 8400M GS
2GB Ram

Downloaded windows32 mfaktc
extracted all 3 files to directory
upon execution: CudaSetDevice(0) failed

Same result if I put the files in the prime95 directory

Or after installing NVIDIA CUDA toolkit

How to set up please?

TheJudger 2010-12-07 10:10

1 Attachment(s)
Hello,

thank you for the Win32 binary, Dave! :smile:
For those who are interessted in, I've attached the modified code. If you're allready running mfaktc 0.13 there is no reason to upgrade to 0.13p1. This change will be in future releases, too. The fix was easy so I decided to provide a patched version 0.13, I didn't want to wait until 0.14 which might take some more time (yes, 0.14 is in work currently, a few percent faster in some cases. :smile: But it will take some weeks until I've finished and tested 0.14)


[QUOTE=Surge;240440]
So is mfaktc now working on the same thing that prime95 was working on and can I now use mfaktc instead of prime95?


Is the correct way to pause it just pressing the pause-break button?[/QUOTE]

Yep, know your running TF on your GPU. Keep in mind that you need to report your results manually (via manual result checkin on [url]www.mersenne.org[/url], be sure to login with your user account before submitting results). You can get TF work manually there, too.

About pause... I don't know, but if it pauses when you hit the break button it should be OK. You can try to run the full selftest (mfaktc.exe -st) and press break again and again for a safety check...



[QUOTE=vsuite;240445]Windows Vista
Core 2 Duo Notebook
Nvidia 8400M GS
2GB Ram

Downloaded windows32 mfaktc
extracted all 3 files to directory
upon execution: CudaSetDevice(0) failed

Same result if I put the files in the prime95 directory

Or after installing NVIDIA CUDA toolkit

How to set up please?[/QUOTE]

I think you don't need to install the CUDA toolkit, but you'll need a proper driver. Perhaps your driver doesn't support CUDA?

As a site note: I'm not sure it is worth running mfaktc on a 8400M GS.
"one core of your CPU + your 8400M GS + mfaktc" should reveal similar speed than "one core of your CPU + prime95" for TF.

Oliver

ATH 2010-12-07 21:42

Can anyone write a guide how to compile mfaktc for windows 64bit ? I have some experience compiling GMP and GMP-ECM with Mingw/Msys (32bit) and little experience doing it with Mingw64.

amphoria 2010-12-07 22:31

[QUOTE=ATH;240565]Can anyone write a guide how to compile mfaktc for windows 64bit ? I have some experience compiling GMP and GMP-ECM with Mingw/Msys (32bit) and little experience doing it with Mingw64.[/QUOTE]

If you download the source code, I added some text to the README.txt file describing how I compile for win-64. Ignore the comment about performance issues on newer drivers as this no longer applies. I guess I ought to send some amended test to Oliver :smile:

moebius 2010-12-07 22:58

[QUOTE=vsuite;240445]
Nvidia 8400M GS
2GB RaM
[/QUOTE]

MSI NX8600 GT 256 MB DDR3 RAM
2 GB RAM

At me just the same, but mfactc 0.02 running without problems.


CudaSetDevice(0) failed

ATH 2010-12-08 00:16

I couldn't compile it with Mingw64, but I succeeded with Visual Studio, and it passes all selftests.

I tried to remove the line:
[CODE]else if(!isprime(exp)) printf("WARNING: exponent is not prime! Ignoring this assignment!\n");[/CODE]

but it crashes when I try to factor a composite exponent, specifically M(p[sup]2[/sup]).

vsuite 2010-12-08 04:02

[QUOTE]I think you don't need to install the CUDA toolkit, but you'll need a proper driver. Perhaps your driver doesn't support CUDA?

As a site note: I'm not sure it is worth running mfaktc on a 8400M GS.
"one core of your CPU + your 8400M GS + mfaktc" should reveal similar speed than "one core of your CPU + prime95" for TF.

Oliver[/QUOTE]
When I tested with GPU Caps Viewer, CUDA was said to be supported and some OpenCL, etc graphics demos ran, so I thought everything was ready for CUDA. Turns out I needed to install the CUDA driver (normal graphics driver - my original drivers were graphics-only no compute or PhysX). Then Voila, it worked.

I have 8400 GSs on two Core-2 Quad (Q6600 - 2.4GHz) Windows32 desktops (XP & 7), and a 8400M GS on a 2.4GHz Core-2 Duo notebook.

I tested the Windows 7 desktop and the notebook. Strangely, the notebook is slightly faster on Prime95 than the desktop (1 thread, same clock speed), but the desktop is faster on mfaktc than the notebook (due presumably to the Shader speed differential).

RivaTuner reports the desktop speeds to be: Core 459, Shader 918 (double), Memory 333.

I bumped the Core and Shader up to 690/1380 (max, range 230-690/460-1380), and it ran mfaktc faster but then rebooted. When I decoupled the Core and Shader clocks, pushing the Core only to 480, and pushing the Shader to 1380, mfaktc ran seemingly proportionally faster. Memory and Core speed did not seem to affect mfaktc speed.

I tested with a small number: 1096267, 57, 58 and mfaktc was slower than prime95. The notebook initially took 16 minutes running with CPU performance limited to 50% for battery life, but mfaktc took almost 18 minutes. When CPU performance was allowed to reach 100%, prime95 time dropped to around 8 minutes, but mfaktc took over 17 minutes.

The desktop CPU was almost 1 minute slower at around 9 minutes, but mfaktc took 15 minutes 8 seconds. After overclocking the GPU, mfaktc took 10 minutes 6 seconds.

So faster GPUs are definitely needed.



Oliver, Dave

Great work. Since mfaktc is GPU, not CPU bound, does the CPU spend a lot of time actively waiting on the slower GPUs? Can mfaktc perform some of the TF on the CPU, if a slower GPUs is being used, to use the wait cycles more efficiently or can the wait be made passive not active, so as to reduce the CPU load from the spin wait?

vsuite 2010-12-08 04:45

Is mfaktc faster on a 64-bit OS than a 32-bit OS and if so, why? Does CUDA operate faster?

Surge 2010-12-08 05:07

[QUOTE=TheJudger;240474]
Yep, know your running TF on your GPU. Keep in mind that you need to report your results manually (via manual result checkin on [URL="http://www.mersenne.org/"][COLOR=#0066cc]www.mersenne.org[/COLOR][/URL], be sure to login with your user account before submitting results). You can get TF work manually there, too.

About pause... I don't know, but if it pauses when you hit the break button it should be OK. You can try to run the full selftest (mfaktc.exe -st) and press break again and again for a safety check...
[/QUOTE]

Thanks, the full selftest passed.

Can someone tell me what to put in my worktodo file for prime95 and for mfaktc so I can compare the speeds?

One more thing, is there a way to get mfaktc to use more than 50% of my CPU?

Here are my specs:
SP9400 (mobile C2D @2.4ghz) + FX 3700M (550 core, 799 mem, 1375 shader).


All times are UTC. The time now is 23:01.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.