mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

James Heinrich 2011-11-28 15:33

[code] class | candidates | time | avg. rate | SievePrimes | ETA | CPU wait
2367/4620 | 9.99G | 326.41s | 30.62M/s | 200000 | 1d18h | 10.31%[/code]Is the "candidates" column capped at <10G, or just by chance that it's been sitting at that suspicious number for the duration of my current assignment?

TheJudger 2011-11-28 17:53

Hi James,

I guess just some luck with your assignment.

Oliver

Dubslow 2011-11-28 19:22

Geez, what is that, 74 bits? I'm doing 50M from 69-72 and have 2.24G classes...

James Heinrich 2011-11-28 20:02

[QUOTE=Dubslow;280249]Geez, what is that, 74 bits? I'm doing 50M from 69-72 and have 2.24G classes...[/QUOTE][code]got assignment: exp=595900037 bit_min=78 bit_max=79
Starting trial factoring M595900037 from 2^78 to 2^79
k_min = 253592411590980
k_max = 507184823188150
Using GPU kernel "barrett79_mul32"[/code]

KyleAskine 2011-11-28 22:03

[QUOTE=James Heinrich;280209][code] class | candidates | time | avg. rate | SievePrimes | ETA | CPU wait
2367/4620 | 9.99G | 326.41s | 30.62M/s | 200000 | 1d18h | 10.31%[/code]Is the "candidates" column capped at <10G, or just by chance that it's been sitting at that suspicious number for the duration of my current assignment?[/QUOTE]

You could force sieve primes down just a hair and see if the candidates goes up :smile:

Dubslow 2011-11-29 05:02

[QUOTE=kladner;279843]
cmd.exe /k "start /b /low /affinity 0x20 mfaktc-win-64.exe"

cmd.exe /k "start /b /low /affinity 0x10 mfaktc-win-64.exe"

cmd.exe /k "start /b /low /affinity 0x08 mfaktc-win-64.exe"
[/QUOTE]
Hey kladner,
First off, I heard Chicago's already gotten snow. True? (If so, not fair!)
Now: You said the /k makes sure each prompt gets its own window, and /b pauses the prompt if mfaktc stops running?

If I'm only running one instance, I can drop the /k, right?

And is cmd.exe recognized even if I'm running this command from the mfaktc folder (or in my case shortcut in the mfaktc folder)?

LaurV 2011-11-29 06:47

you got them a bit viceversa. Each line contains two commands, which - by default - will open their own windows. Without the switches you could end up in having 2 windozes for each command line, totally 6 for your example, and they would close and disappear when the most inner task finishes. Use the help:

[CODE]
c:\>help cmd
Starts a [B][COLOR=Red]new instance[/COLOR][/B] of the Windows XP command interpreter

CMD [/A | /U] [/Q] [/D] [/E:ON | /E:OFF] [/F:ON | /F:OFF] [/V:ON | /V:OFF]
[[/S] [/C | /K] string]

/C Carries out the command specified by string and then terminates
/K [COLOR=Red]Carries out the command specified by string[B] but remains[/B][/COLOR]
/S Modifies the treatment of string after /C or /K (see below)
/Q Turns echo off
/D Disable execution of AutoRun commands from registry (see below)
/A Causes the output of internal commands to a pipe or file to be ANSI
/U Causes the output of internal commands to a pipe or file to be
Unicode
/T:fg Sets the foreground/background colors (see COLOR /? for more info)
/E:ON Enable command extensions (see below)
/E:OFF Disable command extensions (see below)
/F:ON Enable file and directory name completion characters (see below)
/F:OFF Disable file and directory name completion characters (see below)
/V:ON Enable delayed environment variable expansion using ! as the
delimiter. For example, /V:ON would allow !var! to expand the
variable var at execution time. The var syntax expands variables
at input time, which is quite a different thing when inside of a FOR
loop.
/V:OFF Disable delayed environment expansion.

Note that multiple commands separated by the command separator '&&'
are accepted for string if surrounded by quotes. Also, for compatibility
reasons, /X is the same as /E:ON, /Y is the same as /E:OFF and /R is the
same as /C. Any other switches are ignored.

[B][COLOR=Red]If /C or /K is specified, then the remainder of the command line after
the switch is processed as a command line[/COLOR][/B], where the following logic is
used to process quote (") characters:

1. If all of the following conditions are met, then quote characters
on the command line are preserved:

- no /S switch
- exactly two quote characters
- no special characters between the two quote characters,
where special is one of: &<>()@^|
- there are one or more whitespace characters between the
the two quote characters
- the string between the two quote characters is the name
of an executable file.

2. Otherwise, old behavior is to see if the first character is
a quote character and if so, strip the leading character and
remove the last quote character on the command line, preserving
any text after the last quote character.

If /D was NOT specified on the command line, then when CMD.EXE starts, it
Press any key to continue . . .[/CODE][CODE]
c:\>help start
Starts a [B][COLOR=Red]separate[/COLOR][/B] window to run a specified program or command.

START ["title"] [/Dpath] [/I] [/MIN] [/MAX] [/SEPARATE | /SHARED]
[/LOW | /NORMAL | /HIGH | /REALTIME | /ABOVENORMAL | /BELOWNORMAL]
[/WAIT] [/B] [command/program]
[parameters]

"title" Title to display in window title bar.
path Starting directory
B Start application [B][COLOR=Red]without creating a new window[/COLOR][/B]. The
application has ^C handling ignored. Unless the application
enables ^C processing, ^Break is the only way to interrupt
the application
I The new environment will be the original environment passed
to the cmd.exe and not the current environment.
MIN Start window minimized
MAX Start window maximized
SEPARATE Start 16-bit Windows program in separate memory space
SHARED Start 16-bit Windows program in shared memory space
LOW Start application in the IDLE priority class
NORMAL Start application in the NORMAL priority class
HIGH Start application in the HIGH priority class
REALTIME Start application in the REALTIME priority class
ABOVENORMAL Start application in the ABOVENORMAL priority class
BELOWNORMAL Start application in the BELOWNORMAL priority class
WAIT Start application and wait for it to terminate
command/program
If it is an internal cmd command or a batch file then
the command processor is run with the /K switch to cmd.exe.
This means that the window will remain after the command
has been run.

If it is not an internal cmd command or batch file then
it is a program and [B][COLOR=Red]will run as either a windowed application
or a console application.
[/COLOR] [/B]
parameters These are the parameters passed to the command/program


If Command Extensions are enabled, external command invocation
through the command line or the START command changes as follows:

non-executable files may be invoked through their file association just
by typing the name of the file as a command. (e.g. WORD.DOC would
launch the application associated with the .DOC file extension).
See the ASSOC and FTYPE commands for how to create these
associations from within a command script.

When executing an application that is a 32-bit GUI application, CMD.EXE
does not wait for the application to terminate before returning to
the command prompt. This new behavior does NOT occur if executing
within a command script.
Press any key to continue . . .
[/CODE]cmd.exe is the windows command prompts. For a standard installation (that is, if you did not play with the path variables) is is visible from anywhere. Click "start/command prompt", or "start/run" and type "cmd", and type help, i.e. play with "help" command. It is quite easy to understand. My objection to this mode (I said it before) is related to the fact that you end up running two console applications for each line, but they use same window, in fact, and you can not see it. You can use either the "start" command, or the "cmd.exe". Using both is redundant. But is just 64k of memory or so, for each, so it does not matter, if you feel comfortable with it, and want to specify affinities and priorities in an "easy to understand" format.

kladner 2011-11-29 16:06

[QUOTE=Dubslow;280351]Hey kladner,
First off, I heard Chicago's already gotten snow. True? (If so, not fair!)
Now: You said the /k makes sure each prompt gets its own window, and /b pauses the prompt if mfaktc stops running?

If I'm only running one instance, I can drop the /k, right?

And is cmd.exe recognized even if I'm running this command from the mfaktc folder (or in my case shortcut in the mfaktc folder)?[/QUOTE]

I think /k causes the persistent window, and /b keeps there from being more than one window per instance (visibly, at least.)

I use that string to run three instances most of the time. But each is launched independently. That is, I don't think there's any difference between running one, or more than one.

Without the /k I think the window will close on termination.

kladner 2011-11-30 17:36

Batch file to collect results
 
[QUOTE=LaurV;279922]<SNIP> The disadvantage of my method is that the results files will be in the subfolders. This can not be (yet) customized in the ini files. But for that I made a batch file to collect all the result.txt from subfolders, so I don't need to walk on each subfolder and look for them.[/QUOTE]

I put this together to copy the contents of 3 different results.txt files to a single file. It then effectively removes the contents by copying an empty results.txt to all three sub-directories, with no prompt on the overwrite.
[CODE]e:
cd \mfaktc_32-64
copy "mfaktc-0.17_32-64\results.txt"+"mfaktc-0.17_32-64_b\results.txt"+"mfaktc-0.17_32-64_c\results.txt" "Results-kladner.txt"
copy results.txt "mfaktc-0.17_32-64" /y
copy results.txt "mfaktc-0.17_32-64_b" /y
copy results.txt "mfaktc-0.17_32-64_c" /y[/CODE]This is probably not the most elegant solution, but it does work.

TheJudger 2011-12-03 00:34

[QUOTE=TheJudger;279285]
So for the mfaktc 0.18 release[LIST][*]I want to rework the barrett92 kernel (CUDA 4.1 optimizations)[*]I want to wait for official CUDA 4.1 release[*]ask Eric which of his new code should be included[/LIST][/QUOTE]

OK, OBD users will like the CUDA 4.1 optimized barrett92 kernel. :smile:
Preliminary data from my stock GTX 470 for M3321932839 from 2[SUP]79[/SUP] to 2[SUP]80[/SUP] - [B]raw GPU speed[/B]:
[CODE] | CUDA 3.2 | CUDA 4.0 | CUDA 4.1-RC1
mfaktc 0.17 | 177.59M/s | 185.38M/s | 181.21M/s
mfaktc 0.18-pre8 | 177.91M/s | 185.65M/s | 181.48M/s
mfaktc 0.18-pre10 | 183.77M/s | 191.96M/s | 211.03M/s[/CODE]

up to mfaktc 0.18-pre8 there are minimal changes in GPU code against mfaktc 0.17 (e.g. barrett92 has now a mininum factor size of 2[SUP]79[/SUP]). 0.18-pre9 are CUDA 4.1 specific optimizations for the barrett79 kernel, 0.18-pre10 are CUDA 4.1 specific optimizations and a rework of the squaring function for the barrett92 kernel. I guess 3-4% improvement for CC 1.x GPUs, too.


[QUOTE=TheJudger;279979][QUOTE=James Heinrich;279886]:surprised
No perhaps, please(!) add in some code to limit the frequency of writing checkpoint files![/QUOTE]

OK, added to my todo-list.[/QUOTE]

Removed from my todo-list and added to changelog.


Oliver

TheJudger 2011-12-03 13:10

[QUOTE=TheJudger;280853]
...
and a rework of the squaring function for the barrett92 kernel. I guess 3-4% improvement for CC 1.x GPUs, too.[/QUOTE]
Tests on my GTX 275 showed that my initial guess was wrong, only 1% improvement for barrett92 on CC 1.x. :sad:
GTX 275, M3321932839 from 2[SUP]79[/SUP] to 2[SUP]80[/SUP] - [B]raw GPU speed[/B]:
[CODE]mfaktc 0.17 45.96M/s
mfaktc 0.18-pre10 46.40M/s
[/CODE]
No matter if using CUDA 3.2, 4.0 or 4.1-RC1.

Oliver

ET_ 2011-12-03 17:53

[QUOTE=TheJudger;280899]Tests on my GTX 275 showed that my initial guess was wrong, only 1% improvement for barrett92 on CC 1.x. :sad:
GTX 275, M3321932839 from 2[SUP]79[/SUP] to 2[SUP]80[/SUP] - [B]raw GPU speed[/B]:
[CODE]mfaktc 0.17 45.96M/s
mfaktc 0.18-pre10 46.40M/s
[/CODE]
No matter if using CUDA 3.2, 4.0 or 4.1-RC1.

Oliver[/QUOTE]

Every single bit matters... :smile:

Luigi

Dubslow 2011-12-04 23:20

May I also request that the Windows and Linux versions be able to use each others' save files?

Edit: Yeah, I still don't get what's happening here. Remember when I reported that for some reason in Windows my GTX 460 would randomly lose half its throughput? And after a restart it would get it back. Now in Linux it's only getting half the throughput rate, like in Windows, except it [u]starts[/u] at half rather than randomly dropping to a half. To be fair, I'm using fairly old drivers, but updating those is what caused me to lose my GUI in the first place a couple of months ago. Will try again though, and hopefully it won't fail this time.

Edit2: Also does anybody know how to monitor GPU load in Linux?

Edit3: I can confirm that the nVidia .run file to update drivers does not work for me. I have to run [code]sudo apt-get install --reinstall nvidia-current[/code] to fix mah gui. Note: When I tried running the .run, it reported my drivers as 285.x, whereas nvidia-settings reports driver version 270.41.06.

Dubslow 2011-12-05 07:24

And now it's running at 80% throughput, not 50%. That's the first time it's ever done that and I have no idea why, and I can't think of anything that's different.

Now it's down to 50% again?!?!? All I did was stop and restart MPrime?!?!?

Wait a minute. MPrime affinities seems to play a role in it, despite the fact that the affinities shouldn't play a role... this is so confusing. See [URL="http://mersenneforum.org/showthread.php?t=16289"]here[/URL]. I'll have to come back to this tomorrow.

TheJudger 2011-12-05 11:23

[QUOTE=Dubslow;281006]May I also request that the Windows and Linux versions be able to use each others' save files?
[/QUOTE]

Changelog for mfaktc 0.18:[CODE]
version 0.18-pre7 (2011-10-18)
...
- mfaktc no longer refuses to load a checkpoint file from a Linux version
with a Windows version of mfaktc and vice versa. Of course mfaktc still
refuses to load checkpoint files from other versions than itself
(identical version string!)[/CODE]

All you have to do is wait for mfaktc 0.18 (which depends mainly on the public release of CUDA 4.1)

Oliver

James Heinrich 2011-12-05 17:26

I've thrown together a rough chart of CUDA GPU performance comparison:
[url]http://mersenne-aries.sili.net/mfaktc.php[/url]

It is not yet properly calibrated. It currently translates GFLOPS (from Wikipedia) into GHz-days/day based on timing of a single test on my 8800GT. It does not (yet) take into account performance differences of different mfaktc cores etc. But I need some more data to fine-tune it: Please send me some timing info for a [i]single instance[/i] of mfaktc, including assignment (exponent, from/to bits), GPU model, time to complete the assignment, and GPU usage for that single instance.

kladner 2011-12-05 17:37

[QUOTE=James Heinrich;281105].....Please send me some timing info for a [I]single instance[/I] of mfaktc, including assignment (exponent, from/to bits), GPU model, time to complete the assignment, and GPU usage for that single instance.[/QUOTE]

I'm setting up such a run on a GTX 460. I'll launch it as soon as the current assignment finishes in 7 minutes.

chalsall 2011-12-05 17:46

[QUOTE=James Heinrich;281105]I've thrown together a rough chart of CUDA GPU performance comparison: [url]http://mersenne-aries.sili.net/mfaktc.php[/url][/QUOTE]

Sweet! :smile:

kladner 2011-12-05 19:23

Timing run results
 
To James Heinrich:

[CODE]Phenom II 1090T, 3.5GHz, core locked to mfaktc
Gigabyte GTX 460 (factory OC 715MHz, tested at chipset stock 675)
Prime95-64 on 4 other cores

mfaktc.ini settings:
NumStreams=3
CPUStreams=3
GridSize=3
AllowSleep=1

Factor=N/A,54168349,70,71

k_min = 10897430345340
k_max = 21794860698401

candidates=590.87M
time=4.449-5.176 (mostly upper mid 4.4xx)
rate=~131M/s
SievePrimes=5000 (occasional brief upticks)
avg. wait=~185 (with occasional swings as high as 1300. Most in the upper 180s)

Usage ~78% (narrow fluctuation, occasional dips to 67%)
vRAM usage 204-158=46MB
no factor for M54168349 from 2^70 to 2^71 [mfaktc 0.17-Win barrett79_mul32]
tf(): total time spent: 1h 13m 31.064s[/CODE]

James Heinrich 2011-12-05 19:50

[QUOTE=kladner;281111]Gigabyte GTX 460 (factory OC 715MHz, tested at chipset stock 675)
Factor=N/A,54168349,70,71
Usage ~78% (narrow fluctuation, occasional dips to 67%)
tf(): total time spent: 1h 13m 31.064s[/QUOTE]Perfect, thanks. The above is the 4 pieces of info I need.

This is exactly why I need more data points: :smile:
8800GT: 13.90 GFLOPS per GHz-day/day
GTX 460: 8.18 GFLOPS per GHz-day/day


[b]edit:[/b] Hmm, [i]kladner[/i] -- which GPU is your GTX 460 using? GF104 or GF114? (if you're not sure, something like [url=http://www.techpowerup.com/downloads/SysInfo/GPU-Z/]GPU-Z[/url] will tell you).

kladner 2011-12-05 20:17

[QUOTE=James Heinrich;281113]Perfect, thanks. ...........
[B]
edit:[/B] Hmm, [I]kladner[/I] -- which GPU is your GTX 460 using? GF104 or GF114? (if you're not sure, something like [URL="http://www.techpowerup.com/downloads/SysInfo/GPU-Z/"]GPU-Z[/URL] will tell you).[/QUOTE]

This card is a Rev. 1, so it's a GF104 (confirmed by GPU-Z).

You're welcome! I'm glad to contribute to the effort. Do you want numbers for OC?

James Heinrich 2011-12-05 20:25

OC numbers aren't neccesary, but if you feel like it, sure.
But I'm most interested in results from other GPUs, both older and newer.

KyleAskine 2011-12-05 20:38

[QUOTE=James Heinrich;281118]OC numbers aren't neccesary, but if you feel like it, sure.
But I'm most interested in results from other GPUs, both older and newer.[/QUOTE]

Want mfakto numbers too? Or just nVidia?

James Heinrich 2011-12-05 20:46

Right now my data only has NVIDIA cards in there, but send me mfakto numbers and I'll wrangle some AMD data too.

kladner 2011-12-05 21:18

[QUOTE=kladner;281117]This card is a Rev. 1, so it's a GF104 (confirmed by GPU-Z).[/QUOTE]

I should have stated that this is a 1GB of RAM card.

Jaxon 2011-12-05 21:40

For James, I got these results last night:
[CODE]
NVIDIA Geforce GTX 260 Core 216 running at default 576MHz
GPU GT200, Revision B1

Usage ~93%
Factor=N/A,52937323,70,72
no factor for M52937323 from 2^70 to 2^71 [mfaktc 0.17-Win 71bit_mul24]
tf(): total time spent: 2h 7m 47.851s
no factor for M52937323 from 2^71 to 2^72 [mfaktc 0.17-Win barrett79_mul32]
tf(): total time spent: 4h 25m 40.860s
[/CODE]

Dubslow 2011-12-05 21:40

Speaking of OC's and overclocking...
My two previous posts, complaining about 50% and 80% throughput I can now partially explain. My GTX 460 default core clock is 751 MHz (so I guess it is factory OC'ed, based on what kladner said) and in Windows I run it at 850 MHz. (I don't know of any Linux OC utility). So that explains why it's running at '80%' compared to what I was getting in Windows. I still can't explain entirely why it goes down to 50% Windows throughput, but I have figured out it has something to do with MPrime affinities (even though they should be independent of each other).

@James: With SievePrimes=5000, I get 180+/-2 M/s avg. rate at 850 MHz, and 160+/-2 M/s avg. rate at 751 MHz. I can't verify via GPUZ ATM because Linux.

James Heinrich 2011-12-05 21:45

[QUOTE=Dubslow;281131]@James: With SievePrimes=5000, I get 180+/-2 M/s avg. rate at 850 MHz, and 160+/-2 M/s avg. rate at 751 MHz. I can't verify via GPUZ ATM because Linux.[/QUOTE]Thanks, but I need precise numbers running a [b]single instance[/b] (just like [i]kladner[/i] provided): GPU, assignment, GPU usage, runtime. If I'm missing any datum it's not much use to me.

Dubslow 2011-12-05 21:54

That is single instance. My CPU is (almost) almost able to keep up with it. The avg. rate has been consistent for the last month completely independent of assignment. GPU-Z, when I was monitoring this in Windows, reported 80% load at 850 MHz, with 180 M/s, and a runtime of 3H17M (assuming I'm not using the comp, i.e. max efficiency) for Factor=N/A,50222647,69,72 (run as one assignment, not with separated bit levels, which from what I've heard does change mfaktc's efficiency).

Dubslow 2011-12-05 23:07

It's [url]http://www.newegg.com/Product/Product.aspx?Item=N82E16814127519[/url]
Hopefully that helps.

aaronhaviland 2011-12-06 00:27

[QUOTE=TheJudger;281069]All you have to do is wait for mfaktc 0.18 (which depends mainly on the public release of CUDA 4.1)[/QUOTE]

CUDA4.1 RC2 appears to be available! [URL]http://developer.nvidia.com/cuda-toolkit-41[/URL]

KyleAskine 2011-12-06 01:04

[QUOTE=James Heinrich;281113]Perfect, thanks. The above is the 4 pieces of info I need.

This is exactly why I need more data points: :smile:
8800GT: 13.90 GFLOPS per GHz-day/day
GTX 460: 8.18 GFLOPS per GHz-day/day


[b]edit:[/b] Hmm, [i]kladner[/i] -- which GPU is your GTX 460 using? GF104 or GF114? (if you're not sure, something like [url=http://www.techpowerup.com/downloads/SysInfo/GPU-Z/]GPU-Z[/url] will tell you).[/QUOTE]

I ran one instance for each of my graphics cards (in xFire) on my i5-2500k oc'ed to 4.3GHz. I just about maxed out my CPU and my GPUs with sieve primes at 25000.

ASUS HD 6950 DirectCUII (810MHz)
Factor=N/A,52101913,69,70
Usage between 85 and 90%. Usually around 87%.
no factor for M52101913 from 2^69 to 2^70 [mfakto 0.09-Win mfakto_cl_71]
tf(): total time spent: 30m 18.750s

and

ASUS HD 6950 DirectCUII (810MHz)
Factor=N/A,52123333,69,70
Usage between 87 and 91%. Usually around 90%.
no factor for M52123333 from 2^69 to 2^70 [mfakto 0.09-Win mfakto_cl_71]
tf(): total time spent: 29m 36.217s

KyleAskine 2011-12-06 01:19

[QUOTE=James Heinrich;281113]Perfect, thanks. The above is the 4 pieces of info I need.

This is exactly why I need more data points: :smile:
8800GT: 13.90 GFLOPS per GHz-day/day
GTX 460: 8.18 GFLOPS per GHz-day/day


[b]edit:[/b] Hmm, [i]kladner[/i] -- which GPU is your GTX 460 using? GF104 or GF114? (if you're not sure, something like [url=http://www.techpowerup.com/downloads/SysInfo/GPU-Z/]GPU-Z[/url] will tell you).[/QUOTE]

Wouldn't something like sieveprimes be important too? I mean, I get just about the exact same numbers when I run sieve primes = 10000, but my time increases reasonably significantly.

PS - I feel this should be in a separate benchmark thread. I feel like I am threadjacking.

Dubslow 2011-12-06 01:26

[QUOTE=KyleAskine;281162]Wouldn't something like sieveprimes be important too? I mean, I get just about the exact same numbers when I run sieve primes = 10000, but my time increases reasonably significantly.

PS - I feel this should be in a separate benchmark thread. I feel like I am threadjacking.[/QUOTE]

Yes, SievePrimes certainly affects running time, and even if it didn't, it affects how much work is done on the GPU, rather than the CPU. Without it, you'd likely need to include the CPU data too...

Chuck 2011-12-06 03:19

GTX 580 datapoint
 
EVGA Black Ops GTX 580 factory OC 797 MHz
Factor=n/a,49938787,71,72
Usage: 44%
tf(): total time spent: 1h 58m 2.857s

Dubslow 2011-12-06 04:26

Oh yes @TheJudger: Can you also adjust the parser so that it can read Windows and Linux worktodos? I don't know about the opposite case, but the Linux version is incapable of reading the CR/LF of the Windows text files.

Thanks

dbaugh 2011-12-06 05:15

ckp fast vs regular
 
Can a ckp file generated by the regular version of mfaktc be used to continue processing using the fast version of mfaktc?

Bdot 2011-12-06 07:32

[QUOTE=James Heinrich;281132]a [B]single instance[/B]: GPU, assignment, GPU usage, runtime. If I'm missing any datum it's not much use to me.[/QUOTE]

NVIDIA Quadro FX 880M (GT216) @ 550MHz
~98% GPU load
no factor for M47677891 from 2^69 to 2^70 [mfaktc 0.18-pre7 71bit_mul24]
tf(): total time spent: 4h 54m 38.525s

SievePrimes @ 200k, ~13.4M/s, CPU wait 32%

TheJudger 2011-12-06 12:50

[QUOTE=James Heinrich;281105]I've thrown together a rough chart of CUDA GPU performance comparison:
[url]http://mersenne-aries.sili.net/mfaktc.php[/url]

It is not yet properly calibrated. It currently translates GFLOPS (from Wikipedia) into GHz-days/day based on timing of a single test on my 8800GT. It does not (yet) take into account performance differences of different mfaktc cores etc. But I need some more data to fine-tune it: Please send me some timing info for a [i]single instance[/i] of mfaktc, including assignment (exponent, from/to bits), GPU model, time to complete the assignment, and GPU usage for that single instance.[/QUOTE]

[QUOTE=James Heinrich;281113]Perfect, thanks. The above is the 4 pieces of info I need.

This is exactly why I need more data points: :smile:
8800GT: 13.90 GFLOPS per GHz-day/day
GTX 460: 8.18 GFLOPS per GHz-day/day


[b]edit:[/b] Hmm, [i]kladner[/i] -- which GPU is your GTX 460 using? GF104 or GF114? (if you're not sure, something like [url=http://www.techpowerup.com/downloads/SysInfo/GPU-Z/]GPU-Z[/url] will tell you).[/QUOTE]

Well, this might be not so easy...[LIST][*]my GTX 470 (1089 GFLOPS) is [B]4-5 times faster[/B] than my GTX 275 (1011 GFLOPS) for current assignments[LIST][*]compute capability 1.0 (G80 chip): wont work[*]compute capability 1.1-1.3: same speed[*]compute capability 2.0: currently best GFLOPS/mfaktc performance[*]compute capability 2.1: ~20-35% slower than 2.0 for same GFLOPS[/LIST][*]single instance of mfaktc will measure [B]CPU[/B] performance, not GPU performance for the highend GPUs[*]you can remove all G80 GPUs from your list: won't work with mfaktc[/LIST]
Oliver

Dubslow 2011-12-06 15:36

Do you know why 2.1 is worse than 2.0?

Edit: So a 460's slower than a 465/470/480 by nature of its compute compatibility.... [url]http://developer.nvidia.com/cuda-gpus[/url]

TheJudger 2011-12-06 15:59

[QUOTE=Dubslow;281241]Do you know why 2.1 is worse than 2.0?

Edit: So a 460's slower than a 465/470/480 by nature of its compute compatibility.... [url]http://developer.nvidia.com/cuda-gpus[/url][/QUOTE]

ILP (instruction-level parallelism) for CC 2.1. One could say that they saved one instruction scheduler.
mfaktc has a lot of dependent instructions (carry flag) so ILP doesn't work here. Of course I could write a kernel without the use of the carry flag but my guess is that it is much slower on all archtictures.

CC 2.x is much better than CC 1.x for mfaktc because 2.x can do int32 multiplication native while CC 1.x can't.

Oliver

Dubslow 2011-12-06 16:06

Hmm. I understand the third line, but the first two are beyond me...

:)
And that's why I'm not the developer.

kladner 2011-12-06 20:33

Timing run results #2
 
To James Heinrich:

This is from my partner's machine. It's an i7 920, 2811MHz, 3GB RAM, XP 32bit. I did turn off the Turbo for this run.

[CODE]Asus 9600 GT, fanless
GPU @ 650MHz
Usage 99%
Factor M54097591, No factor found, 70-71
Time/class 18.08s
Total Time 4h 51m 42s
Affinity not set
3,3,3 on the Streams and GridSize
AllowSleep=0[/CODE]

Obviously, the CPU is twiddling its non-existent thumbs waiting on this card. SP stuck at 200,000. avg wait 10,300. I just threw this in to give a low end marker.

This box is going to be Win7-64 with 9GB RAM before too much longer.

nucleon 2011-12-07 11:28

[QUOTE=TheJudger;281224]Well, this might be not so easy...[LIST][*]single instance of mfaktc will measure [B]CPU[/B] performance, not GPU performance for the highend GPUs[/LIST][/QUOTE]

What Oliver said.

I have 4x GTX580s in my setup:[LIST][*]2 of them are installed in a i7-2600k@4.5GHz - last 10day average - 564.6GHz-days/day combined, sieve primes=5000, GPU usage 96-98% need more cpu :( (all 4 cores used)[*]1 installed in i7-920@2.8GHz - last 10day average 315.9GHz-days/day, sieve primes=12000, GPU load=99% (all 4 cores used)[*]1 installed in AMD FX8120@3.8GHz, - last 4 days average 201.6GHz-days/day, sieve primes=5000, GPU load=72%, (Sorry, I can't recommend this CPU at all for any reason)[/LIST]
-- Craig

James Heinrich 2011-12-07 17:24

[QUOTE=Dubslow;281241][url]http://developer.nvidia.com/cuda-gpus[/url][/QUOTE]Thanks, that was helpful (... although contained more than one conflicting datum; I'm not sure it's 100% accurate).

[QUOTE=TheJudger;281224]Well, this might be not so easy...[LIST][*]compute capability 1.0 (G80 chip): wont work[*]compute capability 1.1-1.3: same speed[*]compute capability 2.0: currently best GFLOPS/mfaktc performance[*]compute capability 2.1: ~20-35% slower than 2.0 for same GFLOPS[/LIST][/QUOTE]You're right. It wasn't easy, and I concur about your performance conclusions.
There are many factors that affect it (from overclocked GPU speed, SievePrimes setting, CPU powering it, etc etc) so my numbers are naturally quite rough, but compiling all the results give a general pattern. I'm using these approximated multipliers for GFLOPS to GHz-days/day:
v1.1-1.3 = 14.0
v2.0 = 5.0
v2.1 = 7.5

I'm pretty confident about the v1.1 results (3 very close results), less so about v2.0 and 2.1, but it's at least in the ballpark.

My chart is now scaled according to the compute version:
[url]http://mersenne-aries.sili.net/mfaktc.php[/url]

RichD 2011-12-08 14:36

Anybody have experience with the Linux [URL="http://www.nvidia.com/object/linux-display-ia32-290.10-driver.html"]290.10[/URL] driver from nVidia?

James Heinrich 2011-12-08 14:42

[QUOTE=James Heinrich;281409]I'm using these approximated multipliers for GFLOPS to GHz-days/day[/QUOTE]And by that, of course, I mean the complete opposite. :blush:
e.g. 8800GT = v1.1 @ 504 GFLOPS. 504 / 14 = 36 GHz-days/day expected.

Dubslow 2011-12-09 01:04

[QUOTE=RichD;281512]Anybody have experience with the Linux [URL="http://www.nvidia.com/object/linux-display-ia32-290.10-driver.html"]290.10[/URL] driver from nVidia?[/QUOTE]

I would love to tell you, but for whatever reason, using the nVidia install file crashes my GUI (Ubuntu 11.04). (This also applies for previous drivers as well.)

Ralf Recker 2011-12-09 04:53

[QUOTE=Dubslow;281590]I would love to tell you, but for whatever reason, using the nVidia install file crashes my GUI (Ubuntu 11.04). (This also applies for previous drivers as well.)[/QUOTE]
Probably a conflict between the nouveau drivers and the drivers from NVIDIA. It's not only necessary to completely disable the nouveau drivers before trying to install any NVIDIA driver, you also have to prevent the load of the kernel module for the nouveau driver by blacklisting the nouveau kernel module.

This PPA might be useful for you:

[URL="https://launchpad.net/%7Eubuntu-x-swat/+archive/x-updates"]https://launchpad.net/~ubuntu-x-swat/+archive/x-updates[/URL]

Dubslow 2011-12-09 05:18

Installing the nvidia-current package fixes the GUI. I remember at install time I elected to install proprietary drivers. I'll take a look though and see what I can do.

TheJudger 2011-12-10 00:13

[QUOTE=TheJudger;279285]Factory overclocked GTX 560Ti (1701MHz), barrett79 kernel, raw GPU speed (without sieving), M66362159 from 2[SUP]69[/SUP] to 2[SUP]70[/SUP][CODE]
| CUDA 3.2 | CUDA 4.1-RC1
mfaktc 0.17 | 260.94M/s | 261.93M/s
mfaktc 0.18-pre9 | 260.80M/s | 258.97M/s[/CODE][/QUOTE]

Factory overclocked GTX 560Ti (1701MHz), barrett79 kernel, raw GPU speed (without sieving), M66362159 from 2[SUP]69[/SUP] to 2[SUP]70[/SUP][CODE]
| CUDA 3.2 | CUDA 4.1-RC2
mfaktc 0.17 | 260.94M/s | 261.93M/s
mfaktc 0.18-pre10 | 260.80M/s | 265.39M/s[/CODE]

A little bit better than before :smile: but there are no changes in the code of the barrett79 kernel from -pre9 to -pre10...

Factory overclocked GTX 560Ti (1701MHz), barrett92 kernel, raw GPU speed (without sieving), M3321932839 from 2[SUP]79[/SUP] to 2[SUP]80[/SUP][CODE]
| CUDA 4.1-RC2
mfaktc 0.17 | 170.62M/s
mfaktc 0.18-pre10 | 173.32M/s[/CODE]

A little bit faster, too. But the difference between compute capability 2.0 and 2.1 increases further... :sad:

Oliver

TheJudger 2011-12-19 23:26

mfaktc 0.18
 
Hello!

[url]http://www.mersenneforum.org/mfaktc/mfaktc-0.18.tar.gz[/url]
[url]http://www.mersenneforum.org/mfaktc/mfaktc-0.18.win.zip[/url]
[url]http://www.mersenneforum.org/mfaktc/mfaktc-0.18.linux64.tar.gz[/url]

The executables need at least a [B]CUDA 4.0[/B] capable driver (270 series driver or newer). The Windows zip archive contains both, the 32 bit and 64 bit version. I'll upload new executables once [B]CUDA 4.1[/B] is public available. The sources should compile with older CUDA version, too, but they might be slower. CUDA 4.1 will give another performance improvement for the barrett based kernels on compute capability 2.x GPUs (especially on 2.0).

Compared to mfaktc 0.17 there are "more than usuall" minor changes. Highlights from the Changelog.txt:[LIST][*]autoadjustment of SievePrimes is now less dependend on the gridsize and
absolute speed. Instead of measuring the absolute (average) time waited
per precessing block (grid size) now the relative time spent on waiting
for the GPU is calculated. In the per-class output "avg. wait" is replaced
by "CPU wait".[*]new commandline option: "-v" (verbosity) let the user decide how many
informations are printed
(suggested by aspen on [url]www.mersenneforum.org[/url])[*]"has a factor" result lines now contain informations (program name,
versions, bitlevel, ...) James Heinrich is working on this on the server
side. This should give more accurate credits for "has a factor" results
from the primenet server once this is fully implemented.[*]mfaktc no longer refuses to load a checkpoint file from a Linux version
with a Windows version of mfaktc and vice versa. Of course mfaktc still
refuses to load checkpoint files from other versions than itself
(identical version string!)[*]added a (simple) signal handler (captures SIGINT and SIGTERM).
1st ^C: mfaktc will exit after the currently processed class is finished.
2nd ^C: mfaktc will stop immediately[*]added a minimum delay between two checkpoint file writes. The user can set
the delay in mfaktc.ini (CheckpointDelay).[*]added a new code path to barrett79_mul32 and barrett92_mul32 kernels, CUDA
>= 4.1 features multiply-add with carry for compute capability >= 2.0.
On my GTX 470 (compute capability) this yields up to 15% for
barrett92_mul32 and up to 7% for barrett79_mul32 extra throughput.[/LIST]
As usuall: finish your current assignments with your current version and do the update after it, mfaktc 0.18 will refuse foreign checkpoint files.

Oliver

kladner 2011-12-20 00:43

Kudos!
 
Many thanks, sir! I am impatient for my current assignments to finish so that I can put this version into service.

Dubslow 2011-12-20 01:34

Would you mind posting the .dll/.so s on the mfatkc mirror? I'd rather not have to download the whole CUDA environment...

LaurV 2011-12-20 03:47

[QUOTE=TheJudger;282838]...mfaktc 0.18...[/QUOTE]
Output file (results.txt) customizable from the ini file? (including the path, for collecting all the results from all running processes of mfaktc in a single file).

diamonddave 2011-12-20 04:04

[QUOTE=TheJudger;282838][*]"has a factor" result lines now contain informations (program name,
versions, bitlevel, ...) James Heinrich is working on this on the server
side. This should give more accurate credits for "has a factor" results
from the primenet server once this is fully implemented.
[/QUOTE]

Many thanks! Can't wait to test this feature with a new exponent!

kladner 2011-12-20 05:05

The new version seems to be working well. At least, there have been no problems reported.

TheJudger 2011-12-20 11:13

[QUOTE=Dubslow;282851]Would you mind posting the .dll/.so s on the mfatkc mirror? I'd rather not have to download the whole CUDA environment...[/QUOTE]

They are included in the archives for the executables, aren't they?

[QUOTE=LaurV;282864]Output file (results.txt) customizable from the ini file? (including the path, for collecting all the results from all running processes of mfaktc in a single file).[/QUOTE]

Well, I'm still unsure about this feature. Personally I don't like it but it seems that you and some others want it. Bdot (mfakto) tries to convince me, too.

So I guess I'll add this for 0.19?

Oliver

James Heinrich 2011-12-20 13:55

[QUOTE=TheJudger;282896]Well, I'm still unsure about this feature. Personally I don't like it but it seems that you and some others want it. Bdot (mfakto) tries to convince me, too.[/QUOTE]I also think it would be good to have as a configurable option. Naturally you'll need to lock the file for writing for the split second it takes to write the result line so two instances don't try and write at the same time.

Along the same lines, a unified worktodo.txt would also be nice, perhaps split into [Worker #1], [Worker #2], etc sections. This is of course a little more work than a configurable results.txt, but lets us just deal with one in and one out for each machine, in a format that's already familiar to us from Prime95.

Even better would be to optimize/thread the sieving such that we'd only ever need to run a single mfaktc instance (sieving would spread across as many CPU cores as needed to feed the GPU(s). But that's a whole other set of complications for a much later release. :smile:

Chuck 2011-12-20 14:47

Great! Thanks for the update. I've got two instances running now.

kladner 2011-12-20 15:07

.17 vs .18
 
1 Attachment(s)
This was rather a quick test, showing the difference between mfaktc .17 and .18. V.18 did eventually drop to SievePrimes 5000, though the time didn't really change that much.

EDIT: These were run with the same exponent in single instances.

Chuck 2011-12-20 15:09

1 Attachment(s)
Looking good with two instances running.

kladner 2011-12-20 15:17

2 instances
 
1 Attachment(s)
Running on GTX 460, at 875MHz.

Exponent a is 541216xx
Exponent b is 544325xx
71-72 bit.

Dubslow 2011-12-21 01:16

[QUOTE=TheJudger;282896]They are included in the archives for the executables, aren't they?
[/QUOTE]

Whoops, my bad. However, it said *.so.4.0.17, not 4.1.*.

When I ran it, I got this: [code]CUDA version info
binary compiled for CUDA 4.0
CUDA runtime version 0.0
CUDA driver version 297000.8
ERROR: CUDA runtime version must match the CUDA toolkit version used during compile!
[/code]
My command to start it is [code]LD_LIBRARY_PATH=./lib taskset -c 7 ./mfaktc-lin-64[/code] with [code]bill@Gravemind:~/mfaktc∰∂ ls -lR
.:
total 887
-rwxrwxrwx 1 root root 15489 2011-12-20 18:47 Changelog.txt
-rwxrwxrwx 1 root root 35147 2011-12-20 18:47 COPYING
-rwxrwxrwx 2 root root 549480 2011-08-14 13:14 cudart64_32_16.dll
drwxrwxrwx 1 root root 504 2011-12-20 18:49 lib
-rwxrwxrwx 1 root root 4094 2011-12-20 19:01 mfaktc.ini
-rwxrwxrwx 1 root root 267696 2011-12-20 18:47 mfaktc-lin-64
-rwxrwxrwx 1 root root 12308 2011-12-20 18:47 README.txt
-rwxrwxrwx 1 root root 5876 2011-12-20 19:06 worktodo.txt

./lib:
total 329
lrwxrwxrwx 1 root root 36 2011-12-20 18:47 libcudart.so -> libcudart.so.4
lrwxrwxrwx 1 root root 46 2011-12-20 18:47 libcudart.so.4 -> libcudart.so.4.0.17
-rwxrwxrwx 1 root root 334760 2011-12-20 18:47 libcudart.so.4.0.17[/code] as the setup. (Ownership is root because ~/mfaktc is a link to a folder on my Windows partition, so there aren't really 'ownerships' that I can change.)

Edit: Heh -- try numbers 2, 3, 4, 5, 6, 7, and 8: [code]CUDA driver version 797969.80
CUDA driver version 1344879.88
CUDA driver version 1045146.96
CUDA driver version -952091.-76
CUDA driver version -62620.0
CUDA driver version 300866.32
CUDA driver version 2033957.60[/code] With the negatives, the error changes to [code]ERROR: current CUDA driver version is lower than the CUDA toolkit version used during compile![/code]
All runtime versions reported as 0.0.

moebius 2011-12-21 03:26

I got this error running the linux64bit version:

:~/mfaktc-0.18$ ./mfaktc.exe -h
./mfaktc.exe: error while loading shared libraries: libcudart.so.4: cannot open shared object file: No such file or directory

mfaktc-0.16 works without errors.
Any Ideas?

Dubslow 2011-12-21 04:11

do 'echo $LD_LIBRARY_PATH' and 'ls -l mfaktc-0.18' or equivalent.
Did you need anything special like a lib folder for mfaktc16?

moebius 2011-12-21 04:30

[QUOTE=Dubslow;283002]do 'echo $LD_LIBRARY_PATH' and 'ls -l mfaktc-0.18' or equivalent.
Did you need anything special like a lib folder for mfaktc?[/QUOTE]

echo $LD_LIBRARY_PATH
:/usr/local/cuda/lib64

lib folder is existing

Dubslow 2011-12-21 04:34

ls -l /usr/local/cuda/lib64
ls -l mfaktc-0.18/lib

I'll check back every few minutes.

moebius 2011-12-21 04:39

[QUOTE=Dubslow;283006]ls -l /usr/local/cuda/lib64
ls -l mfaktc-0.18/lib

I'll check back every few minutes.[/QUOTE]

ls -l /usr/local/cuda/lib64
total 161108
lrwxrwxrwx 1 root root 14 2011-03-20 02:02 libcublas.so -> libcublas.so.3
lrwxrwxrwx 1 root root 19 2011-03-20 02:02 libcublas.so.3 -> libcublas.so.3.2.16
-rwxr-xr-x 1 root root 85803184 2011-03-20 02:02 libcublas.so.3.2.16
lrwxrwxrwx 1 root root 14 2011-03-20 02:02 libcudart.so -> libcudart.so.3
lrwxrwxrwx 1 root root 19 2011-03-20 02:02 libcudart.so.3 -> libcudart.so.3.2.16
-rwxr-xr-x 1 root root 313872 2011-03-20 02:02 libcudart.so.3.2.16
lrwxrwxrwx 1 root root 13 2011-03-20 02:02 libcufft.so -> libcufft.so.3
lrwxrwxrwx 1 root root 18 2011-03-20 02:02 libcufft.so.3 -> libcufft.so.3.2.16
-rwxr-xr-x 1 root root 28996288 2011-03-20 02:02 libcufft.so.3.2.16
lrwxrwxrwx 1 root root 14 2011-03-20 02:02 libcurand.so -> libcurand.so.3
lrwxrwxrwx 1 root root 19 2011-03-20 02:02 libcurand.so.3 -> libcurand.so.3.2.16
-rwxr-xr-x 1 root root 4234816 2011-03-20 02:02 libcurand.so.3.2.16
lrwxrwxrwx 1 root root 16 2011-03-20 02:02 libcusparse.so -> libcusparse.so.3
lrwxrwxrwx 1 root root 21 2011-03-20 02:02 libcusparse.so.3 -> libcusparse.so.3.2.16
-rwxr-xr-x 1 root root 45613728 2011-03-20 02:02 libcusparse.so.3.2.16


ls -l lib
total 344
lrwxrwxrwx 1 moebius moebius 14 2011-12-20 05:01 libcudart.so -> libcudart.so.4
lrwxrwxrwx 1 moebius moebius 19 2011-12-20 05:01 libcudart.so.4 -> libcudart.so.4.0.17
-rwxr-xr-x 1 moebius moebius 334760 2011-11-13 23:53 libcudart.so.4.0.17

Dubslow 2011-12-21 04:50

okay first try 'cp mfaktc-0.18/lib/* /usr/local/cuda/lib64' and try running again.

moebius 2011-12-21 04:55

[QUOTE=Dubslow;283010]okay first try 'cp mfaktc-0.18/lib/* /usr/local/cuda/lib64' and try running again.[/QUOTE]

Thank you very much


sudo cp mfaktc-0.18/lib/* /usr/local/cuda/lib64

has worked

./mfaktc.exe
mfaktc v0.18 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
NumStreams 3
CPUStreams 3
GridSize 3
WorkFile worktodo.txt
Checkpoints enabled
CheckpointDelay 30s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
AllowSleep no

CUDA version info
binary compiled for CUDA 4.0
CUDA runtime version 4.0
CUDA driver version 4.10

CUDA device info
name GeForce GTX 560 Ti
compute capability 2.1
maximum threads per block 1024
number of multiprocessors 8 (384 shader cores)
clock rate 1800MHz

Automatic parameters
threads per grid 1048576

running a simple selftest...
Selftest statistics
number of tests 25
successfull tests 25

selftest PASSED!

Dubslow 2011-12-21 05:03

Nuts. Now to fix mine :razz:

moebius 2011-12-21 05:09

[QUOTE=Dubslow;283013]Nuts. Now to fix mine :razz:[/QUOTE]

Do you use Cuda 3.2 just like me?

Dubslow 2011-12-21 05:17

No, you're not using 3.2 anymore. Note the driver/runtime versions. I have the 3.2 IDE installed like you, but mfaktc could never find the /usr/local/cuda/lib, even when LD_LIBRARY_PATH was set properly, so I had to use a workaround to get it to find the right .so file. I'm using the same workaround for .18 and instead it tells me runtime version 0.0, and driver version... well just go check the previous page.

TheJudger 2011-12-21 23:14

Dubslow:

actually I've no clue whats wrong but can you check this:[LIST][*]which nvidia driver do you have installed? (run e.g. 'nvidia-smi -a')[*]did you check 'ldd mfaktc-lin-64' or 'LD_LIBRARY_PATH=./bin ldd mfaktc-lin-64'?[/LIST]Oliver

James Heinrich 2011-12-22 00:56

Feature request for v0.19:

Can you make it (optionally) write something like "no work left in worktodo.txt" when it runs out of work? I tend to look at the results files for all my computers once a day or so, but I may not always notice when it runs out of work -- a notice in the results file would make it immediately obvious.

Dubslow 2011-12-22 04:29

[QUOTE=TheJudger;283101]Dubslow:

actually I've no clue whats wrong but can you check this:[LIST][*]which nvidia driver do you have installed? (run e.g. 'nvidia-smi -a')[*]did you check 'ldd mfaktc-lin-64' or 'LD_LIBRARY_PATH=./bin ldd mfaktc-lin-64'?[/LIST]Oliver[/QUOTE]

[code]bill@Gravemind:~∰∂ nvidia-smi -a

==============NVSMI LOG==============

Timestamp : Wed Dec 21 22:24:32 2011

Driver Version : 270.41.06

Attached GPUs : 1

GPU 0:1:0
Product Name : GeForce GTX 460
Display Mode : N/A
Persistence Mode : Disabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : N/A
Inforom Version
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
PCI
Bus : 1
Device : 0
Domain : 0
Device Id : E2210DE
Bus Id : 0:1:0
Fan Speed : 40 %
Memory Usage
Total : 767 Mb
Used : 214 Mb
Free : 552 Mb
Compute Mode : Default
Utilization
Gpu : N/A
Memory : N/A
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A[/code]
I realize 270 is very old, but the .run files that NVidia provides to update the drivers have never worked for me; they wind up breaking the GUI. I install the package nvidia-current to fix it. This is Ubuntu 11.04.
[code]bill@Gravemind:~/mfaktc∰∂ LD_LIBRARY_PATH=./lib ldd ./mfaktc-lin-64
linux-vdso.so.1 => (0x00007fff41fff000)
libcudart.so.4 => ./lib/libcudart.so.4 (0x00007f51d6ecb000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f51d6b11000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f51d680a000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f51d6585000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f51d6381000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f51d6162000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f51d5f5a000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f51d5d44000)
/lib64/ld-linux-x86-64.so.2 (0x00007f51d711f000)[/code]Not sure what to look for here.

Would installing the 4.1 SDK help?

TheJudger 2011-12-22 14:42

[QUOTE=Dubslow;283118][code]bill@Gravemind:~∰∂ nvidia-smi -a
[...]
Driver Version : 270.41.06
[...]
[code][/QUOTE]
Can you try to upgrade the driver: [URL="http://developer.download.nvidia.com/compute/cuda/4_0/drivers/devdriver_4.0_linux_64_270.41.19.run"]devdriver_4.0_linux_64_270.41.19.run[/URL]

[QUOTE=Dubslow;283118]Would installing the 4.1 SDK help?[/QUOTE]
Well, if everything is right the CUDA SDK and CUDA Toolkit is [B]not[/B] needed to run the precompiled mfaktc executable. You'll need a proper driver, nothing more.

Oliver

James Heinrich 2011-12-22 16:18

Not sure if I found a bug or did something silly. I was playing around with [url=http://mersenne-aries.sili.net/252674011]M252674011[/url] (had 8 known factors, I got excited when I found another [strike]one[/strike] [strike]two[/strike] three, but they were all composites of smaller known factors :sad:). But I ran into this:[quote]no factor for M252674011 from 2^68 to 2^69 [mfaktc 0.18 barrett79_mul32]
[color=red]WARNING: can't delete the checkpoint file "mfaktc.ckp"[/color]
tf(): total time spent: 2m 26.308s[/quote]

TheJudger 2011-12-22 16:43

Hello James,

you have just discovered a bug. The good news is that it is not critical. I guess because of the short runtime there was no checkpoint written at all...

Oliver

Bdot 2011-12-22 20:41

[QUOTE=TheJudger;283187]Hello James,

you have just discovered a bug. The good news is that it is not critical. I guess because of the short runtime there was no checkpoint written at all...

Oliver[/QUOTE]

Yes, I get this too if no checkpoints were written. Nothing serious.

TheJudger 2011-12-22 20:43

I've just noticed that the text is wrong, too. Those checkpoint files are no longer named "mfaktc.ckp"...

Oliver

James Heinrich 2011-12-22 21:50

[QUOTE=TheJudger;283221]I've just noticed that the text is wrong, too. Those checkpoint files are no longer named "mfaktc.ckp"...[/QUOTE]That's what caught my attention.

Chuck 2011-12-22 21:57

Checkpoint overhead?
 
Can someone estimate what the overhead of checkpoints is? I decided several weeks ago to turn them off, as mfaktc and my computer are very stable. On rare occasions I need to reboot the computer, and I might lose an hour of processing time if I am too impatient to wait for the current bitlevels to finish.

I am wondering if a month's overhead of checkpoints is more than an hour of lost work time.

Bdot 2011-12-23 00:10

[QUOTE=Chuck;283235]Can someone estimate what the overhead of checkpoints is? I decided several weeks ago to turn them off, as mfaktc and my computer are very stable. On rare occasions I need to reboot the computer, and I might lose an hour of processing time if I am too impatient to wait for the current bitlevels to finish.

I am wondering if a month's overhead of checkpoints is more than an hour of lost work time.[/QUOTE]

I just timed CPs on a W7-64 Core i7-M620 laptop with a slow disk.

per CP:
0.01 ms for creating the checksum (CPU load)
0.2 ms writing & closing the file
1 ms for remove/rename operations for the backup file (mfakto only - mfaktc just has a remove ~ 0.2 ms)
1 ms for committing to disk (fflush);


CPs are written after a class is finished, and before more work is loaded on the GPU - so this is "idle time" for the GPU if you just run a single instance. When running more instances per GPU, then they will overlap.

So if you calculate single instance, 2 ms per CP, one CP after each class, 2 seconds per class, then you spend 0.1% of the time for writing the CP (this should be pretty much worst case). 0.1% of one month is ~ 45 min. If you lose 1h / month due to not writing CP's, you'd already be better off enabling them.
And now you can configure mfaktc to write CP's less frequently - in your case you can set it to maximum (900 s) and it will still write a CP when you abort it with ^C. Then you spend about 6 seconds per month for writing the CPs.

Still anyone running without checkpoints? :smile:

nucleon 2011-12-23 02:09

hehe ramdisk - and all those problems dissappear.

-- Craig

Chuck 2011-12-23 03:14

Thanks bdot that was very helpful. I hadn't looked at checkpoints for some time since before GPUTO72 I was "lumberjacking" in the M600,000,000 range where a TF run took around a minute (I was using chalsall's MORE_CLASSES disabled version).

I went with 600 as the checkpoint delay. It's nice that one is taken after a CTRL-C.

chalsall 2011-12-23 03:46

[QUOTE=Chuck;283266](I was using chalsall's MORE_CLASSES disabled version)[/QUOTE]

That wasn't me, Guv.

kladner 2011-12-23 04:12

[QUOTE=chalsall;283272]That wasn't me, Guv.[/QUOTE]

That would have been "mfaktc171apsen.cuda40.sm_multi.LESS_CLASSES", maybe?

Chuck 2011-12-23 13:19

Oh that's right chalsall is the GPUTO72 author — anyway there was a post somewhere with the MORE_CLASSES disabled or LESS_CLASSES enabled and I picked up the executable and used it for a couple of months.

TheJudger 2011-12-23 15:39

[QUOTE=Chuck;283310]Oh that's right chalsall is the GPUTO72 author — anyway there was a post somewhere with the MORE_CLASSES disabled or LESS_CLASSES enabled and I picked up the executable and used it for a couple of months.[/QUOTE]

I've posted an executable without MORE_CLASSES [URL="http://www.mersenneforum.org/showpost.php?p=273900&postcount=363"]here[/URL] (mfaktc 0.17).

Oliver

Radikalinsky 2011-12-25 03:29

I just found a factor with 0.18:

[QUOTE]M52248761 has a factor: 3708847255636615579439 [TF:70:72*:mfaktc 0.18 barrett79_mul32]
found 1 factor for M52248761 from 2^70 to 2^72 (partially tested) [mfaktc 0.18 barrett79_mul32]
[/QUOTE]Obviously the prime server does not yet like the nice new accurate messages from version 0.18.

[QUOTE]
No factor lines found: 0
Mfaktc no factor lines found: 0
Mfakto no factor lines found: 0
Factors found: 1
Processing result: M52248761 has a factor: 3708847255636615579439
Insufficient information for accurate CPU credit. For stats purposes, assuming factor was found using P-1 with B1 = 800000.
CPU credit is 2.4586 GHz-days.
P-1 lines found: 0
LL lines found: 0
Mlucas lines found: 0
Glucas (G29) lines found: 0
Glucas lines found: 0
MacLucasFFTW lines found: 0
CUDALucas lines found: 0
ECM lines found: 0
[/QUOTE]

Edit: Ok, I just saw that this is on James Heinrich's todo list. Sorry

kladner 2011-12-25 03:35

[QUOTE=Radikalinsky;283459]I just found a factor with 0.18:


Obviously the prime server does not yet like the nice new accurate messages from version 0.18.[/QUOTE]

I saw this once. I think it occurred when I uploaded the result before the second, "end of level" line was generated. As in:

[CODE]M52279247 has a factor: 1525757169405396899617 [TF:70:71:mfaktc 0.18 barrett79_mul32]
found 1 factor for M52279247 from 2^70 to 2^71 [mfaktc 0.18 barrett79_mul32]
[/CODE]

Radikalinsky 2011-12-25 04:04

@Kladner,
I manually submitted both lines. Maybe it is because with partial tests the primenet server does some assumptions. But as I understand, the primenet server just does not yet understand all the details of the mfaktc message, both 0.17 and 0.18.

Thanks, Rad

Dubslow 2011-12-25 04:47

[QUOTE=kladner;279843]
cmd.exe /k "start /b /low /affinity 0x20 mfaktc-win-64.exe"

cmd.exe /k "start /b /low /affinity 0x10 mfaktc-win-64.exe"

cmd.exe /k "start /b /low /affinity 0x08 mfaktc-win-64.exe"
[/QUOTE]
Back to Windows for the time being: While I don't run a web server and can't play Minecraft, I can play all things Steam, such as TF2. Anyways kladner, I've discovered the problem with this is that ^C's are sent to cmd.exe, not mfaktc -- so I get a new prompt, which eventually gets overridden by an mfaktc class output. Just so I could say I tried, 'mfaktc-win-64.exe /affinity 0x80' doesn't work -- no surprises there. I'm running this from a shortcut. Edit: Just tried 'start /b /low /affinity 0x08 mfaktc-win-64.exe', but it again said 'Target not found'. Unfortunate.

kladner 2011-12-25 05:48

[QUOTE]Edit: Just tried 'start /b /low /affinity 0x08 mfaktc-win-64.exe', but it again said 'Target not found'.[/QUOTE]

Are you sure you're in the right directory? Check the Properties of the shortcut and look at the "Start in:" line. I have not tried the shortcut approach. I would have thought that the command line in the quote above might have worked.

Yes. Ctl-C does not get through to mfaktc in the batch scenario. Ctl-Break will stop it. Probably not ideal, but I don't stop mfaktc, or let it stop that often.

kladner 2011-12-25 06:02

[QUOTE=Radikalinsky;283462]@Kladner,
I manually submitted both lines. Maybe it is because with partial tests the primenet server does some assumptions. But as I understand, the primenet server just does not yet understand all the details of the mfaktc message, both 0.17 and 0.18.

Thanks, Rad[/QUOTE]

OK. It seems to work in some situations. The lines I quoted above were from results.txt.

This is what shows in my Results Details page on the server:

[CODE]Manual testing 52279247 F 2011-12-24 05:43 0.0 1525757169405396xxxxx 6.2631

Manual testing 52279247 NF 2011-12-24 05:43 0.0 no factor for M52279247 from 2^69 to 2^70 [mfaktc 0.18 barrett79_mul32] 2.2870
[/CODE]
When I encountered the "partial credit" situation, the two lines were submitted separately, in two different uploads. The second upload didn't produce any further results.

These are only attempts at deduction on my part. But PrimeNet does not seem to balk at all Factor Found reports from mfaktc 0.18.


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.