mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

James Heinrich 2017-08-18 17:32

That will send the output to mfa.txt and then to the screen once mfaktc is finished. Not simultaneously.

chalsall 2017-08-18 18:23

[QUOTE=kriesel;465817]The plot thickens; the -append option is not present in command line tee help on Win7, but is in Win10. I think I'm well off the thread topic now so won't go into it any further.[/QUOTE]

Still, this is an interesting subject...

Like I said, I don't use Winblows (several of my clients do). Does the Winblows shell have the "man" command? For example, under Linux in a shell you can type "man tee" and get [URL="http://man7.org/linux/man-pages/man1/tee.1.html"]documentation[/URL]. Everything from userspace down to deep system functions for programmers. [URL="http://man7.org/linux/man-pages/man3/printf.3.html"]printf()[/URL] or [URL="http://man7.org/linux/man-pages/man2/fork.2.html"]fork()[/URL], for example.

One thing to note: at least under Linux (and the non-free Unix's from the past) the options for append using tee are either "-a" or "--append". Please note the double dashes for the latter.

I have to say I find it a bit amusing that Winblows is finally catching up with Unix for those who script.

chalsall 2017-08-18 18:28

[QUOTE=James Heinrich;465876]That will send the output to mfa.txt and then to the screen once mfaktc is finished. Not simultaneously.[/QUOTE]

That depends on how much data is written to STDOUT (and the associated buffer size), and if the program uses [URL="http://man7.org/linux/man-pages/man3/fflush.3.html"]fflush[/URL] on the stream. :wink:

kriesel 2017-09-12 01:14

mfaktc failing to run on Geforce GTX1070
 
Windows 64-bit CUDA6.5 V0.21 Feb-5-2015 version
Or the V0.20 equivalent, produce an error, early in self-test.

mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
Checkpoints enabled
CheckpointDelay 300s
WorkFileAddDelay 600s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID Kriesel
ComputerID (none)
AllowSleep yes
TimeStampInResults no

CUDA version info
binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 8.0

CUDA device info
name GeForce GTX 1070
compute capability 6.1
max threads per block 1024
max shared memory per MP 98304 byte
number of multiprocessors 15
clock rate (CUDA cores) 1708MHz
memory clock rate: 4004MHz
memory bus width: 256 bit

Automatic parameters
threads per grid 983040
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

########## testcase 1/2867 ##########
Starting trial factoring M50804297 from 2^67 to 2^68 (0.59 GHz-days)
Using GPU kernel "75bit_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Sep 11 19:36 | 3387 0.1% | 0.001 n.a. | n.a. 82485 n.a.%
ERROR: cudaGetLastError() returned 8: invalid device function

CUDALucas and CUDAPm1 run fine on the same gpu.
Another GPU (a GTX480) runs mfaktc 0.20 just fine.

Ideas?

MrRepunit 2017-09-12 05:52

[QUOTE=kriesel;467596]
CUDA version info
binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 8.0
[/QUOTE]

You need to compile mfaktc for CUDA 8 by adding
[CODE]NVCCFLAGS += --generate-code arch=compute_61,code=sm_61 # CC 6.x GPUs will use this code
[/CODE]to the makefile and install the CUDA 8 SDK.

Edit: I cannot provide with Windows binaries (only Linux), but probably somebody has uploaded it within this thread.

Hope this helps.

storm5510 2017-09-12 16:21

[QUOTE=kriesel;467596]Windows 64-bit [B]CUDA6.5[/B] V0.21 Feb-5-2015 version
Or the V0.20 equivalent, produce an error, early in self-test.

mfaktc v0.21 (64bit built)

[/QUOTE]

Is there a special reason you are using CUDA 6.5 instead of 8?

kriesel 2017-09-12 18:20

[QUOTE=MrRepunit;467605]You need to compile mfaktc for CUDA 8 by adding
[CODE]NVCCFLAGS += --generate-code arch=compute_61,code=sm_61 # CC 6.x GPUs will use this code
[/CODE]to the makefile and install the CUDA 8 SDK.

Edit: I cannot provide with Windows binaries (only Linux), but probably somebody has uploaded it within this thread.

Hope this helps.[/QUOTE]

Thanks for the responses.

There are a number of precompiled versions for CUDA 4.2, 6.5, or 8.0, available for Mfaktc at [URL]http://www.mersennewiki.org/index.php/Mfaktc#Resources[/URL]

It's my understanding that a CUDA 8 capable driver is able to support many earlier versions of software and lower compute capability of card.

If I had the reverse situation, a CUDA 6.5 capable driver, and software compiled to require at least CUDA 8 driver, I would need to upgrade the driver.

On other software, I have run as low as V4.0 dlls and software with CUDA 8 capable drivers on this GPU and other gpus. Generally, an exact match is not a requirement, backward compatibility over a wide range is provided. For example, a CUDA 5.0 version of CUDAPm1 runs fine on the same gpu and CUDA 8.0 capable driver:
CUDAPm1 v0.20
Warning: Couldn't parse ini file option UnusedMem; using default.
------- DEVICE 0 -------
name GeForce GTX 1070
Compatibility 6.1
clockRate (MHz) 1708
memClockRate (MHz) 4004
totalGlobalMem 8589934592
totalConstMem 65536
l2CacheSize 2097152
sharedMemPerBlock 49152
regsPerBlock 65536
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 2048
multiProcessorCount 15
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 2147483647,65535,65535
textureAlignment 512
deviceOverlap 1

CUDA reports 7991M of 8192M GPU memory free.
Index 73
Using threads: norm1 32, mult 32, norm2 32.
Using up to 4360M GPU memory.
Selected B1=1010000, B2=32572500, 5.78% chance of finding a factor
Starting stage 1 P-1, M91001161, B1 = 1010000, B2 = 32572500, fft length = 5120K

CUDALucas both 32bit CUDA5.5 and 64-bit CUD6.0 run on it too. (In fact I've benchmarked it on all 17 flavors of May 5 2017 2.06beta )

CUDALucas v2.06beta 32-bit build, compiled May 5 2017 @ 12:32:36

binary compiled for CUDA 5.50
CUDA runtime version 5.50
CUDA driver version 8.0

------- DEVICE 0 -------
name GeForce GTX 1070
UUID **64-bit only on Windows**
ECC Support? Disabled
Compatibility 6.1
clockRate (MHz) 1708
memClockRate (MHz) 4004
totalGlobalMem 4294967295
totalConstMem 65536
l2CacheSize 2097152
sharedMemPerBlock 49152
regsPerBlock 65536
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 2048
multiProcessorCount 15
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 2147483647,65535,65535
textureAlignment 512
deviceOverlap 1
pciDeviceID 0
pciBusID 3

You may experience a small delay on 1st startup to due to Just-in-Time Compilation

Using threads: square 256, splice 128.
Starting M79341173 fft length = 4608K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Jun 12 21:43:10 | M79341173 50000 0x5670ca9237d7c904 | 4608K 0.05273 5.6778 283.89s | 5:05:03:23 0.06% |
| Jun 12 21:48:08 | M79341173 100000 0xc7deb1ca3091a0ff | 4608K 0.04785 5.9326 296.63s | 5:07:46:55 0.12% |



batch wrapper reports CUDALucas2.06beta-CUDA6.0-Windows-x64 -d 0(re)launch at Sat 09/02/2017 13:12:35.10

CUDALucas v2.06beta 64-bit build, compiled May 5 2017 @ 12:59:32

binary compiled for CUDA 6.0
CUDA runtime version 6.0
CUDA driver version 8.0

------- DEVICE 0 -------
name GeForce GTX 1070
UUID GPU-9b15b648-ccfe-f878-b7cb-2bba3cffd5b1
ECC Support? Disabled
Compatibility 6.1
clockRate (MHz) 1708
memClockRate (MHz) 4004
totalGlobalMem 8589934592
totalConstMem 65536
l2CacheSize 2097152
sharedMemPerBlock 49152
regsPerBlock 65536
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 2048
multiProcessorCount 15
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 2147483647,65535,65535
textureAlignment 512
deviceOverlap 1
pciDeviceID 0
pciBusID 3

You may experience a small delay on 1st startup to due to Just-in-Time Compilation

Using threads: square 256, splice 128.
Starting M75316289 fft length = 4096K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| Sep 02 13:20:44 | M75316289 100000 0xef47fad89747c3f4 | 4096K 0.23438 4.8794 487.94s | 4:05:56:54 0.13% |
| Sep 02 13:28:52 | M75316289 200000 0x26966af002b3846b | 4096K 0.21875 4.8795 487.95s | 4:05:48:51 0.26% |
| Sep 02 13:37:00 | M75316289 300000 0x94eeb2ce0af176ef | 4096K 0.21875 4.8800 488.00s | 4:05:40:55 0.39% |
I can and will though try a CUDA 8 version of Mfaktc on this setup.

Usually I run about CUDA 6.5 mersenne code, because on most of my gpus that is faster most of the time.

storm5510 2017-09-17 15:45

I had the errors below occur over a 30 minute period yesterday evening:

[CODE]ERROR: cudaGetLastError() returned 4: unspecified lauch failure
ERROR: cudaGetLastError() returned 30: unspecified lauch failure[/CODE]

Not knowing the exact source, I restarted the machine and then updated the drivers. This 'appears' to have solved the problem. I have been running [I]mfaktc[/I] ten months and this is the first issue to arise. The hardware is a GTX-480 with Windows 10 Pro, x64.

Does anyone have any ideas regarding the cause?

kriesel 2017-09-18 13:12

[QUOTE=storm5510;467953]I had the errors below occur over a 30 minute period yesterday evening:

[CODE]ERROR: cudaGetLastError() returned 4: unspecified lauch failure
ERROR: cudaGetLastError() returned 30: unspecified lauch failure[/CODE]Not knowing the exact source, I restarted the machine and then updated the drivers. This 'appears' to have solved the problem. I have been running [I]mfaktc[/I] ten months and this is the first issue to arise. The hardware is a GTX-480 with Windows 10 Pro, x64.

Does anyone have any ideas regarding the cause?[/QUOTE]

What version were you running when these occurred?

storm5510 2017-09-18 16:03

[QUOTE=kriesel;468029]What version were you running when these occurred?[/QUOTE]

v0.21 running an exponent in the 149M range.

monst 2017-09-18 19:13

Request for new binaries
 
[STRIKE]Are there Windows binaries of mfaktc available for CUDA 9.0?[/STRIKE]

Disregard this request. I had a hiccup while upgrading an NVIDIA driver and it wiped out the CUDA runtime. A clean install of the driver got me back to CUDA 8.0.

storm5510 2017-09-19 05:08

1 Attachment(s)
[QUOTE=monst;468048][STRIKE]Are there Windows binaries of mfaktc available for CUDA 9.0?[/STRIKE]

Disregard this request. I had a hiccup while upgrading an NVIDIA driver and it wiped out the CUDA runtime. A clean install of the driver got me back to CUDA 8.0.[/QUOTE]

This is a snip from a startup.

TheJudger 2017-09-19 18:06

[QUOTE=storm5510;467953]I had the errors below occur over a 30 minute period yesterday evening:

[CODE]ERROR: cudaGetLastError() returned 4: unspecified lauch failure
ERROR: cudaGetLastError() returned 30: unspecified lauch failure[/CODE]

Not knowing the exact source, I restarted the machine and then updated the drivers. This 'appears' to have solved the problem. I have been running [I]mfaktc[/I] ten months and this is the first issue to arise. The hardware is a GTX-480 with Windows 10 Pro, x64.

Does anyone have any ideas regarding the cause?[/QUOTE]

[U]One possibility[/U]: aging hardware :sad:

Oliver

kladner 2017-09-19 21:09

Blown capacitors are a frequent culprit. I had a card blow a cap. Fortunately, it was under warranty. I have replaced caps in an LCD monitor, with guidance from a YouTube tutorial.

"Blown" is descriptive in this case. Parts fail by spewing electrolyte. Failed parts are usually easy to spot.

storm5510 2017-09-20 04:43

[QUOTE=TheJudger;468110][U]One possibility[/U]: aging hardware :sad:

Oliver[/QUOTE]

This machine, with the exception of the GTX-480, is three months old.

After my initial posting about these errors, I noticed a Notepad document was open. It was the [I]results.txt[/I] file. It had been loaded for several hours. I was under the impression that Notepad opens a 'copy' of the original, and then closes it. This must not be the case. I rarely open it with [I]mfaktc[/I] running, but sometimes, I do when I see the program will not attempt a 'write' operation for a few minutes.

Bottom line, I believe I created my own error by not paying attention. After the restart, and the driver update, these errors did not occur again. More than likely, the update was not needed, but it had been a while since I updated the drivers. I always get them from NVidia.

AK76 2017-10-19 21:55

The same sample on ASUS 1080 Ti ROG Strix OC

CUDA version info
binary compiled for CUDA 8.0
CUDA runtime version 8.0
CUDA driver version 9.10

CUDA device info
name GeForce GTX 1080 Ti
compute capability 6.1
max threads per block 1024
max shared memory per MP 98304 byte
number of multiprocessors 28
clock rate (CUDA cores) 1683MHz
memory clock rate: 5505MHz
memory bus width: 352 bit

Automatic parameters
threads per grid 917504
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144


got assignment: exp=66362159 bit_min=75 bit_max=76 (115.31 GHz-days)
Starting trial factoring M66362159 from 2^75 to 2^76 (115.31 GHz-days)
k_min = 284642124606840
k_max = 569284249220360
Using GPU kernel "barrett76_mul32_gs"

Date Time | class Pct | time ETA | GHz-d/day Sieve Rate (M/s)
Oct 19 22:52 | 9 0.3% | 6.507 1h43m | 1594.85 82485 9468.4

storm5510 2017-10-22 00:32

[QUOTE=AK76;470123]...
CUDA device info
name [COLOR=Blue][B]GeForce GTX 1080 Ti[/B][/COLOR]
compute capability 6.1...
[/QUOTE]

I considered going this route a while back but I will not. It is not so much the cost as it is the power consumption. Since I started using my old GTX 480 my utility costs have risen. Based on the APC data, it is using 220W. The GTX 1080 is rated at 180W. This is what I plan to buy within the next month. Yes, the performance is lower than the "Ti." This does not concern me. I simply want a little less load on my APC and still have decent performance. :smile:

kriesel 2017-11-03 16:20

Minimum exponent
 
Just ran across the following in V0.20 mfaktc on win 64.

[URL]http://www.mersennewiki.org/index.php/Mfaktc[/URL] appears to be in error about exponent minimum, stating 100000 (10^5) while program v0.20 with -v 1 or higher warns about minimum exponent 1000000 (10^6). With -v 0 it gives no indication why it's skipping worktodo entries. -v 1 output example:

WARNING: exponents < 1000000 are not supported!
Ignoring TF M999773 from 2^53 to 2^60!
WARNING: ignoring line 1 in "worktodo.txt"! Reason: invalid data
got assignment: exp=1000541 bit_min=66 bit_max=70 (224.06 GHz-days)
Starting trial factoring M1000541 from 2^66 to 2^67 (14.94 GHz-days)
k_min = 36873539559780
k_max = 73747079125031
Using GPU kernel "barrett76_mul32_gs"

Would someone with v0.21 installed please try to duplicate and post here whether its behavior differs regarding lower exponent limit or warning when -v is set to 0?

Who maintains the mersennewiki page for mfaktc?

LaurV 2017-11-04 02:42

There are different compilations of mfaktc 0.18, 0.20, and 0.21, some of them going as low as 10k, or (the 0.18 "bcp" version) to even 2k. The official 0.21 which I use goes down to 100k. You should not decrease the limit if you don't know what you are doing, I mean it is not as simple as changing the definition in the ".h" file and recompile, there are a couple of problems associated with it, for example, if the exponent goes down, the number of candidates for the same bitlevel increases (all factors are of the form q=2kp+1, where p is the exponent in m=2^p-1, and when p gets smaller, there are more numbers in the 2^n and 2^(n+1) interval. This creates problems with sieving and storing them. For example, if p=101, then 2p+1 is 203, and if you search for 12-bit factors of 2^101-1 (i.e. factors between 2^11 and 2^12) there are 2^11/203=2048/203=~10 candidates to look at, but if you search 12 bits factors of 2^67, then 2p+1 is 135, and you will have 2048/135=15 candidates to look at. That is why you need longer time to factor the same bitlevel when the exponent is smaller. If the exponent is too small, and the bitlevel too high, you will not have memory in the GPU to store all candidates for sieving, and you either need to sieve with the CPU (slow), or implement a different segmented GPU siever.

Also, when sieving, additional precaution needs to be in place to avoid eliminating factors. For example, if you sieve the classes with the first 10 million primes (this means primes under about 180M), you need additional tests of those primes, before using them for sieving, like, say, one of them may be a factor of the number you try to factor, and that will be lost by sieving.

In fact, I tried in the past to "push" Oliver into eliminating a couple of restrictions from mfaktc, but he was busy with new things and didn't really listened to the old guy on the other side of the globe...

For example, the most important, restriction of the exponent being prime, should be eliminated (substituted with a warning, and let the user decide, right now, the program is aborting). This may help us to factor some small mersenne numbers with composite exponent from FDB (not of interest for GIMPS, but of interest for other people, see former discussions here around). The drawback is that, in case of composite exponents, some factors are not of the form that mfaktc is searching, and they will be missed. But we do not report "no factors" for these exponents anywhere, so it doesn't matter, as long as other factors may be found. Oliver however, does not like to launch in the wild a program that can miss factors :razz:

Other important restriction is related to the exponent lowest limit (the reason of this post). By allowing a lower limit in the current version of the program, assuming you have memory in your GPU to store the classes for sieving, or assuming that you sieve with the CPU, then all it can happen is that you miss a factor which is [U]under 29 bits[/U]. This I can prove. But the factors of under 29 bits were all already found, most probably, and there is no reason why lowering the exponent limit should be delayed anymore. Again, Oliver will argue that such program will miss factors (assuming you start factoring some very low exponent from scratch, without any previous knowledge about it from the GIMPS' database).

Time will see...

storm5510 2017-11-04 06:04

[QUOTE=kriesel;470947]
Would someone with v0.21 installed please try to duplicate and post here whether its behavior differs regarding lower exponent limit or warning when [B]-v[/B] is set to 0?
[/QUOTE]

There is nothing in [I]mfaktc.ini[/I] for v0.21 which references "-v."

kriesel 2017-11-07 03:17

[QUOTE=storm5510;471011]There is nothing in [I]mfaktc.ini[/I] for v0.21 which references "-v."[/QUOTE]

That's true of the ini file and readme.txt for v0.20 also. But it is listed if you run mfaktc -h.
Try that.

storm5510 2017-11-07 03:48

[QUOTE=kriesel;471215]That's true of the ini file and readme.txt for v0.20 also. But it is listed if you run mfaktc -h.
Try that.[/QUOTE]

I knew there were some command-line switches, but did not know how to list them. Thanks! :smile:

storm5510 2017-11-07 04:57

[QUOTE=kriesel;470947]...Would someone with v0.21 installed please try to duplicate and post here whether its behavior differs regarding lower exponent limit or warning when -v is set to 0?[/QUOTE]

There is a "no-classes" variant of [I]mfaktc[/I] on James Heinrich's [I]mersenne.ca[/I] site which updates the screen at intervals of 1.0% to 1.1% instead of 0.1%. It is v0.21. Not sure about anything for CUDA 65. The link is on his assignment page.

I tried the -v switches on my hardware. They had no affect. Of course, it is a bit out-of-date.

James Heinrich 2017-11-07 05:14

[QUOTE=storm5510;471234]There is a "no-classes" variant of [I]mfaktc[/I][/QUOTE]I believe you mean the "less-classes" version, which is available in the "extra versions" zip file:
[url]http://download.mersenne.ca/mfaktc/mfaktc-0.21[/url]
[url]http://mersenneforum.org/mfaktc/mfaktc-0.21/[/url]
The regular version of mfaktc uses 4620 classes, the LessClasses version only uses 420.

storm5510 2017-11-09 02:56

[QUOTE=James Heinrich;471236]I believe you mean the "less-classes" version, which is available in the "extra versions" zip file:
[URL]http://download.mersenne.ca/mfaktc/mfaktc-0.21[/URL]
[URL]http://mersenneforum.org/mfaktc/mfaktc-0.21/[/URL]
The regular version of mfaktc uses 4620 classes, the LessClasses version only uses 420.[/QUOTE]

I apologize for the error. :blush:

I was not aware there is a fixed number of classes in each. Thank you for the correction and the information. :smile:

Rodrigo 2017-11-22 18:13

I'm setting up TF on a GeForce GTX 1050 in a brand-new computer running Kubuntu. I've downloaded and extracted the mfaktc-0.21 TAR file.

Below is the output. What do I need to do to get this working?

[CODE]mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
Checkpoints enabled
CheckpointDelay 30s
WorkFileAddDelay 600s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID (none)
ComputerID (none)
AllowSleep no
TimeStampInResults no

CUDA version info
binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 9.0

CUDA device info
name GeForce GTX 1050
compute capability 6.1
max threads per block 1024
max shared memory per MP 98304 byte
number of multiprocessors 5
clock rate (CUDA cores) 1455MHz
memory clock rate: 3504MHz
memory bus width: 128 bit

Automatic parameters
threads per grid 655360
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

running a simple selftest...
ERROR: cudaGetLastError() returned 8: invalid device function[/CODE]

I'm new at Linux but the idea is to make it my main, work computer over time. The more details, the better. :smile:

MrRepunit 2017-11-22 19:21

[QUOTE=Rodrigo;472259]
Below is the output. What do I need to do to get this working?

[CODE]
CUDA version info
binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 9.0

CUDA device info
name GeForce GTX 1050
[/CODE][/QUOTE]

Simple answer: you need to download the CUDA 8 version as the 10xx GPUs do not support older CUDA compilations.

kriesel 2017-11-22 19:42

[QUOTE=Rodrigo;472259]I'm setting up TF on a GeForce GTX 1050 in a brand-new computer running Kubuntu. I've downloaded and extracted the mfaktc-0.21 TAR file.

Below is the output. What do I need to do to get this working?

[CODE]mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
Checkpoints enabled
CheckpointDelay 30s
WorkFileAddDelay 600s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID (none)
ComputerID (none)
AllowSleep no
TimeStampInResults no

CUDA version info
binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 9.0

CUDA device info
name GeForce GTX 1050
compute capability 6.1
max threads per block 1024
max shared memory per MP 98304 byte
number of multiprocessors 5
clock rate (CUDA cores) 1455MHz
memory clock rate: 3504MHz
memory bus width: 128 bit

Automatic parameters
threads per grid 655360
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

running a simple selftest...
ERROR: cudaGetLastError() returned 8: invalid device function[/CODE]I'm new at Linux but the idea is to make it my main, work computer over time. The more details, the better. :smile:[/QUOTE]

Start by checking the -d value in your command line. If you are trying to use the first or only gpu in the system, that's device zero not one. Other things to check are that the gpu driver installed successfully and is the NVIDIA supplied driver, not the one linux will install. (On debian, I found it easier to start over with a Windows install than to resolve that driver type in linux.) In some cases, mfaktc requires a match of the CUDA level between dlls and driver. Review past few pages of posts in this thread for other ideas.

On a gtx1070, I found I needed the cuda levels to match, as follows.
CUDA version info
binary compiled for CUDA 8.0
CUDA runtime version 8.0
CUDA driver version 8.0
That was accomplished by running the less-classes-cuda8 version of mfaktc.

On a gtx480, no such limitation:
CUDA version info
binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 8.0

CUDA device info
name GeForce GTX 480
compute capability 2.0
maximum threads per block 1024
number of multiprocessors 15 (480 shader cores)
clock rate 1451MHz

Rodrigo 2017-11-22 20:20

Thank you @MrRepunit and @kriesel for the valuable info.

I checked and the NVIDIA driver had in fact been properly installed, so I went ahead and downloaded the CUDA 8 version of mfaktc. So far, it's running great, yielding about 248 Ghz-d/day.

bayanne 2017-11-24 06:24

Can anyone help in getting mfaktc compiled for Mac for me?

kriesel 2017-11-24 20:17

[QUOTE=kriesel;472264] In some cases, mfaktc requires a match of the CUDA level between dlls and driver. [/QUOTE]

Well, not exactly. That was my imprecise interpretation of what I've seen here.

Executable and dlls must match on CUDA levels.
Driver must support at least the CUDA level of the executable and dlls.

Driver must support the compute capability and card type used. (Note v9 driver drops some older types, and later drivers are likely to drop more.)
Only one NVIDIA driver can be loaded at a time, on Windows at least. This can become an issue for systems with a mix of gpu models installed, limiting maximum driver version as well as minimum driver version.

Mfaktc seems to be more particular than CUDALucas or CUDAPm1, in my experience, on GTX10xx. Mfaktc almost runs on CUDA6.5 executables and dlls on GTX1070 with a CUDA 8 driver;
########## testcase 1/2867 ##########
Starting trial factoring M50804297 from 2^67 to 2^68 (0.59 GHz-days)
Using GPU kernel "75bit_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Sep 11 19:36 | 3387 0.1% | 0.001 n.a. | n.a. 82485 n.a.%
ERROR: cudaGetLastError() returned 8: invalid device function
(program terminates)


I've run benchmarks from CUDA 4.0 up to 8.0 executables on GTX10xx in CUDALucas, and in CUDAPm1, V0.20 CUDA level 5.0 on GTX1070, CUDA 5.5 on GTX1050Ti, CUDA 5.0 on GTX480; mfaktc CUDA 4.2, 6.5 and 8.0 executables on GTX480.

This whole question was a useful review for me, since I've been contemplating branching out from running entirely LL on the GTXxxxx to Mfaktc and CUDAPm1 also. If it saves someone else a little time getting theirs going, so much the better.

bayanne 2017-11-25 06:32

I am running CUDA 8.0.90, GPU driver 10.10.14
I had run mfaktc on a Mac in the past with CUDA 6, but wish to run it afresh on NVIDIA GeForce GTX 775M. Not the fastest card about I know, but I 'may' have the opportunity of a breakout box fitted with a 1080ti.

Do I gather that there is NO ONE running mfaktc on a Mac at present.

Does anyone have any compiling skills to help me out with?

Luis 2017-11-30 18:43

Hello! Finally I started TF'ing my exponent by mfaktc-0.21. :)

[URL]http://www.mersenneforum.org/showthread.php?t=21404&highlight=750ti[/URL]


I reported the [I]results.txt[/I] file (from 2^76 to 2^77) and got granted 45.8941 GHz-days. Now I'll run from 2^77 to 2^81. ;)

I'm just wondering: how do you know if someone cheats reporting "no factor" for non-tested exponents? Is there something that I didn't understand?

James Heinrich 2017-11-30 18:56

You don't.
It's very easy to check if someone reports a factor whether it's true or not.
It's impossible to know if there's really no factor in that range without running the whole test again (and then how do we know the person who did the double-check is also telling the truth...)

In my opinion the only way to minimize the problem is to not give credit for TF that doesn't find factors. Just credit finding factors only, make no public record of how much no-factor TF any user does, and there's no incentive to submit false results.

That of course assumes that false no-factor reports in question are malicious. If there is a false report of no-factor in a range due to hardware or software error that's another problem entirely.

In practice, lower ranges of still-unfactored exponents might get another pass of TF in a few decades when the realtime involved is trivial. Previously-missed TF factors have been found in the past by random re-checks.

kriesel 2017-11-30 19:14

[QUOTE=James Heinrich;472780]
Previously-missed TF factors have been found in the past by random re-checks.[/QUOTE]

Also P-1 factoring may turn up a missed trial factor.

Mark Rose 2017-11-30 20:00

[QUOTE=James Heinrich;472780]In my opinion the only way to minimize the problem is to not give credit for TF that doesn't find factors. Just credit finding factors only, make no public record of how much no-factor TF any user does, and there's no incentive to submit false results.[/QUOTE]

I'd be in favour of this change with regards to credit given. I think having a public record of work done is useful though because it has proven useful in the past to redo TF work done by a dodgy machine (I found many factors).

If people want lots of results, they could easily do the high exponents and find factors more quickly, so hiding no-factor work probably wouldn't help much.

James Heinrich 2017-11-30 20:19

[QUOTE=Mark Rose;472786]If people want lots of results, they could easily do the high exponents and find factors more quickly[/QUOTE]Such as my [URL="http://www.mersenne.ca/tf1G.php"]TF>1000M subproject[/URL] where you can find still find 1-2 factors per minute in the current ranges. :smile:

Luis 2017-12-01 13:15

Not getting credit will not prevent malicious users from messing your databases up. This is my fear.


Some questions:

1) Can I pause trial factoring? Ctrl+C? There is "a bit" of video lag.
2) How do I skip TF on P95 after my GPU will complete all the remaining TF tests? I thought PrimeNet would have aborted my TF by CPU (currently paused).
2) How can I run TF for >1000M exponents? Is manual assignment enough?

James Heinrich 2017-12-01 13:45

1) You can pause TF at any time. Easiest way (on Windows at least) is to hit the Pause/Break key (ie the one above PageUp on your keyboard) and that will actually pause it, any other key will resume it. Ctrl-C will terminate close mfatkc (after it finishes the current class) and it will write a checkpoint so it can resume from there next time you start it (assuming Checkpoints=1 in mfaktc.ini)

2) If you have been (somehow?) assigned an assignment that is insufficiently TF'd and Prime95 wants to do more TF on it (this would surprise me greatly -- what exponent is it?) but you are doing the extra TF on your GPU, you can close Prime95, edit the Prime95 worktodo.txt and change the current TF limit to the new limit that you have TF'd on your GPU, then restart Prime95. But I'm curious as to what assignment you have been assigned that would make Prime95 do TF?

3) Go to [url]http://www.mersenne.ca/tf1G.php[/url] and click "Get Assignments" at the top. Read all the tips in the box on the right. You can do manual assignments if you choose, but I would highly recommend the provided "simple Windows batch file for automated work fetch/submit" that's posted there because there's a huge amount of churn, assignments only take a very few seconds so you can easily go through tens of thousands of assignments per day.

Luis 2017-12-01 14:49

1) I'm on Linux. So i will try Ctrl+C. Checkpoints=1 in my mfaktc.ini.

2) A world record LL test: [URL]https://www.mersenne.org/report_exponent/?exp_lo=333467413&full=1[/URL]

3) Maybe I can write a bash script. However, I don't think to run 24/7 for now.

James Heinrich 2017-12-01 15:03

1) Pause/Break may work on linux as well, I don't know, but it's worth a quick test.
3) The Windows batch file should be pretty trivial to translate into a bash script. It basically just runs mfaktc in a loop and when it runs out of work it uses curl and wget to send results and retrieve new assignments. If you do translate it please send me a copy so I can make it available for others.

On a side note, besides it being my pet project, I find running small TF assignments works very well with my GPU, zero impact in normal use; I only pause it for some games, whereas TF in the normal ~80M range to higher bit depths causes noticeable GUI lag, no matter what mfakto.ini settings I set.

kriesel 2017-12-01 16:34

[QUOTE=James Heinrich;472855]1) Pause/Break may work on linux as well, I don't know, but it's worth a quick test.
3) The Windows batch file should be pretty trivial to translate into a bash script. It basically just runs mfaktc in a loop and when it runs out of work it uses curl and wget to send results and retrieve new assignments. If you do translate it please send me a copy so I can make it available for others.

On a side note, besides it being my pet project, I find running small TF assignments works very well with my GPU, zero impact in normal use; I only pause it for some games, whereas TF in the normal ~80M range to higher bit depths causes noticeable GUI lag, no matter what mfakto.ini settings I set.[/QUOTE]

Hmm, on most of my laptop keyboards, not only is the pg up key in the top row of keys, but there is no key present with label containing pause or break. I use one such laptop to manage many GPU-equipped systems via remote desktop.

James Heinrich 2017-12-01 16:46

[QUOTE=kriesel;472859]Hmm, on most of my laptop keyboards, not only is the pg up key in the top row of keys, but there is no key present with label containing pause or break. I use one such laptop to manage many GPU-equipped systems via remote desktop.[/QUOTE][url]https://en.wikipedia.org/wiki/Break_key#Keyboards_without_Break_key[/url]

kriesel 2017-12-01 16:53

Mfaktc throughput variables
 
1 Attachment(s)
Hi,

Finally got around to investigating Mfaktc consistently indicating 98% GPU usage in GPU-Z., rather than the 100% that CUDAPm1 and CUDALucas typically produce There are slight gains in total throughput to be had by running multiple instances. There are confounding factors, including:

Definitely:
Exponent being run (substantial effect)
Factoring depth being run (minor effect)
Overhead of running the screen display with the same GPU (varying effect)

Probably:
GPU clock throttling due to thermal limits

Potentially, at high exponent or instance count:
memory constraints (although mfaktc doesn't use much memory per instance at current wavefront exponents)

Load sharing for multiple instances appears to be not quite equal; they have slightly differing throughputs.

An operational advantage of running multiple instances is being able to stop one, add work, look at results, change ini file settings, etc, without noticeable loss of throughput; the others will essentially saturate the GPU while one instance is out of action temporarily. If one instance runs out of work the other running instances run a bit faster than if it hadn't, limiting lost total throughput.

Attached results are for a GTX480.

Mark Rose 2017-12-02 01:45

[QUOTE=Luis;472854]1) I'm on Linux. So i will try Ctrl+C. Checkpoints=1 in my mfaktc.ini.

2) A world record LL test: [URL]https://www.mersenne.org/report_exponent/?exp_lo=333467413&full=1[/URL]

3) Maybe I can write a bash script. However, I don't think to run 24/7 for now.[/QUOTE]

I wrote scripts that stop mfaktc on my primary GPU when I unlock my screensaver and resume on lock.

LaurV 2017-12-02 01:56

On a related note, has anybody tried if MISFIT would get assignments from James' site? I don't see why not, with the right settings in the config file, so you don't need batch files and all the stuff. There may be some troubles with the format of the returned file (maybe MISFIT won't parse it properly) but in this case, Scott is known to fix bugs just before we report them... (no joke, hehe, it happened in the past), just PM him.

James Heinrich 2017-12-02 02:01

James has also been known to be happy to modify his site as needed, and could easily tweak the output to be MISFIT-friendly if told what needs to be changed.

Dubslow 2017-12-02 02:51

Since it doesn't seem to have been mentioned, on Linux/most-or-all other Unices, Ctrl+Z in a terminal will pause the current running job (technically: move it to background), which can be resumed at any time with the command [c]fg[/c]. I use this frequently and for a wide variety of programs in use on this forum, including mfaktc back when I ran that.

LaurV 2017-12-02 03:08

[QUOTE=James Heinrich;472904]James has also been known to be happy to modify his site as needed, and could easily tweak the output to be MISFIT-friendly if told what needs to be changed.[/QUOTE]
:tu: Why not? We notified Scott, he can take over if his time allows, I know he is a busy guy right now... Maybe this could be seen like a "re-launch" of MISFIT, with some new publicity, that could attract some guyz and galz who want to find lots of factors, and want to give some utility to their long-forgot 580s.

Luis 2017-12-02 08:50

[QUOTE=Mark Rose;472898]I wrote scripts that stop mfaktc on my primary GPU when I unlock my screensaver and resume on lock.[/QUOTE]
I don't use screensaver at home because I prefer to use all the available resources for BOINC. I type Alt+Ctrl+F1 too, I think it should help, but I'm not sure. Maybe I should turn xorg off in some way.
Anyway could you link your command or script?

[QUOTE=Dubslow;472911]Since it doesn't seem to have been mentioned, on Linux/most-or-all other Unices, Ctrl+Z in a terminal will pause the current running job (technically: move it to background), which can be resumed at any time with the command [c]fg[/c]. I use this frequently and for a wide variety of programs in use on this forum, including mfaktc back when I ran that.[/QUOTE]
Good! You confirm what I found yesterday: [url]https://askubuntu.com/questions/277714/is-there-a-super-break-key-for-bash[/url]
I didn't tried it though because I was waiting for a new TF test to test those commands. I didn't want to mess a running test.

GP2 2017-12-02 10:19

[QUOTE=Dubslow;472911]Since it doesn't seem to have been mentioned, on Linux/most-or-all other Unices, Ctrl+Z in a terminal will pause the current running job (technically: move it to background), which can be resumed at any time with the command [c]fg[/c]. I use this frequently and for a wide variety of programs in use on this forum, including mfaktc back when I ran that.[/QUOTE]

No, it merely pauses the program.

It can be run in the background with the command [c]bg[/c]. This is the same as if the program had originally been started with [c]&[/c] at the end of the command line.

If you pause multiple programs you can run the command [c]jobs[/c] to see them as a numbered list. The [c]fg[/c] and [c]bg[/c] commands can specify a particular paused program out of the list, for instance [c]fg %1[/c]

Dubslow 2017-12-02 11:01

[QUOTE=GP2;472940]No, it merely pauses the program.

It can be run in the background with the command [c]bg[/c]. This is the same as if the program had originally been started with [c]&[/c] at the end of the command line.

If you pause multiple programs you can run the command [c]jobs[/c] to see them as a numbered list. The [c]fg[/c] and [c]bg[/c] commands can specify a particular paused program out of the list, for instance [c]fg %1[/c][/QUOTE]

Ah, my ad hoc "understanding" rears its ugly head :smile:

So what you're saying is that the name "fg" is overloaded to perform the two very-similar-if-technically-not-identical tasks of resuming a paused program and restoring a background-ed (but still running) task to foreground/direct attachment to the terminal?

GP2 2017-12-02 17:38

[QUOTE=Dubslow;472943]So what you're saying is that the name "fg" is overloaded to perform the two very-similar-if-technically-not-identical tasks of resuming a paused program and restoring a background-ed (but still running) task to foreground/direct attachment to the terminal?[/QUOTE]

Yes, exactly.

If you run the [c]jobs[/c] command, the paused programs have the status "Stopped", while the programs running in the background have the status "Running". Either way, they can be made to run in the foreground with [c]fg[/c]


On a related note, programs that aren't attached to a terminal can also be paused and resumed. For example, if you have mprime start up automatically when the system boots.

First find out the process id, for example [c]ps -C mprime[/c] and let's assume it's 9876. Then you can do [c]kill -s SIGSTOP 9876[/c] to pause it, and [c]kill -s SIGCONT 9876[/c] to resume it.

You can check the state, for example with [c]ps -C mprime -o pid=,state=[/c] and the result will usually be S (for sleeping) or R (for running). After you send the SIGSTOP, the state will be T (stopped by job control signal).

Suspending the mprime program that runs automatically at system initialization can be useful if, for instance, you want to start up another mprime manually in order to run a benchmark. Just don't forget to resume it later with SIGCONT.

chalsall 2017-12-02 17:54

[QUOTE=GP2;472962]Either way, they can be made to run in the foreground with [c]fg[/c][/QUOTE]

Somewhat related... You can STOP a job from another console using the [c]kill -STOP [PID][/c] command. When I am going to be away from my main workstation for a while I often run [c]killall -STOP "Web Content"[/c] to prevent the Firefox rendering process from eating unnecessary cycles. When I return I run [c]killall -CONT "Web Content"[/c].

Edit: LOL... Cross post with GP2.

Mark Rose 2017-12-02 20:50

[QUOTE=Luis;472934]I don't use screensaver at home because I prefer to use all the available resources for BOINC. I type Alt+Ctrl+F1 too, I think it should help, but I'm not sure. Maybe I should turn xorg off in some way.
Anyway could you link your command or script?[/QUOTE]

Not much to share. I have a simple shell script that starts mfaktc in a screen session and another simple shell script that runs killall. I execute these though KDE events.

Luis 2017-12-03 13:04

1 Attachment(s)
[QUOTE=James Heinrich;472855]If you do translate it please send me a copy so I can make it available for others.[/QUOTE]

Here it is.

It works for me (Xubuntu). It should work for Debian-like OS's.

I have started to implement an option for ramdisk.
I don't know how to manage ctrl-C. I should delete some important files from ramdisk:
-worktodo.txt: unreserve exponents or copy to mfaktc original directory?
-results.txt: upload, but what if curl would fail? Or copy to mfaktc original directory?

If someone wants to modify it, he/she is welcome.

James Heinrich 2017-12-03 16:23

Thanks. I have added your script to my [url=http://www.mersenne.ca/tf1G.php?available_assignments=0]page[/url].
If anyone has suggestions for improving it please let me know.

Luis 2017-12-03 18:27

1 Attachment(s)
I found a bug in curl failure check.
If there is no internet connection, the exit status is 6, not 1.
I don't know if it is the same for your "errorlevel" variable.
Otherwise it moves results.txt entries to mfaktc-results-submitted.txt, but they are not really submitted.
Maybe it should be better the condition "the exit status is not equal to 0".
_____

I have finally implemented ramdisk option.
Now if ctrl-C is pressed, it copies worktodo.txt and it tries to send results.txt. If it fails, it copies results.txt to mfaktc original directory too.
If curl failed to submit results, then upload and delete results.txt manually before running the script again.

To exit:
1) Press ctrl-C once while mfaktc is running (script will exit because worktodo.txt is not empty; it is always empty if mfaktc finishes its work)
2) Press ctrl-C once while script is sleeping 10 seconds before downloading more work (script will exit because it received signal SIGINT)
Do not press ctrl-C otherwise. I'm not sure what it happens, but it's not good.

James Heinrich 2017-12-03 18:42

[QUOTE=Luis;473064]I found a bug in curl failure check. If there is no internet connection, the exit status is 6, not 1. I don't know if it is the same for your "errorlevel" variable. Maybe it should be better the condition "the exit status is not equal to 0".[/QUOTE]The Windows batch version uses "if errorlevel 1" which actually means "if errorlevel >= 1" (not "if errorlevel == 1"), so it will still trigger with errorlevel 6.

I have updated the bash script.

Also, 100MB is excessive for the RAM drive, I'd say 10MB is plenty (you might even get away with 5MB, but 10MB gives some safety buffer).

Luis 2017-12-03 19:46

If you change "$? -ne 0" (not equal) with "$? -ge 1" (greater or equal), the conditions would be identical.

About ramdisk space, I thought the same thing. :)

We could add a tip:
when you use ramdisk, it should be better to set maxAssignments equal to a number of assignments that are completed in 1-2 hours. So data lost would be acceptable.

James Heinrich 2017-12-03 20:11

Done and done, thanks.

TheJudger 2017-12-08 23:46

Hi,

guess this won't hurt many users for now: mfaktc + CUDA [B][U]9[/U][/B].0 + [B][U]Volta[/U][/B] architecture doesn't work. (CUDA 9.0 + Pascal seems to work fine, didn't test older architectures). For now it is unclear whether this is a bug within mfaktc or in CUDA.

Oliver

storm5510 2017-12-09 02:53

[QUOTE=James Heinrich;473058]Thanks. I have added your script to my [URL="http://www.mersenne.ca/tf1G.php?available_assignments=0"]page[/URL].
If anyone has suggestions for improving it please let me know.[/QUOTE]

Has anyone considered a PowerShell script?

James Heinrich 2017-12-09 03:07

[QUOTE=storm5510;473513]Has anyone considered a PowerShell script?[/QUOTE]I haven't, but if you care to write one feel free and I can post it.

I know basically nothing about PowerShell, so I'm not sure what advantage it would have over the extant batch file.

TheJudger 2017-12-10 00:46

please, not again!
 
[QUOTE=TheJudger;473504]Hi,

guess this won't hurt many users for now: mfaktc + CUDA [B][U]9[/U][/B].0 + [B][U]Volta[/U][/B] architecture doesn't work. (CUDA 9.0 + Pascal seems to work fine, didn't test older architectures). For now it is unclear whether this is a bug within mfaktc or in CUDA.

Oliver[/QUOTE]

Currently I'm 99% sure it is the famous sub.cc/subc 0 bug [B][COLOR="Red"]again[/COLOR][/B].
[URL="http://mersenneforum.org/showpost.php?p=402488&postcount=2549"]http://mersenneforum.org/showpost.php?p=402488&postcount=2549[/URL]

Oliver

storm5510 2017-12-11 06:13

[QUOTE=James Heinrich;473514]I haven't, but if you care to write one feel free and I can post it.

I know basically nothing about PowerShell, so I'm not sure what advantage it would have over the extant batch file.[/QUOTE]

I probably know less about [I]PowerShell[/I] than anyone here. I only know how to get it into command mode. It would be nice to have something that would not need [I]Wget[/I] and [I]CURL[/I]. [I]Wget[/I] did not work for me on Windows 10. Actually, there are a lot of things that do not work with Windows 10.

Chuck 2017-12-12 03:02

I wish we had a MISFIT for James' site.

James Heinrich 2017-12-12 03:19

I've heard of MISFIT but never used it. Tell me what needs to be changed to the API for my site and I'll make it work.

bayanne 2017-12-12 12:03

I finally managed to get mfaktc running on my Mac, with a lot of kind assistance from TheJudger. He asked me to post the method here, in case there are any other Mac users who wished to run TF on their Mac using the GPU.
I am running an iMac using Mac OSX 10.11.6 El Capitan which is running an Nvidia GeForce GTX 775M.

1. Download Xcode and install
2. Download Cuda 8.0 Toolkit (version 9.0 is only for OSX 10.12 and later) and install
3. Download mfaktc-0.21.tar.gz and unpack and place the folder on your desktop
4. You then need to modify the Makefile which is in the src subdirectory as follows

Change from

[CODE]
CFLAGS = -Wall -Wextra -O2 $(CUDA_INCLUDE) -malign-double CFLAGS_EXTRA_SIEVE = -funroll-all-loops
[/CODE]

to

[CODE]
CFLAGS = -Wall -Wextra -O2 $(CUDA_INCLUDE) CFLAGS_EXTRA_SIEVE =
[/CODE]

You have to remove/comment the line which contains

[CODE]
NVCCFLAGS += --generate-code arch=compute_11,code=sm_11
[/CODE]

because CUDA 8 completely dropped support for the first gen CUDA capable GPUs (e.g. Geforce 8000, 9000 and 200 series).

You may want to add

[CODE]
NVCCFLAGS += --generate-code arch=compute_60,code=sm_60 # CC 6.x GPUs will use this code
[/CODE]

which enables code generation for the current Pascal series GPUs (e.g. Geforce 1000 series).

nvcc is located in a different location than expected by the makefile, thus /usr/local/cuda/bin so you would need to define the path with the following statement:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}

You also need to advise the makefile of the path to lcudart so the following needs to be added to the start of the Makefile

# where is the CUDA Toolkit installed?
CUDA_DIR = /usr/local/cuda
CUDA_INCLUDE = -I$(CUDA_DIR)/include/
CUDA_LIB = -L$(CUDA_DIR)/lib/

This may already been in the makefile, in which case amend /lib64/ to /lib/

5. Now open Terminal and cd to the src subdirectory and then type

make clean
make

This should then produce a file mfaktc.exe in the top directory of mfaktc-0.21

6. Now run

./mfaktc.exe in Terminal which will run a short selftest. If it reports no problems, it will tell you there is nothing to do.

Then run

./mfaktc.exe –st which will run a longer selftest.

If this completes the selftest without a single error, then it is time to get some real work.

7. The file mfaktc.ini contains the settings, and you need to amend V5UserID
and Computer ID to those that you intend to use.

8. You then need to grab some exponents to test, from either the Manual Testing section of GIMPs or from [url]www.GPU72.com[/url]. Add these to a new file called worktodo.txt which should be added to the top directory.

9. Now start running mfaktc.exe in Terminal with the command ./mfaktc.exe

That should be that.

Thanks oince again to TheJudger :cool:

TheJudger 2017-12-22 19:48

[QUOTE=TheJudger;473504]Hi,

guess this won't hurt many users for now: mfaktc + CUDA [B][U]9[/U][/B].0 + [B][U]Volta[/U][/B] architecture doesn't work. (CUDA 9.0 + Pascal seems to work fine, didn't test older architectures). For now it is unclear whether this is a bug within mfaktc or in CUDA.

Oliver[/QUOTE]

[QUOTE=TheJudger;473583]Currently I'm 99% sure it is the famous sub.cc/subc 0 bug [B][COLOR="Red"]again[/COLOR][/B].
[URL="http://mersenneforum.org/showpost.php?p=402488&postcount=2549"]http://mersenneforum.org/showpost.php?p=402488&postcount=2549[/URL]

Oliver[/QUOTE]

Yep, bug in CUDA [B]9.0[/B] and [B]9.1[/B] for [B]Volta[/B] architecture....

Oliver

moebius 2017-12-31 05:29

contradictory output in results.txt at mfaktc-0.21
 
M332538901 has a factor: 38814612911305349835664385407 [TF:76:77:mfaktc 0.21 barrett87_mul32_gs]
no factor for M332538901 from 2^76 to 2^77 [mfaktc 0.21 barrett87_mul32_gs]


Of course, the server did not accept the first result, but this is very strange....

ATH 2017-12-31 05:57

The factor is not correct:

2[sup]332538901[/sup] - 1 = 36146327836316851267023636988 (mod 38814612911305349835664385407)

axn 2017-12-31 07:00

That is a valid (composite) factor of M3321928619

38814612911305349835664385407 = 797262868561 x 48684837137044687

Mod(2,38814612911305349835664385407)^3321928619 == 1

TheJudger 2017-12-31 12:28

Hi moebius,
[LIST][*]is this reproduceable for your setup?[*]default config (mfaktc.ini) or altered settings?[*]did this happen on a long run (several assignments without restart of mfaktc or right after the first assignment after (re-)start)?[*]which GPU?[/LIST]
As axn already mentioned: this is a valid (composite) factor for M3321928619. Why M3321928619? Because this is part of the builtin selftest which is run on every (re-)start of mfaktc. Somehow the result from the selftest isn't cleared and shown after an assignment finished. This was reported 2(?) times before, I didn't figure out why this happens yet.

Oliver

moebius 2017-12-31 13:32

1) Sorry, I don't want to try because it takes to long. I eventually have to go back to mfaktc-0.19 because of the overheating problem. (Cuda driver crashes after a few minutes of mfaktc-running)

2) one assignment with several restarts of mfaktc because of overheating errors
"ERROR: cudaGetLastError() returned 30: unknown error" and "ERROR: cudaGetLastError() returned 77: an illegal memory access was encountered"
3) default config (mfaktc.ini).

4)

mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
Checkpoints enabled
CheckpointDelay 30s
WorkFileAddDelay 600s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID (none)
ComputerID (none)
AllowSleep no
TimeStampInResults no

CUDA version info
binary compiled for CUDA 8.0
CUDA runtime version 8.0
CUDA driver version 8.0

CUDA device info
name GeForce GTX 560 Ti
compute capability 2.1
max threads per block 1024
max shared memory per MP 49152 byte
number of multiprocessors 8
CUDA cores per MP 48
CUDA cores - total 384
clock rate (CUDA cores) 1800MHz
memory clock rate: 2004MHz
memory bus width: 256 bit

Automatic parameters
threads per grid 1048576
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

running a simple selftest...
Selftest statistics
number of tests 107
successfull tests 107

TheJudger 2017-12-31 16:37

Does the card really overheat or is it just bad (broken) hardware?

Oliver

moebius 2017-12-31 17:06

Yes it's because of overheating, The temperatures rise sometimes over 100°C and no i don't think the card is defect, I let run CUDALucas as well on it... also for LL double check.
At a certain temperature, the GRAKA(CUDA)-driver simply crashes. Thats all, not so dramatic...

[URL="https://www.mersenne.org/report_exponent/?exp_lo=44714303&full=1"]44714303[/URL]

kladner 2017-12-31 18:40

Out of curiosity, did the report of a factor have only one line in results.txt? I have found that a "single line factor" is always spurious and happens when pushing the overclock too hard.

moebius 2018-01-13 00:15

[QUOTE=TheJudger;475599]Does the card really overheat or is it just bad (broken) hardware?
Oliver[/QUOTE]

I solved the problem as follows.
Core Clock and Memory Clock are now downclocked 100 MHz to the values of a GTX 560 TI NON OC.

There were no more error messages since then.


Thank you for the support

kladner 2018-01-13 16:20

[QUOTE=moebius;477410]I solved the problem as follows.
Core Clock and Memory Clock are now downclocked 100 MHz to the values of a GTX 560 TI NON OC.

There were no more error messages since then.


Thank you for the support[/QUOTE]
You might be able to reduce temps a bit more by setting the memory clock much lower. This will not impact mfaktc performance. I run both my cards 500-700 MHz under normal for memory.

storm5510 2018-01-13 17:44

[QUOTE=moebius;475609]...The temperatures rise sometimes over 100°C and no i don't think the card is defect, I let run CUDALucas as well on it... also for LL double check...[/QUOTE]

100°C is pushing that envelope pretty hard. My old GTX 480 runs around 91°C under a heavy load, [I]mfaktc[/I]. [I]CUDALucas[/I] and [I]CUDAPm1[/I], in the upper 80's. I have ran it with "SieveOnGPU" disabled. That cuts the heat and power consumption. Of course, doing this reduces the GHz-d/day nearly half.

kriesel 2018-01-13 18:46

[QUOTE=moebius;475609]Yes it's because of overheating, The temperatures rise sometimes over 100°C and no i don't think the card is defect, I let run CUDALucas as well on it... also for LL double check.
At a certain temperature, the GRAKA(CUDA)-driver simply crashes. Thats all, not so dramatic...

[URL="https://www.mersenne.org/report_exponent/?exp_lo=44714303&full=1"]44714303[/URL][/QUOTE]

From a geforce. com specifications sheet, maximum gpu temperature is 105 C for the GTX480. Quadro 4000 is also 105C; quadro 2000 102C. GTX 1070 94C; GTX 1060 94C; GTX 1050Ti 97C. All my gpus run with at least 9C temperature margin, including GTX480s in adjacent slots. Some have 30C or more of margin. Cooler electronics tend to live longer.

Max for 560Ti is 99C; 97C for limited edition. [URL]https://www.geforce.com/hardware/desktop-gpus/geforce-gtx-560ti/specifications[/URL].

Memory controller loads tend to be around 60% for LL or P-1, and only around 1% for TF, so throttling memory back considerably for TF should have little impact on throughput.

(All operating values on my hardware, obtained from GPU-Z)

ATH 2018-01-13 21:42

You should run MSI Afterburner and make sure the GPU fan is running 100% to keep the temperature down as much as possible.

kriesel 2018-01-14 00:11

[QUOTE=ATH;477472]You should run MSI Afterburner and make sure the GPU fan is running 100% to keep the temperature down as much as possible.[/QUOTE]

Case ventilation should also be checked. A well ventilated case will handle multiple GPUs and 500W of GPU power without them reaching 100C. High PCB temperatures might indicate poor case ventilation. Fans could be fine yet clearance, or pet hair or whatever cut air flow.

I found an older system running, though not well, with only one of its 3 fans operating. (One looked like it had caught fire!)

storm5510 2018-01-15 19:45

[QUOTE=ATH;477472]You should run MSI Afterburner and make sure the GPU fan is running 100% to keep the temperature down as much as possible.[/QUOTE]

I used this on my GTX 480 a few times. The fan on it, at 100%, sounded like a siren. 82% to 85% worked for me. My case has a lot of ventilation. It makes a difference.

Rodrigo 2018-01-30 05:06

How to adjust GPU memory clock in MSI Afterburner?
 
A few days ago one of my GPUs, a GeForce GT 630, completed a TF assignment overnight, no problems. In the morning I fed it a new set of exponents and went out of the office.

As this is a secondary system, I pay little attention to it except around the time when I anticipate it'll be finishing up a TF batch. So several days later I wiggled the mouse to wake up the display -- and nothing happened. The screen didn't come back after hitting any keys on the keyboard, either.

Eventually I realized that the PC was awake but not sending anything to the monitor. After a reboot and some tests, I discovered that the 630, which had been working just fine until the end of the last TF run, now could no longer run MFAKTC for more than a couple of minutes before it reached 100C and cr*pped out, requiring a reboot. Opening the PC case (for more airflow) didn't help.

The fan does spin but its speed tops out at 90%.

Now I'm trying to fiddle with the MFAKTC settings and the GPU clocks in Afterburner (version 4.4.2). Disabling SieveOnGPU allowed the card to run a little longer before going blink.

Regarding Afterburner, I could use it to dial down the [B]core[/B] clock from the default 810 MHz to 710 MHz, and that helped to slow down the process a little more, but ultimately the card is still tickling 100C, at which point only a reboot would bring back the display.

And so here's the issue. I can lower the [B]memory[/B] clock from the default 533 MHz, but -- unlike the core clock -- as soon as I start MFAKTC it jumps right back up to 533. I can't seem to find a way to make any other (lower) setting stick. Yes, I do click on "Apply" after trying to change the clock.

Why does this work with the core clock, but not the memory clock? How do I change the memory clock setting in MSI Afterburner?

Rodrigo 2018-01-30 06:27

Addendum to the above post:

I also tried dusting the inside of the PC case. Then I removed the GPU and gave it a good dose of compressed air. These steps didn't help the graphics card's situation.

Maybe it's simply time to replace that card?

Mark Rose 2018-01-30 11:55

Given how little TF your card will produce, my suggestion would be not to use it.

If you do replace it, I'd look for a GTX 1050. It should be supported by your system. Also, the more expensive cards are ridiculously priced right now.

James Heinrich 2018-01-30 13:27

If it's getting to 100C that quick, quite probably the GPU fan isn't spinning (either at all, or at the appropriate speed). Less likely are things like the heatsink becoming detached from the GPU and other mechanical failures. In any case, replacing the GPU wouldn't be a bad idea.
The GTX 1050 will give you 250% relative performance for 115% power usage.
[url]http://www.mersenne.ca/mfaktc.php?filter=gt+630|gtx+1050[/url]

Rodrigo 2018-01-30 15:50

Mark and James,

Thanks for the ideas.

The PC in question is double-booting Windows XP and Vista. A couple of months ago, I bought a 1050 for another Vista system that needed a new card, but the card wouldn't run on it; there seem to be no Vista drivers for the 1050. So I installed the 1050 in my new Linux box, where it's putting out some 250 GHz-d/day, compared to the 630's 53 GHz-d/day.

I guess I could put Linux on the XP/Vista machine to run another GTX 1050.

P.S. James, that's a pretty neat chart you put together. I didn't know one could do that on your page! :smile:

kriesel 2018-01-30 16:29

[QUOTE=Rodrigo;478807]Mark and James,

I guess I could put Linux on the XP/Vista machine to run another GTX 1050.
[/QUOTE]
Or maybe:
linux host OS + GTX1050; mprime and gpu apps run on host OS continuously;
Virtualbox environment;
Guest OS XP; Guest OS Vista; run either or both or neither
(down or pause the guest OSes when not in use, for max GIMPS throughput when system is not in use; preferably down, to release memory for P-1 or ECM if running those)

I've been meaning to play with Virtualbox as a means of readily migrating a complex main PC environment from old hardware set to new.

Rodrigo 2018-01-30 17:01

[QUOTE=kriesel;478808]Or maybe:
linux host OS + GTX1050; mprime and gpu apps run on host OS continuously;
Virtualbox environment;
Guest OS XP; Guest OS Vista; run either or both or neither
(down or pause the guest OSes when not in use, for max GIMPS throughput when system is not in use; preferably down, to release memory for P-1 or ECM if running those)

I've been meaning to play with Virtualbox as a means of readily migrating a complex main PC environment from old hardware set to new.[/QUOTE]
That's an intriguing idea. I'm still transitioning my work to the Linux computer but when that process is done, I just might look into virtualization. One step at a time...

Wonder how one would migrate an existing hardware PC environment to a virtualized one. Can you take an image of the existing HDD and use it as the basis for the virtual PC?

Sake 2018-02-05 18:39

CUDA 9.1, Linux
 
mfactc compiled with CUDA 9.1 (tested on fedora 27 Linux) doesn't seem to work. Same issues with pre-compiled mfactc versions.
The self-test fails with all kernels except GPU kernel "71bit_mul24". Compilation with different GPU architecture settings (5, 6, etc.) caused the same issues. Tested with CUDA 9.1.85.1, fedora 27 on NVIDIA TITAN V

Is it a CUDA bug? Any idea? Thanks.


# Compiler settings for .cu files (CPU/GPU)
NVCC = nvcc
NVCCFLAGS = $(CUDA_INCLUDE) --ptxas-options=-v -arch=sm_70 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70


-------------------------------------------------------------------------------------
mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 1024
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
Checkpoints enabled
CheckpointDelay 30s
WorkFileAddDelay 600s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID (none)
ComputerID (none)
AllowSleep no
TimeStampInResults no

CUDA version info
binary compiled for CUDA 9.10
CUDA runtime version 9.10
CUDA driver version 9.10

CUDA device info
name Graphics Device
compute capability 7.0
max threads per block 1024
max shared memory per MP 98304 byte
number of multiprocessors 80
clock rate (CUDA cores) 1455MHz
memory clock rate: 850MHz
memory bus width: 3072 bit

Automatic parameters
threads per grid 655360
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

########## testcase 1/2867 ##########

chalsall 2018-02-05 19:37

[QUOTE=Sake;479355]Is it a CUDA bug? Any idea?[/QUOTE]

Oliver (the author of mfaktc) confirms that this is a bug in CUDA 9.0.

He has alerted nVidia, and they've acknowledged the problem. Still no working patch yet.

Your only option is to downgrade your CUDA to 8.x (I believe; could be lower) and recompile.

TheJudger 2018-02-05 21:19

@Chris: No, he can't... his Volta GPU needs CUDA 9.0 or newer.

@Sake Above is not that bad, just run CUDALucas on your toy.

Oliver

kriesel 2018-02-05 22:08

[QUOTE=TheJudger;479375]@Chris: No, he can't... his Volta GPU needs CUDA 9.0 or newer.

@Sake Above is not that bad, just run CUDALucas on your toy.

Oliver[/QUOTE]
Or run cudapm1. It will generate results more frequently than CUDALucas. Not many are running P-1, so it's easy to be in the top 500 producers list. [URL]https://www.mersenne.org/report_top_500_p-1/[/URL]
Manual reporting only though for P-1 on GPU.
[URL="https://www.mersenne.org/report_top_500_p-1/"][/URL]

Sake 2018-02-11 18:06

mfaktc on CUDA9 , Titan V
 
Thank you for the help! Will test with cudapm1 in the meanwhile.


All times are UTC. The time now is 13:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.