mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

ixfd64 2019-05-25 23:59

I'm getting the following errors while trying to compile mfaktc on a Linux system:

[CODE]nvcc fatal : Unsupported gpu architecture 'compute_11'
Makefile:55: recipe for target 'tf_72bit.o' failed
[/CODE]

Any ideas?

GP2 2019-05-26 00:24

[QUOTE=ixfd64;517788]
Any ideas?[/QUOTE]

Just edit the Makefile and delete the lines with [c]compute_11[/c] and probably also [c]compute_20[/c]. Those are only applicable to very old GPUs.

nomead 2019-05-26 00:39

[QUOTE=ixfd64;517788]I'm getting the following errors while trying to compile mfaktc on a Linux system:

[CODE]nvcc fatal : Unsupported gpu architecture 'compute_11'
Makefile:55: recipe for target 'tf_72bit.o' failed
[/CODE]

Any ideas?[/QUOTE]
You need to modify the Makefile to make sure the compute_xx and sm_xx values match the hardware you're going to run mfaktc on. You're getting this error because newer versions of CUDA don't support compute capability 1.1 anymore (CUDA SDK 6.5 was the last one that did). But that's for an old architecture, 1.1-1.3 is for cards that are over 10 years old by now, GTX 2xx series and the like. Conversely, the newest cards (RTX20 / GTX16) will need compute capability 7.5 and that won't be supported under anything older than CUDA SDK 10.0. So find out what you need, put it in the Makefile, check that your CUDA version is OK and recompile. That should do it.

ixfd64 2019-05-27 03:07

Thanks. Commenting out the appropriate lines in the makefile resolved the issue.

ixfd64 2019-06-06 02:53

I recently got access to two Tesla V100 GPUs. However, running mfaktc gives me the following error:

[QUOTE]no kernel image is available for execution on the device[/QUOTE]

Adding CUDA Compute Capability 7.0 to the makefile solved the problem:

[code]NVCCFLAGS += --generate-code arch=compute_70,code=sm_70[/code]

Is mfaktc supposed to support Volta GPUs out of the box?

TheJudger 2019-06-07 22:23

[QUOTE=ixfd64;518678]I recently got access to two Tesla V100 GPUs. However, running mfaktc gives me the following error:



Adding CUDA Compute Capability 7.0 to the makefile solved the problem:

[code]NVCCFLAGS += --generate-code arch=compute_70,code=sm_70[/code]

Is mfaktc supposed to support Volta GPUs out of the box?[/QUOTE]

Yes, all you need to do is to modify the Makefile as you already did!

Oliver

ixfd64 2019-06-08 17:27

It might be a good idea for the makefile to have rules for different CUDA versions. For example [c]make cuda10[/c] would automatically apply the flags for compute capability 3.0 to 7.5.

Ideally, non-developers shouldn't have to know how to modify makefiles. At the very least, the documentation should probably be updated.

hansl 2019-06-16 02:45

Hi, I've just built mfaktc-0.21 against CUDA 10.1 on Linux, with a GTX 970, and I'm getting errors like this:
[code]
mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 1
CPUStreams 1
GridSize 0
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 4Mi bits
GPUSieveProcessSize 8Ki bits
Checkpoints enabled
CheckpointDelay 30s
WorkFileAddDelay 600s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID (none)
ComputerID (none)
AllowSleep no
TimeStampInResults no

CUDA version info
binary compiled for CUDA 10.10
CUDA runtime version 10.10
CUDA driver version 10.10

CUDA device info
name GeForce GTX 970
compute capability 5.2
max threads per block 1024
max shared memory per MP 98304 byte
number of multiprocessors 13
CUDA cores per MP 128
CUDA cores - total 1664
clock rate (CUDA cores) 1240MHz
memory clock rate: 3505MHz
memory bus width: 256 bit

Automatic parameters
threads per grid 106496
ERROR: cudaStreamCreate() failed for stream 0
cudaGetLastError() returned 2: out of memory
[/code]
Not sure if my settings look a little odd at this point since I've tried to lower anything that looked like it would be related to increased memory usage.

About every 1 in 5-10 attempts it actually seems to get past this error and then everything seems to run fine (I completed -st and -st2 with all tests passing). But most of the times I just get that "out of memory" error before it even gets started. Any ideas what is wrong?
Is it likely due to some improper configuration on my part when building?

Using nvidia-smi to report utilizatoin (without mfaktc running) I see memory usage of: 152MiB / 4039MiB

Is it possible it is complaining about regular system memory, not GPU memory? I have 16GB of actual RAM, which is currently all used up by another process, but there is still gobs of swap space free on this system (>300GB) from an NVMe SSD.

hansl 2019-06-23 04:26

Nevermind about the above. It seems the issue was fixed after rebooting. I had been waiting for a very long job to finish and I think I must have forgotten to reboot since installing CUDA, causing it to behave in all kinds of strange ways.

kriesel 2019-07-03 02:49

false factor 38814612911305349835664385407 generated reliably by display TDRs
 
Playing with an old slow gpu from my spare parts bin, on an also old CORE 2 DUO system, after it passes ~2900 self tests, I found that TDR timeouts reliably produce the known false factor 38814612911305349835664385407 in mfaktc:
[CODE]batch wrapper logs (re)launch of EAGLET mfaktc quadro nvs295 at Tue 07/02/2019 20:01:57.14
mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
Checkpoints enabled
CheckpointDelay 900s
WorkFileAddDelay 600s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID kriesel
ComputerID eaglet-nvs295
AllowSleep no
TimeStampInResults yes

CUDA version info
binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 6.50

CUDA device info
name Quadro NVS 295
compute capability 1.1
max threads per block 512
max shared memory per MP 16384 byte
number of multiprocessors 1
CUDA cores per MP 8
CUDA cores - total 8
clock rate (CUDA cores) 1300MHz
memory clock rate: 695MHz
memory bus width: 64 bit

Automatic parameters
threads per grid 1048576
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

running a simple selftest...
Selftest statistics
number of tests 107
successfull tests 107

selftest PASSED!

got assignment: exp=119998999 bit_min=72 bit_max=73 (7.97 GHz-days)
Starting trial factoring M119998999 from 2^72 to 2^73 (7.97 GHz-days)
k_min = 19676691147960
k_max = 39353382296711
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 02 20:02 | 0 0.1% | 3.928 1h02m | 182.63 82485 n.a.%
M119998999 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
at Tue 07/02/2019 20:02:39.49mfaktc quadro nvs295 exit logged by batch wrapper
batch wrapper logs (re)launch of EAGLET mfaktc quadro nvs295 at Tue 07/02/2019 20:02:39.53
mfaktc v0.21 (64bit built)
...

selftest PASSED!

got assignment: exp=119998999 bit_min=72 bit_max=73 (7.97 GHz-days)
Starting trial factoring M119998999 from 2^72 to 2^73 (7.97 GHz-days)
k_min = 19676691147960
k_max = 39353382296711
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 02 20:03 | 0 0.1% | 3.573 57m07s | 200.78 82485 n.a.%
M119998999 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
at Tue 07/02/2019 20:03:21.71mfaktc quadro nvs295 exit logged by batch wrapper
batch wrapper logs (re)launch of EAGLET mfaktc quadro nvs295 at Tue 07/02/2019 20:03:21.72
mfaktc v0.21 (64bit built)
...

selftest PASSED!

got assignment: exp=119998999 bit_min=72 bit_max=73 (7.97 GHz-days)
Starting trial factoring M119998999 from 2^72 to 2^73 (7.97 GHz-days)
k_min = 19676691147960
k_max = 39353382296711
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 02 20:04 | 0 0.1% | 3.604 57m36s | 199.05 82485 n.a.%
M119998999 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
at Tue 07/02/2019 20:04:03.89mfaktc quadro nvs295 exit logged by batch wrapper
batch wrapper logs (re)launch of EAGLET mfaktc quadro nvs295 at Tue 07/02/2019 20:08:00.62
mfaktc v0.21 (64bit built)
...

selftest PASSED!

got assignment: exp=119998999 bit_min=72 bit_max=73 (7.97 GHz-days)
Starting trial factoring M119998999 from 2^72 to 2^73 (7.97 GHz-days)
k_min = 19676691147960
k_max = 39353382296711
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 02 20:08 | 0 0.1% | 3.629 58m00s | 197.68 82485 n.a.%
M119998999 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
at Tue 07/02/2019 20:08:42.44mfaktc quadro nvs295 exit logged by batch wrapper
batch wrapper logs (re)launch of EAGLET mfaktc quadro nvs295 at Tue 07/02/2019 20:08:42.60
mfaktc v0.21 (64bit built)
...

selftest PASSED!

got assignment: exp=119998999 bit_min=72 bit_max=73 (7.97 GHz-days)
Starting trial factoring M119998999 from 2^72 to 2^73 (7.97 GHz-days)
k_min = 19676691147960
k_max = 39353382296711
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 02 20:09 | 0 0.1% | 3.606 57m38s | 198.94 82485 n.a.%
M119998999 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
at Tue 07/02/2019 20:09:25.30mfaktc quadro nvs295 exit logged by batch wrapper
batch wrapper logs (re)launch of EAGLET mfaktc quadro nvs295 at Tue 07/02/2019 20:09:25.36
mfaktc v0.21 (64bit built)
...

selftest PASSED!

got assignment: exp=119998999 bit_min=72 bit_max=73 (7.97 GHz-days)
Starting trial factoring M119998999 from 2^72 to 2^73 (7.97 GHz-days)
k_min = 19676691147960
k_max = 39353382296711
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 02 20:10 | 0 0.1% | 3.628 57m59s | 197.74 82485 n.a.%
M119998999 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
at Tue 07/02/2019 20:10:07.93mfaktc quadro nvs295 exit logged by batch wrapper
batch wrapper logs (re)launch of EAGLET mfaktc quadro nvs295 at Tue 07/02/2019 20:30:12.32
mfaktc v0.21 (64bit built)
...

selftest PASSED!

got assignment: exp=119998999 bit_min=72 bit_max=73 (7.97 GHz-days)
Starting trial factoring M119998999 from 2^72 to 2^73 (7.97 GHz-days)
k_min = 19676691147960
k_max = 39353382296711
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 02 20:30 | 0 0.1% | 3.557 56m51s | 201.68 82485 n.a.%
M119998999 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
at Tue 07/02/2019 20:30:54.43mfaktc quadro nvs295 exit logged by batch wrapper
batch wrapper logs (re)launch of EAGLET mfaktc quadro nvs295 at Tue 07/02/2019 20:30:54.49
mfaktc v0.21 (64bit built)
...

selftest PASSED!

got assignment: exp=119998999 bit_min=72 bit_max=73 (7.97 GHz-days)
Starting trial factoring M119998999 from 2^72 to 2^73 (7.97 GHz-days)
k_min = 19676691147960
k_max = 39353382296711
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 02 20:31 | 0 0.1% | 3.585 57m18s | 200.11 82485 n.a.%
M119998999 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
at Tue 07/02/2019 20:31:36.79mfaktc quadro nvs295 exit logged by batch wrapper
batch wrapper logs (re)launch of EAGLET mfaktc quadro nvs295 at Tue 07/02/2019 20:31:37.01
mfaktc v0.21 (64bit built)
...

selftest PASSED!

got assignment: exp=119998999 bit_min=72 bit_max=73 (7.97 GHz-days)
Starting trial factoring M119998999 from 2^72 to 2^73 (7.97 GHz-days)
k_min = 19676691147960
k_max = 39353382296711
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 02 20:32 | 0 0.1% | 4.533 1h12m | 158.26 82485 n.a.%
M119998999 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
at Tue 07/02/2019 20:32:20.23mfaktc quadro nvs295 exit logged by batch wrapper
batch wrapper logs (re)launch of EAGLET mfaktc quadro nvs295 at Tue 07/02/2019 20:44:07.67
mfaktc v0.21 (64bit built)
...

selftest PASSED!

got assignment: exp=119998999 bit_min=72 bit_max=73 (7.97 GHz-days)
Starting trial factoring M119998999 from 2^72 to 2^73 (7.97 GHz-days)
k_min = 19676691147960
k_max = 39353382296711
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 02 20:44 | 0 0.1% | 3.522 56m18s | 203.69 82485 n.a.%
M119998999 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
at Tue 07/02/2019 20:44:49.76mfaktc quadro nvs295 exit logged by batch wrapper
batch wrapper logs (re)launch of EAGLET mfaktc quadro nvs295 at Tue 07/02/2019 20:44:49.82
mfaktc v0.21 (64bit built)
...

selftest PASSED!

got assignment: exp=119998999 bit_min=72 bit_max=73 (7.97 GHz-days)
Starting trial factoring M119998999 from 2^72 to 2^73 (7.97 GHz-days)
k_min = 19676691147960
k_max = 39353382296711
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 02 20:45 | 0 0.1% | 3.556 56m50s | 201.74 82485 n.a.%
M119998999 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
at Tue 07/02/2019 20:45:32.10mfaktc quadro nvs295 exit logged by batch wrapper
batch wrapper logs (re)launch of EAGLET mfaktc quadro nvs295 at Tue 07/02/2019 20:45:32.15
mfaktc v0.21 (64bit built)
...

selftest PASSED!

got assignment: exp=119998999 bit_min=72 bit_max=73 (7.97 GHz-days)
Starting trial factoring M119998999 from 2^72 to 2^73 (7.97 GHz-days)
k_min = 19676691147960
k_max = 39353382296711
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 02 20:46 | 0 0.1% | 3.548 56m43s | 202.20 82485 n.a.%
M119998999 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
at Tue 07/02/2019 20:46:14.24mfaktc quadro nvs295 exit logged by batch wrapper
[/CODE]At this point, it becomes obvious that the known-bad-factor, absurdly optimistic GHz-d/day rate indicated, error 30 occurrence, and periodic screen blanking have a 1:1 correspondence in 12 tries to the default TDR value of 2 being exceeded. Run REGEDT32, add the TdrDelay key dword value 0x20 (32 seconds), and retry:
[URL="https://docs.microsoft.com/en-us/windows-hardware/drivers/display/tdr-"]https://docs.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys[/URL] yields success of a sort:
[CODE]
batch wrapper logs (re)launch of EAGLET mfaktc quadro nvs295 at Tue 07/02/2019 20:57:36.34
mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
Checkpoints enabled
CheckpointDelay 900s
WorkFileAddDelay 3600s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID kriesel
ComputerID eaglet-nvs295
AllowSleep no
TimeStampInResults yes

CUDA version info
binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 6.50

CUDA device info
name Quadro NVS 295
compute capability 1.1
max threads per block 512
max shared memory per MP 16384 byte
number of multiprocessors 1
CUDA cores per MP 8
CUDA cores - total 8
clock rate (CUDA cores) 1300MHz
memory clock rate: 695MHz
memory bus width: 64 bit

Automatic parameters
threads per grid 1048576
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

running a simple selftest...
Selftest statistics
number of tests 107
successfull tests 107

selftest PASSED!

got assignment: exp=119998999 bit_min=72 bit_max=73 (7.97 GHz-days)
Starting trial factoring M119998999 from 2^72 to 2^73 (7.97 GHz-days)
k_min = 19676691147960
k_max = 39353382296711
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 02 21:05 | 0 0.1% | 411.97 4d13h | 1.74 82485 n.a.%
Jul 02 21:11 | 5 0.2% | 411.85 4d13h | 1.74 82485 n.a.%
Jul 02 21:18 | 9 0.3% | 411.49 4d13h | 1.74 82485 n.a.%
Jul 02 21:25 | 12 0.4% | 411.39 4d13h | 1.74 82485 n.a.%
[/CODE]I had estimated it around 2.5 GHz-D/day, so this is more consistent with expectations. And perhaps a contender for the turtle consolation prize in gpu benchmarks. 23W/2Ghzd/day is rather costly also. And this little gem does not have DP Floats, so no P-1 or primality testing.

hansl 2019-07-06 01:07

Are there any particular recommended settings that would help to maximize throughput for TF of many large exponents (between 10^9 and 2^32) to fairly low bit levels? I've tried a few different combinations to reduce overhead, but nothing in particular seems to make a noticeable difference.

I'm using a "less classes" build by the way.

Also strangely I seem to get the better performance from my laptop Quadro M1000M vs a GTX 780. I would have expected the GTX to be faster , having more than 4x CUDA core count of the mobile quadro.
[url]https://www.videocardbenchmark.net/compare/Quadro-M1000M-vs-GeForce-GTX-780/3349vs2525[/url]
Although, the quadro has a bit more core clock, is that all that matters in this case?


All times are UTC. The time now is 22:55.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.