mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

petrw1 2019-09-30 03:08

[QUOTE=petrw1;526935]Wow too bad I didn't see this months ago.
Throughput changed from 3,900 to 4,500 on my 2080Ti doing TF to 74 in the 4xM ranges. 15% improvement.
[/QUOTE]


AND....running 8 cores of P-1 on the CPU does NOT slow down the GPU.

storm5510 2019-10-01 00:56

[QUOTE=petrw1;526947]AND....running 8 cores of P-1 on the CPU does NOT slow down the GPU.[/QUOTE]

In my experience, both flavors of [I]mfaktc[/I] run faster if the CPU is busy. A person can always increase the CPU's minimum speed in the power options. The default is 5%. I set it to 60% This helps too.

A side note: I never run my GPU at full throttle now. I use [I]Afterburner[/I] to slow it down. 75% is typical. The performance loss is negligible. Power consumption and heat output is considerably less. 70°C at full, 55°C at 75%.

kriesel 2019-10-04 01:04

Anyone? Bueller?
 
Is there a CUDA 8 version Windows build of the 2047M gpusievesize mfaktc mod somewhere?

kriesel 2019-10-06 16:54

[QUOTE=kriesel;527268]Is there a CUDA 8 version Windows build of the 2047M gpusievesize mfaktc mod somewhere?[/QUOTE]
nomead posted one on the RTX2080 thread at [URL]https://www.mersenneforum.org/showpost.php?p=527305&postcount=139[/URL]. I've run and tuned it on my GTX1080Ti. Results are posted at [URL]https://www.mersenneforum.org/showpost.php?p=526899&postcount=8[/URL]. It looks like the GTX1080Ti would benefit slightly from an even further increase in GPUSieveSize. Whenever I get around to trying and tuning it on a GTX1060 that will be added.
To try it on my older CC2.x gpus, it would require a CUDA6.5 Windows 7 X64 build.

nomead 2019-10-06 18:48

1 Attachment(s)
[QUOTE=kriesel;527419]
To try it on my older CC2.x gpus, it would require a CUDA6.5 Windows 7 X64 build.[/QUOTE]
I took that as a hint and compiled it again under CUDA6.5 ... so, it complains, but still compiles:
[C]nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release
(Use -Wno-deprecated-gpu-targets to suppress warning).[/C]

I also managed to recompile under CUDA 8.0 but for compute 2.0. I didn't read the error messages correctly the last time around, it complained about 2.1 but not 2.0... go figure. Do you need that combination as well?

Anyway the attached binaries are for 64-bit CUDA 6.5, compute versions 20 30 35 37 50 and 52.

Fan Ming 2019-10-12 19:32

1 Attachment(s)
[QUOTE=MrRepunit;496093]I modified [c]class_needed[/c], 10 has to be exponentiated instead of 2, I added a 64 bit shortcut. I guess most of the 64 bit stuff can be used for mersenne numbers, but might need some changes or are missing some minor stuff. I think I removed some bit shift methods since they were not used for base 10.
Also a snippet from the readme:
- Removed Barrett and 72 bit kernels
- Removed Wagstaff related stuff
- Added 64 bit kernels
- Implemented repunit factorization (hardcoded)
- Improved performance compared to older version (0.18-repunit) about 30%
- Notes
- Compiling with more-classes flag seem to be slightly faster, thus it is switched on
- Not tested on Windows yet
- GPU sieving utilizes 100% of the GPU, so 1 mfaktc instance is enough
- GPU sieving makes the system response slow (tested on Ubuntu 14.04 64 bit with Geforce 460 GTS)
Setting GPUSieveSize in mfaktc.ini to 8 or lower makes the system more responsive



I did not remove the git directory, so if anybody is interested in the single commits feel free to take a closer look. I also added the linux executable. Not sure if I can quickly provide a windows variant...[/QUOTE]

Hi!
I tried to compile this code with CUDA Toolkit v10.1 and Visual Studio 2019 on my Windows 10 computer. It succeeded at compiling. But when I run the compiled executables on my another Windows 7 computer(the Windows 10 computer only have CUDA 8 drive installed, so can't run), the exe exited with this error information:
[CODE]CUDA version info
binary compiled for CUDA 10.10
CUDA runtime version 10.10
CUDA driver version 10.10

CUDA device info
name GeForce GTX 1660
compute capability 7.5
max threads per block 1024
max shared memory per MP 65536 byte
number of multiprocessors 22
clock rate (CUDA cores) 1800MHz
memory clock rate: 4001MHz
memory bus width: 192 bit

Automatic parameters
threads per grid 720896
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

running a simple selftest...
[B][COLOR="Red"]ERROR: cudaGetLastError() returned 98: invalid device function[/COLOR][/B][/CODE]
It's really frustrating that I completely knew nothing about how to deal with this error..
Attached file contains Makefile.win I used (and modified) and the compiled executable.

nomead 2019-10-12 19:59

[QUOTE=Fan Ming;527848]Hi!
I tried to compile this code with CUDA Toolkit v10.1 and Visual Studio 2019 on my Windows 10 computer.
[/QUOTE]

Back in February when I set up my Windows compile environment for mfaktc, I apparently ran into the same problem, or something very similar. Compilation on VS2017 worked, but the resulting binary exited with the error
[C]ERROR: cudaGetLastError() returned 8: invalid device function[/C]
So almost the same, but not quite. Windows 7 computer for compiling and running, but that shouldn't make a difference (but then, who knows?) Anyway, compiling on VS2012 worked and then the binary also started working.

This was for the plain unmodified mfaktc 0.21 though, but maybe the code is still similar enough so that the errors are the same. No idea why it happens...

Fan Ming 2019-10-13 05:20

[QUOTE=nomead;527852]
This was for the plain unmodified mfaktc 0.21 though, but maybe the code is still similar enough so that the errors are the same. No idea why it happens...[/QUOTE]

That's frustrating... :davieddy: May be problems related to Makefile.win or cl.exe in VS, but I knew nothing about them(I just Googled and knew the name of these things yesterday).

Can anyone test compiling on Windows environment?

Fan Ming 2019-10-19 11:13

1 Attachment(s)
Attached file is the compiled CUDA 10.1 mfaktc (repunit version) on Windows 64bit.
I compiled it using CUDA 10.1 toolkit and Microsoft VS 2012 on Windows 10.
I failed all test cases on my GTX 1660 GPU(CC 7.5 Turing card) with the following error message:
[CODE]CUDA version info
binary compiled for CUDA 10.10
CUDA runtime version 10.10
CUDA driver version 10.10
CUDA device info
name GeForce GTX 1660
compute capability 7.5
max threads per block 1024
max shared memory per MP 65536 byte
number of multiprocessors 22
clock rate (CUDA cores) 1800MHz
memory clock rate: 4001MHz
memory bus width: 192 bit
Automatic parameters
threads per grid 720896
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144
running a simple selftest...
ERROR: selftest failed for R2866499
no factor found
ERROR: selftest failed for R2866499
no factor found
ERROR: selftest failed for R2866499
no factor found
ERROR: selftest failed for R2866499
no factor found
ERROR: selftest failed for R2866499
no factor found
ERROR: selftest failed for R2866499
no factor found
ERROR: selftest failed for R1729921
no factor found
ERROR: selftest failed for R1729921
no factor found
ERROR: selftest failed for R1729921
no factor found
ERROR: selftest failed for R1729921
no factor found
ERROR: selftest failed for R1729921
no factor found
ERROR: selftest failed for R1729921
no factor found
ERROR: selftest failed for R2482747
no factor found
ERROR: selftest failed for R2482747
no factor found
ERROR: selftest failed for R2482747
no factor found
ERROR: selftest failed for R2482747
no factor found
ERROR: selftest failed for R2482747
no factor found
ERROR: selftest failed for R2482747
no factor found
ERROR: selftest failed for R403433
no factor found
ERROR: selftest failed for R403433
no factor found
ERROR: selftest failed for R422057
no factor found
ERROR: selftest failed for R422057
no factor found
ERROR: selftest failed for R554959
no factor found
ERROR: selftest failed for R554959
no factor found
ERROR: selftest failed for R575173
no factor found
ERROR: selftest failed for R444113
no factor found
ERROR: selftest failed for R442487
no factor found
Selftest statistics
number of tests 27
successfull tests 0
no factor found 27
selftest FAILED!
random selftest offset was: 29242[/CODE]
However, the compiled execute seems work well on some GPUs.
My compiled mmff seems work well on my GPU and I didn't test compiling original mfaktc version. IDK why this happened:(
Can anyone do a test to see if it works well? Thanks!

mnd9 2019-10-19 12:08

Sorry to derail the thread but random question:

Are mfaktc and mfakto checkpoint files cross compatible or not?

kriesel 2019-10-19 14:44

[QUOTE=mnd9;528346]Sorry to derail the thread but random question:

Are mfaktc and mfakto checkpoint files cross compatible or not?[/QUOTE]Unlikely. Mfaktc isn't even cross compatible with itself, v0.20 vs. v0.21, or less-classes vs. more-classes.
See also points 38 and 39 in [url]https://www.mersenneforum.org/showpost.php?p=508523&postcount=6[/url]


All times are UTC. The time now is 22:50.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.