mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2018-09-27, 12:42   #2894
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

152C16 Posts
Default

Quote:
Originally Posted by nofaith628 View Post
As you may have read in my previous call for help, my switch from GTX 1080TI to a Titan V has resulted in errors.

Previous attempts to assuage this issue:
Code:
ERROR: cudaGetLastError() returned 8: invalid device function
have failed, regardless of attempts to reinstall, clean install and uninstall display driver and CUDA, as well as tweaking settings in mfaktc.ini.

Without any prerequisite knowledge on computer programming and compilation. I have managed to compile a non-optimized version of mfaktc 10.0, it currently works with the Titan V. The GHz-days output is not good, but it works.

As for the rest, I have discussed with Oliver (TheJudger) that he may release an optimized mfaktc for the Turing architecture in the near future, there are no plans to optimize the code specifically for Volta Architecture as it has a very high entry price.
Thanks for sharing the build.

I'm guessing here, that you meant something like mfaktc version 0.21, compiled for Windows 64 bit, and for CUDA 10. (There was no 32-bit CUDA, only 64-bit, beginning at CUDA version 8.0, as I recall; highest version of mfaktc I've seen previously was v0.21.) https://docs.nvidia.com/cuda/cuda-to...tes/index.html says of CUDA 10 "32-bit tools are no longer supported..." The CUDA 10 download page confirms x86_64 is available and win32 is not. Including the CUDA 10 runtime dll in the zip file would be a plus.

Last fiddled with by kriesel on 2018-09-27 at 12:56
kriesel is offline   Reply With Quote
Old 2018-09-27, 21:31   #2895
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

27018 Posts
Default

Hi all,

It´s been a long time since I ran mfakt on any of my machines.

I am now intending to start running it on a GTX 1060, but I must confess I´m a bit off as to the recommended CUDA version / mfakt version. I don´t have the means to do any compilation myself, so I would kindly request any willing member of this forum to point me to the right binaries. I am using Windows 10.

Many thanks.
lycorn is offline   Reply With Quote
Old 2018-09-28, 00:49   #2896
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×5×271 Posts
Default

Quote:
Originally Posted by lycorn View Post
Hi all,

It´s been a long time since I ran mfakt on any of my machines.

I am now intending to start running it on a GTX 1060, but I must confess I´m a bit off as to the recommended CUDA version / mfakt version. I don´t have the means to do any compilation myself, so I would kindly request any willing member of this forum to point me to the right binaries. I am using Windows 10.

Many thanks.
First line following is generated by a batch file.
Code:
mfaktc-win-64.LessClasses-CUDA8.exe (re)launch at Mon 12/04/2017 10:46:19.80 count 0 
mfaktc v0.21 (64bit built)

Compiletime options
  THREADS_PER_BLOCK         256
  SIEVE_SIZE_LIMIT          32kiB
  SIEVE_SIZE                230945bits
  SIEVE_SPLIT               250
  MORE_CLASSES              disabled

Runtime options
  SievePrimes               25000
  SievePrimesAdjust         1
  SievePrimesMin            5000
  SievePrimesMax            100000
  NumStreams                3
  CPUStreams                3
  GridSize                  3
  GPUSievePrimes            82486
  GPUSieveSize              64Mi bits
  GPUSieveProcessSize       16Ki bits
  Checkpoints               enabled
  CheckpointDelay           300s
WARNING: Cannot read WorkFileAddDelay from mfaktc.ini, set to 600s by default
  WorkFileAddDelay          600s
  Stages                    enabled
  StopAfterFactor           bitlevel
  PrintMode                 full
  V5UserID                  Kriesel
  ComputerID                condor-gtx1060
  ProgressHeader            "Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait"
  ProgressFormat            "%d %T | %C %p%% | %t  %e |   %g  %s  %W%%"
  AllowSleep                no
  TimeStampInResults        yes

CUDA version info
  binary compiled for CUDA  8.0
  CUDA runtime version      8.0
  CUDA driver version       8.0

CUDA device info
  name                      GeForce GTX 1060 3GB
  compute capability        6.1
  max threads per block     1024
  max shared memory per MP  98304 byte
  number of multiprocessors 9
  clock rate (CUDA cores)   1771MHz
  memory clock rate:        4004MHz
  memory bus width:         192 bit

Automatic parameters
  threads per grid          589824
  random selftest offset    23085
  GPUSievePrimes (adjusted) 82486
  GPUsieve minimum exponent 1055144

running a simple selftest...
Selftest statistics
  number of tests           107
  successfull tests         107

selftest PASSED!
kriesel is offline   Reply With Quote
Old 2018-09-28, 00:53   #2897
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

The LessClasses version should only be used for extremely-fast-running assignments (where each assignment only takes a few seconds).

mfaktc can be download from here or here.
James Heinrich is offline   Reply With Quote
Old 2018-09-28, 01:04   #2898
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

65358 Posts
Default

Quote:
Originally Posted by Honza View Post
I guess old cudart32_80.dll and cudart64_80.dll needs to be updated with cudart32_100.dll and cudart64_100.dll
CUDA DLLs can be found here, if needed. Or you can download the toolkit from https://developer.nvidia.com/cuda-toolkit, install just the libraries you need, and grab the DLLs from where you installed it (by default, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin).
James Heinrich is offline   Reply With Quote
Old 2018-09-28, 01:54   #2899
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·5·271 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
The LessClasses version should only be used for extremely-fast-running assignments (where each assignment only takes a few seconds).
Why?
I've been running it on high exponents and ordinary assignments on multiple gpus for months.

Following is on a gtx1070, one of two instances running on it.

Code:
Sep 27 14:04 |  405  96.9% | 271.26  13m34s |    293.35    82485    n.a.%
Sep 27 14:08 |  408  97.9% | 271.00   9m02s |    293.64    82485    n.a.%
Sep 27 14:13 |  413  99.0% | 271.10   4m31s |    293.53    82485    n.a.%
Sep 27 14:17 |  416 100.0% | 271.27   0m00s |    293.35    82485    n.a.%
no factor for M173090623 from 2^76 to 2^77 [mfaktc 0.21 barrett87_mul32_gs]
tf(): total time spent:  7h 13m 50.524s

Starting trial factoring M173090623 from 2^77 to 2^78 (176.83 GHz-days)
 k_min =  436521993025020
 k_max =  873043986050178
Using GPU kernel "barrett87_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Sep 27 14:26 |    0   1.0% | 542.32  14h18m |    293.46    82485    n.a.%
Sep 27 14:35 |    5   2.1% | 542.49  14h09m |    293.37    82485    n.a.%
Sep 27 14:44 |    8   3.1% | 542.39  14h00m |    293.42    82485    n.a.%
Sep 27 14:53 |   12   4.2% | 542.38  13h51m |    293.43    82485    n.a.%
kriesel is offline   Reply With Quote
Old 2018-09-28, 02:30   #2900
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11·311 Posts
Default

Quote:
Originally Posted by kriesel View Post
Why? I've been running it on high exponents and ordinary assignments on multiple gpus for months.
Someone else can explain the mechanics better than I, but the more classes the more candidates are filtered out prior to testing. The extra overhead to do this is not worth it for very fast-running assignments, but it is beneficial for any "normal" TF assignment.

edit: from the mfakto readme:
Quote:
MoreClasses is a switch for defining if 420 (2*2*3*5*7) or 4620 (2*2*3*5*7*11) classes of
factor candidates should be used. Normally, 4620 gives better results but for very small classes
420 reduces the class initialization overhead enough to provide an overall benefit.
To clarify: mfakto allows this to be set in the ini file, whereas mfaktc is hardcoded to 4620 classes (unless you explicitly use the LessClasses version which is hardcoded to 420 classes).

You can easily run a quick test: using the same ini settings try running both the normal and LessClasses version of mfaktc and compare the throughput of each.
James Heinrich is offline   Reply With Quote
Old 2018-09-28, 14:03   #2901
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10101001011002 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Someone else can explain the mechanics better than I, but the more classes the more candidates are filtered out prior to testing. The extra overhead to do this is not worth it for very fast-running assignments, but it is beneficial for any "normal" TF assignment.

edit: from the mfakto readme:To clarify: mfakto allows this to be set in the ini file, whereas mfaktc is hardcoded to 4620 classes (unless you explicitly use the LessClasses version which is hardcoded to 420 classes).

You can easily run a quick test: using the same ini settings try running both the normal and LessClasses version of mfaktc and compare the throughput of each.
Thanks. On the 3GB GTX1060, I found regular CUDA8 gives about 2% higher throughput initially, 1% later, than the less-classes CUDA8, at the costs of restart of the current exponent/bit level (ignoring the existing checkpoint file) and much more rapid log file growth. There's no warning about the restart of bit level. In this case it cost 4.5 hours of throughput. There are cases where it could cost weeks. (GPU-Z indicates power, thermal, vrel are limiting performance.)

less-classes (420):
Code:
Starting trial factoring M172926979 from 2^76 to 2^77 (88.50 GHz-days)
 k_min =  218467540932060
 k_max =  436935081864318
Using GPU kernel "barrett87_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Sep 28 03:56 |    0   1.0% | 181.20   4h46m |    439.57    82485    n.a.%
Sep 28 03:59 |    5   2.1% | 181.24   4h43m |    439.49    82485    n.a.%
...
Sep 28 08:19 |  380  91.7% | 181.24  24m10s |    439.48    82485    n.a.%
Sep 28 08:22 |  384  92.7% | 181.26  21m09s |    439.44    82485    n.a.%
Sep 28 08:25 |  389  93.8% | 181.21  18m07s |    439.55    82485    n.a.%
Sep 28 08:28 |  392  94.8% | 181.52  15m08s |    438.81    82485    n.a.%
received signal "SIGINT"
mfaktc will exit once the current class is finished.
press ^C again to exit immediately
Sep 28 08:31 |  396  95.8% | 181.00  12m04s |    440.06    82485    n.a.%
4620 classes:
Code:
got assignment: exp=172926979 bit_min=76 bit_max=78 (265.50 GHz-days)
Starting trial factoring M172926979 from 2^76 to 2^77 (88.50 GHz-days)
 k_min =  218467540931640
 k_max =  436935081864318
Using GPU kernel "barrett87_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Sep 28 08:34 |    0   0.1% | 17.811   4h44m |    447.20    82485    n.a.%
Sep 28 08:34 |    5   0.2% | 17.797   4h44m |    447.55    82485    n.a.%
Sep 28 08:34 |    9   0.3% | 17.769   4h43m |    448.26    82485    n.a.%
Sep 28 08:34 |   20   0.4% | 17.840   4h44m |    446.47    82485    n.a.%
...
Sep 28 08:53 |  321   7.0% | 17.971   4h27m |    443.22    82485    n.a.%
Sep 28 08:54 |  324   7.1% | 17.947   4h26m |    443.81    82485    n.a.%
Sep 28 08:54 |  329   7.2% | 17.928   4h26m |    444.28    82485    n.a.%
Sep 28 08:54 |  336   7.3% | 17.901   4h25m |    444.95    82485    n.a.%
Sep 28 08:54 |  341   7.4% | 17.916   4h25m |    444.58    82485    n.a.%

Last fiddled with by kriesel on 2018-09-28 at 14:16
kriesel is offline   Reply With Quote
Old 2018-09-28, 14:08   #2902
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11·311 Posts
Default

Quote:
Originally Posted by kriesel View Post
There's no warning about the restart of bit level.
Sorry, I guess I should have more explicitly warned you that the checkpoint files would not be cross-compatible between the 420-class and 4620-class implementations.
James Heinrich is offline   Reply With Quote
Old 2018-09-28, 22:11   #2903
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

3×491 Posts
Default

Thank you all for your answers.

Up and running. It´s nice to be "back in business"...
lycorn is offline   Reply With Quote
Old 2018-09-29, 11:10   #2904
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

23·3·72 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
CUDA DLLs can be found here, if needed.
Could you make sub-dirs arranged by CUDA SDK version? Last time I needed a CUDA-DLL for factoring on my GTX1080Ti I pretty much downloaded them all cause I was unsure which ones I needed . Sorry for wasting your bandwidth!

Later I found out that CUDA capability of the card =/= CUDA SDK version. Still I find it a bit confusing that you need to compile for different architectures, right? A mfaktc compile with CUDA SDK 10, GTX980 won't work on a GTX1080 right? Cause the architecture/CUDA capability of the GTX1080 is higher (and somehow not backwards compatibe?) Or am I just being ignorent?
VictordeHolland is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 14:12.


Mon Aug 2 14:12:46 UTC 2021 up 10 days, 8:41, 0 users, load averages: 3.76, 3.77, 3.30

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.