mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2018-05-28, 21:25   #2817
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

100010101112 Posts
Default

Hi,

I had no access to a Volta GPU since the release of CUDA 9.2.

Oliver

Last fiddled with by TheJudger on 2018-05-28 at 21:25
TheJudger is offline   Reply With Quote
Old 2018-05-31, 03:43   #2818
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×5×271 Posts
Default Reference material

I was offered "a blog area to consolidate all of your pdfs and guides and stuff" and accepted.
Feel free to have a look and suggest content. (G-rated only;)
General interest gpu related reference material http://www.mersenneforum.org/showthread.php?t=23371
Mfaktc CUDA based factoring on gpus http://www.mersenneforum.org/showthread.php?t=23386

Future updates to material previously posted in this thread will probably occur on the blog threads and not here. Having in-place update without a time limit makes it more manageable there.
kriesel is offline   Reply With Quote
Old 2018-06-28, 15:20   #2819
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

Good news: CUDA 9.2.88 seems to have fixed the issue on Volta architecture!

Initial performance numbers are impressive! Unmodified mfaktc 0.21 sources (just adjusted the Makefile) + CUDA 9.2.88 on Linux, no fine tuning (default parameters in mfaktc.ini):
Code:
# ./mfaktc.exe -tf 66362159 73 74
mfaktc v0.21 (64bit built)
[...]
CUDA device info
  name                      Tesla V100-PCIE-16GB
  compute capability        7.0
  max threads per block     1024
  max shared memory per MP  98304 byte
  number of multiprocessors 80
  clock rate (CUDA cores)   1380MHz
  memory clock rate:        877MHz
  memory bus width:         4096 bit
[...]
Starting trial factoring M66362159 from 2^73 to 2^74 (28.83 GHz-days)
 k_min =  71160531149400
 k_max =  142321062305090
Using GPU kernel "barrett76_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jun 28 11:42 |    0   0.1% |  0.697  11m08s |   3722.28    82485    n.a.%
[...]
Jun 28 11:52 | 4612  99.9% |  0.639   0m01s |   4060.14    82485    n.a.%
Jun 28 11:52 | 4617 100.0% |  0.641   0m00s |   4047.47    82485    n.a.%
no factor for M66362159 from 2^73 to 2^74 [mfaktc 0.21 barrett76_mul32_gs]
tf(): total time spent: 10m 23.287s
Code:
# ./mfaktc.exe -tf 46510507 72 73
mfaktc v0.21 (64bit built)
[...]
Starting trial factoring M46510507 from 2^72 to 2^73 (20.57 GHz-days)
 k_min =  50766663139500
 k_max =  101533326284094
Using GPU kernel "barrett76_mul32_gs"
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jun 28 11:56 |    0   0.1% |  0.473   7m34s |   3913.09    82485    n.a.%
[...]
Jun 28 12:04 | 4613  99.9% |  0.470   0m00s |   3938.07    82485    n.a.%
Jun 28 12:04 | 4617 100.0% |  0.471   0m00s |   3929.71    82485    n.a.%
found 1 factor for M46510507 from 2^72 to 2^73 [mfaktc 0.21 barrett76_mul32_gs]
tf(): total time spent:  7m 28.293s
I have no clue why it is THAT fast, cores and clock rate are roughly 50% more than Tesla P100 but V100 is 3 times faster than P100 for mfaktc.
Even more impressive is the power efficency, nvidia-smi reports just below 200W while running mfaktc. That is ~50mW per "GHz Core2Solo equivalent".

If those numbers are correct that might be the biggest performance step over the previous GPU architecture since the launch of Fermi cards! Right now those numbers feel a little bit too high to be true but I can't find an issue...
Pascal generation: GTX 1080 and GTX 1080 Ti


Oliver

P.S. using the old (pre GPU factoring) limits, how many V100 would be needed to do all the TF work for GIMPS?

P.P.S. even if those numbers are impressive I think those cards should be used for LL tests!
TheJudger is offline   Reply With Quote
Old 2018-06-28, 15:48   #2820
moebius
 
moebius's Avatar
 
Jul 2009
Germany

25F16 Posts
Default

Quote:
Originally Posted by TheJudger View Post
[CODE]#
Reason need more fresh air in chassis.

Oliver
Maybe the temperatures can be tamed with a water cooling solution.


f.e.
https://www.mindfactory.de/product_i...z_1221121.html

seems to be a universal cooler

Last fiddled with by moebius on 2018-06-28 at 15:52 Reason: +link
moebius is offline   Reply With Quote
Old 2018-06-28, 16:03   #2821
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

45716 Posts
Default

Hi moebius,

check the date of that post, was my initial hands on a GTX 1080 Ti. And it wasn't my card/system.

Oliver
TheJudger is offline   Reply With Quote
Old 2018-06-29, 00:13   #2822
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

2×7×47 Posts
Default

From TheJudger ( Oliver ):
"I have no clue why it is THAT fast, cores and clock rate are roughly 50% more than Tesla P100 but V100 is 3 times faster than P100 for mfaktc."

Nvidia had mentioned in blog posts around right after Volta's announcement that for the first time,
they has seperated FP registers and compute units from INT ones so oft used things like pointer arithmetic do not disturb the FP regs and since they have their own compute units, they can be scheduled and run together.
You're right Oliver, 3x is mighty impressive AND all that power is best used for LL.
tServo is offline   Reply With Quote
Old 2018-06-29, 22:01   #2823
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

mfaktc rarely uses FP math. I guess they improved 32bit integer multiplication throughput. Because this isn't native supported on Maxwell and Pascal they just write "multiple instructions" insteat of "up to". So not easy to compare on paper.

Oliver

Last fiddled with by TheJudger on 2018-06-29 at 22:01
TheJudger is offline   Reply With Quote
Old 2018-07-03, 00:03   #2824
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·5·271 Posts
Default reproducible misplaced factor meant bad gpu ram

Quote:
Originally Posted by TheJudger View Post
Hi moebius,
  • is this reproduceable for your setup?
  • default config (mfaktc.ini) or altered settings?
  • did this happen on a long run (several assignments without restart of mfaktc or right after the first assignment after (re-)start)?
  • which GPU?
As axn already mentioned: this is a valid (composite) factor for M3321928619. Why M3321928619? Because this is part of the builtin selftest which is run on every (re-)start of mfaktc. Somehow the result from the selftest isn't cleared and shown after an assignment finished. This was reported 2(?) times before, I didn't figure out why this happens yet.

Oliver
Re 38814612911305349835664385407 as a factor showing up, for random exponents, I accidentally found a way of reproducing it. Run on a GPU with a lot of memory errors. Had a GTX480 that had deteriorated. Same one as documented in the "GPU RIP" thread http://www.mersenneforum.org/showthread.php?t=23472
after memory testing showed how bad it was. Other symptoms included "unspecified launch failure" and "illegal memory access". It went from passing memory tests, to having millions of errors, regardless of clock rates, in a year. It has been removed.

The repeatedly indicated factor 38814612911305349835664385407 = 2 × 36 × 31081 × 65381 × 3943673 × 3321928619 + 1, so it is not a legitimate factor of any other prime exponent between 3943673 and 3321928619 (or any above 3321928619).

The gpu routinely passed the startup selftest in mfaktc v0.20 64-bit Windows CUDA 6.5 executable with V9.1 capable driver. Perhaps a quick memory test would be a good addition. One percent or less that of a single pass of cudalucas -memtest would have been sufficient to detect a problem with this gpu, and that would not take long, perhaps 20 seconds; a thousand write/read/check cycles per pattern and block instead of the hundred-thousand cudalucas uses.

running a simple selftest...
Selftest statistics
number of tests 92
successfull tests 92

selftest PASSED!

The inappropriate factor occurred only above 2^80 for ~329M exponents, 2^81 for 651M, in a month of running from 2^69 to 2^84.
Code:
[Sat May 26 11:04:45 2018]
no factor for M329000033 from 2^79 to 2^80 [mfaktc 0.20 barrett87_mul32_gs]
[Mon May 28 01:18:34 2018]
M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] bad factor repeated not reported
[Mon May 28 05:40:46 2018]
M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs]
[Mon May 28 05:44:16 2018]
M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs]
[Tue May 29 08:44:42 2018]
M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs]
[Tue May 29 13:29:25 2018]
no factor for M329000033 from 2^80 to 2^81 [mfaktc 0.20 barrett87_mul32_gs]
...
[Fri Jun 01 01:02:12 2018]
no factor for M331000037 from 2^79 to 2^80 [mfaktc 0.20 barrett87_mul32_gs]
[Sat Jun 02 06:57:35 2018]
M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs]
[Sat Jun 02 11:45:02 2018]
M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs]
[Sun Jun 03 00:31:23 2018]
M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] bad factor repeated not reported
[Sun Jun 03 16:25:29 2018]
no factor for M331000037 from 2^80 to 2^81 [mfaktc 0.20 barrett87_mul32_gs]
[Wed Jun 06 16:47:02 2018]
no factor for M651102253 from 2^80 to 2^81 [mfaktc 0.20 barrett87_mul32_gs]
[Wed Jun 06 17:28:43 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:81:82:mfaktc 0.20 barrett87_mul32_gs]
[Thu Jun 07 10:07:16 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:81:82:mfaktc 0.20 barrett87_mul32_gs]
[Sat Jun 09 03:32:34 2018]
no factor for M651102253 from 2^81 to 2^82 [mfaktc 0.20 barrett87_mul32_gs]
[Sat Jun 09 23:45:09 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs]
[Sun Jun 10 00:20:40 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs]
[Sun Jun 10 00:50:36 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs]
[Tue Jun 12 16:37:00 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs]
[Wed Jun 13 08:32:57 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs]
[Thu Jun 14 06:10:24 2018]
no factor for M651102253 from 2^82 to 2^83 [mfaktc 0.20 barrett87_mul32_gs]
[Sun Jun 17 02:29:45 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs]
[Wed Jun 20 08:52:16 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs]
[Wed Jun 20 21:39:14 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs]
[Thu Jun 21 18:26:52 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs]
[Sat Jun 23 03:04:08 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs]
[Sun Jun 24 01:19:29 2018]
no factor for M651102253 from 2^83 to 2^84 [mfaktc 0.20 barrett87_mul32_gs]

Last fiddled with by kriesel on 2018-07-03 at 00:32
kriesel is offline   Reply With Quote
Old 2018-07-03, 16:21   #2825
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

32×29×37 Posts
Default

I can't reproduce that. Did you compile mfaktc by yourself?
LaurV is offline   Reply With Quote
Old 2018-07-04, 15:47   #2826
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

542010 Posts
Default Huh?

Quote:
Originally Posted by LaurV View Post
I can't reproduce that. Did you compile mfaktc by yourself?
What are you trying to reproduce, and who are you asking? If referring to post 2824, no recompile, and count your blessings that you can't reproduce that!
kriesel is offline   Reply With Quote
Old 2018-07-06, 17:42   #2827
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·5·271 Posts
Default Fixed inappropriate factor repetition

On a GTX480 that very recently passed cudalucas -memtest with flying colors (and after the bad-vram gtx480 of GPU RIP thread was removed from the same system):
Dozens of occurrences overnight, like the following, in a burst (31 in a 45 minute period, preceded and followed by hours of none at all, without user interaction) Maybe it has something to do with the CUDA 7.0 driver?

Code:
batch wrapper reports mfaktc-win-64.exe (re)launch at Thu 07/05/2018 23:48:28.17 count 30 on model gtx480 dev 0 
mfaktc v0.20 (64bit built)

Compiletime options
  THREADS_PER_BLOCK         256
  SIEVE_SIZE_LIMIT          32kiB
  SIEVE_SIZE                193154bits
  SIEVE_SPLIT               250
  MORE_CLASSES              enabled

Runtime options
  SievePrimes               25000
  SievePrimesAdjust         1
  SievePrimesMin            5000
  SievePrimesMax            100000
  NumStreams                3
  CPUStreams                3
  GridSize                  3
  GPUSievePrimes            82486
  GPUSieveSize              64Mi bits
  GPUSieveProcessSize       16Ki bits
  WorkFile                  worktodo.txt
  Checkpoints               enabled
  CheckpointDelay           900s
  Stages                    enabled
  StopAfterFactor           bitlevel
  PrintMode                 full
  V5UserID                  Kriesel
  ComputerID                dodo-gtx480-0
  ProgressHeader            "Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait"
  ProgressFormat            "%d %T | %C %p%% | %t  %e |   %g  %s  %W%%"
  AllowSleep                no
  TimeStampInResults        yes

CUDA version info
  binary compiled for CUDA  6.50
  CUDA runtime version      6.50
  CUDA driver version       7.0

CUDA device info
  name                      GeForce GTX 480
  compute capability        2.0
  maximum threads per block 1024
  number of multiprocessors 15 (480 shader cores)
  clock rate                1401MHz

Automatic parameters
  threads per grid          983040

running a simple selftest...
Selftest statistics
  number of tests           92
  successfull tests         92

selftest PASSED!

got assignment: exp=670000207 bit_min=80 bit_max=84 (5482.09 GHz-days)
Starting trial factoring M670000207 from 2^80 to 2^81 (365.47 GHz-days)
 k_min = 902183168737620
 k_max = 1804366337478801
Using GPU kernel "barrett87_mul32_gs"

found a valid checkpoint file!
  last finished class was: 689
  found 0 factor(s) already

Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jul 05 23:49 |  692  15.1% | 46.563  10h32m |    706.41    82485    n.a.%
M670000207 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
batch wrapper reports mfaktc-win-64.exe exited at Thu 07/05/2018 23:49:20.34
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 14:10.


Mon Aug 2 14:10:38 UTC 2021 up 10 days, 8:39, 0 users, load averages: 4.32, 3.87, 3.26

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.