mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

TheJudger 2018-05-28 21:25

Hi,

I had no access to a Volta GPU since the release of CUDA 9.2.

Oliver

kriesel 2018-05-31 03:43

Reference material
 
I was offered "a blog area to consolidate all of your pdfs and guides and stuff" and accepted.
Feel free to have a look and suggest content. (G-rated only;)
General interest gpu related reference material [URL]http://www.mersenneforum.org/showthread.php?t=23371[/URL]
Mfaktc CUDA based factoring on gpus [URL]http://www.mersenneforum.org/showthread.php?t=23386[/URL]

Future updates to material previously posted in this thread will probably occur on the blog threads and not here. Having in-place update without a time limit makes it more manageable there.

TheJudger 2018-06-28 15:20

Good news: CUDA 9.2.88 seems to have fixed the issue on Volta architecture!

Initial performance numbers are [B][U]impressive[/U][/B]! Unmodified mfaktc 0.21 sources (just adjusted the Makefile) + CUDA 9.2.88 on Linux, no fine tuning (default parameters in mfaktc.ini):
[CODE]# ./mfaktc.exe -tf 66362159 73 74
mfaktc v0.21 (64bit built)
[...]
CUDA device info
name [COLOR="red"][B]Tesla V100-PCIE-16GB[/B][/COLOR]
compute capability 7.0
max threads per block 1024
max shared memory per MP 98304 byte
number of multiprocessors 80
clock rate (CUDA cores) 1380MHz
memory clock rate: 877MHz
memory bus width: 4096 bit
[...]
Starting trial factoring M66362159 from 2^73 to 2^74 (28.83 GHz-days)
k_min = 71160531149400
k_max = 142321062305090
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jun 28 11:42 | 0 0.1% | 0.697 11m08s | 3722.28 82485 n.a.%
[...]
Jun 28 11:52 | 4612 99.9% | 0.639 0m01s | 4060.14 82485 n.a.%
Jun 28 11:52 | 4617 100.0% | 0.641 0m00s | 4047.47 82485 n.a.%
no factor for [COLOR="Red"][B]M66362159 from 2^73 to 2^74[/B][/COLOR] [mfaktc 0.21 barrett76_mul32_gs]
tf(): total time spent: [COLOR="red"][B]10m 23.287s[/B][/COLOR]
[/CODE]

[CODE]# ./mfaktc.exe -tf 46510507 72 73
mfaktc v0.21 (64bit built)
[...]
Starting trial factoring M46510507 from 2^72 to 2^73 (20.57 GHz-days)
k_min = 50766663139500
k_max = 101533326284094
Using GPU kernel "barrett76_mul32_gs"
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jun 28 11:56 | 0 0.1% | 0.473 7m34s | 3913.09 82485 n.a.%
[...]
Jun 28 12:04 | 4613 99.9% | 0.470 0m00s | 3938.07 82485 n.a.%
Jun 28 12:04 | 4617 100.0% | 0.471 0m00s | 3929.71 82485 n.a.%
found 1 factor for [COLOR="Red"][B]M46510507 from 2^72 to 2^73[/B][/COLOR] [mfaktc 0.21 barrett76_mul32_gs]
tf(): total [COLOR="red"][B]time spent: 7m 28.293s[/B][/COLOR]
[/CODE]

I have no clue why it is THAT fast, cores and clock rate are roughly 50% more than Tesla P100 but V100 is 3 times faster than P100 for mfaktc.
Even more impressive is the power efficency, nvidia-smi reports just below 200W while running mfaktc. [B]That is ~50mW per [I]"GHz Core2Solo equivalent"[/I].[/B]

If those numbers are correct that might be the biggest performance step over the previous GPU architecture since the launch of Fermi cards! Right now those numbers feel a little bit too high to be true but I can't find an issue...
Pascal generation: [URL="http://mersenneforum.org/showpost.php?p=443782&postcount=2627"]GTX 1080[/URL] and [URL="http://mersenneforum.org/showpost.php?p=455386&postcount=2695"]GTX 1080 Ti[/URL]


Oliver

P.S. using the old (pre GPU factoring) limits, how many V100 would be needed to do all the TF work for GIMPS?

[B]P.P.S. even if those numbers are impressive I think those cards should be used for LL tests![/B]

moebius 2018-06-28 15:48

[QUOTE=TheJudger;455386][CODE]#
Reason need more fresh air in chassis.

Oliver[/QUOTE]
Maybe the temperatures can be tamed with a water cooling solution.


f.e.
[URL]https://www.mindfactory.de/product_info.php/Alphacool-Eiswolf-240-GPX-Pro-Nvidia-Geforce-GTX-1080-M24-schwarz_1221121.html[/URL]

seems to be a universal cooler

TheJudger 2018-06-28 16:03

Hi moebius,

check the date of that post, was my initial hands on a GTX 1080 Ti. And it wasn't my card/system.

Oliver

tServo 2018-06-29 00:13

From TheJudger ( Oliver ):
"I have no clue why it is THAT fast, cores and clock rate are roughly 50% more than Tesla P100 but V100 is 3 times faster than P100 for mfaktc."

Nvidia had mentioned in blog posts around right after Volta's announcement that for the first time,
they has seperated FP registers and compute units from INT ones so oft used things like pointer arithmetic do not disturb the FP regs and since they have their own compute units, they can be scheduled and run together.
You're right Oliver, 3x is mighty impressive AND all that power is best used for LL.

TheJudger 2018-06-29 22:01

mfaktc rarely uses FP math. I guess they improved 32bit integer multiplication throughput. Because this isn't native supported on Maxwell and Pascal they just write "multiple instructions" insteat of "up to". So not easy to compare on paper.

Oliver

kriesel 2018-07-03 00:03

reproducible misplaced factor meant bad gpu ram
 
[QUOTE=TheJudger;475566]Hi moebius,
[LIST][*]is this reproduceable for your setup?[*]default config (mfaktc.ini) or altered settings?[*]did this happen on a long run (several assignments without restart of mfaktc or right after the first assignment after (re-)start)?[*]which GPU?[/LIST] As axn already mentioned: this is a valid (composite) factor for M3321928619. Why M3321928619? Because this is part of the builtin selftest which is run on every (re-)start of mfaktc. Somehow the result from the selftest isn't cleared and shown after an assignment finished. This was reported 2(?) times before, I didn't figure out why this happens yet.

Oliver[/QUOTE]
Re 38814612911305349835664385407 as a factor showing up, for random exponents, I accidentally found a way of reproducing it. Run on a GPU with a lot of memory errors. Had a GTX480 that had deteriorated. Same one as documented in the "GPU RIP" thread [URL]http://www.mersenneforum.org/showthread.php?t=23472[/URL]
after memory testing showed how bad it was. Other symptoms included "unspecified launch failure" and "illegal memory access". It went from passing memory tests, to having millions of errors, regardless of clock rates, in a year. It has been removed.

The repeatedly indicated factor 38814612911305349835664385407 = 2 × 36 × 31081 × 65381 × 3943673 × 3321928619 + 1, so it is not a legitimate factor of any other prime exponent between 3943673 and 3321928619 (or any above 3321928619).

The gpu routinely passed the startup selftest in mfaktc v0.20 64-bit Windows CUDA 6.5 executable with V9.1 capable driver. Perhaps a quick memory test would be a good addition. One percent or less that of a single pass of cudalucas -memtest would have been sufficient to detect a problem with this gpu, and that would not take long, perhaps 20 seconds; a thousand write/read/check cycles per pattern and block instead of the hundred-thousand cudalucas uses.

running a simple selftest...
Selftest statistics
number of tests 92
successfull tests 92

selftest PASSED!

The inappropriate factor occurred only above 2^80 for ~329M exponents, 2^81 for 651M, in a month of running from 2^69 to 2^84.
[CODE][Sat May 26 11:04:45 2018]
no factor for M329000033 from 2^79 to 2^80 [mfaktc 0.20 barrett87_mul32_gs]
[Mon May 28 01:18:34 2018]
M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] bad factor repeated not reported
[Mon May 28 05:40:46 2018]
M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs]
[Mon May 28 05:44:16 2018]
M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs]
[Tue May 29 08:44:42 2018]
M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs]
[Tue May 29 13:29:25 2018]
no factor for M329000033 from 2^80 to 2^81 [mfaktc 0.20 barrett87_mul32_gs]
...
[Fri Jun 01 01:02:12 2018]
no factor for M331000037 from 2^79 to 2^80 [mfaktc 0.20 barrett87_mul32_gs]
[Sat Jun 02 06:57:35 2018]
M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs]
[Sat Jun 02 11:45:02 2018]
M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs]
[Sun Jun 03 00:31:23 2018]
M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] bad factor repeated not reported
[Sun Jun 03 16:25:29 2018]
no factor for M331000037 from 2^80 to 2^81 [mfaktc 0.20 barrett87_mul32_gs]
[Wed Jun 06 16:47:02 2018]
no factor for M651102253 from 2^80 to 2^81 [mfaktc 0.20 barrett87_mul32_gs]
[Wed Jun 06 17:28:43 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:81:82:mfaktc 0.20 barrett87_mul32_gs]
[Thu Jun 07 10:07:16 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:81:82:mfaktc 0.20 barrett87_mul32_gs]
[Sat Jun 09 03:32:34 2018]
no factor for M651102253 from 2^81 to 2^82 [mfaktc 0.20 barrett87_mul32_gs]
[Sat Jun 09 23:45:09 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs]
[Sun Jun 10 00:20:40 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs]
[Sun Jun 10 00:50:36 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs]
[Tue Jun 12 16:37:00 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs]
[Wed Jun 13 08:32:57 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs]
[Thu Jun 14 06:10:24 2018]
no factor for M651102253 from 2^82 to 2^83 [mfaktc 0.20 barrett87_mul32_gs]
[Sun Jun 17 02:29:45 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs]
[Wed Jun 20 08:52:16 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs]
[Wed Jun 20 21:39:14 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs]
[Thu Jun 21 18:26:52 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs]
[Sat Jun 23 03:04:08 2018]
M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs]
[Sun Jun 24 01:19:29 2018]
no factor for M651102253 from 2^83 to 2^84 [mfaktc 0.20 barrett87_mul32_gs]
[/CODE]

LaurV 2018-07-03 16:21

I can't reproduce that. Did you compile mfaktc by yourself?

kriesel 2018-07-04 15:47

Huh?
 
[QUOTE=LaurV;491078]I can't reproduce that. Did you compile mfaktc by yourself?[/QUOTE]
What are you trying to reproduce, and who are you asking? If referring to post 2824, no recompile, and count your blessings that you can't reproduce that!

kriesel 2018-07-06 17:42

Fixed inappropriate factor repetition
 
On a GTX480 that very recently passed cudalucas -memtest with flying colors (and after the bad-vram gtx480 of GPU RIP thread was removed from the same system):
Dozens of occurrences overnight, like the following, in a burst (31 in a 45 minute period, preceded and followed by hours of none at all, without user interaction) Maybe it has something to do with the CUDA 7.0 driver?

[CODE]batch wrapper reports mfaktc-win-64.exe (re)launch at Thu 07/05/2018 23:48:28.17 count 30 on model gtx480 dev 0
mfaktc v0.20 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
WorkFile worktodo.txt
Checkpoints enabled
CheckpointDelay 900s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID Kriesel
ComputerID dodo-gtx480-0
ProgressHeader "Date Time | class Pct | time ETA | GHz-d/day Sieve Wait"
ProgressFormat "%d %T | %C %p%% | %t %e | %g %s %W%%"
AllowSleep no
TimeStampInResults yes

CUDA version info
binary compiled for CUDA 6.50
CUDA runtime version 6.50
CUDA driver version 7.0

CUDA device info
name GeForce GTX 480
compute capability 2.0
maximum threads per block 1024
number of multiprocessors 15 (480 shader cores)
clock rate 1401MHz

Automatic parameters
threads per grid 983040

running a simple selftest...
Selftest statistics
number of tests 92
successfull tests 92

selftest PASSED!

got assignment: exp=670000207 bit_min=80 bit_max=84 (5482.09 GHz-days)
Starting trial factoring M670000207 from 2^80 to 2^81 (365.47 GHz-days)
k_min = 902183168737620
k_max = 1804366337478801
Using GPU kernel "barrett87_mul32_gs"

found a valid checkpoint file!
last finished class was: 689
found 0 factor(s) already

Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 05 23:49 | 692 15.1% | 46.563 10h32m | 706.41 82485 n.a.%
M670000207 has a factor: 38814612911305349835664385407
ERROR: cudaGetLastError() returned 30: unknown error
batch wrapper reports mfaktc-win-64.exe exited at Thu 07/05/2018 23:49:20.34 [/CODE]


All times are UTC. The time now is 23:08.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.