![]() |
Hi,
I had no access to a Volta GPU since the release of CUDA 9.2. Oliver |
Reference material
I was offered "a blog area to consolidate all of your pdfs and guides and stuff" and accepted.
Feel free to have a look and suggest content. (G-rated only;) General interest gpu related reference material [URL]http://www.mersenneforum.org/showthread.php?t=23371[/URL] Mfaktc CUDA based factoring on gpus [URL]http://www.mersenneforum.org/showthread.php?t=23386[/URL] Future updates to material previously posted in this thread will probably occur on the blog threads and not here. Having in-place update without a time limit makes it more manageable there. |
Good news: CUDA 9.2.88 seems to have fixed the issue on Volta architecture!
Initial performance numbers are [B][U]impressive[/U][/B]! Unmodified mfaktc 0.21 sources (just adjusted the Makefile) + CUDA 9.2.88 on Linux, no fine tuning (default parameters in mfaktc.ini): [CODE]# ./mfaktc.exe -tf 66362159 73 74 mfaktc v0.21 (64bit built) [...] CUDA device info name [COLOR="red"][B]Tesla V100-PCIE-16GB[/B][/COLOR] compute capability 7.0 max threads per block 1024 max shared memory per MP 98304 byte number of multiprocessors 80 clock rate (CUDA cores) 1380MHz memory clock rate: 877MHz memory bus width: 4096 bit [...] Starting trial factoring M66362159 from 2^73 to 2^74 (28.83 GHz-days) k_min = 71160531149400 k_max = 142321062305090 Using GPU kernel "barrett76_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Jun 28 11:42 | 0 0.1% | 0.697 11m08s | 3722.28 82485 n.a.% [...] Jun 28 11:52 | 4612 99.9% | 0.639 0m01s | 4060.14 82485 n.a.% Jun 28 11:52 | 4617 100.0% | 0.641 0m00s | 4047.47 82485 n.a.% no factor for [COLOR="Red"][B]M66362159 from 2^73 to 2^74[/B][/COLOR] [mfaktc 0.21 barrett76_mul32_gs] tf(): total time spent: [COLOR="red"][B]10m 23.287s[/B][/COLOR] [/CODE] [CODE]# ./mfaktc.exe -tf 46510507 72 73 mfaktc v0.21 (64bit built) [...] Starting trial factoring M46510507 from 2^72 to 2^73 (20.57 GHz-days) k_min = 50766663139500 k_max = 101533326284094 Using GPU kernel "barrett76_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Jun 28 11:56 | 0 0.1% | 0.473 7m34s | 3913.09 82485 n.a.% [...] Jun 28 12:04 | 4613 99.9% | 0.470 0m00s | 3938.07 82485 n.a.% Jun 28 12:04 | 4617 100.0% | 0.471 0m00s | 3929.71 82485 n.a.% found 1 factor for [COLOR="Red"][B]M46510507 from 2^72 to 2^73[/B][/COLOR] [mfaktc 0.21 barrett76_mul32_gs] tf(): total [COLOR="red"][B]time spent: 7m 28.293s[/B][/COLOR] [/CODE] I have no clue why it is THAT fast, cores and clock rate are roughly 50% more than Tesla P100 but V100 is 3 times faster than P100 for mfaktc. Even more impressive is the power efficency, nvidia-smi reports just below 200W while running mfaktc. [B]That is ~50mW per [I]"GHz Core2Solo equivalent"[/I].[/B] If those numbers are correct that might be the biggest performance step over the previous GPU architecture since the launch of Fermi cards! Right now those numbers feel a little bit too high to be true but I can't find an issue... Pascal generation: [URL="http://mersenneforum.org/showpost.php?p=443782&postcount=2627"]GTX 1080[/URL] and [URL="http://mersenneforum.org/showpost.php?p=455386&postcount=2695"]GTX 1080 Ti[/URL] Oliver P.S. using the old (pre GPU factoring) limits, how many V100 would be needed to do all the TF work for GIMPS? [B]P.P.S. even if those numbers are impressive I think those cards should be used for LL tests![/B] |
[QUOTE=TheJudger;455386][CODE]#
Reason need more fresh air in chassis. Oliver[/QUOTE] Maybe the temperatures can be tamed with a water cooling solution. f.e. [URL]https://www.mindfactory.de/product_info.php/Alphacool-Eiswolf-240-GPX-Pro-Nvidia-Geforce-GTX-1080-M24-schwarz_1221121.html[/URL] seems to be a universal cooler |
Hi moebius,
check the date of that post, was my initial hands on a GTX 1080 Ti. And it wasn't my card/system. Oliver |
From TheJudger ( Oliver ):
"I have no clue why it is THAT fast, cores and clock rate are roughly 50% more than Tesla P100 but V100 is 3 times faster than P100 for mfaktc." Nvidia had mentioned in blog posts around right after Volta's announcement that for the first time, they has seperated FP registers and compute units from INT ones so oft used things like pointer arithmetic do not disturb the FP regs and since they have their own compute units, they can be scheduled and run together. You're right Oliver, 3x is mighty impressive AND all that power is best used for LL. |
mfaktc rarely uses FP math. I guess they improved 32bit integer multiplication throughput. Because this isn't native supported on Maxwell and Pascal they just write "multiple instructions" insteat of "up to". So not easy to compare on paper.
Oliver |
reproducible misplaced factor meant bad gpu ram
[QUOTE=TheJudger;475566]Hi moebius,
[LIST][*]is this reproduceable for your setup?[*]default config (mfaktc.ini) or altered settings?[*]did this happen on a long run (several assignments without restart of mfaktc or right after the first assignment after (re-)start)?[*]which GPU?[/LIST] As axn already mentioned: this is a valid (composite) factor for M3321928619. Why M3321928619? Because this is part of the builtin selftest which is run on every (re-)start of mfaktc. Somehow the result from the selftest isn't cleared and shown after an assignment finished. This was reported 2(?) times before, I didn't figure out why this happens yet. Oliver[/QUOTE] Re 38814612911305349835664385407 as a factor showing up, for random exponents, I accidentally found a way of reproducing it. Run on a GPU with a lot of memory errors. Had a GTX480 that had deteriorated. Same one as documented in the "GPU RIP" thread [URL]http://www.mersenneforum.org/showthread.php?t=23472[/URL] after memory testing showed how bad it was. Other symptoms included "unspecified launch failure" and "illegal memory access". It went from passing memory tests, to having millions of errors, regardless of clock rates, in a year. It has been removed. The repeatedly indicated factor 38814612911305349835664385407 = 2 × 36 × 31081 × 65381 × 3943673 × 3321928619 + 1, so it is not a legitimate factor of any other prime exponent between 3943673 and 3321928619 (or any above 3321928619). The gpu routinely passed the startup selftest in mfaktc v0.20 64-bit Windows CUDA 6.5 executable with V9.1 capable driver. Perhaps a quick memory test would be a good addition. One percent or less that of a single pass of cudalucas -memtest would have been sufficient to detect a problem with this gpu, and that would not take long, perhaps 20 seconds; a thousand write/read/check cycles per pattern and block instead of the hundred-thousand cudalucas uses. running a simple selftest... Selftest statistics number of tests 92 successfull tests 92 selftest PASSED! The inappropriate factor occurred only above 2^80 for ~329M exponents, 2^81 for 651M, in a month of running from 2^69 to 2^84. [CODE][Sat May 26 11:04:45 2018] no factor for M329000033 from 2^79 to 2^80 [mfaktc 0.20 barrett87_mul32_gs] [Mon May 28 01:18:34 2018] M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] bad factor repeated not reported [Mon May 28 05:40:46 2018] M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Mon May 28 05:44:16 2018] M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Tue May 29 08:44:42 2018] M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Tue May 29 13:29:25 2018] no factor for M329000033 from 2^80 to 2^81 [mfaktc 0.20 barrett87_mul32_gs] ... [Fri Jun 01 01:02:12 2018] no factor for M331000037 from 2^79 to 2^80 [mfaktc 0.20 barrett87_mul32_gs] [Sat Jun 02 06:57:35 2018] M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Sat Jun 02 11:45:02 2018] M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Sun Jun 03 00:31:23 2018] M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] bad factor repeated not reported [Sun Jun 03 16:25:29 2018] no factor for M331000037 from 2^80 to 2^81 [mfaktc 0.20 barrett87_mul32_gs] [Wed Jun 06 16:47:02 2018] no factor for M651102253 from 2^80 to 2^81 [mfaktc 0.20 barrett87_mul32_gs] [Wed Jun 06 17:28:43 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:81:82:mfaktc 0.20 barrett87_mul32_gs] [Thu Jun 07 10:07:16 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:81:82:mfaktc 0.20 barrett87_mul32_gs] [Sat Jun 09 03:32:34 2018] no factor for M651102253 from 2^81 to 2^82 [mfaktc 0.20 barrett87_mul32_gs] [Sat Jun 09 23:45:09 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs] [Sun Jun 10 00:20:40 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs] [Sun Jun 10 00:50:36 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs] [Tue Jun 12 16:37:00 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs] [Wed Jun 13 08:32:57 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs] [Thu Jun 14 06:10:24 2018] no factor for M651102253 from 2^82 to 2^83 [mfaktc 0.20 barrett87_mul32_gs] [Sun Jun 17 02:29:45 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs] [Wed Jun 20 08:52:16 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs] [Wed Jun 20 21:39:14 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs] [Thu Jun 21 18:26:52 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs] [Sat Jun 23 03:04:08 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs] [Sun Jun 24 01:19:29 2018] no factor for M651102253 from 2^83 to 2^84 [mfaktc 0.20 barrett87_mul32_gs] [/CODE] |
I can't reproduce that. Did you compile mfaktc by yourself?
|
Huh?
[QUOTE=LaurV;491078]I can't reproduce that. Did you compile mfaktc by yourself?[/QUOTE]
What are you trying to reproduce, and who are you asking? If referring to post 2824, no recompile, and count your blessings that you can't reproduce that! |
Fixed inappropriate factor repetition
On a GTX480 that very recently passed cudalucas -memtest with flying colors (and after the bad-vram gtx480 of GPU RIP thread was removed from the same system):
Dozens of occurrences overnight, like the following, in a burst (31 in a 45 minute period, preceded and followed by hours of none at all, without user interaction) Maybe it has something to do with the CUDA 7.0 driver? [CODE]batch wrapper reports mfaktc-win-64.exe (re)launch at Thu 07/05/2018 23:48:28.17 count 30 on model gtx480 dev 0 mfaktc v0.20 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 3 CPUStreams 3 GridSize 3 GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 900s Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID Kriesel ComputerID dodo-gtx480-0 ProgressHeader "Date Time | class Pct | time ETA | GHz-d/day Sieve Wait" ProgressFormat "%d %T | %C %p%% | %t %e | %g %s %W%%" AllowSleep no TimeStampInResults yes CUDA version info binary compiled for CUDA 6.50 CUDA runtime version 6.50 CUDA driver version 7.0 CUDA device info name GeForce GTX 480 compute capability 2.0 maximum threads per block 1024 number of multiprocessors 15 (480 shader cores) clock rate 1401MHz Automatic parameters threads per grid 983040 running a simple selftest... Selftest statistics number of tests 92 successfull tests 92 selftest PASSED! got assignment: exp=670000207 bit_min=80 bit_max=84 (5482.09 GHz-days) Starting trial factoring M670000207 from 2^80 to 2^81 (365.47 GHz-days) k_min = 902183168737620 k_max = 1804366337478801 Using GPU kernel "barrett87_mul32_gs" found a valid checkpoint file! last finished class was: 689 found 0 factor(s) already Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Jul 05 23:49 | 692 15.1% | 46.563 10h32m | 706.41 82485 n.a.% M670000207 has a factor: 38814612911305349835664385407 ERROR: cudaGetLastError() returned 30: unknown error batch wrapper reports mfaktc-win-64.exe exited at Thu 07/05/2018 23:49:20.34 [/CODE] |
| All times are UTC. The time now is 23:08. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.