![]() |
[QUOTE=Sake;479830]Thank you for the help! Will test with cudapm1 in the meanwhile.[/QUOTE]
Sake, have you tried Cuda Toolkit 9.1.85 ? Your output above from the bug shows 9.10. |
still broken in 9.1 (Volta only)
Oliver |
[QUOTE=TheJudger;485814]still broken in 9.1 (Volta only)
Oliver[/QUOTE] 9.2 is out. Anyone tried to check it already? |
Hi,
I had no access to a Volta GPU since the release of CUDA 9.2. Oliver |
Reference material
I was offered "a blog area to consolidate all of your pdfs and guides and stuff" and accepted.
Feel free to have a look and suggest content. (G-rated only;) General interest gpu related reference material [URL]http://www.mersenneforum.org/showthread.php?t=23371[/URL] Mfaktc CUDA based factoring on gpus [URL]http://www.mersenneforum.org/showthread.php?t=23386[/URL] Future updates to material previously posted in this thread will probably occur on the blog threads and not here. Having in-place update without a time limit makes it more manageable there. |
Good news: CUDA 9.2.88 seems to have fixed the issue on Volta architecture!
Initial performance numbers are [B][U]impressive[/U][/B]! Unmodified mfaktc 0.21 sources (just adjusted the Makefile) + CUDA 9.2.88 on Linux, no fine tuning (default parameters in mfaktc.ini): [CODE]# ./mfaktc.exe -tf 66362159 73 74 mfaktc v0.21 (64bit built) [...] CUDA device info name [COLOR="red"][B]Tesla V100-PCIE-16GB[/B][/COLOR] compute capability 7.0 max threads per block 1024 max shared memory per MP 98304 byte number of multiprocessors 80 clock rate (CUDA cores) 1380MHz memory clock rate: 877MHz memory bus width: 4096 bit [...] Starting trial factoring M66362159 from 2^73 to 2^74 (28.83 GHz-days) k_min = 71160531149400 k_max = 142321062305090 Using GPU kernel "barrett76_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Jun 28 11:42 | 0 0.1% | 0.697 11m08s | 3722.28 82485 n.a.% [...] Jun 28 11:52 | 4612 99.9% | 0.639 0m01s | 4060.14 82485 n.a.% Jun 28 11:52 | 4617 100.0% | 0.641 0m00s | 4047.47 82485 n.a.% no factor for [COLOR="Red"][B]M66362159 from 2^73 to 2^74[/B][/COLOR] [mfaktc 0.21 barrett76_mul32_gs] tf(): total time spent: [COLOR="red"][B]10m 23.287s[/B][/COLOR] [/CODE] [CODE]# ./mfaktc.exe -tf 46510507 72 73 mfaktc v0.21 (64bit built) [...] Starting trial factoring M46510507 from 2^72 to 2^73 (20.57 GHz-days) k_min = 50766663139500 k_max = 101533326284094 Using GPU kernel "barrett76_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Jun 28 11:56 | 0 0.1% | 0.473 7m34s | 3913.09 82485 n.a.% [...] Jun 28 12:04 | 4613 99.9% | 0.470 0m00s | 3938.07 82485 n.a.% Jun 28 12:04 | 4617 100.0% | 0.471 0m00s | 3929.71 82485 n.a.% found 1 factor for [COLOR="Red"][B]M46510507 from 2^72 to 2^73[/B][/COLOR] [mfaktc 0.21 barrett76_mul32_gs] tf(): total [COLOR="red"][B]time spent: 7m 28.293s[/B][/COLOR] [/CODE] I have no clue why it is THAT fast, cores and clock rate are roughly 50% more than Tesla P100 but V100 is 3 times faster than P100 for mfaktc. Even more impressive is the power efficency, nvidia-smi reports just below 200W while running mfaktc. [B]That is ~50mW per [I]"GHz Core2Solo equivalent"[/I].[/B] If those numbers are correct that might be the biggest performance step over the previous GPU architecture since the launch of Fermi cards! Right now those numbers feel a little bit too high to be true but I can't find an issue... Pascal generation: [URL="http://mersenneforum.org/showpost.php?p=443782&postcount=2627"]GTX 1080[/URL] and [URL="http://mersenneforum.org/showpost.php?p=455386&postcount=2695"]GTX 1080 Ti[/URL] Oliver P.S. using the old (pre GPU factoring) limits, how many V100 would be needed to do all the TF work for GIMPS? [B]P.P.S. even if those numbers are impressive I think those cards should be used for LL tests![/B] |
[QUOTE=TheJudger;455386][CODE]#
Reason need more fresh air in chassis. Oliver[/QUOTE] Maybe the temperatures can be tamed with a water cooling solution. f.e. [URL]https://www.mindfactory.de/product_info.php/Alphacool-Eiswolf-240-GPX-Pro-Nvidia-Geforce-GTX-1080-M24-schwarz_1221121.html[/URL] seems to be a universal cooler |
Hi moebius,
check the date of that post, was my initial hands on a GTX 1080 Ti. And it wasn't my card/system. Oliver |
From TheJudger ( Oliver ):
"I have no clue why it is THAT fast, cores and clock rate are roughly 50% more than Tesla P100 but V100 is 3 times faster than P100 for mfaktc." Nvidia had mentioned in blog posts around right after Volta's announcement that for the first time, they has seperated FP registers and compute units from INT ones so oft used things like pointer arithmetic do not disturb the FP regs and since they have their own compute units, they can be scheduled and run together. You're right Oliver, 3x is mighty impressive AND all that power is best used for LL. |
mfaktc rarely uses FP math. I guess they improved 32bit integer multiplication throughput. Because this isn't native supported on Maxwell and Pascal they just write "multiple instructions" insteat of "up to". So not easy to compare on paper.
Oliver |
reproducible misplaced factor meant bad gpu ram
[QUOTE=TheJudger;475566]Hi moebius,
[LIST][*]is this reproduceable for your setup?[*]default config (mfaktc.ini) or altered settings?[*]did this happen on a long run (several assignments without restart of mfaktc or right after the first assignment after (re-)start)?[*]which GPU?[/LIST] As axn already mentioned: this is a valid (composite) factor for M3321928619. Why M3321928619? Because this is part of the builtin selftest which is run on every (re-)start of mfaktc. Somehow the result from the selftest isn't cleared and shown after an assignment finished. This was reported 2(?) times before, I didn't figure out why this happens yet. Oliver[/QUOTE] Re 38814612911305349835664385407 as a factor showing up, for random exponents, I accidentally found a way of reproducing it. Run on a GPU with a lot of memory errors. Had a GTX480 that had deteriorated. Same one as documented in the "GPU RIP" thread [URL]http://www.mersenneforum.org/showthread.php?t=23472[/URL] after memory testing showed how bad it was. Other symptoms included "unspecified launch failure" and "illegal memory access". It went from passing memory tests, to having millions of errors, regardless of clock rates, in a year. It has been removed. The repeatedly indicated factor 38814612911305349835664385407 = 2 × 36 × 31081 × 65381 × 3943673 × 3321928619 + 1, so it is not a legitimate factor of any other prime exponent between 3943673 and 3321928619 (or any above 3321928619). The gpu routinely passed the startup selftest in mfaktc v0.20 64-bit Windows CUDA 6.5 executable with V9.1 capable driver. Perhaps a quick memory test would be a good addition. One percent or less that of a single pass of cudalucas -memtest would have been sufficient to detect a problem with this gpu, and that would not take long, perhaps 20 seconds; a thousand write/read/check cycles per pattern and block instead of the hundred-thousand cudalucas uses. running a simple selftest... Selftest statistics number of tests 92 successfull tests 92 selftest PASSED! The inappropriate factor occurred only above 2^80 for ~329M exponents, 2^81 for 651M, in a month of running from 2^69 to 2^84. [CODE][Sat May 26 11:04:45 2018] no factor for M329000033 from 2^79 to 2^80 [mfaktc 0.20 barrett87_mul32_gs] [Mon May 28 01:18:34 2018] M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] bad factor repeated not reported [Mon May 28 05:40:46 2018] M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Mon May 28 05:44:16 2018] M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Tue May 29 08:44:42 2018] M329000033 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Tue May 29 13:29:25 2018] no factor for M329000033 from 2^80 to 2^81 [mfaktc 0.20 barrett87_mul32_gs] ... [Fri Jun 01 01:02:12 2018] no factor for M331000037 from 2^79 to 2^80 [mfaktc 0.20 barrett87_mul32_gs] [Sat Jun 02 06:57:35 2018] M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Sat Jun 02 11:45:02 2018] M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] [Sun Jun 03 00:31:23 2018] M331000037 has a factor: 38814612911305349835664385407 [TF:80:81:mfaktc 0.20 barrett87_mul32_gs] bad factor repeated not reported [Sun Jun 03 16:25:29 2018] no factor for M331000037 from 2^80 to 2^81 [mfaktc 0.20 barrett87_mul32_gs] [Wed Jun 06 16:47:02 2018] no factor for M651102253 from 2^80 to 2^81 [mfaktc 0.20 barrett87_mul32_gs] [Wed Jun 06 17:28:43 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:81:82:mfaktc 0.20 barrett87_mul32_gs] [Thu Jun 07 10:07:16 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:81:82:mfaktc 0.20 barrett87_mul32_gs] [Sat Jun 09 03:32:34 2018] no factor for M651102253 from 2^81 to 2^82 [mfaktc 0.20 barrett87_mul32_gs] [Sat Jun 09 23:45:09 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs] [Sun Jun 10 00:20:40 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs] [Sun Jun 10 00:50:36 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs] [Tue Jun 12 16:37:00 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs] [Wed Jun 13 08:32:57 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:82:83:mfaktc 0.20 barrett87_mul32_gs] [Thu Jun 14 06:10:24 2018] no factor for M651102253 from 2^82 to 2^83 [mfaktc 0.20 barrett87_mul32_gs] [Sun Jun 17 02:29:45 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs] [Wed Jun 20 08:52:16 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs] [Wed Jun 20 21:39:14 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs] [Thu Jun 21 18:26:52 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs] [Sat Jun 23 03:04:08 2018] M651102253 has a factor: 38814612911305349835664385407 [TF:83:84:mfaktc 0.20 barrett87_mul32_gs] [Sun Jun 24 01:19:29 2018] no factor for M651102253 from 2^83 to 2^84 [mfaktc 0.20 barrett87_mul32_gs] [/CODE] |
I can't reproduce that. Did you compile mfaktc by yourself?
|
Huh?
[QUOTE=LaurV;491078]I can't reproduce that. Did you compile mfaktc by yourself?[/QUOTE]
What are you trying to reproduce, and who are you asking? If referring to post 2824, no recompile, and count your blessings that you can't reproduce that! |
Fixed inappropriate factor repetition
On a GTX480 that very recently passed cudalucas -memtest with flying colors (and after the bad-vram gtx480 of GPU RIP thread was removed from the same system):
Dozens of occurrences overnight, like the following, in a burst (31 in a 45 minute period, preceded and followed by hours of none at all, without user interaction) Maybe it has something to do with the CUDA 7.0 driver? [CODE]batch wrapper reports mfaktc-win-64.exe (re)launch at Thu 07/05/2018 23:48:28.17 count 30 on model gtx480 dev 0 mfaktc v0.20 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 3 CPUStreams 3 GridSize 3 GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits WorkFile worktodo.txt Checkpoints enabled CheckpointDelay 900s Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID Kriesel ComputerID dodo-gtx480-0 ProgressHeader "Date Time | class Pct | time ETA | GHz-d/day Sieve Wait" ProgressFormat "%d %T | %C %p%% | %t %e | %g %s %W%%" AllowSleep no TimeStampInResults yes CUDA version info binary compiled for CUDA 6.50 CUDA runtime version 6.50 CUDA driver version 7.0 CUDA device info name GeForce GTX 480 compute capability 2.0 maximum threads per block 1024 number of multiprocessors 15 (480 shader cores) clock rate 1401MHz Automatic parameters threads per grid 983040 running a simple selftest... Selftest statistics number of tests 92 successfull tests 92 selftest PASSED! got assignment: exp=670000207 bit_min=80 bit_max=84 (5482.09 GHz-days) Starting trial factoring M670000207 from 2^80 to 2^81 (365.47 GHz-days) k_min = 902183168737620 k_max = 1804366337478801 Using GPU kernel "barrett87_mul32_gs" found a valid checkpoint file! last finished class was: 689 found 0 factor(s) already Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Jul 05 23:49 | 692 15.1% | 46.563 10h32m | 706.41 82485 n.a.% M670000207 has a factor: 38814612911305349835664385407 ERROR: cudaGetLastError() returned 30: unknown error batch wrapper reports mfaktc-win-64.exe exited at Thu 07/05/2018 23:49:20.34 [/CODE] |
If anyone here has used mfaktc to search for factors of Wagstaff numbers (2^p + 1)/3 and if you have conserved log files or lists of factors that you're willing to share, let me know.
I've already asked in Wagstaff-related threads, and tried contacting some of the folks who were active when earlier searches were being done around 2013. I just thought I'd inquire here too. |
[QUOTE=TheJudger;490784]Good news: CUDA 9.2.88 seems to have fixed the issue on Volta architecture![/QUOTE]
I am experimenting with one GPU of a Tesla V100-SXM2-16GB (this is a p3.2xlarge instance on Amazon AWS cloud with Deep Learning Base AMI). Same specs as you listed for the Tesla V100-PCIE-16GB except a slightly faster clock rate: [CODE] clock rate (CUDA cores) 1530MHz [/CODE] It is configurable to use CUDA 9.2.88, by setting the symbolic link /usr/local/cuda mfaktc passes all the Mersenne self tests. However, when I compile an alternate version with -DWAGSTAFF added to CFLAGS, it fails all the Wagstaff self tests. Did you try the Wagstaff self tests on your V100 and do they work for you? Is anything more needed to create a Wagstaff version, other than adding the -DWAGSTAFF flag in CFLAGS? The compilation uses gcc 4.8 |
Hello,
[QUOTE=GP2;492679]mfaktc passes all the Mersenne self tests. However, when I compile an alternate version with -DWAGSTAFF added to CFLAGS, it fails all the Wagstaff self tests. Did you try the Wagstaff self tests on your V100 and do they work for you?[/QUOTE] no, I didn't try. [QUOTE=GP2;492679]Is anything more needed to create a Wagstaff version, other than adding the -DWAGSTAFF flag in CFLAGS?[/QUOTE] No, that should be enough. Will look at this later. Thanks for reporting. Oliver |
comments in worktodo
While looking for something else, I stumbled across this:
The source of parse.c for CUDAPm1 indicates # or \\ or / are comment characters marking the rest of a worktodo line as a comment I've confirmed by test in mfaktc that \\ worked; # or / did not work in my test, which placed them mostly at the beginnings of records. I could tell by the line number in any warning messages which did or did not work. The capability is not present in the readme.txt (yet) that I recall. |
I have a question, maybe somebody knows, how do the mfaktc and mfakto codebases compare?
I think at some point in history, mfakto was inspired by mfaktc. But in the interleaving years, how did they diverge? do they have now any different capabilities? or different self-test data sets? (aside from targeting different platforms, CUDA vs. OpenCL). |
[QUOTE=preda;493277]I have a question, maybe somebody knows, how do the mfaktc and mfakto codebases compare?
I think at some point in history, mfakto was inspired by mfaktc. But in the interleaving years, how did they diverge? do they have now any different capabilities? or different self-test data sets? (aside from targeting different platforms, CUDA vs. OpenCL).[/QUOTE] Yes, mfaktc preceded mfakto. Some features developed in mfakto were added to mfaktc later (worktodoadd as I recall). Per [URL]http://www.mersenneforum.org/showpost.php?p=488291&postcount=2[/URL] Mfaktc max bit depth 95, mfakto 92. Minimum exponent may vary. Comparing their respective readme files and bug and wish lists may show some other differences. Mfaktc bug and wish list [URL]http://www.mersenneforum.org/showpost.php?p=488521&postcount=3[/URL] Mfakto bug and wish list [URL]http://www.mersenneforum.org/showpost.php?p=488637&postcount=3[/URL] Some client management software supports mfaktc or mfakto, typically not both. [URL]http://www.mersenneforum.org/showpost.php?p=488292&postcount=3[/URL] (All the above, and more, are periodically updated in place, as part of the mersenne-gpu-computing-oriented reference material I've been accumulating at [URL]http://www.mersenneforum.org/forumdisplay.php?f=154[/URL]) And of course, there's comparing the source code in the portions that are not CUDA or OpenCl specific. Mfaktc self-test:tests multiple kernels per testcase[CODE]########## testcase 1/2867 ########## ... Selftest statistics number of tests 26192 successfull tests 26192 kernel | success | fail -------------------+---------+------- UNKNOWN kernel | 0 | 0 71bit_mul24 | 2586 | 0 75bit_mul32 | 2682 | 0 95bit_mul32 | 2867 | 0 barrett76_mul32 | 1096 | 0 barrett77_mul32 | 1114 | 0 barrett79_mul32 | 1153 | 0 barrett87_mul32 | 1066 | 0 barrett88_mul32 | 1069 | 0 barrett92_mul32 | 1084 | 0 75bit_mul32_gs | 2420 | 0 95bit_mul32_gs | 2597 | 0 barrett76_mul32_gs | 1079 | 0 barrett77_mul32_gs | 1096 | 0 barrett79_mul32_gs | 1130 | 0 barrett87_mul32_gs | 1044 | 0 barrett88_mul32_gs | 1047 | 0 barrett92_mul32_gs | 1062 | 0 selftest PASSED! [/CODE]Mfakto short self-test (runs every time I launch mfakto to do factoring):[CODE]Started a simple selftest ... ######### testcase 1/30 (M1031831[63-64]) ######### ######### testcase 2/30 (M51332417[68-69]) ######### ######### testcase 3/30 (M50896831[69-70]) ######### ######### testcase 4/30 (M50979079[70-71]) ######### ######### testcase 5/30 (M51232133[71-72]) ######### ######### testcase 6/30 (M50830523[71-72]) ######### ######### testcase 7/30 (M50752613[72-73]) ######### ######### testcase 8/30 (M51507913[72-73]) ######### ######### testcase 9/30 (M51916901[73-74]) ######### ######### testcase 10/30 (M51157933[74-75]) ######### ######### testcase 11/30 (M51308501[75-76]) ######### ######### testcase 12/30 (M51671491[75-76]) ######### ######### testcase 13/30 (M50805581[77-78]) ######### ######### testcase 14/30 (M51157429[78-79]) ######### ######### testcase 15/30 (M51406151[78-79]) ######### ######### testcase 16/30 (M51478381[79-80]) ######### ######### testcase 17/30 (M51350527[80-81]) ######### ######### testcase 18/30 (M53061139[80-81]) ######### ######### testcase 19/30 (M48629519[81-82]) ######### ######### testcase 20/30 (M51752893[83-84]) ######### ######### testcase 21/30 (M51760133[83-84]) ######### ######### testcase 22/30 (M51090757[84-85]) ######### ######### testcase 23/30 (M51050171[84-85]) ######### ######### testcase 24/30 (M50989481[86-87]) ######### ######### testcase 25/30 (M50856937[86-87]) ######### ######### testcase 26/30 (M53065231[88-89]) ######### ######### testcase 27/30 (M3321929777[63-64]) ######### ######### testcase 28/30 (M3321930841[63-64]) ######### ######### testcase 29/30 (M55069117[64-65]) ######### ######### testcase 30/30 (M45448679[81-82]) ######### Selftest statistics number of tests 30 successful tests 30[/CODE]Mfakto -st:[CODE]######### testcase 1/34071 (M67094119[81-82]) ######### ... ######### testcase 34071/34071 (M112404491[91-92]) ######### Starting trial factoring M112404491 from 2^91 to 2^92 (4461450.54GHz-days) Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Jan 16 20:34 | 1848 0.1% | 0.124 n.a. | n.a. 81206 0.00% M112404491 has a factor: 3941616367695054034124905537 (91.670846 bits, 2992945.937358 GHz-d) found 1 factor for M112404491 from 2^91 to 2^92 [mfakto 0.15pre6-Win cl_barrett32_92_gs_2] selftest for M112404491 passed (cl_barrett32_92_gs)! tf(): total time spent: 0.124s Selftest statistics number of tests 34026 successful tests 34026 selftest PASSED! [/CODE] |
I have been playing with some GPU sieving code, similar to the GPU sieve used by mfaktc and mafkto.
The sieve works in the usual way: for each prime P from a set of primes, compute the initial "bit-to-clear" for a given exponent E and K (q = 2*E*K+1), and then mark off bits at every P step starting with the bit-to-clear. Is there some (mathematical) reason for the number of survivors of this kind of sieve to slightly decrease as K grows? In other words, are there slightly fewer candidates surviving the sieve when the bit-level of K grows? (I know that the number of actual primes does decrease as K grows, but is this fact reflected at all when sieving with the technique above?) |
[QUOTE=preda;493362]Is there some (mathematical) reason for the number of survivors of this kind of sieve to slightly decrease as K grows? In other words, are there slightly fewer candidates surviving the sieve when the bit-level of K grows?
(I know that the number of actual primes does decrease as K grows, but is this fact reflected at all when sieving with the technique above?)[/QUOTE] Not if you keep the set of sieving primes the same. If you increase the sieving primes, that will result in fewer survivors. This is assuming you're sieving with fewer primes than sqrt(candidate). |
[QUOTE=axn;493371]Not if you keep the set of sieving primes the same. If you increase the sieving primes, that will result in fewer survivors.
This is assuming you're sieving with fewer primes than sqrt(candidate).[/QUOTE] That's what I thought. Need to find the bug that generates the observed behavior then.. |
[QUOTE=axn;493371]This is assuming you're sieving with fewer primes than sqrt(candidate).[/QUOTE]
Is sqrt(q=2*e*k+1), or sqrt(k) the prime magnitude limit? If it's sqrt(k), this may be it. If I sieve with primes up to 2^23, exponent 2^28, then TF under 75bits would have slightly reduced filtering. |
[QUOTE=preda;493400]Is sqrt(q=2*e*k+1), or sqrt(k) the prime magnitude limit?
If it's sqrt(k), this may be it. If I sieve with primes up to 2^23, exponent 2^28, then TF under 75bits would have slightly reduced filtering.[/QUOTE] It is the first one. If you're sieving for 75 bits (i.e 2^74 - 2^75), then as long as you're using primes < 2^37 (and 2^23 is well under that), you'll be sieving out a constant-ish proportion of the candidate. There will be variations, but more or less the same fraction will be left if you sieve any range from 2^64 and above. Can you provide some stats as to the pattern you're observing (of the fraction of survivors)? |
[QUOTE=axn;493402]Can you provide some stats as to the pattern you're observing (of the fraction of survivors)?[/QUOTE]
For example, below, sieving with 262176 primes has an expected survivor rate of 17.871%, yet what I count is around 17.859%. OTOH I did a sieve step on the host, and the counts did match. Thus.. I am left to suspect there's no bug? (the counts are over a block of 256*1024*1024 bits). [CODE] Using 262176 primes (up to 3681761) expected filter 17.87131% Exponent 332195561, k 1819 599599668620 (80.000000 bits) Count 47939695 0.0% (class 0): 47939646 (17.859%), 27.0ms 0.1% (class 3): 47939695 (17.859%), 40.2ms 0.2% (class 4): 47942475 (17.860%), 40.2ms 0.3% (class 15): 47935056 (17.857%), 40.2ms 0.4% (class 19): 47938399 (17.858%), 40.2ms 0.5% (class 24): 47933509 (17.857%), 40.2ms 0.6% (class 28): 47932880 (17.856%), 40.2ms 0.7% (class 31): 47934883 (17.857%), 40.2ms 0.8% (class 36): 47935394 (17.857%), 40.2ms 0.9% (class 39): 47935199 (17.857%), 40.2ms 1.0% (class 40): 47930480 (17.855%), 40.2ms 1.1% (class 43): 47929750 (17.855%), 40.2ms 1.2% (class 48): 47935548 (17.857%), 40.2ms 1.4% (class 55): 47933873 (17.857%), 40.2ms 1.5% (class 60): 47935660 (17.857%), 40.2ms 1.6% (class 63): 47936327 (17.858%), 40.2ms 1.7% (class 64): 47931330 (17.856%), 40.2ms 1.8% (class 75): 47935783 (17.857%), 40.2ms 1.9% (class 76): 47937988 (17.858%), 40.2ms [/CODE] (the source is at [url]https://github.com/preda/gpuowl/[/url] , tf.cpp tf.cl ) |
[QUOTE=preda;493464]expected survivor rate of 17.871%, yet what I count is around 17.859%[/QUOTE]
These are close enough. I don't think I would rely on more than 3 significant digits from the estimate. |
Hi,
we have found the issue in this case: [QUOTE=GP2;492679]I am experimenting with one GPU of a Tesla V100-SXM2-16GB (this is a p3.2xlarge instance on Amazon AWS cloud with Deep Learning Base AMI). Same specs as you listed for the Tesla V100-PCIE-16GB except a slightly faster clock rate: [CODE] clock rate (CUDA cores) 1530MHz [/CODE] It is configurable to use CUDA 9.2.88, by setting the symbolic link /usr/local/cuda mfaktc passes all the Mersenne self tests. However, when I compile an alternate version with -DWAGSTAFF added to CFLAGS, it fails all the Wagstaff self tests. Did you try the Wagstaff self tests on your V100 and do they work for you? Is anything more needed to create a Wagstaff version, other than adding the -DWAGSTAFF flag in CFLAGS? The compilation uses gcc 4.8[/QUOTE] Adding "-DWAGSTAFF" to the CFLAGS in the Makefile is not enough. This compiles the CPU code for Wagstaff numbers but the GPU code defaults to Mersenne numbers. [U]Recommended way to configure mfaktc for Wagstaff numbers is to enable this in src/params.h.[/U] Adding "-DWAGSTAFF" to both, CFLAGS and NVCCFLAGS should work aswell. Oliver |
link error / updated makefile
1 Attachment(s)
Tried to build mfaktc v0.21 unmodified, for CUDA 9.2 on Windows 64-bit, following the directions in the mfaktc readme's compiling on windows section. On a system on which gnu make for Windows had freshly been installed, and also Visual Studio 2017, and no previous builds of mfaktc performed, after one small modification to eliminate a single compile error on mfaktc.c line 995:[CODE] mystuff.selftestrandomoffset = rand() % 25000000 ; // was random() % 25000000 until 8/22/2018 kriesel; random( 25000000 ) gives a different compiler error
[/CODE] and modest changes to the makefile (attached) to update for compute capabilities and proper paths for the CUDA toolkit and Visual Studio locations, I get the following link error, which a web search shows relates to mixing 32-bit and 64-bit. [CODE]link /nologo /LTCG sieve.obj timer.obj parse.obj read_config.obj mfaktc.obj checkpoint.obj signal_handler.obj output.obj tf_72bit.obj tf_96bit.obj tf_barrett96 .obj tf_barrett96_gs.obj gpusieve.obj tf_75bit.obj "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2"\lib\x64\cudart.lib /out:..\mfaktc-win-64.exe fatal error C1905: Front end and back end not compatible (must target same proce ssor). LINK : fatal error LNK1257: code generation failed make: *** [..\mfaktc-win-64.exe] Error 1257[/CODE]So how does one effectively track down what module or whatever is not 64-bit? It sure would be helpful if the linker would say what it's finding that doesn't match the 64-bit directive. (Unfortunately, the Visual Studio command prompt in which the make is run, has too small a screen history buffer and resists attempts to increase it, to catch the thousands of lines of a clean first compile, much less to hold the error messages preceding.) Doing 32-bit compilation is not a viable alternative in the CUDA toolkit 9.2 & VS 2017 combo:[CODE]nvcc -O2 -c tf_72bit.cu -o tf_72bit.obj -ccbin="C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin" -Xcompiler /EHsc,/W3,/nologo,/Ox,/GL -m32 --ptxas-options=-v --generate-code arch=compute_30,code=sm_30 --generate-code arch=compute_35,code=sm_35 --generate-code arch=compute_50,code=sm_50 --generate-code arch=compute_52,code=sm_52 --generate-code arch=compute_61,code=sm_61 --generate-code arch=compute_70,code=sm_70 nvcc fatal : 32 bit compilation is only supported for Microsoft Visual Studio 2013 and earlier make: *** [tf_72bit.obj] Error 1[/CODE]I've been trying to download VS 2013 for days, without success. It's 4.8GB, and my downloads fail at no more than 1.5GB to date (via a crappy ISP that acquires territories and milks them for profit, does not maintain or upgrade them; 3/4Mbps download rate despite their advertising "high speed"; fiber coming by a different provider eventually) |
Can Mfaktc be modified to support [URL="https://primes.utm.edu/top20/page.php?id=16"]Generalized repunit[/URL] trial factoring. The algebraic factors would be similar to mersenne numbers. Any help would be welcomed. Thanks.
|
[QUOTE=Citrix;495953]Can Mfaktc be modified to support [URL="https://primes.utm.edu/top20/page.php?id=16"]Generalized repunit[/URL] trial factoring. The algebraic factors would be similar to mersenne numbers. Any help would be welcomed. Thanks.[/QUOTE]
I would say possibly yes, but not easily. When I converted mfaktc to base 10 repunits I had to calculate the possible modular classes for base 10 and adapt the corresponding parts in the source code. For general repunits I am not sure if there is a generic way of calculating these classes. One option would be to ignore the possible classes and simply test all numbers, thus wasting lots of resources. I guess creating a lookup table for the different base to the corresponding classes is some bigger task, but possible. Also the current mfaktc version is not well suited to handle numbers in the range of general repunits, so one has to create kernels that are faster. I did this already by implementing a kernel for numbers < 64 bits. I can put the source code here if there is some interest. Question to TheJudger: is there an official repository for mfaktc? I found one on github, but it was not created by you. If I would like to post my changes (on an extra branch) where would I do this? |
[QUOTE=MrRepunit;495991]I would say possibly yes, but not easily. When I converted mfaktc to base 10 repunits I had to calculate the possible modular classes for base 10 and adapt the corresponding parts in the source code. For general repunits I am not sure if there is a generic way of calculating these classes. One option would be to ignore the possible classes and simply test all numbers, thus wasting lots of resources. I guess creating a lookup table for the different base to the corresponding classes is some bigger task, but possible.
Also the current mfaktc version is not well suited to handle numbers in the range of general repunits, so one has to create kernels that are faster. I did this already by implementing a kernel for numbers < 64 bits. I can put the source code here if there is some interest. Question to TheJudger: is there an official repository for mfaktc? I found one on github, but it was not created by you. If I would like to post my changes (on an extra branch) where would I do this?[/QUOTE] Wiki page for mfaktc lists [URL="http://www.mersenneforum.org/mfaktc/mfaktc-0.21/mfaktc-0.21.tar.gz"]www.mersenneforum.org/mfaktc/mfaktc-0.21/mfaktc-0.21.tar.gz[/URL] There's also James Heinrich's mirror site, [URL]https://download.mersenne.ca/[/URL] |
[QUOTE=kriesel;496001]Wiki page for mfaktc lists [URL="http://www.mersenneforum.org/mfaktc/mfaktc-0.21/mfaktc-0.21.tar.gz"]www.mersenneforum.org/mfaktc/mfaktc-0.21/mfaktc-0.21.tar.gz[/URL] There's also James Heinrich's mirror site, [URL]https://download.mersenne.ca/[/URL][/QUOTE]
I really meant a source code repository, similar to gpuowl: [URL]https://github.com/preda/gpuowl[/URL] |
[QUOTE=MrRepunit;495991] For general repunits I am not sure if there is a generic way of calculating these classes. One option would be to ignore the possible classes and simply test all numbers, thus wasting lots of resources. I guess creating a lookup table for the different base to the corresponding classes is some bigger task, but possible.
[/QUOTE] For generalized repunits b^p-1 (also for b^p+1 i.e. for negative b) with p prime, they have factors of the form 2*k*p+1 irrespective of b. The only question is what form these factors can take mod 8? I do not have a general rule for this. I am working on a special set of odd bases (for b^p+1 and b^p-1) where the factors are always 1 and 5 (mod 8). Are there any other modular restrictions that I am not aware of? If you could modify your program to help me.. I would appreciate it. I would prefer the compiled .exe instead of the source code. I am afraid I might not be compile the code. Thanks. |
[QUOTE=Citrix;495953]Can Mfaktc be modified to support [URL="https://primes.utm.edu/top20/page.php?id=16"]Generalized repunit[/URL] trial factoring. The algebraic factors would be similar to mersenne numbers. Any help would be welcomed. Thanks.[/QUOTE]
You are probably aware that the standard version of mfaktc was already modified to support Wagstaff numbers (with base b = −2). This was done by putting #ifdef WAGSTAFF lines in the source code rather than as a command line option flag. The set of changes were rather small. So it ought to be feasible. PS, if anyone is actively finding factoring Wagstaff numbers, or has log files from old searches, I am compiling a list of factors. See [URL="https://mersenneforum.org/showthread.php?t=23523"]this thread[/URL] and [URL="http://mprime.s3-website.us-west-1.amazonaws.com/wagstaff/"]this webpage[/URL]. |
1 Attachment(s)
[QUOTE=MrRepunit;496002]I really meant a source code repository, similar to gpuowl: [URL]https://github.com/preda/gpuowl[/URL][/QUOTE]
The source is there. See the screen grab below. From my mfaktc forum notes, some of these posts might be of interest too. 2492 v0.21 release [URL]http://mersenneforum.org/showpost.php?p=395689&postcount=2492[/URL] 2505 tuning advice 2547 mention of a v0.22 in development [URL]http://mersenneforum.org/showpost.php?p=402408&postcount=2547[/URL] 2569 Win XP won't run mfaktc or anything built in VS2012; needs to be VS2010. [URL]http://mersenneforum.org/showpost.php?p=408103&postcount=2569[/URL] 2570 version built for win xp [URL]http://mersenneforum.org/showpost.php?p=408118&postcount=2570[/URL] 2602 NVIDIA bug related to 8.0 and gtx1070/80 2645 cuda 8 v0.21 build [URL]http://mersenneforum.org/showpost.php?p=444025&postcount=2645[/URL] 2663 extra versions: wagstaff and less-classes [URL]http://mersenneforum.org/showpost.php?p=444127&postcount=2663[/URL] 2692 linux x64 cuda8 build such as for gtx1070/80 [URL]http://mersenneforum.org/showpost.php?p=454623&postcount=2692[/URL] 2735 various versions with various minimum exponent described |
[QUOTE=Citrix;496005]For generalized repunits b^p-1 (also for b^p+1 i.e. for negative b) with p prime, they have factors of the form 2*k*p+1 irrespective of b.
The only question is what form these factors can take mod 8? I do not have a general rule for this. I am working on a special set of odd bases (for b^p+1 and b^p-1) where the factors are always 1 and 5 (mod 8). Are there any other modular restrictions that I am not aware of? If you could modify your program to help me.. I would appreciate it. I would prefer the compiled .exe instead of the source code. I am afraid I might not be compile the code. Thanks.[/QUOTE] I try to give you my derivation for base 10 repunits: Similar to Mersenne factors we find for repunits that [TEX]\text{Legendre}(10,p) \equiv 1[/TEX] (10 is a quadratic residue mod p) [TEX]\text{Legendre}(10,p) = \text{Legendre}(2,p)\times\text{Legendre}(5,p)[/TEX] [TEX] \text{Legendre}(2,p) = (-1)^{\frac{p^2-1}{8}} = \begin{cases} +1 & \text{if p\equiv1 or 7 mod 8} \\ -1 & \text{if p\equiv 3 or 5 mod 8} \end{cases} [/TEX] [TEX] \text{Legendre}(5,p) = (-1)^{\text{Floor}(\frac{p+2}{5})} = \begin{cases} +1 & \text{if p\equiv1 or 4 mod 5} \\ -1 & \text{if p\equiv 2 or 3 mod 5} \end{cases} [/TEX] Product must be +1, so either both products of (-1)(-1) and (+1)(+1) are possible. Filtering the possible values mod 40 gives the following allowed values: {1,3,9,13,27,31,37,39} or {±1, ±3, ±9, ±13} for 2kp+1. Thus allowed values for 2kp are: {0,2,8,12,26,30,36,38} See also: [URL]https://math.stackexchange.com/questions/1767306/find-all-prime-p-such-that-legendre-symbol-of-left-frac10p-right-1[/URL] So if you try to adapt it to your numbers I could compile a new version. |
[QUOTE=kriesel;496017]The source is there. See the screen grab below.
[/QUOTE] I know, I have it from there. And I also follow the mfaktc thread for some years. I also suffered from the Cuda 8 bug myself. It was actually me that told TheJudger that the bug in Cuda 8 was fixed after I found that my repunit adaption was working again after a cuda sdk update. What I really mean is that I would like to have a source code versioning system for mfaktc (git, svn, mercurial, ...) where one could see the progress. This allows to create different branches without affecting the original. Optimal would be hosting the source on github or gitlab. I know that there is a mfaktc repository on github, but it is not managed by TheJudger, thus I was asking. |
[QUOTE=MrRepunit;496020]I try to give you my derivation for base 10 repunits:
Similar to Mersenne factors we find for repunits that [TEX]\text{Legendre}(10,p) \equiv 1[/TEX] (10 is a quadratic residue mod p) [TEX]\text{Legendre}(10,p) = \text{Legendre}(2,p)\times\text{Legendre}(5,p)[/TEX] [TEX] \text{Legendre}(2,p) = (-1)^{\frac{p^2-1}{8}} = \begin{cases} +1 & \text{if p\equiv1 or 7 mod 8} \\ -1 & \text{if p\equiv 3 or 5 mod 8} \end{cases} [/TEX] [TEX] \text{Legendre}(5,p) = (-1)^{\text{Floor}(\frac{p+2}{5})} = \begin{cases} +1 & \text{if p\equiv1 or 4 mod 5} \\ -1 & \text{if p\equiv 2 or 3 mod 5} \end{cases} [/TEX] Product must be +1, so either both products of (-1)(-1) and (+1)(+1) are possible. Filtering the possible values mod 40 gives the following allowed values: {1,3,9,13,27,31,37,39} or {±1, ±3, ±9, ±13} for 2kp+1. Thus allowed values for 2kp are: {0,2,8,12,26,30,36,38} See also: [URL]https://math.stackexchange.com/questions/1767306/find-all-prime-p-such-that-legendre-symbol-of-left-frac10p-right-1[/URL] So if you try to adapt it to your numbers I could compile a new version.[/QUOTE] I am working on numbers such that b^p+-1 where p is an odd prime and b=n*p and n is a natural number. From your example the Legendre (b,p)= Legendre (n*p,p)=Legendre (0,p)=0 How do we proceed from here? Thanks. |
[QUOTE=MrRepunit;495991]I would say possibly yes, but not easily. When I converted mfaktc to base 10 repunits I had to calculate the possible modular classes for base 10 and adapt the corresponding parts in the source code.
Also the current mfaktc version is not well suited to handle numbers in the range of general repunits, so one has to create kernels that are faster. I did this already by implementing a kernel for numbers < 64 bits. I can put the source code here if there is some interest.[/QUOTE] Sure, I'd be interested. Modifying the [c]class_needed[/c] function seems simple enough if you just want to hardcode it for some particular base, for instance 3. And the selftest stuff can be temporarily commented out. What other changes did you need to make to the rest of the source code? For instance the part where it checks against b^p − 1 rather than 2^p − 1. Are the faster kernels also applicable to Mersenne testing? Can they simply be contributed to the existing source code? |
[QUOTE=Citrix;496026]I am working on numbers such that b^p+-1 where p is an odd prime and b=n*p and n is a natural number.
From your example the Legendre (b,p)= Legendre (n*p,p)=Legendre (0,p)=0 How do we proceed from here? Thanks.[/QUOTE] Not sure. I am not really fluent in this kind of math (I am a theoretical physicist). My first thought was that we can use Legendre (n*p,p)=Legendre (0,p)*Legendre(n,p). Legendre (0,p) is trivially 0, so that Legendre(n,p) can be anything. But I think that we cannot use this here since we need the Legendre symbol to be equal to 1 to make some useful statements (correct me if I am wrong). |
[QUOTE=MrRepunit;496089]Not sure. I am not really fluent in this kind of math (I am a theoretical physicist). My first thought was that we can use Legendre (n*p,p)=Legendre (0,p)*Legendre(n,p). Legendre (0,p) is trivially 0, so that Legendre(n,p) can be anything.
But I think that we cannot use this here since we need the Legendre symbol to be equal to 1 to make some useful statements (correct me if I am wrong).[/QUOTE] I am more interested in the case where n=1. Looking at the factors of these numbers they are 1 and 7 (mod 8). I do not know how to prove this. If you could compile a version that just restricts the factors to 2*k*p+1, that would be great. |
1 Attachment(s)
[QUOTE=GP2;496087]Sure, I'd be interested.
Modifying the [c]class_needed[/c] function seems simple enough if you just want to hardcode it for some particular base, for instance 3. And the selftest stuff can be temporarily commented out. What other changes did you need to make to the rest of the source code? For instance the part where it checks against b^p − 1 rather than 2^p − 1. Are the faster kernels also applicable to Mersenne testing? Can they simply be contributed to the existing source code?[/QUOTE] I modified [c]class_needed[/c], 10 has to be exponentiated instead of 2, I added a 64 bit shortcut. I guess most of the 64 bit stuff can be used for mersenne numbers, but might need some changes or are missing some minor stuff. I think I removed some bit shift methods since they were not used for base 10. Also a snippet from the readme: - Removed Barrett and 72 bit kernels - Removed Wagstaff related stuff - Added 64 bit kernels - Implemented repunit factorization (hardcoded) - Improved performance compared to older version (0.18-repunit) about 30% - Notes - Compiling with more-classes flag seem to be slightly faster, thus it is switched on - Not tested on Windows yet - GPU sieving utilizes 100% of the GPU, so 1 mfaktc instance is enough - GPU sieving makes the system response slow (tested on Ubuntu 14.04 64 bit with Geforce 460 GTS) Setting GPUSieveSize in mfaktc.ini to 8 or lower makes the system more responsive I did not remove the git directory, so if anybody is interested in the single commits feel free to take a closer look. I also added the linux executable. Not sure if I can quickly provide a windows variant... |
[QUOTE=Citrix;496092]I am more interested in the case where n=1. Looking at the factors of these numbers they are 1 and 7 (mod 8). I do not know how to prove this.
If you could compile a version that just restricts the factors to 2*k*p+1, that would be great.[/QUOTE] I try to do it this weekend, but I can only provide a linux executable quickly if I succeed. Windows will take a bit longer... |
[QUOTE=MrRepunit;496094]I try to do it this weekend, but I can only provide a linux executable quickly if I succeed. Windows will take a bit longer...[/QUOTE]
Thanks. I would need windows, I do not have linux. :no: I will wait, I am not in a hurry. |
[QUOTE=MrRepunit;496093]I modified [c]class_needed[/c], 10 has to be exponentiated instead of 2, I added a 64 bit shortcut. I guess most of the 64 bit stuff can be used for mersenne numbers, but might need some changes or are missing some minor stuff. I think I removed some bit shift methods since they were not used for base 10.
Also a snippet from the readme: - Removed Barrett and 72 bit kernels - Removed Wagstaff related stuff - Added 64 bit kernels - Implemented repunit factorization (hardcoded) - Improved performance compared to older version (0.18-repunit) about 30% - Notes - Compiling with more-classes flag seem to be slightly faster, thus it is switched on - Not tested on Windows yet - GPU sieving utilizes 100% of the GPU, so 1 mfaktc instance is enough - GPU sieving makes the system response slow (tested on Ubuntu 14.04 64 bit with Geforce 460 GTS) Setting GPUSieveSize in mfaktc.ini to 8 or lower makes the system more responsive I did not remove the git directory, so if anybody is interested in the single commits feel free to take a closer look. I also added the linux executable. Not sure if I can quickly provide a windows variant...[/QUOTE] It pleases me to see that this is getting some lattention. At some point, the ceiling value for exponents will need to be increased. If memory serves, the current value is 2[SUP]32[/SUP]-1. A 'wishful thinking' value might be 2[SUP]34[/SUP]-1 Either way, a Windows 64-bit compile would be nice The current version does [U]not[/U] work my GTX 1080 all that hard. Core temps stay in the mid 60's with factory default settings. :smile: |
[QUOTE=storm5510;496247]It pleases me to see that this is getting some lattention.
At some point, the ceiling value for exponents will need to be increased. If memory serves, the current value is 2[SUP]32[/SUP]-1. A 'wishful thinking' value might be 2[SUP]34[/SUP]-1 Either way, a Windows 64-bit compile would be nice The current version does [U]not[/U] work my GTX 1080 all that hard. Core temps stay in the mid 60's with factory default settings. :smile:[/QUOTE] Re exponent limit, I estimate we and our successors have about a century yet before running out of Mersenne hunting work in p<2[SUP]32[/SUP]-5 (the largest prime exponent below 2[SUP]32[/SUP]). There was a discussion a while back about the additional amount of programming required to support exponents above the current level, and it was definitely nontrivial. see starting post 1439 of the mfakto thread [URL]https://mersenneforum.org/showthread.php?t=15646&page=131[/URL] and particularly post 1447. Note also there's no work or results coordination site for Mersenne hunting at exponents above 2[SUP]32[/SUP]. Re your GTX1080, how do you know you don't simply have very good cooling? What does TechPowerUp GPU-Z or CPUID HWMonitor say about gpu % load and reasons for it being less than 95-100%? Are you running high enough bit depths, and/or the less-classes version, or running on a solid state disk, so that I/O is not limiting throughput? |
[QUOTE=kriesel;496266]Note also there's no work or results coordination site for Mersenne hunting at exponents above 2[SUP]32[/SUP].[/QUOTE]Mostly because there's no GPU program I know of that supports large exponents.
But as you say, there's a lot of work before that point. Something in the order of 18,000,000,000,000 GHz-days of TF effort just between 10[sup]9[/sup] and 2[sup]32[/sup] exponents (not counting what's still left in the current <1000M PrimeNet range). |
[QUOTE=James Heinrich;496269]Mostly because there's no GPU program I know of that supports large exponents.
But as you say, there's a lot of work before that point. Something in the order of 18,000,000,000,000 GHz-days of TF effort just between 10[sup]9[/sup] and 2[sup]32[/sup] exponents (not counting what's still left in the current <1000M PrimeNet range).[/QUOTE] So in other words, about 18 billion GTX1080-days, plus about 2 billion for the up to a billion exponent, 20 billion total GTX1080-days, and the aggregate throughput seen by PrimeNet of all of GIMPS amounts to about 150 to 180 GTX1080s at 1000GhzD/day (past day or month). 2x10[SUP]10[/SUP]/180/365 = 304,000 years. We're counting significantly on Moore's Law to get through that in a century or so. (Something above 11 doublings, soon, which is not justified by current feature sizes and practical further scaling downward.) |
[QUOTE=kriesel]Re your GTX1080, how do you know you don't simply have very good cooling? What does TechPowerUp GPU-Z or CPUID HWMonitor say about gpu % load and reasons for it being less than 95-100%? Are you running high enough bit depths, and/or the less-classes version, or running on a solid state disk, so that I/O is not limiting throughput?[/QUOTE]
Less-classes for [I]James Heinrich's[/I] project. GPU-Z says 98% load. I told James I would finish the 3990M group to 2[SUP]71[/SUP], and I will. Very good cooling? This case has four fans which are rather noisy. So, yes, there is good cooling. SSD, no, but the next best thing, a RAM drive, at James' suggestion. The extended limit for factoring was a passing thought. There used to be a member here named Luigi, a.k.a. E.T. Back in 2007, he wrote a little program called [I]Factor5[/I]. It is not used much now. I tried to find a ceiling for what it would accept as an exponent and a bit depth. I stopped trying after giving it a 19-digit exponent and a bit depth of 2[SUP]120[/SUP]. I asked myself why would he make the limits so high. The only answer I could come up with is that there would be no need to modify it again. There is a bit of sense in that. 18-trillion GHz-Days in what's available now. I can believe that. :smile: |
[QUOTE=storm5510;496278]There used to be a member here named Luigi, a.k.a. E.T. Back in 2007, he wrote a little program called [I]Factor5[/I]. It is not used much now. I tried to find a ceiling for what it would accept as an exponent and a bit depth. I stopped trying after giving it a 19-digit exponent and a bit depth of 2[SUP]120[/SUP].
I asked myself why would he make the limits so high. The only answer I could come up with is that there would be no need to modify it again. There is a bit of sense in that. 18-trillion GHz-Days in what's available now. I can believe that. :smile:[/QUOTE] The only limit for Factor5 is the sky... and your RAM. Using mpz_t elementys slows down the calculation, but keeps the app updated with Moore's law. And it's already suitable to sieve the 16 residual classes mod 60 in parallel. So, you can make ypur exponent (and your factor size) grow as long as you have memory to allocate :smile: |
[QUOTE=ET_;496303]The only limit for Factor5 is the sky... and your RAM. Using mpz_t elementys slows down the calculation, but keeps the app updated with Moore's law. And it's already suitable to sieve the 16 residual classes mod 60 in parallel. So, you can make ypur exponent (and your factor size) grow as long as you have memory to allocate :smile:[/QUOTE]
Thank you for the reply. It can really work a CPU, depending on what you give it. It sends the temperature on my i7-7700 into the mid 70's on the C scale when running hard. Question: Why does it say "banned" just below your user name? |
[QUOTE=storm5510;496312]Thank you for the reply.
It can really work a CPU, depending on what you give it. It sends the temperature on my i7-7700 into the mid 70's on the C scale when running hard. Question: Why does it say "banned" just below your user name?[/QUOTE] Oh! I'not sure either :smile: I guess it is a comment of the super-supermod once I was whining too much on a past thread... |
Haven't been here in a while, need your help regarding issues with CUDA driver compatibility.
Recently swapped a 1080TI with a TITAN V, getting [CODE]ERROR: cudaGetLastError() returned 8: invalid device function[/CODE] Output from mfaktc-win-64.exe is as follows: [CODE]Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 3 CPUStreams 3 GridSize 3 GPU Sieving enabled GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits Checkpoints enabled CheckpointDelay 30s WorkFileAddDelay 600s Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID (none) ComputerID (none) AllowSleep no TimeStampInResults no CUDA version info binary compiled for CUDA 8.0 CUDA runtime version 8.0 CUDA driver version 9.20 CUDA device info name TITAN V compute capability 7.0 max threads per block 1024 max shared memory per MP 98304 byte number of multiprocessors 80 clock rate (CUDA cores) 1455MHz memory clock rate: 850MHz memory bus width: 3072 bit Automatic parameters threads per grid 655360 GPUSievePrimes (adjusted) 82486 GPUsieve minimum exponent 1055144 running a simple selftest... ERROR: cudaGetLastError() returned 8: invalid device function [/CODE] Tried: - Clean install of both Display Driver and CUDA - CUDA 8.0 GA1 - CUDA 8.0 GA2 Still displaying CUDA driver version 9.20. Unable to match CUDA driver version with 8.0 despite efforts to reinstall driver. Any help is appreciated! |
[QUOTE=nofaith628;496342]running a simple selftest...
ERROR: cudaGetLastError() returned 8: invalid device function - Clean install of both Display Driver and CUDA - CUDA 8.0 GA1 - CUDA 8.0 GA2 Still displaying CUDA driver version 9.20. Unable to match CUDA driver version with 8.0 despite efforts to reinstall driver. Any help is appreciated![/QUOTE] Are you getting any unexpected system restarts? [I]mfaktc[/I] displays the same, 8.0, 8.0, 9.20 for my 1080 and it runs fine. |
[QUOTE=storm5510;496370]Are you getting any unexpected system restarts?
[I]mfaktc[/I] displays the same, 8.0, 8.0, 9.20 for my 1080 and it runs fine.[/QUOTE] No unexpected system restarts as far I have ran mfaktc on this system. Switched the graphics card back to the 1080TI, running smoothly as always. Switching the card to a TITAN V, the error pops up. On an another machine, mfaktc displays 8.0, 8.0, 9.20 for 1080 and 1080TI, running smoothly without issues. Clean uninstallations of the CUDA driver and the Display Driver with the help of Revo Uninstaller, along with several restarts, and blocking the internet to prevent auto updates, have failed. There must be some files left in the computer that was not deleted during the uninstalling process. The CUDA driver version still shows 9.20. |
@nofaith628
I heartily endorse Revo Uninstaller, which I use. To check on other remnants, you might try jv16 Power Tools. [url]https://www.macecraft.com/download/[/url] I have used it even longer than Revo. It is good at unearthing lingering bits in the registry. |
[QUOTE=nofaith628;496380]No unexpected system restarts as far I have ran mfaktc on this system. Switched the graphics card back to the 1080TI, running smoothly as always. Switching the card to a TITAN V, the error pops up.
On an another machine, mfaktc displays 8.0, 8.0, 9.20 for 1080 and 1080TI, running smoothly without issues. Clean uninstallations of the CUDA driver and the Display Driver with the help of Revo Uninstaller, along with several restarts, and blocking the internet to prevent auto updates, have failed. There must be some files left in the computer that was not deleted during the uninstalling process. The CUDA driver version still shows 9.20.[/QUOTE] Are you sure Windows isn't updating the driver again, perhaps from some local stored content? ("System restore" etc) [URL]https://answers.microsoft.com/en-us/windows/forum/windows_8-hardware/how-to-disable-windows-update-from-auto-updating/8f5a50fd-403b-4207-bcf2-20cd32f4b1e9[/URL] search disabling windows driver updates for other articles that may help |
[QUOTE=kriesel;496384]Are you sure Windows isn't updating the driver again, perhaps from some local stored content? ("System restore" etc)
[URL]https://answers.microsoft.com/en-us/windows/forum/windows_8-hardware/how-to-disable-windows-update-from-auto-updating/8f5a50fd-403b-4207-bcf2-20cd32f4b1e9[/URL] search disabling windows driver updates for other articles that may help[/QUOTE] Why not disable Windows altogether and switch to Linux ? |
[QUOTE=nofaith628;496342]
Tried: - Clean install of both Display Driver and CUDA - CUDA 8.0 GA1 - CUDA 8.0 GA2 Still displaying CUDA driver version 9.20. Unable to match CUDA driver version with 8.0 despite efforts to reinstall driver.[/QUOTE] Won't work with that CUDA version! Your GPU is Volta architecture and thus CUDA 9.0 or newer is needed in general. Due to some bugs in the CUDA compiler you'll need CUDA 9.2.88 or newer for mfaktc + Volta. Maybe some is able to build CUDA 9.2 binaries including CC 7.0. Oliver |
[QUOTE=TheJudger;496390]Won't work with that CUDA version! Your GPU is Volta architecture and thus CUDA 9.0 or newer is needed in general. Due to some bugs in the CUDA compiler you'll need CUDA 9.2.88 or newer for mfaktc + Volta. Maybe some is able to build CUDA 9.2 binaries including CC 7.0.
Oliver[/QUOTE] Thanks for the answer! Is there any method as of now to get mfaktc 0.21 to run with a GPU that is of the Volta Architecture? |
I think Oliver, TheJudger, (the author of mfaktc) told you what was needed. Is mfaktc not running under those conditions?
|
[QUOTE=kladner;496410]I think Oliver, TheJudger, (the author of mfaktc) told you what was needed. Is mfaktc not running under those conditions?[/QUOTE]
[QUOTE=TheJudger]Won't work with that CUDA version! Your GPU is Volta architecture and thus CUDA 9.0 or newer is needed in general. Due to some bugs in the CUDA compiler you'll need CUDA 9.2.88 or newer for mfaktc + Volta. Maybe some is able to build CUDA 9.2 binaries including CC 7.0.[/QUOTE] On an another attempt, I have installed CUDA 10 from the Nvidia website. Supposedly CUDA 10 is a newer version of CUDA 9.2.88. I am not particularly tech savvy with the software side of things, may I ask what this sentence means? [QUOTE]Maybe some is able to build CUDA 9.2 binaries including CC 7.0.[/QUOTE] The error persists, and this is the output from mfaktc: [CODE]mfaktc v0.21 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 3 CPUStreams 3 GridSize 3 GPU Sieving enabled GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits Checkpoints enabled CheckpointDelay 30s WorkFileAddDelay 600s Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID (none) ComputerID (none) AllowSleep no TimeStampInResults no CUDA version info binary compiled for CUDA 8.0 CUDA runtime version 8.0 CUDA driver version 10.0 CUDA device info name TITAN V compute capability 7.0 max threads per block 1024 max shared memory per MP 98304 byte number of multiprocessors 80 clock rate (CUDA cores) 1455MHz memory clock rate: 850MHz memory bus width: 3072 bit Automatic parameters threads per grid 655360 GPUSievePrimes (adjusted) 82486 GPUsieve minimum exponent 1055144 running a simple selftest... ERROR: cudaGetLastError() returned 8: invalid device function[/CODE] Still no luck. |
[QUOTE=nofaith628;496417]On an another attempt, I have installed CUDA 10 from the Nvidia website. Supposedly CUDA 10 is a newer version of CUDA 9.2.88. I am not particularly tech savvy with the software side of things, may I ask what this sentence means?
The error persists, and this is the output from mfaktc: [CODE]mfaktc v0.21 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 3 CPUStreams 3 GridSize 3 GPU Sieving enabled GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits Checkpoints enabled CheckpointDelay 30s WorkFileAddDelay 600s Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID (none) ComputerID (none) AllowSleep no TimeStampInResults no CUDA version info binary compiled for CUDA [COLOR="red"]8.0[/COLOR] CUDA runtime version [COLOR="Red"]8.0[/COLOR] CUDA driver version [COLOR="SeaGreen"]10.0[/COLOR] CUDA device info name TITAN V compute capability 7.0 max threads per block 1024 max shared memory per MP 98304 byte number of multiprocessors 80 clock rate (CUDA cores) 1455MHz memory clock rate: 850MHz memory bus width: 3072 bit Automatic parameters threads per grid 655360 GPUSievePrimes (adjusted) 82486 GPUsieve minimum exponent 1055144 running a simple selftest... ERROR: cudaGetLastError() returned 8: invalid device function[/CODE] Still no luck.[/QUOTE] It looks like you either installed the drivers of CUDA 10, but not the sdk, or your old installation was not overwritten, and the environment variables still pick on the older sdk... |
[QUOTE=ET_;496420]It looks like you either installed the drivers of CUDA 10, but not the sdk, or your old installation was not overwritten, and the environment variables still pick on the older sdk...[/QUOTE]
Thanks for the heads up. CUDA 10 has been installed from [url]https://developer.nvidia.com/cuda-downloads[/url], all previous versions of CUDA were removed and residual files were taken care of using Revo Uninstaller. To no avail, mfaktc still outputs the following error: [CODE]ERROR: cudaGetLastError() returned 8: invalid device function[/CODE] Is there perhaps an installation that I am missing? If so, can you please point me in the correct direction. |
[QUOTE=nofaith628;496426]Thanks for the heads up. CUDA 10 has been installed from [URL]https://developer.nvidia.com/cuda-downloads[/URL], all previous versions of CUDA were removed and residual files were taken care of using Revo Uninstaller.
To no avail, mfaktc still outputs the following error: [CODE]ERROR: cudaGetLastError() returned 8: invalid device function[/CODE]Is there perhaps an installation that I am missing? If so, can you please point me in the correct direction.[/QUOTE] If you use the Event Viewer and look in Windows Logs > Applications, there might be something there more revealing. |
Hey...
it seems that the new 20xx Nvidia series has new instructions... |quote] - the 32-bit integer multiply was a multiple instruction (about 3 simple inst.) and now it is a single instruction. [/quote] Would that help mfaktc? |
[QUOTE=firejuggler;496635]
Would that help mfaktc?[/QUOTE] Greatly |
[QUOTE=firejuggler;496635]Would that help mfaktc?[/QUOTE]
Hint: check performance data of Volta. Oliver |
Same error on GTX 2080 as Titan V, as shold be expected.
[CODE]d:\TeMp\MFakt>mfaktc-win-64.exe -st mfaktc v0.21 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 3 CPUStreams 3 GridSize 3 GPU Sieving enabled GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits Checkpoints enabled CheckpointDelay 30s WorkFileAddDelay 600s Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID (none) ComputerID (none) AllowSleep no TimeStampInResults no CUDA version info binary compiled for CUDA 8.0 CUDA runtime version 8.0 CUDA driver version 10.0 CUDA device info name GeForce RTX 2080 compute capability 7.5 max threads per block 1024 max shared memory per MP 65536 byte number of multiprocessors 46 clock rate (CUDA cores) 1860MHz memory clock rate: 7000MHz memory bus width: 256 bit Automatic parameters threads per grid 753664 GPUSievePrimes (adjusted) 82486 GPUsieve minimum exponent 1055144 ########## testcase 1/2867 ########## Starting trial factoring M50804297 from 2^67 to 2^68 (0.59 GHz-days) Using GPU kernel "75bit_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Sep 25 17:50 | 3387 0.1% | 0.025 n.a. | n.a. 82485 n.a.% ERROR: cudaGetLastError() returned 8: invalid device function[/CODE] |
[QUOTE=Honza;496755]Same error on GTX 2080 as Titan V, as shold be expected.
[B]ERROR: cudaGetLastError() returned 8: invalid device function[/B][/QUOTE] I had this same exact error on two other cards, GTX 480 and GTX 750Ti, when I tried to overclock them. I have a feeling this has something to do with [I]mfaktc[/I] itself. It can only handle so much speed. Q.: Have you tried running [I]CUDALucas[/I] or [I]CUDAPm1[/I]? |
[QUOTE=storm5510;496766]I had this same exact error on two other cards, GTX 480 and GTX 750Ti, [U]when I tried to overclock them[/U]. I have a feeling this has something to do with [I]mfaktc[/I] itself. It can only handle so much speed.
Q.: Have you tried running [I]CUDALucas[/I] or [I]CUDAPm1[/I]?[/QUOTE] How high did you get with the GTX480? |
[QUOTE=storm5510;496766]Q.: Have you tried running [I]CUDALucas[/I] or [I]CUDAPm1[/I]?[/QUOTE]
Nope. I would need a link to Windows binary... |
Your one-stop-shop for mersenne binaries/source: [url]https://download.mersenne.ca/[/url]
No guarantees though that the latest mirrored versions support the RTX 2080. |
[QUOTE=Honza;496794]Nope.
I would need a link to Windows binary...[/QUOTE] See the last page in "Available software" thread ("sticky", top of the list in GPU Computing) or go directly to [url]https://www.mersenneforum.org/showthread.php?p=488291#post488291[/url] |
For 2080 he needs a binary with CUDA 9.2.0.88 or newer, and no one has compiled a Windows binary.
I tried to compile for my old Titan Black but cannot get it to compile. |
[QUOTE=kladner;496771]How high did you get with the GTX480?[/QUOTE]
I didn't get very far. It was a bit odd. [I]GPU-Z[/I] reported its base frequency as 700 MHz. [I]Afterburner[/I] said 1400 MHz. At its base, [I]mfaktc[/I] would run 330 GHz-d/day. The most I ever got was 390 GHz-d/day. Beyond that, error. I had to reboot to get it to reset because it would drop way down. Something like 325 MHz. |
[QUOTE=ATH;496817]For 2080 he needs a binary with CUDA 9.2.0.88 or newer, and no one has compiled a Windows binary.
I tried to compile for my old Titan Black but cannot get it to compile.[/QUOTE] Right, thanks for the reminder on CUDA level. If anyone gets a version compiled above CUDA 8.0, please share and notify James Heinrich and me, so that the mirror and the available software list get updated. |
1 Attachment(s)
As you may have read in my previous call for help, my switch from GTX 1080TI to a Titan V has resulted in errors.
Previous attempts to assuage this issue: [CODE]ERROR: cudaGetLastError() returned 8: invalid device function[/CODE] have failed, regardless of attempts to reinstall, clean install and uninstall display driver and CUDA, as well as tweaking settings in mfaktc.ini. Without any prerequisite knowledge on computer programming and compilation. I have managed to compile a [B]non-optimized[/B] version of mfaktc 10.0, it currently works with the Titan V. The GHz-days output is not good, but it works. As for the rest, I have discussed with Oilver (TheJudger) that he may release an optimized mfaktc for the Turing architecture in the near future, there are no plans to optimize the code specifically for Volta Architecture as it has a very high entry price. |
I guess old cudart32_80.dll and cudart64_80.dll needs to be updated with cudart32_100.dll and cudart64_100.dll
|
[QUOTE=nofaith628;496845]As you may have read in my previous call for help, my switch from GTX 1080TI to a Titan V has resulted in errors.
Previous attempts to assuage this issue: [CODE]ERROR: cudaGetLastError() returned 8: invalid device function[/CODE]have failed, regardless of attempts to reinstall, clean install and uninstall display driver and CUDA, as well as tweaking settings in mfaktc.ini. Without any prerequisite knowledge on computer programming and compilation. I have managed to compile a [B]non-optimized[/B] version of mfaktc 10.0, it currently works with the Titan V. The GHz-days output is not good, but it works. As for the rest, I have discussed with Oliver (TheJudger) that he may release an optimized mfaktc for the Turing architecture in the near future, there are no plans to optimize the code specifically for Volta Architecture as it has a very high entry price.[/QUOTE] Thanks for sharing the build. I'm guessing here, that you meant something like mfaktc version 0.21, compiled for Windows 64 bit, and for CUDA 10. (There was no 32-bit CUDA, only 64-bit, beginning at CUDA version 8.0, as I recall; highest version of mfaktc I've seen previously was v0.21.) [URL]https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html[/URL] says of CUDA 10 "32-bit tools are no longer supported..." The CUDA 10 download page confirms x86_64 is available and win32 is not. Including the CUDA 10 runtime dll in the zip file would be a plus. |
Hi all,
It´s been a long time since I ran mfakt on any of my machines. I am now intending to start running it on a GTX 1060, but I must confess I´m a bit off as to the recommended CUDA version / mfakt version. I don´t have the means to do any compilation myself, so I would kindly request any willing member of this forum to point me to the right binaries. I am using Windows 10. Many thanks. |
[QUOTE=lycorn;496932]Hi all,
It´s been a long time since I ran mfakt on any of my machines. I am now intending to start running it on a GTX 1060, but I must confess I´m a bit off as to the recommended CUDA version / mfakt version. I don´t have the means to do any compilation myself, so I would kindly request any willing member of this forum to point me to the right binaries. I am using Windows 10. Many thanks.[/QUOTE] First line following is generated by a batch file. [CODE]mfaktc-win-64.LessClasses-CUDA8.exe (re)launch at Mon 12/04/2017 10:46:19.80 count 0 mfaktc v0.21 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits SIEVE_SPLIT 250 MORE_CLASSES disabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 3 CPUStreams 3 GridSize 3 GPUSievePrimes 82486 GPUSieveSize 64Mi bits GPUSieveProcessSize 16Ki bits Checkpoints enabled CheckpointDelay 300s WARNING: Cannot read WorkFileAddDelay from mfaktc.ini, set to 600s by default WorkFileAddDelay 600s Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID Kriesel ComputerID condor-gtx1060 ProgressHeader "Date Time | class Pct | time ETA | GHz-d/day Sieve Wait" ProgressFormat "%d %T | %C %p%% | %t %e | %g %s %W%%" AllowSleep no TimeStampInResults yes CUDA version info binary compiled for CUDA 8.0 CUDA runtime version 8.0 CUDA driver version 8.0 CUDA device info name GeForce GTX 1060 3GB compute capability 6.1 max threads per block 1024 max shared memory per MP 98304 byte number of multiprocessors 9 clock rate (CUDA cores) 1771MHz memory clock rate: 4004MHz memory bus width: 192 bit Automatic parameters threads per grid 589824 random selftest offset 23085 GPUSievePrimes (adjusted) 82486 GPUsieve minimum exponent 1055144 running a simple selftest... Selftest statistics number of tests 107 successfull tests 107 selftest PASSED![/CODE] |
The [i]LessClasses[/i] version should only be used for extremely-fast-running assignments (where each assignment only takes a few seconds).
mfaktc can be download from [url=https://mersenneforum.org/mfaktc/mfaktc-0.21/]here[/url] or [url=https://download.mersenne.ca/mfaktc/mfaktc-0.21]here[/url]. |
[QUOTE=Honza;496893]I guess old cudart32_80.dll and cudart64_80.dll needs to be updated with cudart32_100.dll and cudart64_100.dll[/QUOTE]CUDA DLLs can be found [url=https://download.mersenne.ca/CUDA-DLLs]here[/url], if needed. Or you can download the toolkit from [url]https://developer.nvidia.com/cuda-toolkit[/url], install just the libraries you need, and grab the DLLs from where you installed it (by default, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin).
|
[QUOTE=James Heinrich;496947]The [I]LessClasses[/I] version should only be used for extremely-fast-running assignments (where each assignment only takes a few seconds).
[/QUOTE] Why? I've been running it on high exponents and ordinary assignments on multiple gpus for months. Following is on a gtx1070, one of two instances running on it. [CODE]Sep 27 14:04 | 405 96.9% | 271.26 13m34s | 293.35 82485 n.a.% Sep 27 14:08 | 408 97.9% | 271.00 9m02s | 293.64 82485 n.a.% Sep 27 14:13 | 413 99.0% | 271.10 4m31s | 293.53 82485 n.a.% Sep 27 14:17 | 416 100.0% | 271.27 0m00s | 293.35 82485 n.a.% no factor for M173090623 from 2^76 to 2^77 [mfaktc 0.21 barrett87_mul32_gs] tf(): total time spent: 7h 13m 50.524s Starting trial factoring M173090623 from 2^77 to 2^78 (176.83 GHz-days) k_min = 436521993025020 k_max = 873043986050178 Using GPU kernel "barrett87_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Sep 27 14:26 | 0 1.0% | 542.32 14h18m | 293.46 82485 n.a.% Sep 27 14:35 | 5 2.1% | 542.49 14h09m | 293.37 82485 n.a.% Sep 27 14:44 | 8 3.1% | 542.39 14h00m | 293.42 82485 n.a.% Sep 27 14:53 | 12 4.2% | 542.38 13h51m | 293.43 82485 n.a.%[/CODE] |
[QUOTE=kriesel;496953]Why? I've been running it on high exponents and ordinary assignments on multiple gpus for months.[/QUOTE]Someone else can explain the mechanics better than I, but the more classes the more candidates are filtered out prior to testing. The extra overhead to do this is not worth it for [i]very[/i] fast-running assignments, but it is beneficial for any "normal" TF assignment.
edit: from the mfakto readme:[quote]MoreClasses is a switch for defining if 420 (2*2*3*5*7) or 4620 (2*2*3*5*7*11) classes of factor candidates should be used. Normally, 4620 gives better results but for very small classes 420 reduces the class initialization overhead enough to provide an overall benefit.[/quote]To clarify: mfakto allows this to be set in the ini file, whereas mfaktc is hardcoded to 4620 classes (unless you explicitly use the LessClasses version which is hardcoded to 420 classes). You can easily run a quick test: using the same ini settings try running both the normal and LessClasses version of mfaktc and compare the throughput of each. |
[QUOTE=James Heinrich;496956]Someone else can explain the mechanics better than I, but the more classes the more candidates are filtered out prior to testing. The extra overhead to do this is not worth it for [I]very[/I] fast-running assignments, but it is beneficial for any "normal" TF assignment.
edit: from the mfakto readme:To clarify: mfakto allows this to be set in the ini file, whereas mfaktc is hardcoded to 4620 classes (unless you explicitly use the LessClasses version which is hardcoded to 420 classes). You can easily run a quick test: using the same ini settings try running both the normal and LessClasses version of mfaktc and compare the throughput of each.[/QUOTE] Thanks. On the 3GB GTX1060, I found regular CUDA8 gives about 2% higher throughput initially, 1% later, than the less-classes CUDA8, at the costs of restart of the current exponent/bit level (ignoring the existing checkpoint file) and much more rapid log file growth. There's no warning about the restart of bit level. In this case it cost 4.5 hours of throughput. There are cases where it could cost weeks. (GPU-Z indicates power, thermal, vrel are limiting performance.) less-classes (420): [CODE]Starting trial factoring M172926979 from 2^76 to 2^77 (88.50 GHz-days) k_min = 218467540932060 k_max = 436935081864318 Using GPU kernel "barrett87_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Sep 28 03:56 | 0 1.0% | 181.20 4h46m | 439.57 82485 n.a.% Sep 28 03:59 | 5 2.1% | 181.24 4h43m | 439.49 82485 n.a.% ... Sep 28 08:19 | 380 91.7% | 181.24 24m10s | 439.48 82485 n.a.% Sep 28 08:22 | 384 92.7% | 181.26 21m09s | 439.44 82485 n.a.% Sep 28 08:25 | 389 93.8% | 181.21 18m07s | 439.55 82485 n.a.% Sep 28 08:28 | 392 94.8% | 181.52 15m08s | 438.81 82485 n.a.% received signal "SIGINT" mfaktc will exit once the current class is finished. press ^C again to exit immediately Sep 28 08:31 | 396 95.8% | 181.00 12m04s | 440.06 82485 n.a.%[/CODE]4620 classes: [CODE]got assignment: exp=172926979 bit_min=76 bit_max=78 (265.50 GHz-days) Starting trial factoring M172926979 from 2^76 to 2^77 (88.50 GHz-days) k_min = 218467540931640 k_max = 436935081864318 Using GPU kernel "barrett87_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Sep 28 08:34 | 0 0.1% | 17.811 4h44m | 447.20 82485 n.a.% Sep 28 08:34 | 5 0.2% | 17.797 4h44m | 447.55 82485 n.a.% Sep 28 08:34 | 9 0.3% | 17.769 4h43m | 448.26 82485 n.a.% Sep 28 08:34 | 20 0.4% | 17.840 4h44m | 446.47 82485 n.a.% ... Sep 28 08:53 | 321 7.0% | 17.971 4h27m | 443.22 82485 n.a.% Sep 28 08:54 | 324 7.1% | 17.947 4h26m | 443.81 82485 n.a.% Sep 28 08:54 | 329 7.2% | 17.928 4h26m | 444.28 82485 n.a.% Sep 28 08:54 | 336 7.3% | 17.901 4h25m | 444.95 82485 n.a.% Sep 28 08:54 | 341 7.4% | 17.916 4h25m | 444.58 82485 n.a.% [/CODE] |
[QUOTE=kriesel;496996]There's no warning about the restart of bit level.[/QUOTE]Sorry, I guess I should have more explicitly warned you that the checkpoint files would not be cross-compatible between the 420-class and 4620-class implementations.
|
Thank you all for your answers.
Up and running. It´s nice to be "back in business"... |
[QUOTE=James Heinrich;496948]CUDA DLLs can be found [url=https://download.mersenne.ca/CUDA-DLLs]here[/url], if needed.[/QUOTE]
Could you make sub-dirs arranged by CUDA SDK version? Last time I needed a CUDA-DLL for factoring on my GTX1080Ti I pretty much downloaded them all cause I was unsure which ones I needed :blush: . Sorry for wasting your bandwidth! Later I found out that CUDA capability of the card =/= CUDA SDK version. Still I find it a bit confusing that you need to compile for different architectures, right? A mfaktc compile with CUDA SDK 10, GTX980 won't work on a GTX1080 right? Cause the architecture/CUDA capability of the GTX1080 is higher (and somehow not backwards compatibe?) Or am I just being ignorent? |
[QUOTE=James Heinrich;496947]The [i]LessClasses[/i] version should only be used for extremely-fast-running assignments (where each assignment only takes a few seconds).
mfaktc can be download from [url=https://mersenneforum.org/mfaktc/mfaktc-0.21/]here[/url] or [url=https://download.mersenne.ca/mfaktc/mfaktc-0.21]here[/url].[/QUOTE] The CUDA 10 binary does not work with just the .dll files, it wants CUDA 10 installed (I have CUDA 9.2 installed): "ERROR: current CUDA driver version is lower than the CUDA toolkit version used during compile! Please update your graphics driver." I think the other binaries work with just the dll files. |
[QUOTE=VictordeHolland;497052]Still I find it a bit confusing that you need to compile for different architectures, right? A mfaktc compile with CUDA SDK 10, GTX980 won't work on a GTX1080 right? Cause the architecture/CUDA capability of the GTX1080 is higher (and somehow not backwards compatibe?) Or am I just being ignorent?[/QUOTE]
It is not that bad.[LIST][*]the CUDA [U]runtime DLL[/U] must be [U]exactly the same version[/U] used during compilation of mfaktc[*]the [U]driver[/U] of your system must support the [U]same or newer version[/U] of CUDA used for compiling mfaktc.[*]the binary (mfaktc.exe) must have support for your GPU. A single binary can support multiple GPU architectures, e.g. the CUDA 8 binary found [URL="https://mersenneforum.org/mfaktc/mfaktc-0.21/"]here[/URL] are compiled for Fermi, Kepler, Kepler "update", Maxwell and Pascal.[/LIST] Oliver |
JFYI: just built some Windows binaries using CUDA Toolkit 10.0. Will do some testing and provide binaries after sucessfull testing.
Oliver |
If you have time could you please also post a guide describing how you compile it for Windows.
|
[QUOTE=ATH;497142]If you have time could you please also post a guide describing how you compile it for Windows.[/QUOTE]
The readme.txt for mfaktc v0.21 CUDA 8.0 says [CODE]############################# # 2.2 Compilation (Windows) # ############################# The following instructions have been tested on Windows 7 64bit using Visual Studio 2012 Professional. A GNU compatible version of make is also required as the Makefile is not compatible with nmake. GNU Make for Win32 can be downloaded from http://gnuwin32.sourceforge.net/packages/make.htm. Run the Visual Studio 2012 x64 Win64 Command Prompt for x64 or Run the Visual Studio 2012 x86 Native Tools Command Prompt for x86 (32 bit) and change into the "\src" subdirectory. Run 'make -f Makefile.win' for a 64bit built (recommended on 64bit systems) or 'make -f Makefile.win32' for a 32bit built. You will have to adjust the paths to your CUDA installation and the Microsoft Visual Studio binaries in the makefiles if you have something other than CUDA 8.0 and MSVS 2012. The binaries "mfaktc-win-64.exe" or "mfaktc-win-32.exe" are placed in the parent directory.[/CODE] Presumably you're asking for an update or more detail. |
Hello!
[QUOTE=ATH;497142]If you have time could you please also post a guide describing how you compile it for Windows.[/QUOTE] [LIST=1][*]Installed [URL="https://visualstudio.microsoft.com/de/downloads/"]Visual Studio 2017.8 "Community"[/URL][*]Installed [URL="https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64"]CUDA Toolkit 10 for Windows[/URL][*]Installed [URL="http://www.mingw.org/"]MinGW[/URL] as on of many options for [I]GNU Make[/I] on Windows. In MinGW folder I've copied [I]bin/mingw32-make.exe[/I] to [I]bin/make.exe[/I] because I'm lazy. Careful when updating [I]mingw32-make.exe[/I]...[*]Configure Environment for [I]"x64 Native Tools-Command Promt"[/I] - add MinGW/bin and CUDA/bin to PATH variable.[/LIST] The just open [I]"x64 Native Tools-Command Promt"[/I] and change into the directory with the mfaktc source files and run[CODE] make -f Makefile.win[/CODE] I had to adjust some settings in Makefile.win:[CODE] CUDA_DIR = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0" CC = cl CFLAGS = /Ox /Oy /GL /W2 /fp:fast /I$(CUDA_DIR)\include /I$(CUDA_DIR)\include\cudart /nologo NVCCFLAGS = --ptxas-options=-v CUFLAGS = -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\bin\Hostx86\x64" -x cu -I$(CUDA_DIR)\/include --machine 64 --compile -Xcompiler "/wd 4819" -DWIN64 -Xcompiler "/EHsc /W3 /nologo /O2 /FS" $(NVCCFLAGS) # generate code for various compute capabilities # NVCCFLAGS += --generate-code arch=compute_11,code=sm_11 # CC 1.1, 1.2 and 1.3 GPUs will use this code (1.0 is not possible for mfaktc) # NVCCFLAGS += --generate-code arch=compute_20,code=sm_20 # CC 2.x GPUs will use this code, one code fits all! NVCCFLAGS += --generate-code arch=compute_30,code=sm_30 # all CC 3.x GPUs _COULD_ use this code NVCCFLAGS += --generate-code arch=compute_35,code=sm_35 # but CC 3.5 (3.2?) _CAN_ use funnel shift which is useful for mfaktc NVCCFLAGS += --generate-code arch=compute_50,code=sm_50 # CC 5.x GPUs will use this code NVCCFLAGS += --generate-code arch=compute_60,code=sm_60 # CC 6.x GPUs will use this code NVCCFLAGS += --generate-code arch=compute_70,code=sm_70 # CC 7.x GPUs will use this code # NVCCFLAGS += --generate-code arch=compute_75,code=sm_75 # CC 7.5 GPUs will use this code[/CODE] Oliver |
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.