![]() |
[QUOTE=Cubox;481580]I saw those, and do not wish to use them. I would like to ensure the software I run is updated. This is why I am asking here about updates to this code.
MSI GTX 1070 8G I am running CUDALucas2.06beta at the moment, doing some double checking LLs. The card is stable-ish. Over the 53 DC I have done, only 3 (updated, was 4 before edit) were bad. (One was a stupid overclock I did). I am willing to compile my binaries and/or help with testing updated code if you have patches.[/QUOTE] As far as I know, v0.20, approx Nov 2013, is the latest available executable for Windows. There was something dated June 2015 for linux. Thanks for volunteering to help change that. What programming experience do you have? Are you familiar with posting code on sourceforge? First step is to get the development environment together, and demonstrate to yourself that you can compile and link gpu code and produce something functional. (That doesn't have to be CUDAPm1 initially; could be CUDALucas or mfaktc, or any tiny demo CUDA app for quick turnaround.) I suggest aiming for CUDA6.5 or CUDA8.0, 64-bit Windows executables. (I've seen speed advantages with CUDA6.x over other versions, in CUDALucas with extensive benchmarking. Driver version didn't make any detectable difference. But it can vary vs. card.) The GTX1070 requires CUDA 8, as I recall. A lot of us have older cards that perform faster at lower CUDA levels. I think NVIDIA CUDA SDK; MS VC Community Edition. Perhaps Jerry (flashjh) could advise how to set up for multiple CUDA levels. Then we can get into developing a v0.21 beta with some minor tweaks and bug fixes, and go from there. Six percent bad runs seems a bit high to me (3/53) |
[QUOTE=kriesel;481604]As far as I know, v0.20, approx Nov 2013, is the latest available executable for Windows. There was something dated June 2015 for linux. Thanks for volunteering to help change that.
What programming experience do you have? Are you familiar with posting code on sourceforge? First step is to get the development environment together, and demonstrate to yourself that you can compile and link gpu code and produce something functional. (That doesn't have to be CUDAPm1 initially; could be CUDALucas or mfaktc, or any tiny demo CUDA app for quick turnaround.) I suggest aiming for CUDA6.5 or CUDA8.0, 64-bit Windows executables. (I've seen speed advantages with CUDA6.x over other versions, in CUDALucas with extensive benchmarking. Driver version didn't make any detectable difference. But it can vary vs. card.) The GTX1070 requires CUDA 8, as I recall. A lot of us have older cards that perform faster at lower CUDA levels. I think NVIDIA CUDA SDK; MS VC Community Edition. Perhaps Jerry (flashjh) could advise how to set up for multiple CUDA levels. Then we can get into developing a v0.21 beta with some minor tweaks and bug fixes, and go from there. Six percent bad runs seems a bit high to me (3/53)[/QUOTE] I am good with C, kinda good with C++, used to work on Linux and OSX, not Windows. I know all about posting source on Github. I'll try to go compile the latest CUDALucas. I will keep you updated, however due to my free time being an unknown quantity, I might take a few days. |
[QUOTE=Cubox;481663]I will keep you updated, however due to my free time being an unknown quantity, I might take a few days.[/QUOTE]
No problem, I can relate. Some things have waited nearly 5 years, some longer, they can wait a few more days or weeks. |
cudapm1 images
[QUOTE=James Heinrich;481536]Windows binaries for CudaPM1 are available at [URL]https://download.mersenne.ca/[/URL] but they're 5 years old.[/QUOTE]
This looks rather comprehensive for Windows binaries, and apparently contains no linux executables. Clicking on the link at mersenne.ca, [url]http://www.mersenneforum.org/CUDAPm1/[/url], I get a 404 error. The June 23 2015 Linux build is on sourceforge but not on mersenne.ca. I wonder if that linux version is the only build with [r52] "reduced register use on square kernel", since that sourceforge entry is dated Nov 25 2013, slightly after the newest Windows build (Nov 18 2013). [URL]https://sourceforge.net/p/cudapm1/code/HEAD/tree/trunk/[/URL] The wiki page at [url]http://mersennewiki.org/index.php/CUDAPm1[/url] is not an article (yet?), so much as 3 links, to James' mirror, the SourceForge folder, and this discussion thread. |
[QUOTE=Cubox;481581]The CUDAp-1 software mentioned in your list of mersenne hunting software pdf (very useful for newcomers!) states Jan 2016 as 'Approx date' for CUDAp-1.
[URL]https://sourceforge.net/projects/cudapm1/files/[/URL] has last code update in 2013, last binaries are from 2013 as well.[/QUOTE] Sorry, Jan 2016 in the CUDAPm1 date cell was probably a late-night-edit-error. (clLucas not CUDAPm1 as I recall.) See post 503 in this thread for a hopefully more accurate reflection of the latest CUDAPm1 versions currently available. I'll fix the pdf soon. (Then, hopefully, you'll make it obsolete, by producing something newer...) |
CUDAPm1 bug and wish list update
1 Attachment(s)
Here is today's version of the list I am maintaining. As always, this is in appreciation of the authors' past contributions. Users may want to browse this for workarounds included in some of the descriptions, and for an awareness of some known pitfalls. Please respond with any comments, additions or suggestions you may have.
|
The current version seems to be working on the GTX1080 Ti with W10 x64 (didn't do any extensive tests or performance optimalisations)
[code] C:\CUDAPm1_v0.20>CUDAPm1_v0.20.exe 60593041, -b1 1000 CUDAPm1 v0.20 Warning: Couldn't parse ini file option Threads; using default: 256 Warning: Couldn't parse ini file option CheckRoundoffAllIterations; using default: off Warning: Couldn't parse ini file option Polite; using default: 1 Warning: Couldn't parse ini file option DeviceNumber; using default: 0 Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt" Warning: Couldn't parse ini file option ResultsFile; using default "results.txt" Warning: Couldn't parse ini file option UnusedMem; using default. CUDA reports 9310M of 11264M GPU memory free. Index 50 No GeForce GTX 1080 Ti threads.txt file found. Using default thread sizes. For optimal thread selection, please run ./CUDAPm1 -cufftbench 3584 3584 r for some small r, 0 < r < 6 e.g. Using threads: norm1 256, mult 128, norm2 128. Using up to 4284M GPU memory. Starting stage 1 P-1, M60593041, B1 = 1000, B2 = 13320000, fft length = 3584K Doing 1475 iterations Running careful round off test for 1000 iterations. If average error > 0.25, the test will restart with a longer FFT. Iteration 100, average error = 0.01770, max error = 0.02539 Iteration 200, average error = 0.02034, max error = 0.02734 Iteration 300, average error = 0.02122, max error = 0.02734 Iteration 400, average error = 0.02165, max error = 0.02637 Iteration 500, average error = 0.02194, max error = 0.02734 Iteration 600, average error = 0.02210, max error = 0.02686 Iteration 700, average error = 0.02226, max error = 0.02734 Iteration 800, average error = 0.02232, max error = 0.02637 Iteration 900, average error = 0.02238, max error = 0.02637 Iteration 1000, average error = 0.02240 <= 0.25 (max error = 0.02734), continuing test. M60593041, 0x962b95049cafb7d9, n = 3584K, CUDAPm1 v0.20 Stage 1 complete, estimated total time = 0:03 Starting stage 1 gcd. M60593041 has a factor: 2105528336291622770155712978260232660484461209 (P-1, B1=1000, B2=1000, e=0, n=3584K CUDAPm1 v0.20) [/code] fft bench: [code] Device GeForce GTX 1080 Ti Compatibility 6.1 clockRate (MHz) 1607 memClockRate (MHz) 5505 fft max exp ms/iter 1 22133 0.0355 2 43633 0.0390 4 85933 0.0478 32 657719 0.0693 44 898213 0.0791 64 1296011 0.0839 81 1631969 0.0987 96 1927129 0.0989 112 2240863 0.1025 128 2553659 0.1204 160 3176779 0.1251 200 3951977 0.1446 224 4415431 0.1553 256 5031737 0.1925 288 5646379 0.2212 294 5761451 0.2562 320 6259537 0.2708 324 6336103 0.2832 392 7634537 0.3099 400 7786967 0.3304 448 8700169 0.3338 512 9914521 0.3805 576 11125619 0.4453 648 12484649 0.5054 686 13200581 0.5413 800 15343429 0.5486 864 16543493 0.6236 1024 19535569 0.6952 1080 20580341 0.8218 1120 21325891 0.8564 1152 21921901 0.8756 1176 22368691 0.9074 1296 24599717 0.9129 1372 26010389 1.0312 1568 29640913 1.0384 1600 30232693 1.0678 1728 32597297 1.1680 1792 33778141 1.2742 2048 38492887 1.2833 2160 40551479 1.5437 2304 43194913 1.5569 2560 47885689 1.7060 2592 48471289 1.7171 2625 49075057 1.9772 2688 50227213 1.9787 2744 51250889 1.9848 2800 52274087 2.0086 3136 58404433 2.0353 3200 59570449 2.2746 3240 60298969 2.2818 3584 66556463 2.3477 4096 75846319 2.5299 4608 85111207 3.0311 4800 88579669 3.3866 5120 94353877 3.3908 5184 95507747 3.4069 5292 97454309 3.8099 5600 103000823 3.8417 5832 107174381 4.0325 6144 112781477 4.1750 6272 115080019 4.2456 6400 117377567 4.4651 6480 118813021 4.5797 6912 126558077 4.6116 7168 131142761 4.7072 7200 131715607 4.9283 8192 149447533 5.1292 [/code] |
gtx1070 for comparison
[QUOTE=VictordeHolland;482568]The current version seems to be working on the GTX1080 Ti with W10 x64 (didn't do any extensive tests or performance optimalisations)[/QUOTE]
Looks like the 1080 Ti is nearly the equal of a pair of GTX1070s. What's the largest exponent you can successfully run on the 1080 Ti with its 11GB VRAM? I've run 314M on the 1070 ok, but 628M had problems continuing from the stage 1 gcd or performing it. (I think the former based on GPU-Z indications) The GTX480's limit was about 290M for stage 2 due to 1.5GB memory size becoming inadequate at nrp=1. [CODE]Device GeForce GTX 1070 Compatibility 6.1 clockRate (MHz) 1708 memClockRate (MHz) 4004 fft max exp ms/iter 2 43633 0.0606 4 85933 0.0630 8 169409 0.0911 16 333803 0.0913 32 657719 0.0953 64 1296011 0.1109 80 1612249 0.1237 81 1631969 0.1408 96 1927129 0.1428 100 2005673 0.1436 112 2240863 0.1488 120 2397383 0.1716 128 2553659 0.1794 144 2865601 0.1882 160 3176779 0.2148 162 3215629 0.2467 168 3332107 0.2524 200 3951977 0.2622 216 4261051 0.2945 224 4415431 0.2989 225 4434721 0.3248 256 5031737 0.3341 288 5646379 0.3603 320 6259537 0.4237 324 6336103 0.4458 336 6565633 0.5069 392 7634537 0.5102 400 7786967 0.5271 432 8395997 0.5558 448 8700169 0.5791 512 9914521 0.6009 540 10444757 0.7232 576 11125619 0.7246 640 12333809 0.8014 648 12484649 0.8258 672 12936919 0.9232 686 13200581 0.9234 720 13840423 0.9244 800 15343429 0.9298 864 16543493 1.0297 1024 19535569 1.1486 1080 20580341 1.3637 1125 21419011 1.4440 1134 21586693 1.4747 1152 21921901 1.4855 1176 22368691 1.5284 1280 24302527 1.5325 1296 24599717 1.5563 1323 25101101 1.7481 1344 25490893 1.7790 1350 25602229 1.7805 1400 26529691 1.7827 1568 29640913 1.8353 1600 30232693 1.8536 1728 32597297 2.0343 1750 33003301 2.2177 1792 33778141 2.2198 2048 38492887 2.2744 2304 43194913 2.6746 2560 47885689 3.0174 2592 48471289 3.0979 2688 50227213 3.5028 2700 50446621 3.5501 2800 52274087 3.5831 2916 54392209 3.6662 3136 58404433 3.7083 3200 59570449 4.0342 3240 60298969 4.1233 3584 66556463 4.2461 3600 66847171 4.6064 4096 75846319 4.6173 4608 85111207 5.4760 4800 88579669 6.1239 5120 94353877 6.1506 5184 95507747 6.2963 5292 97454309 6.9197 5600 103000823 7.0910 5832 107174381 7.4497 6144 112781477 7.7539 6272 115080019 7.8423 6400 117377567 8.4223 6480 118813021 8.5396 6912 126558077 8.5851 7168 131142761 9.0281 7200 131715607 9.4287 8192 149447533 9.7002 8640 157439981 11.4261 9216 167703023 11.7002 9408 171120919 12.9847 9600 174537299 12.9942 9720 176671801 13.2919 10080 183071879 13.7479 10240 185914837 13.9074 10368 188188471 14.6202 11200 202952693 14.6974 11664 211176269 15.7289 12096 218826341 16.3628 12544 226753511 16.5236 12800 231280639 17.2002 12960 234109067 17.6919 13824 249369863 18.0687 14336 258403573 18.5125 14400 259532291 19.2037 15552 279831199 20.5104 16384 294471259 20.9802 18432 330441847 23.5745 18816 337176443 26.0162 20480 366326371 26.8871 20736 370806323 29.1363 21168 378363589 29.6717 21504 384239189 30.1835 21952 392070229 30.5201 23040 411074273 30.6741 23328 416101459 32.0017 25088 446794913 34.3478 25600 455715121 35.5808 27648 491358173 37.0692 28672 509158127 38.0063 28800 511382147 38.6743 32768 580225813 41.9480 32805 580866907 47.4597 33075 585544397 48.3338 36864 651102253 49.4871 39200 691446799 56.7610 41472 730636397 58.1385 42336 745527179 62.3263 44800 787958201 62.5338 46080 809980289 64.9344 49152 862780273 68.7844 50176 880364279 71.1277 51200 897940567 75.0087 51840 908921869 75.8619 55296 968171579 77.0567 57344 1003244573 78.9115 57600 1007626787 80.0893 65536 1143276383 87.6720 [/CODE]Obtained with, and followed by, something resembling the following (actually run in stages) [CODE]set exe=cudaPm1_win64_20131118_CUDA_50.exe set model=GeForce GTX 1070 set ntimes=2 set dev=0 :some gpus can't do the whole span, so are run in portions to obtain some fft results %exe% -d %dev% -cufftbench 1 32768 1 >>cudapm1start.txt rename "%model% fft.txt" "%model% fft save.txt" if errorlevel 1 goto skip %exe% -d %dev% -cufftbench 32768 65536 1 >>cudapm1start.txt for %%a in ( 4096 5120 6144 ) do %exe% -d %dev% -cufftbench %%a %%a 1 >>cudapm1start.txt for %%a in ( 4608 4800 5184 5292 5600 5832 6272 6400 6480 6912 7168 7200 8192 ) do %exe% -d %dev% -cufftbench %%a %%a 1 >>cudapm1start.txt for %%a in ( 8640 9216 9408 9600 9720 10080 10240 10368 11200 11664 12096 12544 12800 12960 13824 14336 14400 15552 16384 ) do %exe% -d %dev% -cufftbench %%a %%a 1 >>cudapm1start.txt for %%a in ( 18432 18816 20480 20736 21168 21504 21952 23040 23328 25088 25600 27648 28672 28800 32768 ) do %exe% -d %dev% -cufftbench %%a %%a 1 >>cudapm1start.txt :>32m-64M for %%a in ( 32805 33075 36864 39200 41472 42336 44800 46080 49152 50176 51200 51840 55296 57344 57600 65536 ) do %exe% -d %dev% -cufftbench %%a %%a %ntimes% >>cudapm1start.txt [/CODE] |
highest exponents successfully run? Issues seen on high exponents?
What are the highest exponents you've successfully run in CUDAPm1 through stage 1 including gcd?
Through both stage 1 and stage 2 including gcds? What hardware was it run on? If a run failed on a high exponent, what issues were seen? |
Manually reported P-1 results are getting marked as expired assignments
FYI: more at [url]http://www.mersenneforum.org/showpost.php?p=486151&postcount=1499[/url]
|
Improved recovery from Windows TDRs on old gpus
See the detailed writeup at [URL]http://www.mersenneforum.org/showpost.php?p=488288&postcount=37[/URL]
|
| All times are UTC. The time now is 23:19. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.