mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   The P-1 factoring CUDA program (https://www.mersenneforum.org/showthread.php?t=17835)

kriesel 2018-03-05 04:11

[QUOTE=Cubox;481580]I saw those, and do not wish to use them. I would like to ensure the software I run is updated. This is why I am asking here about updates to this code.

MSI GTX 1070 8G

I am running CUDALucas2.06beta at the moment, doing some double checking LLs.
The card is stable-ish. Over the 53 DC I have done, only 3 (updated, was 4 before edit) were bad. (One was a stupid overclock I did).

I am willing to compile my binaries and/or help with testing updated code if you have patches.[/QUOTE]

As far as I know, v0.20, approx Nov 2013, is the latest available executable for Windows. There was something dated June 2015 for linux. Thanks for volunteering to help change that.

What programming experience do you have?
Are you familiar with posting code on sourceforge?

First step is to get the development environment together, and demonstrate to yourself that you can compile and link gpu code and produce something functional. (That doesn't have to be CUDAPm1 initially; could be CUDALucas or mfaktc, or any tiny demo CUDA app for quick turnaround.) I suggest aiming for CUDA6.5 or CUDA8.0, 64-bit Windows executables. (I've seen speed advantages with CUDA6.x over other versions, in CUDALucas with extensive benchmarking. Driver version didn't make any detectable difference. But it can vary vs. card.) The GTX1070 requires CUDA 8, as I recall. A lot of us have older cards that perform faster at lower CUDA levels.

I think NVIDIA CUDA SDK; MS VC Community Edition. Perhaps Jerry (flashjh) could advise how to set up for multiple CUDA levels.

Then we can get into developing a v0.21 beta with some minor tweaks and bug fixes, and go from there.

Six percent bad runs seems a bit high to me (3/53)

Cubox 2018-03-06 03:47

[QUOTE=kriesel;481604]As far as I know, v0.20, approx Nov 2013, is the latest available executable for Windows. There was something dated June 2015 for linux. Thanks for volunteering to help change that.

What programming experience do you have?
Are you familiar with posting code on sourceforge?

First step is to get the development environment together, and demonstrate to yourself that you can compile and link gpu code and produce something functional. (That doesn't have to be CUDAPm1 initially; could be CUDALucas or mfaktc, or any tiny demo CUDA app for quick turnaround.) I suggest aiming for CUDA6.5 or CUDA8.0, 64-bit Windows executables. (I've seen speed advantages with CUDA6.x over other versions, in CUDALucas with extensive benchmarking. Driver version didn't make any detectable difference. But it can vary vs. card.) The GTX1070 requires CUDA 8, as I recall. A lot of us have older cards that perform faster at lower CUDA levels.

I think NVIDIA CUDA SDK; MS VC Community Edition. Perhaps Jerry (flashjh) could advise how to set up for multiple CUDA levels.

Then we can get into developing a v0.21 beta with some minor tweaks and bug fixes, and go from there.

Six percent bad runs seems a bit high to me (3/53)[/QUOTE]

I am good with C, kinda good with C++, used to work on Linux and OSX, not Windows.
I know all about posting source on Github.

I'll try to go compile the latest CUDALucas. I will keep you updated, however due to my free time being an unknown quantity, I might take a few days.

kriesel 2018-03-06 15:03

[QUOTE=Cubox;481663]I will keep you updated, however due to my free time being an unknown quantity, I might take a few days.[/QUOTE]
No problem, I can relate. Some things have waited nearly 5 years, some longer, they can wait a few more days or weeks.

kriesel 2018-03-06 17:34

cudapm1 images
 
[QUOTE=James Heinrich;481536]Windows binaries for CudaPM1 are available at [URL]https://download.mersenne.ca/[/URL] but they're 5 years old.[/QUOTE]
This looks rather comprehensive for Windows binaries, and apparently contains no linux executables.

Clicking on the link at mersenne.ca, [url]http://www.mersenneforum.org/CUDAPm1/[/url], I get a 404 error.

The June 23 2015 Linux build is on sourceforge but not on mersenne.ca. I wonder if that linux version is the only build with [r52] "reduced register use on square kernel", since that sourceforge entry is dated Nov 25 2013, slightly after the newest Windows build (Nov 18 2013). [URL]https://sourceforge.net/p/cudapm1/code/HEAD/tree/trunk/[/URL]

The wiki page at [url]http://mersennewiki.org/index.php/CUDAPm1[/url] is not an article (yet?), so much as 3 links, to James' mirror, the SourceForge folder, and this discussion thread.

kriesel 2018-03-06 23:43

[QUOTE=Cubox;481581]The CUDAp-1 software mentioned in your list of mersenne hunting software pdf (very useful for newcomers!) states Jan 2016 as 'Approx date' for CUDAp-1.

[URL]https://sourceforge.net/projects/cudapm1/files/[/URL] has last code update in 2013, last binaries are from 2013 as well.[/QUOTE]

Sorry, Jan 2016 in the CUDAPm1 date cell was probably a late-night-edit-error. (clLucas not CUDAPm1 as I recall.)
See post 503 in this thread for a hopefully more accurate reflection of the latest CUDAPm1 versions currently available. I'll fix the pdf soon. (Then, hopefully, you'll make it obsolete, by producing something newer...)

kriesel 2018-03-16 17:54

CUDAPm1 bug and wish list update
 
1 Attachment(s)
Here is today's version of the list I am maintaining. As always, this is in appreciation of the authors' past contributions. Users may want to browse this for workarounds included in some of the descriptions, and for an awareness of some known pitfalls. Please respond with any comments, additions or suggestions you may have.

VictordeHolland 2018-03-17 00:23

The current version seems to be working on the GTX1080 Ti with W10 x64 (didn't do any extensive tests or performance optimalisations)

[code]
C:\CUDAPm1_v0.20>CUDAPm1_v0.20.exe 60593041, -b1 1000
CUDAPm1 v0.20
Warning: Couldn't parse ini file option Threads; using default: 256
Warning: Couldn't parse ini file option CheckRoundoffAllIterations; using default: off
Warning: Couldn't parse ini file option Polite; using default: 1
Warning: Couldn't parse ini file option DeviceNumber; using default: 0
Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt"
Warning: Couldn't parse ini file option ResultsFile; using default "results.txt"
Warning: Couldn't parse ini file option UnusedMem; using default.
CUDA reports 9310M of 11264M GPU memory free.
Index 50
No GeForce GTX 1080 Ti threads.txt file found. Using default thread sizes.
For optimal thread selection, please run
./CUDAPm1 -cufftbench 3584 3584 r
for some small r, 0 < r < 6 e.g.
Using threads: norm1 256, mult 128, norm2 128.
Using up to 4284M GPU memory.
Starting stage 1 P-1, M60593041, B1 = 1000, B2 = 13320000, fft length = 3584K
Doing 1475 iterations
Running careful round off test for 1000 iterations. If average error > 0.25, the test will restart with a longer FFT.
Iteration 100, average error = 0.01770, max error = 0.02539
Iteration 200, average error = 0.02034, max error = 0.02734
Iteration 300, average error = 0.02122, max error = 0.02734
Iteration 400, average error = 0.02165, max error = 0.02637
Iteration 500, average error = 0.02194, max error = 0.02734
Iteration 600, average error = 0.02210, max error = 0.02686
Iteration 700, average error = 0.02226, max error = 0.02734
Iteration 800, average error = 0.02232, max error = 0.02637
Iteration 900, average error = 0.02238, max error = 0.02637
Iteration 1000, average error = 0.02240 <= 0.25 (max error = 0.02734), continuing test.
M60593041, 0x962b95049cafb7d9, n = 3584K, CUDAPm1 v0.20
Stage 1 complete, estimated total time = 0:03
Starting stage 1 gcd.
M60593041 has a factor: 2105528336291622770155712978260232660484461209 (P-1, B1=1000, B2=1000, e=0, n=3584K CUDAPm1 v0.20)
[/code]


fft bench:
[code]
Device GeForce GTX 1080 Ti
Compatibility 6.1
clockRate (MHz) 1607
memClockRate (MHz) 5505

fft max exp ms/iter
1 22133 0.0355
2 43633 0.0390
4 85933 0.0478
32 657719 0.0693
44 898213 0.0791
64 1296011 0.0839
81 1631969 0.0987
96 1927129 0.0989
112 2240863 0.1025
128 2553659 0.1204
160 3176779 0.1251
200 3951977 0.1446
224 4415431 0.1553
256 5031737 0.1925
288 5646379 0.2212
294 5761451 0.2562
320 6259537 0.2708
324 6336103 0.2832
392 7634537 0.3099
400 7786967 0.3304
448 8700169 0.3338
512 9914521 0.3805
576 11125619 0.4453
648 12484649 0.5054
686 13200581 0.5413
800 15343429 0.5486
864 16543493 0.6236
1024 19535569 0.6952
1080 20580341 0.8218
1120 21325891 0.8564
1152 21921901 0.8756
1176 22368691 0.9074
1296 24599717 0.9129
1372 26010389 1.0312
1568 29640913 1.0384
1600 30232693 1.0678
1728 32597297 1.1680
1792 33778141 1.2742
2048 38492887 1.2833
2160 40551479 1.5437
2304 43194913 1.5569
2560 47885689 1.7060
2592 48471289 1.7171
2625 49075057 1.9772
2688 50227213 1.9787
2744 51250889 1.9848
2800 52274087 2.0086
3136 58404433 2.0353
3200 59570449 2.2746
3240 60298969 2.2818
3584 66556463 2.3477
4096 75846319 2.5299
4608 85111207 3.0311
4800 88579669 3.3866
5120 94353877 3.3908
5184 95507747 3.4069
5292 97454309 3.8099
5600 103000823 3.8417
5832 107174381 4.0325
6144 112781477 4.1750
6272 115080019 4.2456
6400 117377567 4.4651
6480 118813021 4.5797
6912 126558077 4.6116
7168 131142761 4.7072
7200 131715607 4.9283
8192 149447533 5.1292
[/code]

kriesel 2018-03-17 16:30

gtx1070 for comparison
 
[QUOTE=VictordeHolland;482568]The current version seems to be working on the GTX1080 Ti with W10 x64 (didn't do any extensive tests or performance optimalisations)[/QUOTE]

Looks like the 1080 Ti is nearly the equal of a pair of GTX1070s.
What's the largest exponent you can successfully run on the 1080 Ti with its 11GB VRAM?

I've run 314M on the 1070 ok, but 628M had problems continuing from the stage 1 gcd or performing it. (I think the former based on GPU-Z indications)

The GTX480's limit was about 290M for stage 2 due to 1.5GB memory size becoming inadequate at nrp=1.

[CODE]Device GeForce GTX 1070
Compatibility 6.1
clockRate (MHz) 1708
memClockRate (MHz) 4004

fft max exp ms/iter
2 43633 0.0606
4 85933 0.0630
8 169409 0.0911
16 333803 0.0913
32 657719 0.0953
64 1296011 0.1109
80 1612249 0.1237
81 1631969 0.1408
96 1927129 0.1428
100 2005673 0.1436
112 2240863 0.1488
120 2397383 0.1716
128 2553659 0.1794
144 2865601 0.1882
160 3176779 0.2148
162 3215629 0.2467
168 3332107 0.2524
200 3951977 0.2622
216 4261051 0.2945
224 4415431 0.2989
225 4434721 0.3248
256 5031737 0.3341
288 5646379 0.3603
320 6259537 0.4237
324 6336103 0.4458
336 6565633 0.5069
392 7634537 0.5102
400 7786967 0.5271
432 8395997 0.5558
448 8700169 0.5791
512 9914521 0.6009
540 10444757 0.7232
576 11125619 0.7246
640 12333809 0.8014
648 12484649 0.8258
672 12936919 0.9232
686 13200581 0.9234
720 13840423 0.9244
800 15343429 0.9298
864 16543493 1.0297
1024 19535569 1.1486
1080 20580341 1.3637
1125 21419011 1.4440
1134 21586693 1.4747
1152 21921901 1.4855
1176 22368691 1.5284
1280 24302527 1.5325
1296 24599717 1.5563
1323 25101101 1.7481
1344 25490893 1.7790
1350 25602229 1.7805
1400 26529691 1.7827
1568 29640913 1.8353
1600 30232693 1.8536
1728 32597297 2.0343
1750 33003301 2.2177
1792 33778141 2.2198
2048 38492887 2.2744
2304 43194913 2.6746
2560 47885689 3.0174
2592 48471289 3.0979
2688 50227213 3.5028
2700 50446621 3.5501
2800 52274087 3.5831
2916 54392209 3.6662
3136 58404433 3.7083
3200 59570449 4.0342
3240 60298969 4.1233
3584 66556463 4.2461
3600 66847171 4.6064
4096 75846319 4.6173
4608 85111207 5.4760
4800 88579669 6.1239
5120 94353877 6.1506
5184 95507747 6.2963
5292 97454309 6.9197
5600 103000823 7.0910
5832 107174381 7.4497
6144 112781477 7.7539
6272 115080019 7.8423
6400 117377567 8.4223
6480 118813021 8.5396
6912 126558077 8.5851
7168 131142761 9.0281
7200 131715607 9.4287
8192 149447533 9.7002
8640 157439981 11.4261
9216 167703023 11.7002
9408 171120919 12.9847
9600 174537299 12.9942
9720 176671801 13.2919
10080 183071879 13.7479
10240 185914837 13.9074
10368 188188471 14.6202
11200 202952693 14.6974
11664 211176269 15.7289
12096 218826341 16.3628
12544 226753511 16.5236
12800 231280639 17.2002
12960 234109067 17.6919
13824 249369863 18.0687
14336 258403573 18.5125
14400 259532291 19.2037
15552 279831199 20.5104
16384 294471259 20.9802
18432 330441847 23.5745
18816 337176443 26.0162
20480 366326371 26.8871
20736 370806323 29.1363
21168 378363589 29.6717
21504 384239189 30.1835
21952 392070229 30.5201
23040 411074273 30.6741
23328 416101459 32.0017
25088 446794913 34.3478
25600 455715121 35.5808
27648 491358173 37.0692
28672 509158127 38.0063
28800 511382147 38.6743
32768 580225813 41.9480
32805 580866907 47.4597
33075 585544397 48.3338
36864 651102253 49.4871
39200 691446799 56.7610
41472 730636397 58.1385
42336 745527179 62.3263
44800 787958201 62.5338
46080 809980289 64.9344
49152 862780273 68.7844
50176 880364279 71.1277
51200 897940567 75.0087
51840 908921869 75.8619
55296 968171579 77.0567
57344 1003244573 78.9115
57600 1007626787 80.0893
65536 1143276383 87.6720
[/CODE]Obtained with, and followed by, something resembling the following (actually run in stages)
[CODE]set exe=cudaPm1_win64_20131118_CUDA_50.exe

set model=GeForce GTX 1070
set ntimes=2
set dev=0

:some gpus can't do the whole span, so are run in portions to obtain some fft results
%exe% -d %dev% -cufftbench 1 32768 1 >>cudapm1start.txt
rename "%model% fft.txt" "%model% fft save.txt"
if errorlevel 1 goto skip
%exe% -d %dev% -cufftbench 32768 65536 1 >>cudapm1start.txt
for %%a in ( 4096 5120 6144 ) do %exe% -d %dev% -cufftbench %%a %%a 1 >>cudapm1start.txt
for %%a in ( 4608 4800 5184 5292 5600 5832 6272 6400 6480 6912 7168 7200 8192 ) do %exe% -d %dev% -cufftbench %%a %%a 1 >>cudapm1start.txt
for %%a in ( 8640 9216 9408 9600 9720 10080 10240 10368 11200 11664 12096 12544 12800 12960 13824 14336 14400 15552 16384 ) do %exe% -d %dev% -cufftbench %%a %%a 1 >>cudapm1start.txt
for %%a in ( 18432 18816 20480 20736 21168 21504 21952 23040 23328 25088 25600 27648 28672 28800 32768 ) do %exe% -d %dev% -cufftbench %%a %%a 1 >>cudapm1start.txt
:>32m-64M
for %%a in ( 32805 33075 36864 39200 41472 42336 44800 46080 49152 50176 51200 51840 55296 57344 57600 65536 ) do %exe% -d %dev% -cufftbench %%a %%a %ntimes% >>cudapm1start.txt
[/CODE]

kriesel 2018-03-26 16:52

highest exponents successfully run? Issues seen on high exponents?
 
What are the highest exponents you've successfully run in CUDAPm1 through stage 1 including gcd?
Through both stage 1 and stage 2 including gcds?
What hardware was it run on?

If a run failed on a high exponent, what issues were seen?

kriesel 2018-04-25 12:42

Manually reported P-1 results are getting marked as expired assignments
 
FYI: more at [url]http://www.mersenneforum.org/showpost.php?p=486151&postcount=1499[/url]

kriesel 2018-05-27 16:46

Improved recovery from Windows TDRs on old gpus
 
See the detailed writeup at [URL]http://www.mersenneforum.org/showpost.php?p=488288&postcount=37[/URL]


All times are UTC. The time now is 23:19.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.