mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   The P-1 factoring CUDA program (https://www.mersenneforum.org/showthread.php?t=17835)

firejuggler 2013-05-03 09:36

grab the windows binary, there is a ini file in it that might help you.
higher fftlength?

Karl M Johnson 2013-05-03 10:23

With e=12, d=2310 and nrp=480, the last exponent, which can be checked on current binary is 14,155,777.
The next exponent, 14,155,807, cant go to stage 2.

Now, the real vRAM usage for CPm1 for the 14,155,777 exp is ~3073MB (MSI afterburner delta method), reported approx. vRAM usage was 3014MB.

As a conclusion of this micro research, if you see approx. memory usage of >=3139MB, be sure that stage 2 will not work, even if you have a lot more than that.

Proof:
[URL]http://i.imgur.com/iUpQaMr.png[/URL]
[URL]http://i.imgur.com/W8fqlWQ.png[/URL]

James Heinrich 2013-05-03 11:57

[QUOTE=frmky;339103]If the factor is found in stage 1, should the value of B2 in the output be equal to B1 as in the following:
M55824233 has a factor: 833043841114609831879 (P-1, B1=839, B2=839, e=6, n=3072K CUDAPm1 v0.00)
If so, that's an easy change.[/QUOTE]Yes, please, it would be helpful if the results indicated that.

James Heinrich 2013-05-03 12:06

[QUOTE=frmky;339101]here's the next version to try.[/QUOTE]Starting a new run looks better than last time:[code]Selected B1=560000, B2=14280000, 3.55% chance of finding a factor
CUDA reports 781M of 1279M GPU memory free.
Using e=6, d=2310, nrp=12
Using approximately 744M GPU memory.
Starting stage 1 P-1, M60817711, B1 = 560000, B2 = 14280000, e = 6, fft length = 3360K
Doing 807829 iterations[/code]I'll let it run and see if it finds the [url=http://www.mersenne.ca/exponent/60817711]known stage2 factor[/url].

kjaget 2013-05-03 14:02

[QUOTE=frmky;339101]Still don't have the motivation to track down the problem reading text from ini files[/QUOTE]


Remove the #define sscanf sscanf_s line from parse.c. Using sscanf_s requires each string var scanned into to be followed by an argument with the size of that string, but that's not done in the sscanf call in IniGetStr. This means the sscanf_s checking picks a random uninitialized value off the stack for the length of the dest string, leading to random failures.

A real fix is implementing a wrapper like the sprintf() one which includes this parameter in the call to sscanf_s. Or just ignore the safe version of this function since it is more trouble than it is worth.

Stef42 2013-05-03 15:45

I'm getting a lot of cudaDeviceSynchronize() error 30...
Usually on high B2 value's while only 400-500MB is used (low exponents).
Why this might have happened: [url]http://stackoverflow.com/questions/12200994/cuda-runtime-api-error-30-repeated-kernel-calls[/url]

James Heinrich 2013-05-03 17:34

[QUOTE=James Heinrich;339130]I'll let it run and see if it finds the [url=http://www.mersenne.ca/exponent/60817711]known stage2 factor[/url].[/QUOTE]It did:[code]Stage 2 complete, estimated total time = 2:57:29
Accumulated Product: M60817711, 0x978923630c42303f, n = 3360K, CUDAPm1 v0.00
Starting stage 2 gcd.
M60817711 has a factor: 3493866477323309653137460319 (P-1, B1=560000, B2=14280000, e=6, n=3360K CUDAPm1 v0.00)[/code]4.212GHz-days in 2h57m29s = 34GHz-days/day. A far cry from the ~400GHd/d the GTX570 can push in mfaktc, but also notably faster than can be done on my CPU.

frmky 2013-05-03 18:26

[QUOTE=Karl M Johnson;339122]As a conclusion of this micro research, if you see approx. memory usage of >=3139MB, be sure that stage 2 will not work, even if you have a lot more than that.[/QUOTE]

Perhaps because it is a 32-bit binary? I'll try creating a 64-bit binary later today.

frmky 2013-05-03 18:27

[QUOTE=kjaget;339141]Remove the #define sscanf sscanf_s line from parse.c. [/QUOTE]
Thanks!

frmky 2013-05-04 06:18

New versions ...
Win32:
[URL="https://www.dropbox.com/s/alz4xodjjend7bi/cudapm1_win32_20130503.zip"]https://www.dropbox.com/s/alz4xodjjend7bi/cudapm1_win32_20130503.zip[/URL]
x64:
[URL="https://www.dropbox.com/s/gbs9pr3ily49ric/cudapm1_x64_20130503.zip"]https://www.dropbox.com/s/gbs9pr3ily49ric/cudapm1_x64_20130503.zip[/URL]

The x64 version should allow you to use more than 3GB (or 4GB, not sure which limit applies to GPU ram) of memory if your card has that much. Also, the GCD at the end will likely be a bit faster, but it doesn't really take that long anyway. As usual, please let me know of problems.

frmky 2013-05-04 06:21

[QUOTE=Stef42;339152]I'm getting a lot of cudaDeviceSynchronize() error 30...
Usually on high B2 value's while only 400-500MB is used (low exponents).
Why this might have happened: [url]http://stackoverflow.com/questions/12200994/cuda-runtime-api-error-30-repeated-kernel-calls[/url][/QUOTE]
Hmmm. Try the 64-bit version to see if it makes any difference. If it persists, we can try adding cudaDeviceSynchronize() as well, but that seemed to be hit-or-miss in the discussions.


All times are UTC. The time now is 23:19.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.