![]() |
[QUOTE=garo;337309]P-1 with 2GB memory in the 61M range gives a probability of success of 3.3-3.6% depending on the TF level. Dunno where you got 5-8%.[/QUOTE]I have seeing Prime95 giving around 3.75% for 60M exponents that have been taken to 73.
|
Any luck getting Winbloze compiled? Once it's compiled and available, I'll reinstall my 460 to play with it :)
|
No windows. I'll need help with that as I don't have access to any windows boxes.
|
[QUOTE=NBtarheel_33;337330]
M61000000, factored to 70 bits, assuming 2 L-L tests saved, with B1=670,000 and B2=16,750,000, using K*B^N+C = 1*2^61000000-1 Probability = [B]5.664070%[/B] M65000000, factored to 70 bits, with B1=800,000 and B2=24,000,000, using K*B^N+C = 1*2^65000000-1 Probability = [B]6.224824%[/B][/QUOTE] But 70 bits is not realistic for these assignments anymore, isn't it? 73 & 74 respectively would be more accurate. Here you'd get probabilities of 3.7-3.8 (for 73) and 3.3-3.4 (for 74). |
[QUOTE=axn;337369]But 70 bits is not realistic for these assignments anymore, isn't it? 73 & 74 respectively would be more accurate. Here you'd get probabilities of 3.7-3.8 (for 73) and 3.3-3.4 (for 74).[/QUOTE]
No matter how you cut it, GPUs are faster than CPUs. We can get to the specifics once its public and has more testing. :) |
[QUOTE=c10ck3r;337403]No matter how you cut it, GPUs are faster than CPUs. We can get to the specifics once its public and has more testing. :)[/QUOTE]
GPUs are also faster than CPUs at LLs, but they are so much faster at TF that TF makes the most sense. Do we know yet that the P-1 speed is sufficient to make sense? (I haven't been following this thread closely enough to know, so feel free to tell me to :rtfm:) |
And perhaps even more importantly, doesn't the project need P-1's right now? So even if the speed difference isn't of the magnitude of the TF increase it might still be best for the project. I know that there is no general concensus on this, and also that everyone should do whatever part of the project makes them happy. But many of us are willing to sacrifice GHz days to work on whatever needs working on.
|
I heard (read) of a 25 time sêed increase.
ah found the quote [code]Originally Posted by owftheevil Cudapm1 output: Code: M61076737 has a factor: 432634830991289176546683053423 Run with B1 = 65000, B2 = 12035000, n = 3360k, d = 2310, e =2, 8 rp per pass. It used about 600Mb of device memory. Stage 2 took ~53 minutes. Edit: Looks like about 15 minutes longer to make e = 4. [/code] [code] To compare with CPU speed running the same curve in Prime95. Laptop with Corei7 2720QM sandy bridge: using 1 core: stage1 43min, stage2 ~ 8h (3 Gb RAM) using 4 cores: stage1: 19 min, stage2 ~ 3.8h (3 Gb RAM) I only completed ~20% of stage2 and extrapolated the runtime. [/code] |
[QUOTE=firejuggler;337409]I heard (read) of a 25 time sêed increase.
ah found the quote [/QUOTE] 8h/53min ~= 8x. But later found that only half the things were being processed, so it is more like 4x? |
There is no word for the stage 1 speed increase, and maybe there is none. But i'm sure i've read a 25 time speed somewhere. Oh well, 4 time speed increase, thats still good.
|
[QUOTE=Aramis Wyler;336999]That would definately put a dent in our p-1 deficit. Though it's hard to trade 25x p-1 work for 125x factoring work.
EDIT: Not that it wouldn't get used though. I was trading up 10 ghz day of factoring per ghz day of p-1, and this is a better deal than that. :smile:[/QUOTE] I said 25x because it takes me 20 hours to do a stage 2 on my Athlon x4, though that's if I'm running 3 at a time. It was a personal benchmark, because I tf 125x faster on my gpus than my cpus. |
[QUOTE=axn;337369]But 70 bits is not realistic for these assignments anymore, isn't it? 73 & 74 respectively would be more accurate. Here you'd get probabilities of 3.7-3.8 (for 73) and 3.3-3.4 (for 74).[/QUOTE]
Bingo! |
Optimal b1 and b2 are not going to be much different than what mprime selects. The basic unit of measurement used to compute optimal values is the time it takes to do 1 fft, which is relative to the device its running on. Gpus will probably favor slightly higher b2 because of memory bandwidth, and smaller e, at least for most cards that don't have lots of memory.
Recent run of 6108xxxx with b1 = 580000, b2 = 12035000, e = 2, d = 2310, and rp = 20: Stage1 96m, 43s, 1673702 ffts Stage2 89m, 42s, 1504234 ffts Each increase of e by 2 will take about 15 minutes more. |
[QUOTE=axn;337412]8h/53min ~= 8x. But later found that only half the things were being processed, so it is more like 4x?[/QUOTE]
Actually, I'm pretty sure it was stage 2 vs stage 2, so 8x would still apply. This matters not though, since you need a comparison to a fairly fixed given. If the GPU is not using a CPU core, then what do you compare it to? If I have a GTX 480 in an I5 2500 I get a speed up, but if I run that same 480 in a Core2Quad I have a more significant speed up when compared to the system it is in. I'd say you'd need to do like CPU based P-1, compare the time it takes to run the GPU P-1 VS the time it takes to run that same exponent on CUDALucas (or Mfaktc/o?). Otherwise the 'speed increase' is technically unknown. |
[QUOTE=NBtarheel_33;337164]Factoring to 7x bits (assuming an increase of one bit level) gives you (roughly) a 1/7x = 1.27-1.43% chance of finding a factor.
P-1 with decent bounds will typically give you a 5-8% chance of finding a factor. So, given 125 TF attempts, we'd expect roughly 1.6-1.8 factors found. On the other hand, 25 TF attempts should yield roughly 1.25-2.0 factors found. If GPU P-1 allows us to increase bounds or make more frequent use of the Brent-Suyama extension, the expected number of successes will be at or above the higher end of this range. In that case, it would make complete sense to trade 125x TF for 25x P-1. Note also that GPU P-1 will make use of the *GPU* RAM, rather than the system RAM. This could bring in P-1'ers who were previously unable to dedicate large quantities of RAM to Stage 2.[/QUOTE] This looks like a good jumping point into the discussion. For those of you who're thinking of switching from GPU TF to GPU P-1 and worried whether it'll impact the number of expos cleared, I propose the following model. Compare the efficiency (expos cleared/unit time) of doing the _last bit_ of TF to the efficiency of doing P-1. Simple. Assuming that the last bit work is for 73->74, how many "last bit" TFs can be done in a day on a particular GPU, and how many P-1 can be done in the same time? Then calculate the expected number of factors. Whichever is higher wins. If they're approximately the same (within 20% of each other), picking either one should be fine. |
[QUOTE=garo;337309]P-1 with 2GB memory in the 61M range gives a probability of success of 3.3-3.6% depending on the TF level. Dunno where you got 5-8%.[/QUOTE]
What typical values of B1, B2, and e does Prime95 choose at 61M with 2GB memory and TF to 73 bits? |
[QUOTE=frmky;337743]What typical values of B1, B2, and e does Prime95 choose at 61M with 2GB memory and TF to 73 bits?[/QUOTE]
No factor to 74 bits: M61482791 completed P-1, B1=545000, B2=10355000 To 73 bits: M59518889 completed P-1, B1=555000, B2=10961250 e=0 in both cases. |
[QUOTE=garo;337793]e=0 in both cases.[/QUOTE]
Which is secretly e=2 :smile: |
[QUOTE=axn;337795]Which is secretly e=2 :smile:[/QUOTE]
Ah didn't know that. Thanks. |
With owftheevil's permission, I have posted a [I]very[/I] early version at Sourceforge, [URL="https://sourceforge.net/projects/cudapm1/?source=directory"]https://sourceforge.net/projects/cudapm1/?source=directory[/URL]
It does read Pfactor lines from worktodo.txt and output to results.txt, and George has indicated that he will add support for the results output soon. The core routines have survived testing on 30+ known factors over the past few days. Autoselection of FFT sizes may need tweaking. It currently does not intelligently select B1 and B2 sizes; for now parameters should be specified manually (it defaults to B1=600k, B2=12M, e=6 which is reasonable for current ~61M exponents). Error checking should be added in many places. It does not support checkpointing. In summary, it is still very alpha. |
The default parameters will require ~900 MB of GPU memory. If you do not have that available, try using -nrp2 10 or -nrp2 4. You can also save a little memory by using -e2 4 or -e2 2. For really low memory cards, use -d2 30 -e2 2 -nrp2 2. Autoselection of these parameters based on available GPU memory is on the TODO.
|
My 560 has 1024. Might be a bit tight.
|
[QUOTE=firejuggler;338239]My 560 has 1024. Might be a bit tight.[/QUOTE]
Try a run with a small B1, say 3000, just to see if stage 2 starts successfully without waiting a long time for B1 to finish. |
[QUOTE=frmky;338242]Try a run with a small B1, say 3000, just to see if stage 2 starts successfully without waiting a long time for B1 to finish.[/QUOTE]
Hi Greg, I am doing some time testing on the e, d and nrp values for different B2 values. It will take some time to check all the combinations for 5 distinct test-cases, but I think it may be useful to automate the choice of these parameters. I hope I am not stepping over others' feet. Luigi P.S. Thanks again to Carl that started the project... :smile: |
1024 of memory as it is close it might be worth slightly reducing B2 in order to fit. B1 can be increased to compensate.
|
[QUOTE=frmky;338230]George has indicated that he will add support for the results output soon.[/QUOTE]Can you please post a sample of possible variations in the output? I need to add support to both mersenne.org and mersenne.ca
|
[QUOTE=henryzz;338248]1024 of memory as it is close it might be worth slightly reducing B2 in order to fit. B1 can be increased to compensate.[/QUOTE]
Only e2 and nrp2 have an effect on memory use. b1 and b2 have an effect on how many transforms need to be done. The reason frmky said to try low b1 is that you don't know until the end of stage 1 if the necessary memory for stage 2 is available. So until you are confident that you will be able to allocate all the stage 2 memory, its best to do short stage 1 runs. |
[QUOTE=owftheevil;338255]Only e2 and nrp2 have an effect on memory use. b1 and b2 have an effect on how many transforms need to be done.
The reason frmky said to try low b1 is that you don't know until the end of stage 1 if the necessary memory for stage 2 is available. So [B]until you are confident that you will be able to allocate all the stage 2 memory, its best to do short stage 1 runs[/B].[/QUOTE] Good advice! This will help me shorten the time required for the speed tests :smile: Luigi |
[QUOTE=owftheevil;338255]Only e2 and nrp2 have an effect on memory use. b1 and b2 have an effect on how many transforms need to be done.
The reason frmky said to try low b1 is that you don't know until the end of stage 1 if the necessary memory for stage 2 is available. So until you are confident that you will be able to allocate all the stage 2 memory, its best to do short stage 1 runs.[/QUOTE] B2 normally effects memory usage with prime95 and gmp-ecm. My problem for misunderstanding. |
No Windows version yet? :redface:
|
[QUOTE=Stef42;338288]No Windows version yet? :redface:[/QUOTE]I would also very much appreciate the ability to try it out :smile:
|
Autoselection of B1, B2, and GPU memory related parameters (d2, e, nrp) should now work. It tries to use as much GPU memory as it thinks is safe, so let me know if you get memory allocation errors.
It still lacks proper error checking in many places, checkpointing, and the ability to interrupt stage 2 with a Ctrl-C. If anyone wants to dive into that code, feel free! Also, I'm not set up to compile on Windows with both CUDA and GMP. If anyone here is, I'm sure it will be appreciated! :smile: |
Haven't posted timing in a while.
M60973753 P-1, B1=535000, B2=10432500, e=6, n=3584K Time for stage 1: 56:27 Time for stage 2: 47:18 This was on a K20. A GTX Titan should be a bit faster. My GTX 480 will likely take nearly twice as long. And it does find factors. :smile: M60870653 has a factor: 87951105041429114235889 (P-1, B1=600000, B2=12000000, e=6, n=3584K CUDAPm1 v0.00) |
[QUOTE=frmky;338341]Autoselection of B1, B2, and GPU memory related parameters (d2, e, nrp) should now work. It tries to use as much GPU memory as it thinks is safe, so let me know if you get memory allocation errors.
It still lacks proper error checking in many places, checkpointing, and the ability to interrupt stage 2 with a Ctrl-C. If anyone wants to dive into that code, feel free! Also, I'm not set up to compile on Windows with both CUDA and GMP. If anyone here is, I'm sure it will be appreciated! :smile:[/QUOTE] If the autoselection now works, I guess I can quit my work on timing different exponents with different e, d and nrp... :davieddy::bangheadonwall: I will get the alpha from sourceforge... Luigi |
@ ET The timings are interesting, but the work on how big nrp can get without a crash is still just as important.
A recent run on a 570: Selected B1=605000, B2=16637500, 4.1% chance of finding a factor CUDA reports 732M of 1279M GPU memory free. Using e=6, d=2310, nrp=10 Starting stage 1 P-1, M61410829, B1 = 605000, B2 = 16637500, e = 6, fft length = 3360K . . . Stage 1 complete, estimated total time = 1:41:14 . . . Stage 2 complete, estimated total time = 3:42:23 M61410829 Stage 2 found no factor (P-1, B1=605000, B2=16637500, e=6, n=3360K CUDAPm1 v0.00) There were ~350Mb of free memory during stage 2. |
[QUOTE=owftheevil;338386]@ ET The timings are interesting, but the work on how big nrp can get without a crash is still just as important.
A recent run on a 570: Selected B1=605000, B2=16637500, 4.1% chance of finding a factor CUDA reports 732M of 1279M GPU memory free. Using e=6, d=2310, nrp=10 Starting stage 1 P-1, M61410829, B1 = 605000, B2 = 16637500, e = 6, fft length = 3360K . . . Stage 1 complete, estimated total time = 1:41:14 . . . Stage 2 complete, estimated total time = 3:42:23 M61410829 Stage 2 found no factor (P-1, B1=605000, B2=16637500, e=6, n=3360K CUDAPm1 v0.00) There were ~350Mb of free memory during stage 2.[/QUOTE] Should I continue the analysis? Luigi |
Yes. Right now the memory settings are on the conservative side. Data from different cards about how much memory use they can tolerate would be very useful.
Edit: small b1 and small b2 or terminate after 1 pass of stage2, I just want to see if it handles the memory load. |
[QUOTE=owftheevil;338391]Yes. Right now the memory settings are on the conservative side. Data from different cards about how much memory use they can tolerate would be very useful.
Edit: small b1 and small b2 or terminate after 1 pass of stage2, I just want to see if it handles the memory load.[/QUOTE] Fine :smile: Just one more question. The version I have can't be stopped during B2, while the one on sourceforge doesn't allow to modify the automatized parameters. Or at least I didn't yet see how. :guilty: I'll study during the weekend... Luigi |
I just look up the process number and kill it.
|
[QUOTE=owftheevil;338398]I just look up the process number and kill it.[/QUOTE]
That way you don't have the presumed time to complete Stage 2... But will be waaaaay faster :wink: Luigi |
[QUOTE=owftheevil;338386]There were ~350Mb of free memory during stage 2.[/QUOTE]
The current version now reports expected memory use. Please compare that to the amount of memory used as reported by the driver to see if it is reasonably accurate. It looks accurate on both my K20 and GTX 480. There is additional memory free as reported by the driver, but CUDA says it isn't free and I'm going with what CUDA reports. |
I'll add a me too.
When the windows exe gets released, I have a Titan here itching to have a bash at P-1. -- Craig |
Here's a Windows version to try:
[URL="https://www.dropbox.com/s/hdu6eqwkk9vr9p8/cudapm1_20130501.zip"]https://www.dropbox.com/s/hdu6eqwkk9vr9p8/cudapm1_20130501.zip[/URL] Included is a worktodo.txt that should find a factor. I can't actually test it here since I have no Windows machine with a CC >= 1.3 card, so please try that first, make sure the factor is found, and please let me know if it worked! There is a problem with reading the text entries in the .ini file, but the numerical entries work fine. Something in parse.c that Visual Studio doesn't like, but I don't have the motivation to track that down right now. It just uses the default input and output files, worktodo.txt and results.txt. Also, make sure you have the latest graphics drivers. I compiled with CUDA 5.0 downloaded today. |
looking good for now
[code] C:\Users\Vincent\Desktop\cudapm1>CUDAPm1.exe Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt" Warning: Couldn't parse ini file option ResultsFile; using default "results.txt" Selected B1=605000, B2=16637500, 4.1% chance of finding a factor CUDA reports zuM of zuM GPU memory free. Using e=2, d=210, nrp=4 Using approximately zuM GPU memory. Starting stage 1 P-1, M61262347, B1 = 605000, B2 = 16637500, e = 2, fft length = 3360K Doing 873133 iterations Iteration 1000 M61262347, 0xf19a7f6041953a97, n = 3360K, CUDAPm1 v0.00 err = 0.2 1094 (0:16 real, 16.1885 ms/iter, ETA 3:55:18) Iteration 2000 M61262347, 0xaf1d15aad49fcee8, n = 3360K, CUDAPm1 v0.00 err = 0.1 9336 (0:13 real, 12.8097 ms/iter, ETA 3:05:58) Iteration 3000 M61262347, 0xb702298e7a8c9a8e, n = 3360K, CUDAPm1 v0.00 err = 0.2 1680 (0:13 real, 12.8040 ms/iter, ETA 3:05:41) Iteration 4000 M61262347, 0xc53d1695707d3dc0, n = 3360K, CUDAPm1 v0.00 err = 0.1 9922 (0:13 real, 12.7794 ms/iter, ETA 3:05:07) Iteration 5000 M61262347, 0xf154bc3c5f15a9c9, n = 3360K, CUDAPm1 v0.00 err = 0.1 9727 (0:12 real, 12.8228 ms/iter, ETA 3:05:31) [/code] Will report back tonight |
I got it running in Windows 8, but it crashed when starting stage 2.
[CODE]Stage 1 complete, estimated total time = 1:20:10 Starting stage 1 gcd. M61262347 Stage 1 found no factor (P-1, B1=605000, B2=16637500, e=6, n=3360K CUDAPm1 v0.00 Starting stage 2[/CODE] Some details: memusage during stage 1: 334MB (MSI afterburner) memusage during start of stage 2: 1185MB Total video memory is 1,5GB. I'm trying to figure out if there was not enough memory for some reason. Details of my graphics card: [url]http://www.msi.com/product/vga/N580GTX-Lightning.html[/url] |
If its not able to allocate the stage 2 memory, it should be giving you a message like this:
[CODE]CUDAPm1.cu(2628) : cudaSafeCall() Runtime API error 2: out of memory.[/CODE]So something else is likely going on. Also, the difference in memory use between stage 1 and the beginning of stage 2 is consistent with e = 6 and nrp = 24. Is that what you saw? |
[QUOTE=frmky;338990]Here's a Windows version to try
Included is a worktodo.txt that should find a factor. please try that first, make sure the factor is found, and please let me know if it worked![/QUOTE]Starting a run on my GTX 570. One thing that might be a concern (especially in light of [i]Stef42[/i]'s comment about stage2 memory) is the references to "zu" as a quantity of graphics memory:[quote]Selected B1=605000, B2=16637500, 4.1% chance of finding a factor CUDA reports [color=red]zu[/color]M of [color=red]zu[/color]M GPU memory free. Using e=6, d=2310, nrp=16 Using approximately [color=red]zu[/color]M GPU memory. Starting stage 1 P-1, M61262347, B1 = 605000, B2 = 16637500, e = 6, fft length = 3360K Doing 873133 iterations[/quote]Running at 7.0ms/it, stage1 should be done in 1h40m and I'll report back with what happens when stage2 starts. |
The zu is not a big deal, simply a size specifier specific to gcc. For windows we need Iu instead.
|
Right after stage 1 finished and stage 2 was initiated, I got a popup saying that CUDAPm1 crashed.
The Windows error log showed an APPCRASH, which is not very useful I think. When followed was that after the gpu load dropped from 99% to 0%, the memory remained at 1134MB usage until, I guess because of a time-out, was flushed. Maybe I'll try a smaller P-1 exponent with a factor found to check. |
Feedback so far:
1. Does not create checkpoints. 2. Beats CUDALucas in memory stability stress testing(60M exponents were free from errors on CL, found errors at 50K iterations on CP+1) 3. Fails at the beginning of stage 2 with out of memory error, should not be the case (6GB of vRAM, 16GB of RAM). |
[QUOTE=owftheevil;339001]The zu is not a big deal, simply a size specifier specific to gcc. For windows we need Iu instead.[/QUOTE]Would it be a big deal that we're seeing "zu" instead of "Iu" on Windows?
|
I think I found one problem. chalsall has his SPEs. Well, this was an ISPE, I for Ineffably. Its an easy fix, but will have to wait until I get home from work. In the meantime, running with b2 = even multiple of 2310 should bypass the error.
@Karl M Johnson: 1. Checkpoints are coming soon, maybe this weekend. 2. CPm1 during stage 1 does do more global memory reads than CuLu, so maybe thats why. 3. Unexpected. What is the error message? Thank you all for your input. |
[QUOTE=James Heinrich;339008]Would it be a big deal that we're seeing "zu" instead of "Iu" on Windows?[/QUOTE]
%zu in printf prints size_t variable values, you need %Iu in windows to do the same thing. |
Ah, now I understand what you mean.
|
I restarted CUDAPm1 using a b2 value, of a known P-1 with a factor found by me earlier.
This was the command-line: [CODE]cudapm1.exe -b2 550000[/CODE] Output: [CODE]Iteration 164000 M9090017, 0xd7661b0c859fa9e5, n = 512K, CUDAPm1 v0.00 err = 0.0 2734 (0:01 real, 0.7921 ms/iter, ETA 0:01) Iteration 165000 M9090017, 0x7d3f99a08f445b8b, n = 512K, CUDAPm1 v0.00 err = 0.0 2734 (0:01 real, 0.7878 ms/iter, ETA 0:00) M9090017, 0x1d50507696eeef9f, offset = 0, n = 512K, CUDAPm1 v0.00 Stage 1 complete, estimated total time = 2:14 Starting stage 1 gcd. M9090017 Stage 1 found no factor (P-1, B1=115000, B2=[COLOR="Red"]1495000[/COLOR], e=6, n=512K CUDAPm 1 v0.00) Starting stage 2. Zeros: 59077, Ones: 84923, Pairs: 18379 itime: 14.921770, transforms: 1, average: 14921.770000 ptime: 35.394836, transforms: 88612, average: 0.399436 ETA: 0:50 itime: 17.911887, transforms: 1, average: 17911.887000 ptime: 35.547328, transforms: 88434, average: 0.401964 ETA: 0:00 Stage 2 complete, estimated total time = 1:43 Accumulated Product: M9090017, 0x1a6840caa5d05db3, n = 512K, CUDAPm1 v0.00 Starting stage 2 gcd. M9090017 has a factor: 516770062491225473521 (P-1, B1=115000, B2=1495000, e=6, n =512K CUDAPm1 v0.00)[/CODE] As you can see, there is a different B2 value. Still, it finished well. Earlier on, the program would crash when starting stage 2. Any thoughts? I must have done something wrong :smile: Bit more surprising: according to mersenne.ca, in the past the factor was found in stage 1 using prime95, but CudaPm1 reports stage 2 in the output... ? Exponent [URL="http://www.mersenne.ca/exponent/9090017#"]9090017[/URL] |
[QUOTE=Stef42;339012][i]M9090017 Stage 1 found no factor (P-1, [b]B1=115000, B2=1495000[/b], e=6, n=512K CUDAPm
1 v0.00)[/i] Bit more surprising: according to mersenne.ca, in the past the factor was found in stage 1 using prime95, but CudaPm1 reports stage 2 in the output... ?[/QUOTE]That is a bit disturbing. [URL=http://www.mersenne.ca/exponent/9090017]M9090017[/url] has factor [url=http://www.mersenne.ca/factor/516770062491225473521]516770062491225473521[/url], with a k of [url=http://www.mersenne.ca/k/28425142796280]28425142796280[/url] k-factored = 2[sup]3[/sup] × 3 × 5 × 61 × 97 × 389 × 102913 minimal bounds to find this factor in stage2 would be B1=389,B2=102913 minimal bounds to find this factor in stage1 would be B1=102913 You ran this with B1=115000 so it [i]should[/i] have found the factor, at least according to my understand of P-1 :unsure: |
It can still find it, although I wonder, as you mentioned, in stage 2 rather than stage 1....
I will do some further testing on a different exponent. Did the same test again on prime95 to verify: [CODE][May 2 16:52] Worker starting [May 2 16:52] Setting affinity to run worker on any logical CPU. [May 2 16:52] P-1 on M9090017 with B1=110000 [May 2 16:54] M9090017 stage 1 complete. 317502 transforms. Time: 128.915 sec. [May 2 16:54] Stage 1 GCD complete. Time: 6.593 sec. [May 2 16:54] P-1 found a factor in stage #1, B1=110000. [May 2 16:54] M9090017 has a factor: 516770062491225473521 [May 2 16:54] No work to do at the present time. Waiting. [/CODE] |
[QUOTE=James Heinrich;339013]That is a bit disturbing.
[URL="http://www.mersenne.ca/exponent/9090017"]M9090017[/URL] has factor [URL="http://www.mersenne.ca/factor/516770062491225473521"]516770062491225473521[/URL], with a k of [URL="http://www.mersenne.ca/k/28425142796280"]28425142796280[/URL] k-factored = 2[sup]3[/sup] × 3 × 5 × 61 × 97 × 389 × 102913 minimal bounds to find this factor in stage2 would be B1=389,B2=102913 minimal bounds to find this factor in stage1 would be B1=102913 You ran this with B1=115000 so it [I]should[/I] have found the factor, at least according to my understand of P-1 :unsure:[/QUOTE] If I follow that correctly, then with a B1 of 110000 not only should it have found it in stage 1, but it should not have been possible to find in stage 2 (B1 too high). Is that right? Or could it have found it in stage 2 as a multiple of 102913 (like 205826)? further, if it did find it as a multiple of 102913, would it have given the same factor (516770062491225473521) as prime95 did in stage1? |
You are all right. There's something weird going on here.
Edit: 102913 is pairing up with 1341143, which gets caught in stage 2. But I still don't know why stage 1 is not finding the factor. Edit2. Found it. Stage 1 doesn't stand a chance of finding any factor at the moment. Its not looking at the right data. Fix coming this evening. |
I did some other exponents which had factors in low P-1 bounds.
Each and everyone of them was reported by prime95 in stage 1, CUDAPm1 found them in stage 2. |
1 Attachment(s)
[QUOTE=Stef42;339002]Right after stage 1 finished and stage 2 was initiated, I got a popup saying that CUDAPm1 crashed.[/QUOTE]Just for clarity, I have attached a screenshot showing this happening.
|
[QUOTE=Stef42;339019]I did some other exponents which had factors in low P-1 bounds.
Each and everyone of them was reported by prime95 in stage 1, CUDAPm1 found them in stage 2.[/QUOTE]I tried looking for factor where k=1, so I tried[code]CUDAPm1 4444091 -b1 100 -b2 1000[/code]Should've been found in stage1, actually it should have found 2 factors in stage 1: [url=http://www.mersenne.ca/factor/8888183]8888183[/url] k = 1 [url=http://www.mersenne.ca/factor/319974553]319974553[/url] k = 36 But no factor(s) found:[quote]Stage 1 complete, estimated total time = 0:00 Starting stage 1 gcd. M4444091 Stage 1 found no factor (P-1, B1=100, B2=390390, e=6, n=256K CUDAPm1 v0.00) Starting stage 2. Zeros: 12986, Ones: 24934, Pairs: 8522[/quote] One side note:[quote]B2 should be at least 390390, increasing it. Starting stage 1 P-1, M4444091, B1 = 100, B2 = 390390, e = 6, fft length = 256K[/quote]Is the B2>=390390 a fixed limitation, or tied to the exponent, or FFT, or...? It could be interesting to play with CUDAPm1 with a smaller B2 bound than that, if possible. |
[QUOTE=owftheevil;339018]You are all right. There's something weird going on here.
Edit: 102913 is pairing up with 1341143, which gets caught in stage 2. But I still don't know why stage 1 is not finding the factor. Edit2. Found it. Stage 1 doesn't stand a chance of finding any factor at the moment. Its not looking at the right data. Fix coming this evening.[/QUOTE] A fundamental truth: software is hard. Computers do [B][I][U]exactly[/U][/I][/B] what we tell them to do (usually; damn bad hardware!). My second born for a DWIM command! :wink: This is why extensive testing -- by many different people -- is required. Good work everyone! :smile: |
[QUOTE=James Heinrich;339025]actually it should have found 2 factors in stage 1[/QUOTE]On the plus side, it did find all 3 known factors in stage2, albeit as the composite of all of them:[code]M4444091 has a factor: 1809798096458971047321927127 (P-1, B1=100, B2=390390, e=6, n=256K CUDAPm1 v0.00)[/code]
|
[QUOTE=James Heinrich;339025]One side note:Is the B2>=390390 a fixed limitation, or tied to the exponent, or FFT, or...? It could be interesting to play with CUDAPm1 with a smaller B2 bound than that, if possible.[/QUOTE]
[CODE]B2 should be at least 1560000, increasing it. Starting stage 1 P-1, M9090017, B1 = 120000, B2 = 1560000, e = 6, fft length = 5 12K[/CODE] I'm not that good in figuring out what it's bound too. Example might help tough. |
Playing with some limits checking.
smallest exponent: [url=http://www.mersenne.ca/exponent/86243]M86243[/url] (aka 28[sup]th[/sup] [url=http://www.mersenne.ca/prime.php]Mersenne Prime[/url]) -- checked for and warns user non-prime exponents: checks for and warns user maximum exponent: uncertain. Haven't tested extensively, but testing in OBD range isn't working nicely: "CUDAPm1 3333333011 -b1 100 -b2 1000" crashes quickly ("CUDAPm1 has stopped working...", whereas "CUDAPm1 3333333011" (no bounds specified) just sits there (no GPU load, no crash, no progress). Just under 2[sup]31[/sup] (M2000000011) does the same thing. Just under 2[sup]30[/sup], it doesn't crash, but the error message is somewhat cryptic to me as an end-user:[code]CUDAPm1 1000000009 -b1 1000 -b2 10000 over specifications Grid = 110592 try increasing threads (512) or decreasing FFT length (55296K)[/code] Specifying a negative exponent (e.g. "CUDAPm1 -3333333011") doesn't work, but doesn't issue any warnings either. I guess it's being treated as an unrecognized parameter, but a warning should be generated for unrecognized parameters. |
Threads is a parameter you can set in the ini file. 1024 is the largest possible value. That value should enable 1000000009 to run.
|
Q? about proto-p-1-cuda...
Is the does it write a .bu or .bu2 file like P95 does? If so, are they compatible? i.e. could I run Stage 1 on GPU and Stage 2 on CPU? |
[QUOTE=c10ck3r;339039]Q? about proto-p-1-cuda...
Is the does it write a .bu or .bu2 file like P95 does? If so, are they compatible? i.e. could I run Stage 1 on GPU and Stage 2 on CPU?[/QUOTE] I believe I read your answer is somewhere in this thread. EDIT: Or maybe that was on CuLu, I can't remember. :( |
About the 1000000009 run, I got to thinking (funny how I do most of that after speaking) that you would need an incredible amount of memory to get that to work, ~4.2GB at the absolute minimum. ~2.4 just for stage 1.
|
Far beyond my card, but Karl mentioned above he has a 6GB vidcard...
|
[QUOTE=James Heinrich;339050]Far beyond my card, but Karl mentioned above he has a 6GB vidcard...[/QUOTE]
The Titan is the only nVidia Geforce card I think that has 6 GB. EDIT: That or the Tesla K20X... |
there.
window7 home premium (64 bit) [code] Iteration 873000 M61262347, 0x92b46441f57f0dc1, n = 3360K, CUDAPm1 v0.00 err = 0 .19531 (0:13 real, 12.7994 ms/iter, ETA 0:01) M61262347, 0xfd7ab9d857ea4a36, offset = 0, n = 3360K, CUDAPm1 v0.00 Stage 1 complete, estimated total time = 3:06:32 Starting stage 1 gcd. M61262347 Stage 1 found no factor (P-1, B1=605000, B2=16637500, e=2, n=3360K CUD APm1 v0.00) Starting stage 2. Zeros: 875508, Ones: 853116, Pairs: 166845 itime: 1.982408, transforms: 1, average: 1982.408000 ptime: 1863.879498, transforms: 285724, average: 6.523356 ETA: 5:42:04 itime: 2.236556, transforms: 1, average: 2236.556000 ptime: 1867.422307, transforms: 286126, average: 6.526573 ETA: 5:11:17 itime: 2.341590, transforms: 1, average: 2341.590000 ptime: 1863.484573, transforms: 286070, average: 6.514086 ETA: 4:40:04 itime: 2.443132, transforms: 1, average: 2443.132000 ptime: 1864.386307, transforms: 286206, average: 6.514141 ETA: 4:08:56 itime: 2.479896, transforms: 1, average: 2479.896000 ptime: 1865.738907, transforms: 286420, average: 6.513997 ETA: 3:37:50 itime: 2.566038, transforms: 1, average: 2566.038000 ptime: 1866.830105, transforms: 286588, average: 6.513986 ETA: 3:06:45 itime: 2.578672, transforms: 1, average: 2578.672000 ptime: 1863.986985, transforms: 286146, average: 6.514112 ETA: 2:35:37 itime: 2.578564, transforms: 1, average: 2578.564000 ptime: 1868.104663, transforms: 286782, average: 6.514023 ETA: 2:04:31 itime: 2.616162, transforms: 1, average: 2616.162000 ptime: 1864.357941, transforms: 286198, average: 6.514224 ETA: 1:33:23 itime: 2.704018, transforms: 1, average: 2704.018000 ptime: 1869.413957, transforms: 286978, average: 6.514137 ETA: 1:02:16 itime: 2.703811, transforms: 1, average: 2703.811000 ptime: 1861.521090, transforms: 285758, average: 6.514327 ETA: 31:07 itime: 2.665333, transforms: 1, average: 2665.333000 ptime: 1862.245724, transforms: 285860, average: 6.514538 ETA: 0:00 Stage 2 complete, estimated total time = 6:13:31 Accumulated Product: M61262347, 0xa77ba20d6e2648c2, n = 3360K, CUDAPm1 v0.00 Starting stage 2 gcd. M61262347 has a factor: 195362848474407049033033 (P-1, B1=605000, B2=16637500, e =2, n=3360K CUDAPm1 v0.00) [/code] Cudaluca 5.0 installed too. on a gtx560, 1024 Mo ram. |
now thanks to jwb52z we have
[code] P-1 found a factor in stage #1, B1=580000. UID: Jwb52z/Clay, M61761811 has a factor: 664146289430268916763473 79.136 bits. [/code] wich is k = 2^3 * 3 * 269 * 331 * 8363 * 300857 wich coulld tehorically found with a B1 of 8363 and a B2 of 300857. stage 1 used 355 Mb of ram and stage 2 838 and 850 with stage2 GCD [code] C:\Users\Vincent\Desktop\cudapm1>CUDAPm1.exe 61761811 -b1 8363 -b2 300857 Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt" Warning: Couldn't parse ini file option ResultsFile; using default "results.txt" CUDA reports zuM of zuM GPU memory free. Using e=2, d=210, nrp=6 Using approximately zuM GPU memory. Starting stage 1 P-1, M61761811, B1 = 8363, B2 = 300857, e = 2, fft length = 336 0K Doing 12072 iterations Iteration 1000 M61761811, 0x58d24f1daf85c89d, n = 3360K, CUDAPm1 v0.00 err = 0.2 2656 (0:16 real, 16.2453 ms/iter, ETA 2:59) Iteration 2000 M61761811, 0xbf2f93dbb5319ece, n = 3360K, CUDAPm1 v0.00 err = 0.2 4219 (0:13 real, 12.9319 ms/iter, ETA 2:10) Iteration 3000 M61761811, 0x92d6f0e4c26aff33, n = 3360K, CUDAPm1 v0.00 err = 0.2 3438 (0:13 real, 12.8572 ms/iter, ETA 1:56) ... 22656 (0:13 real, 12.8680 ms/iter, ETA 0:13) Iteration 12000 M61761811, 0x19f76d7f61bb24ed, n = 3360K, CUDAPm1 v0.00 err = 0. 23828 (0:13 real, 12.9096 ms/iter, ETA 0:00) M61761811, 0xd041eb56158c648e, offset = 0, n = 3360K, CUDAPm1 v0.00 Stage 1 complete, estimated total time = 2:39 Starting stage 1 gcd. M61761811 Stage 1 found no factor (P-1, B1=8363, B2=300857, e=2, n=3360K CUDAPm1 v0.00) Starting stage 2. Zeros: 11856, Ones: 19440, Pairs: 5592 itime: 2.051921, transforms: 1, average: 2051.921000 ptime: 48.817234, transforms: 7426, average: 6.573826 ETA: 5:56 iETA: 0:51 itime: 2.995641, transforms: 1, average: 2995.641000 ptime: 49.863910, transforms: 7440, average: 6.702138 ETA: 0:00 Stage 2 complete, estimated total time = 6:56 Accumulated Product: M61761811, 0x5e6c85d01c0aae6e, n = 3360K, CUDAPm1 v0.00 Starting stage 2 gcd. M61761811 has a factor: 664146289430268916763473 (P-1, B1=8363, B2=300857, e=2, n=3360K CUDAPm1 v0.00) [/code] |
Technically, with that binary, we should not be able to use more than 2GB of vRAM and 3-4GB of RAM, for it is 32 bit:smile:
This should be worth mentioning. |
another thing :
[code] M55824233 has a factor: 833043841114609831879 (P-1, B1=839, B2=11550, e=2, n=307 2K CUDAPm1 v0.00) [/code] This one has a k of p1*p2*...*839 aand should have been found with a B1 of 839. it doesn't find it. I have to wait till the end of stage 2 to get the factor. |
One problem fixed.
[CODE]filbert@filbert:~/Build/cudapm1-0.00/cudapm1-code/trunk$ ./CUDAPm1 55824233 -b1 839 CUDA reports 716M of 1279M GPU memory free. Using e=6, d=2310, nrp=12 Using approximately 681M GPU memory. B1 should be at least 18324, increasing it. Starting stage 1 P-1, M55824233, B1 = 839, B2 = 12625000, e = 6, fft length = 3072K Doing 1239 iterations Iteration 1000 M55824233, 0xa6b0b535ca74136a, n = 3072K, CUDAPm1 v0.00 err = 0.16406 (0:16 real, 15.6659 ms/iter, ETA 0:03) M55824233, 0x8e2dd418ceb91638, offset = 0, n = 3072K, CUDAPm1 v0.00 Stage 1 complete, estimated total time = 0:18 Starting stage 1 gcd. M55824233 has a factor: 833043841114609831879 (P-1, B1=839, B2=12625000, e=6, n=3072K CUDAPm1 v0.00) [/CODE] |
Thanks. I guess that was an easy one.
|
Will there be a new windows build with the stage1 fix in? I have an [URL="http://www.evga.com/products/pdf/03G-P3-1591.pdf"]unusual 580[/URL] that I would be willing to run some tests against.
|
[QUOTE=Aramis Wyler;339072]Will there be a new windows build with the stage1 fix in? I have an [URL="http://www.evga.com/products/pdf/03G-P3-1591.pdf"]unusual 580[/URL] that I would be willing to run some tests against.[/QUOTE]
Dang! That [U]is[/U] unusual! Nice amount of RAM, too. Does it OC at all? |
Well, the 1.5gb version runs at 850, so as a lark I tried to crank this one up to 850/1700 as well. Sure enough, it has been stable. I have never tried to take it past 850/1700.
EDIT: Saying it clocks at 850 doesn't always mean anything in speed terms, so I grabbed this out of the mfactc window for reference. [CODE] got assignment: exp=63249397 bit_min=73 bit_max=74 (30.25 GHz-days) Starting trial factoring M63249397 from 2^73 to 2^74 (30.25 GHz-days) k_min = 74662632479820 k_max = 149325264962435 Using GPU kernel "barrett76_mul32_gs" Date Time | class Pct | time ETA | GHz-d/day Sieve Wait May 03 00:35 | 3827 82.8% | 5.773 15m53s | 471.52 69941 n.a.% [/CODE] |
Still don't have the motivation to track down the problem reading text from ini files, but here's the next version to try.
[URL="https://www.dropbox.com/s/2b840sgu33bqm6l/cudapm1_20130502.zip"]https://www.dropbox.com/s/2b840sgu33bqm6l/cudapm1_20130502.zip[/URL] Again, completely untested in Windows by me. |
[QUOTE=Stef42;339030][CODE]B2 should be at least 1560000, increasing it.
Starting stage 1 P-1, M9090017, B1 = 120000, B2 = 1560000, e = 6, fft length = 5 12K[/CODE] I'm not that good in figuring out what it's bound too. Example might help tough.[/QUOTE] The limits depend on B1, B2, d2, and e2. It's somewhat non-trivial, which is why the code handles it automatically. |
[QUOTE=owftheevil;339065][CODE]
Starting stage 1 gcd. M55824233 has a factor: 833043841114609831879 (P-1, B1=839, B2=12625000, e=6, n=3072K CUDAPm1 v0.00) [/CODE][/QUOTE] If the factor is found in stage 1, should the value of B2 in the output be equal to B1 as in the following: M55824233 has a factor: 833043841114609831879 (P-1, B1=839, B2=839, e=6, n=3072K CUDAPm1 v0.00) If so, that's an easy change. |
Yay, it works!
Manged to get to stage 2 with -b1 500 -b2 0.5M, it was using around 1.8GB of vRAM, as the program calculated. Now, how do I change the 'e' parameter to increase memory usage (is there a way at all, indirect perhaps?)? [CODE] CUDAPm1 -d 0 -threads 512 -c 10000 -t -polite 0 -b1 99000 -b2 99000000 8000117 Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt" Warning: Couldn't parse ini file option ResultsFile; using default "results.txt" ------- DEVICE 0 ------- name GeForce GTX TITAN totalGlobalMem -1 sharedMemPerBlock 49152 regsPerBlock 65536 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 2147483647,65535,65535 totalConstMem 65536 Compatibility 3.5 clockRate (MHz) 928 textureAlignment 512 deviceOverlap 1 multiProcessorCount 14 CUDA reports 4095M of 4095M GPU memory free. Using e=6, d=2310, nrp=480 Using approximately 1737M GPU memory. B1 should be at least 143687, increasing it. Starting stage 1 P-1, M8000117, B1 = 143687, B2 = 99000000, e = 6, fft length = 448K Doing 207401 iterations Iteration 10000 M8000117, 0xcdeeefc9ed0c8af2, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:10 real, 1.0008 ms/iter, ETA 3:17) Iteration 20000 M8000117, 0xc1f6fb554bd5366d, n = 448K, CUDAPm1 v0.00 err = 0.03809 (0:10 real, 0.9963 ms/iter, ETA 3:06) Iteration 30000 M8000117, 0xa8c3682070917470, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9987 ms/iter, ETA 2:57) Iteration 40000 M8000117, 0x8641b21065c7c3c4, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:10 real, 0.9896 ms/iter, ETA 2:45) Iteration 50000 M8000117, 0xdde465fe55ac1ecb, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9817 ms/iter, ETA 2:34) Iteration 60000 M8000117, 0xa795e30debbb03a1, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 0.9906 ms/iter, ETA 2:26) Iteration 70000 M8000117, 0xbe53ab8c34cac0e3, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9818 ms/iter, ETA 2:14) Iteration 80000 M8000117, 0x3f4e80a97be2b8b8, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:10 real, 0.9974 ms/iter, ETA 2:07) Iteration 90000 M8000117, 0x64c10d213e1edda8, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 0.9985 ms/iter, ETA 1:57) Iteration 100000 M8000117, 0xb85ce1dc3b7d9537, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 0.9842 ms/iter, ETA 1:45) Iteration 110000 M8000117, 0xa031e593b3e4eb0e, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0001 ms/iter, ETA 1:37) Iteration 120000 M8000117, 0x33806a25d8628703, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:10 real, 1.0034 ms/iter, ETA 1:27) Iteration 130000 M8000117, 0x4d78d18fdbe49d31, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0009 ms/iter, ETA 1:17) Iteration 140000 M8000117, 0x8340c832411bb464, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0004 ms/iter, ETA 1:07) Iteration 150000 M8000117, 0xf9531f22d8b5d8fb, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0039 ms/iter, ETA 0:57) Iteration 160000 M8000117, 0xa5ee8cd34b352e31, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:10 real, 1.0033 ms/iter, ETA 0:47) Iteration 170000 M8000117, 0x978568529d0b8b98, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0025 ms/iter, ETA 0:37) Iteration 180000 M8000117, 0xe27641ef5a0da890, n = 448K, CUDAPm1 v0.00 err = 0.03442 (0:10 real, 1.0026 ms/iter, ETA 0:27) Iteration 190000 M8000117, 0x28680bf513e7c074, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 0.9994 ms/iter, ETA 0:17) Iteration 200000 M8000117, 0xc34709e98a8b4b98, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:10 real, 1.0015 ms/iter, ETA 0:07) M8000117, 0x930fd9ded05878a5, offset = 0, n = 448K, CUDAPm1 v0.00 Stage 1 complete, estimated total time = 3:27 Starting stage 1 gcd. M8000117 has a factor: 418913928878609399 (P-1, B1=143687, B2=99000000, e=6, n=448K CUDAPm1 v0.00)[/CODE] Running the same binary with the same options, but for GTX 480(1.5GB vRAM) results in great success! [CODE]CUDAPm1 -d 1 -threads 512 -c 10000 -t -polite 0 -b1 99000 -b2 99000000 8000117 Warning: Couldn't parse ini file option WorkFile; using default "worktodo.txt" Warning: Couldn't parse ini file option ResultsFile; using default "results.txt" ------- DEVICE 1 ------- name GeForce GTX 480 totalGlobalMem 1610285056 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 Compatibility 2.0 clockRate (MHz) 1600 textureAlignment 512 deviceOverlap 1 multiProcessorCount 15 CUDA reports 1404M of 1535M GPU memory free. Using e=6, d=2310, nrp=240 Using approximately 897M GPU memory. B1 should be at least 143687, increasing it. Starting stage 1 P-1, M8000117, B1 = 143687, B2 = 99000000, e = 6, fft length = 448K Doing 207401 iterations Iteration 10000 M8000117, 0xcdeeefc9ed0c8af2, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2635 ms/iter, ETA 4:09) Iteration 20000 M8000117, 0xc1f6fb554bd5366d, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:12 real, 1.2055 ms/iter, ETA 3:45) Iteration 30000 M8000117, 0xa8c3682070917470, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:12 real, 1.2043 ms/iter, ETA 3:33) Iteration 40000 M8000117, 0x8641b21065c7c3c4, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:13 real, 1.2206 ms/iter, ETA 3:24) Iteration 50000 M8000117, 0xdde465fe55ac1ecb, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2151 ms/iter, ETA 3:11) Iteration 60000 M8000117, 0xa795e30debbb03a1, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.1971 ms/iter, ETA 2:56) Iteration 70000 M8000117, 0xbe53ab8c34cac0e3, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2213 ms/iter, ETA 2:47) Iteration 80000 M8000117, 0x3f4e80a97be2b8b8, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2120 ms/iter, ETA 2:34) Iteration 90000 M8000117, 0x64c10d213e1edda8, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2175 ms/iter, ETA 2:22) Iteration 100000 M8000117, 0xb85ce1dc3b7d9537, n = 448K, CUDAPm1 v0.00 err = 0.03418 (0:12 real, 1.2178 ms/iter, ETA 2:10) Iteration 110000 M8000117, 0xa031e593b3e4eb0e, n = 448K, CUDAPm1 v0.00 err = 0.03711 (0:13 real, 1.2204 ms/iter, ETA 1:58) Iteration 120000 M8000117, 0x33806a25d8628703, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2092 ms/iter, ETA 1:45) Iteration 130000 M8000117, 0x4d78d18fdbe49d31, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2287 ms/iter, ETA 1:35) Iteration 140000 M8000117, 0x8340c832411bb464, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2191 ms/iter, ETA 1:22) Iteration 150000 M8000117, 0xf9531f22d8b5d8fb, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2197 ms/iter, ETA 1:10) Iteration 160000 M8000117, 0xa5ee8cd34b352e31, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:13 real, 1.2187 ms/iter, ETA 0:57) Iteration 170000 M8000117, 0x978568529d0b8b98, n = 448K, CUDAPm1 v0.00 err = 0.03369 (0:12 real, 1.1983 ms/iter, ETA 0:44) Iteration 180000 M8000117, 0xe27641ef5a0da890, n = 448K, CUDAPm1 v0.00 err = 0.03516 (0:12 real, 1.2121 ms/iter, ETA 0:33) Iteration 190000 M8000117, 0x28680bf513e7c074, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2194 ms/iter, ETA 0:21) Iteration 200000 M8000117, 0xc34709e98a8b4b98, n = 448K, CUDAPm1 v0.00 err = 0.03320 (0:12 real, 1.2186 ms/iter, ETA 0:09) M8000117, 0x930fd9ded05878a5, offset = 0, n = 448K, CUDAPm1 v0.00 Stage 1 complete, estimated total time = 4:12 Starting stage 1 gcd. M8000117 has a factor: 418913928878609399 (P-1, B1=143687, B2=99000000, e=6, n=448K CUDAPm1 v0.00)[/CODE] |
I would suggest to fiddle with the E value.
|
[QUOTE=Karl M Johnson;339106]
Now, how do I change the 'e' parameter to increase memory usage (is there a way at all, indirect perhaps?)? [/QUOTE] Yes. e can be 2, 4, 6, 8, 10, or 12. Just use -e2 12. |
Setting the Brent-Suyama exponent to the max resulted in a slight increase of memory usage, around 20 additional MBs.
Now, since the other parameter is nrp(-nrp2 n?), I've tried increasing it too, but the program ignored the switch. |
then... higher exponent? or higher bound.
|
We do not seek simple solutions:smile:
Will try to find out the threshold of current binary for vRAM, I can confirm it is NOT this: 2147483647 bytes(mempitch). |
grab the windows binary, there is a ini file in it that might help you.
higher fftlength? |
With e=12, d=2310 and nrp=480, the last exponent, which can be checked on current binary is 14,155,777.
The next exponent, 14,155,807, cant go to stage 2. Now, the real vRAM usage for CPm1 for the 14,155,777 exp is ~3073MB (MSI afterburner delta method), reported approx. vRAM usage was 3014MB. As a conclusion of this micro research, if you see approx. memory usage of >=3139MB, be sure that stage 2 will not work, even if you have a lot more than that. Proof: [URL]http://i.imgur.com/iUpQaMr.png[/URL] [URL]http://i.imgur.com/W8fqlWQ.png[/URL] |
[QUOTE=frmky;339103]If the factor is found in stage 1, should the value of B2 in the output be equal to B1 as in the following:
M55824233 has a factor: 833043841114609831879 (P-1, B1=839, B2=839, e=6, n=3072K CUDAPm1 v0.00) If so, that's an easy change.[/QUOTE]Yes, please, it would be helpful if the results indicated that. |
[QUOTE=frmky;339101]here's the next version to try.[/QUOTE]Starting a new run looks better than last time:[code]Selected B1=560000, B2=14280000, 3.55% chance of finding a factor
CUDA reports 781M of 1279M GPU memory free. Using e=6, d=2310, nrp=12 Using approximately 744M GPU memory. Starting stage 1 P-1, M60817711, B1 = 560000, B2 = 14280000, e = 6, fft length = 3360K Doing 807829 iterations[/code]I'll let it run and see if it finds the [url=http://www.mersenne.ca/exponent/60817711]known stage2 factor[/url]. |
[QUOTE=frmky;339101]Still don't have the motivation to track down the problem reading text from ini files[/QUOTE]
Remove the #define sscanf sscanf_s line from parse.c. Using sscanf_s requires each string var scanned into to be followed by an argument with the size of that string, but that's not done in the sscanf call in IniGetStr. This means the sscanf_s checking picks a random uninitialized value off the stack for the length of the dest string, leading to random failures. A real fix is implementing a wrapper like the sprintf() one which includes this parameter in the call to sscanf_s. Or just ignore the safe version of this function since it is more trouble than it is worth. |
I'm getting a lot of cudaDeviceSynchronize() error 30...
Usually on high B2 value's while only 400-500MB is used (low exponents). Why this might have happened: [url]http://stackoverflow.com/questions/12200994/cuda-runtime-api-error-30-repeated-kernel-calls[/url] |
[QUOTE=James Heinrich;339130]I'll let it run and see if it finds the [url=http://www.mersenne.ca/exponent/60817711]known stage2 factor[/url].[/QUOTE]It did:[code]Stage 2 complete, estimated total time = 2:57:29
Accumulated Product: M60817711, 0x978923630c42303f, n = 3360K, CUDAPm1 v0.00 Starting stage 2 gcd. M60817711 has a factor: 3493866477323309653137460319 (P-1, B1=560000, B2=14280000, e=6, n=3360K CUDAPm1 v0.00)[/code]4.212GHz-days in 2h57m29s = 34GHz-days/day. A far cry from the ~400GHd/d the GTX570 can push in mfaktc, but also notably faster than can be done on my CPU. |
| All times are UTC. The time now is 23:18. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.