![]() |
[QUOTE=owftheevil;344980]Not as far as I know.
I've got stage 2 error reporting and checkpoints working, just need to fix the ETA estimates when resuming stage 2.[/QUOTE] Any chance of a new version? Checkpoints for stage 2 would make me VERY happy :smile: |
[QUOTE=BigBrother;345680]Any chance of a new version? Checkpoints for stage 2 would make me VERY happy :smile:[/QUOTE]
BTW, I had some issues with the last exponent. The proram stops (either on stage 1 ot in stage 2) and doesn't even feel the controlC (I had to killnine it). It's not an overheating issue (i'm uing the GPU with mmff), maybe some memory issues. I will dig a bit more into it in the next days. Luigi |
The stage 2 checkpoints are essentially done, I just want to add an option in the ini file to deal with deleting the savefiles at the end of a test or not. However, I can't do anything until they let us go back home again. We were evacuated on the 4th due to a fire.
|
Ok, I've started both my titans on P-1.
Stage1+Stage2 completed on current exponents in <2hrs. They are just slightly ahead of my i5-3450 CPU @ 3.10GHz in terms of efficiency*. Latency - they blow it out of the park. *Calc: CPU: per day: 8 * 3.7661GHz-days 90W 0.33GHz-days/W GPU: per day: 12*5.7006GHz-days 200W 0.34GHZ-days/W Very rough figures of course. -- Craig |
CudaLucas, mfaktc, the ECM thing and now this? What do we have CPUs for anyway?
|
[QUOTE=TheMawn;349063]CudaLucas, mfaktc, the ECM thing and now this? What do we have CPUs for anyway?[/QUOTE]
CUDALucas isn't as efficient as mfaktc/o vs cpu. But yeah, almost everything runs on a gpu, lol |
The next logical step would be to boot an OS under a GPU without any CPUs in the system, but that would require physically integrating additional components into the gpgpu card :smile:
|
Just wanted to say how awesome this code is with a titan.
2h4mins-ish to get 5.7GHz-days of results. Found a heap of factors already. -- Craig |
BTW, windoze builds for the last version? Hopefully with stage2 checkpoints and properly resuming stage 1 in case one wants to extend B1? :razz:
Wouldn't be nice to have a mersenneforum.org/cudapm1 folder, as we have for misfit, mfaktc/o, etc? |
[QUOTE=LaurV;349674]BTW, windoze builds for the last version? Hopefully with stage2 checkpoints and properly resuming stage 1 in case one wants to extend B1?
Wouldn't be nice to have a mersenneforum.org/cudapm1 folder, as we have for misfit, mfaktc/o, etc?[/QUOTE] I second both suggestions/requests. |
[QUOTE=LaurV;349674]Wouldn't be nice to have a mersenneforum.org/cudapm1 folder, as we have for misfit, mfaktc/o, etc?[/QUOTE]I already have a mirror at [url]http://download.mersenne.ca/CUDAPm1/[/url]
Let me know if I'm missing versions (I only have the Windows builds there right now, if someone wants to point me to more recent Windows versions and/or source or Linux builds I'll happily put them up). |
Thanks for that, James! I'll have to try it to find out if I understand the current state of affairs enough to use it.
|
The latest builds I have are from May. I believe some further work has been done since then, but I haven't seen a Windows build anywhere.
Source code (SVN repository) is maintained at [url]http://sourceforge.net/projects/cudapm1/[/url] I also just create a wiki page for CUDAPm1, since there wasn't one before: [url]http://mersennewiki.org/index.php/CUDAPm1[/url] If someone knows more about about it perhaps they can fill in some details :smile: |
[QUOTE=James Heinrich;349697]The latest builds I have are from May. I believe some further work has been done since then, but I haven't seen a Windows build anywhere.
Source code (SVN repository) is maintained at [url]http://sourceforge.net/projects/cudapm1/[/url] I also just create a wiki page for CUDAPm1, since there wasn't one before: [url]http://mersennewiki.org/index.php/CUDAPm1[/url] If someone knows more about about it perhaps they can fill in some details :smile:[/QUOTE] I got time this weekend, I can take latest source code and build it for cuda 5.0 and 5.5.. just have to get to weekend first.. |
[QUOTE=Manpowre;349714]I got time this weekend, I can take latest source code and build it for cuda 5.0 and 5.5.. just have to get to weekend first..[/QUOTE]
That would be Great, Manpowre! I started searching and reading to see how to do the compile, but it will take me a while. |
Work has been getting done since May. The latest version on SourceForge has stage 2 checkpoints and error checking, automatic selection of optimal b1, b2, e, and d, and some speed improvements.
Right now I'm working on reducing the base vram use, then optimizing fft selection, then the ability to extend b1 beyond a completed stage 1. Windows builds would be welcome. Carl Edit: Craig, on titans, fft 4000k, or sometimes 3888k seem to be faster than anything else above 3240k |
[QUOTE=owftheevil;349735]Work has been getting done since May. The latest version on SourceForge has stage 2 checkpoints and error checking, automatic selection of optimal b1, b2, e, and d, and some speed improvements.
Right now I'm working on reducing the base vram use, then optimizing fft selection, then the ability to extend b1 beyond a completed stage 1. Windows builds would be welcome. Carl[/QUOTE] I know it's been discussed before, and I can track it down, but can you point me in the right direction for setting up to compile for and in Windows 7 64-bit? EDIT: I've downloaded the "snapshot" 'cudapm1-code-41-trunk.zip' from Sourceforge. |
Kladner,
Sorry but no I can't. I have no experience with windows. In fact, I just installed win7 so that I can start to get it figured out. Any help would be greatly appreciated. |
[QUOTE=kladner;349736]I know it's been discussed before, and I can track it down, but can you point me in the right direction for setting up to compile for and in Windows 7 64-bit?
EDIT: I've downloaded the "snapshot" 'cudapm1-code-41-trunk.zip' from Sourceforge.[/QUOTE] Problem here is not to get Pm-1 into VC project, problem is GMP.h. I found MPIR tutorial to replace GMP, but it was not straight forward. also, I got constant errors with long long e parsed from arg. not sure why, but that was weird, as that code I used myself before.. so it just doesnt build like that.. I got to the point where i linked MPIR in.. now just code issues which I guess is linux specific. |
[QUOTE=owftheevil;349737]Kladner,
Sorry but no I can't. I have no experience with windows. In fact, I just installed win7 so that I can start to get it figured out. Any help would be greatly appreciated.[/QUOTE] Thanks! You've done plenty already. If I have any inspired successes, be sure I'll be shouting from the rooftops. |
Here's a link to the files I used for the compile. Hopefully everything you need is there. Note the solution file is called vectorAdd since I just stole that example from the cuda samples and substituted what I needed without changing the name. It should compile the CUDAPm1.cu in the directory as is, but I haven't tried the latest version.
[url]https://www.dropbox.com/sh/n1tqr8660fkivtm/mM3EaqhWTW[/url] |
Does P-1 use the double precision that GTX 500 cards have but 600 and 700 don't? What's the best series of card to use for P-1?
|
[QUOTE=TheMawn;349855]Does P-1 use the double precision that GTX 500 cards have but 600 and 700 don't? What's the best series of card to use for P-1?[/QUOTE]
What do you mean? They are all capable of DP. EDIT: Second question, sorry. Probably 500 series. |
Oh I know they all do double precision but there was something about it being a third of the speed on the 600 and 700
|
[QUOTE=frmky;349851]Here's a link to the files I used for the compile. Hopefully everything you need is there. Note the solution file is called vectorAdd since I just stole that example from the cuda samples and substituted what I needed without changing the name. It should compile the CUDAPm1.cu in the directory as is, but I haven't tried the latest version.
[URL]https://www.dropbox.com/sh/n1tqr8660fkivtm/mM3EaqhWTW[/URL][/QUOTE] Many thanks, Greg. I'll have a go with them. |
[QUOTE=TheMawn;349874]Oh I know they all do double precision but there was something about it being a third of the speed on the 600 and 700[/QUOTE]
The PM1 program is derived from CUDALucas, so it leans heavily on DP. 500 series is better in this regard than the higher numbered ones. This does not apply to Titan, which is a special case unto itself. |
[QUOTE=TheMawn;349874]Oh I know they all do double precision but there was something about it being a third of the speed on the 600 and 700[/QUOTE]
Yes, it's all double precision. The Titan is the fastest by a large margin, but it's also ridiculously expensive. Otherwise go with a 5xx series and avoid the intermediate cards based on the Kepler K10. |
[QUOTE=frmky;349917]Yes, it's all double precision. The Titan is the fastest by a large margin, but it's also ridiculously expensive. Otherwise go with a 5xx series and avoid the intermediate cards based on the Kepler K10.[/QUOTE]
GTX780 is Titan-lite. GTX780 is a different chip to GTX770. If you can't get a GTX5x0, and a Titan is out of your price range then GTX780 is your better bet. Of course factoring in budget constraints. But all things considered, If you want to do DP work, low clocked quad core haswell + high clocked ram comes out on top efficiency-wise. -- Craig |
1 Attachment(s)
Many thanks to frmky, here's a 64bit windows build of CUDAPm1, using CUDA toolkit 5.0. I have tested this very little, but seems to be working OK.
|
[QUOTE=owftheevil;349949]Many thanks to frmky, here's a 64bit windows build of CUDAPm1, using CUDA toolkit 5.0. I have tested this very little, but seems to be working OK.[/QUOTE]
OMG! Wow! Thanks to both of you! :grin: |
[QUOTE=nucleon;349939]GTX780 is Titan-lite. GTX780 is a different chip to GTX770.
If you can't get a GTX5x0, and a Titan is out of your price range then GTX780 is your better bet. Of course factoring in budget constraints. [/QUOTE] The DP performance of the GTX 780 has been cut to GTX 7xx levels, so for DP compute it is really no different than the earlier chip. A GTX 580 should still give better performance at a much lower price. |
[QUOTE=frmky;349965]The DP performance of the GTX 780 has been cut to GTX 7xx levels, so for DP compute it is really no different than the earlier chip. A GTX 580 should still give better performance at a much lower price.[/QUOTE]According to [url=http://www.mersenne.ca/cudalucas.php]benchmark data[/url] I have for CUDAlucas, the GTX 780 is still slightly ahead of the GTX 580 by roughly 5%
I'm not sure how relative performance varies between CUDAlucas and CUDAPm1. |
[QUOTE=James Heinrich;349966]According to [URL="http://www.mersenne.ca/cudalucas.php"]benchmark data[/URL] I have for CUDAlucas, the GTX 780 is still slightly ahead of the GTX 580 by roughly 5%
I'm not sure how relative performance varies between CUDAlucas and CUDAPm1.[/QUOTE] Too bad it also uses ~2.5% more power, too. I'd say this gives the edge to the 580 because of its lower price. |
If you are after DP* result throughput efficiency.
Your best best is to skip GPUs and buy multiple low-clocked quad core machines +high clock ram. Capex might be more, opex is lower for a given throughput. -- Craig *I stress DP. TF - GPUs blow CPUs out of the water. |
[QUOTE=James Heinrich;349966]According to [URL="http://www.mersenne.ca/cudalucas.php"]benchmark data[/URL] I have for CUDAlucas, the GTX 780 is still slightly ahead of the GTX 580 by roughly 5%
I'm not sure how relative performance varies between CUDAlucas and CUDAPm1.[/QUOTE] I haven't tested this very thoroughly yet, but it seems that on cards with smaller amounts of memory, e.g. a 560 with ~1gb of memory, CUDALucas and CUDAPm1 have about the same thoughput, whereas with 6gb of memory, throughput for CuPm1 is about 15% greater than for CuLu. |
[QUOTE=kladner;349968]Too bad it [U]also[/U] uses ~2.5% more power, [U]too[/U]. [/QUOTE]
Brought to you by the Department of Redundancy Department. :blush: |
[QUOTE=owftheevil;349949]Many thanks to frmky, here's a 64bit windows build of CUDAPm1, using CUDA toolkit 5.0. I have tested this very little, but seems to be working OK.[/QUOTE]
Thank you for the new binary. I see some changes (like full S1 and S2 checkpoints) from the old one I've had (dated 06 May 2013). Owners of the defective Titan may run CUDAPm1/CUDALucas on Windows like this:[CODE] start CUDAPm1 [flags if not using ini file] goto :start[/CODE]So whenever CUDAPm1 quits due to the vRAM being unstable, it will launch again and restart from the latest checkpoint. For this to work effectively, I suggest setting the checkpoint iterations to a thousand, so checkpoints would be written every couple of seconds and running CUDAPm1 from a RAM disk, so that the checkpoints would not wear your storage media. One drawback of this method is that it will never go out of the loop, even if there are no tasks in the worktodo file. Another one is related to the volatile nature of RAM disks: if your system crashes or reboots, you lose all the work. Comments are welcome:smile: |
With the latest drivers, 326.41 for windows and 325.15 for linux, the unstable memory problem (if that's what it was) is fixed. There is still a bug with the driver that causes the ffts to hang occasionally. Its been reported and I presume being worked on. This bug affects all cards, not just the titans.
I've been doing something similar to what you suggested, but instead, looping on a non zero exit value. That way ^C still exits the program. I also don't think the checkpoint iterations set so low is necessary. You will loose as much time by doing the extra checkpoints as you gain by having a more recent checkpoint when it dies. |
[QUOTE=owftheevil;350145]With the latest drivers, 326.41 for windows and 325.15 for linux, the unstable memory problem (if that's what it was) is fixed. There is still a bug with the driver that causes the ffts to hang occasionally. Its been reported and I presume being worked on. This bug affects all cards, not just the titans.
[/QUOTE] Is this other [U]recent[/U] cards, i.e. 600 and 700 series, or does it extend back to the 500s and 400s? I would love to find out that my 570 can actually run at stock RAM clock. |
I'm relatively certain that the problem with your 570 is different from the titan problem. With the titan, the symptoms were almost the same as the other cards--mismatching residues or occasional roundoff errors. On all cards, reducing the memory clock eliminated the problem. But, unlike the other cards, titans didn't show any errors with the memory test program I wrote. Karl M Johnson reported this some time ago, and I found it to be true for my titan also.
On the other hand, it doesn't take much effort to test it out. |
[QUOTE]On the other hand, it doesn't take much effort to test it out. [/QUOTE]
True, and I shall! I'm still pretty much convinced that the VRAM specified for most Geforce cards is fine for game-pixel-pushing at stock speed, but overclocked for more precise use. |
[QUOTE=kladner;350183]True, and I shall!
I'm still pretty much convinced that the VRAM specified for most Geforce cards is fine for game-pixel-pushing at stock speed, but overclocked for more precise use.[/QUOTE] I agree. |
Successful run on the first known-factor example in readme.
M50001781 has a factor: 4392938042637898431087689 (P-1, B1=94709, B2=4067587, e=2, n=2688K CUDAPm1 v0.10) IIRC, it took a total of just over 30 minutes. I still have to test the 570 card. This was on an Asus 580 that I picked up on impulse from eBay. It was sitting at $150 with no bids. I got it for a total of $169, with shipping. I knew that the Asus Direct CU cards are big, but you really don't fully get that until you have one in your hands and have to fit it into the case. I had to move a hard drive to a different slot, and it was still a very close thing just maneuvering it in. EDIT: Program created a "savefiles" folder, but never put anything in it. All the error checking and the "save all" options were turned on in the ini. |
Good deal on the 580.
I have the savefiles part commented out at the moment, so thats why nothing got put there. By the way, don't use that version for production work. There is a bug in stage 2 initialization. It won't find any factors in the first pass. I have the fix committed, but windows and ubuntu are not playing well together at the moment, and I can't build anything new on windows until tomorrow (incredibly slow internet). |
[QUOTE=owftheevil;350253]Good deal on the 580.
I have the savefiles part commented out at the moment, so thats why nothing got put there. By the way, don't use that version for production work. There is a bug in stage 2 initialization. It won't find any factors in the first pass. I have the fix committed, but windows and ubuntu are not playing well together at the moment, and I can't build anything new on windows until tomorrow (incredibly slow internet).[/QUOTE] Thanks for the info and the caution. So far, I've just been figuring out how to do stuff, and testing the hardware. Having the Known Factor samples in the Readme is a big help. |
[QUOTE=owftheevil;350253]
By the way, don't use that version for production work. There is a bug in stage 2 initialization. It won't find any factors in the first pass. I have the fix committed, but windows and ubuntu are not playing well together at the moment, and I can't build anything new on windows until tomorrow (incredibly slow internet).[/QUOTE] I just got a segfault with the latest SVN. I'm rerunning it from the beginning to verify reproducibility. [CODE]M62677721, 0x5861c3dd30a23133, n = 3584K, CUDAPm1 v0.10 Stage 1 complete, estimated total time = 59:58 Starting stage 1 gcd. M62677721 Stage 1 found no factor (P-1, B1=620000, B2=16275000, e=0, n=3584K CUDAPm1 v0.10) Starting stage 2. Using b1 = 620000, b2 = 16275000, d = 2310, e = 6, nrp = 160 Zeros: 731774, Ones: 829666, Pairs: 167476 Processing 1 - 160 of 480 relative primes. nrp = 160, m = 0, d = 2310, e = 6, num_tran = 0, k = 541. Inititalizing pass... *** glibc detected *** ./CUDAPm1: corrupted double-linked list: 0x0000000004f41cc0 *** [/CODE] |
Submitted a fix. Thanks for finding that.
|
1 Attachment(s)
Ran[CODE]cudapm1-5.0 [U]61012769[/U] -b1 10273 -b2 1572097 -f 3360K[/CODE]in right at 15 min., 2.8493 ms/tran, Stage 2
[CODE]M61012769 has a factor: 2018028590362685212673 (P-1, B1=10273, B2=1572097, e=2, n=3360K CUDAPm1 v0.10)[/CODE]As noted by others, I think, GCD, at least for Stage 2, uses no GPU. In my case it uses the equivalent of one core, (12 %) CPU. I'm just getting used to behavior, and watching for errors. |
nice, now we only have to decide about the version number, if it is 5.0 as the command line says, or 0.10 as the result line says... :razz:
|
[QUOTE=LaurV;350307]if it is 5.0 as the command line says[/QUOTE]I believe that would be the CUDA version compiled for, not the program version.
|
Does the P-1 Cuda program use double precision? Would a GTX 660Ti be well-suited for P-1 or should I stick to TF. I'd like to up my participation in P-1 matters but not enough to lose too much productivity...
|
[QUOTE=TheMawn;350400]Does the P-1 Cuda program use double precision? Would a GTX 660Ti be well-suited for P-1 or should I stick to TF. I'd like to up my participation in P-1 matters but not enough to lose too much productivity...[/QUOTE]
The program is derived from CUDALucas, and yes, it relies on DP. 660ti is 14th in [URL="http://www.mersenne.ca/mfaktc.php?sort=ghdpd&noA=1"]TF performance[/URL] among nVidia chips. 660ti is 15th in [URL="http://www.mersenne.ca/cudalucas.php"]LL performance [/URL]according to mersenne.ca Since CuLu and PM1 are cousins, performance might be similar, but owftheevil or frmky etc. would have to comment on comparison between the programs. |
[QUOTE=TheMawn;350400]Does the P-1 Cuda program use double precision? Would a GTX 660Ti be well-suited for P-1 or should I stick to TF. I'd like to up my participation in P-1 matters but not enough to lose too much productivity...[/QUOTE]
Yes, it relies on the same cudaFFT library as cudaLucas. Productivity has many definitions. If you are concerned about GHz-days/day, then stick to TF. |
Also on James's chart, something's not right, the 660 ti and 670 have the same output (237.7 TF)
EDIT: Expect the performance of CuLu and P-1 to be around a tenth or less of TF. EDIT2: Blame nVidia for reducing DP on Kepler and overall all compute! |
Wow. I didn't know about that page. Thanks for that.
That's all I need to know. P-1 progress would be okay but I'd be losing more than 90% of my productivity in GHz-Days by switching away from TF. |
[QUOTE=TheMawn;350408]That's all I need to know. P-1 progress would be okay but I'd be losing more than 90% of my productivity in GHz-Days by switching away from TF.[/QUOTE]
And, to put on the table, the overall effort needs a lot more TFing to keep up with the P-1'ing (we're still comfortably ahead of the LL'ing). (Carl: My offer for a really good dinner here in Barbados stands (as does my offer of my virtual first born).) |
[QUOTE=kracker;350407]EDIT2: Blame nVidia for reducing DP on Kepler and overall all compute![/QUOTE]
Don't you mean blame Nvidia for making a GPU that's better at playing video games? Hah! I've heard of people complaining a $4000 Quadro or some such had terrible video game performance for a $200 card and that for $4000 they should be getting a LOT LOT LOT LOT more. These people are just idiots of course but if Nvidia did make the CuLu God-GPU specially for us, it would certainly be bad at video games and would piss of the dumb section of the overwhelmingly huge gamer portion of their market. We're just along for the ride, in the end... |
[QUOTE=kracker;350407]Also on James's chart, something's not right, the 660 ti and 670 have the same output (237.7 TF)[/QUOTE]No, that's right. There's very little difference between the [url=http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units#GeForce_600_Series]GTX 660 Ti and GTX 670[/url], just a difference (24 vs 32) in the number of [url=http://en.wikipedia.org/wiki/Render_output_unit]ROPs[/url]. Same theoretical SP GFLOPS (2459.52).
|
[QUOTE=TheMawn;350412]Don't you mean blame Nvidia for making a GPU that's better at playing video games? Hah!
I've heard of people complaining a $4000 Quadro or some such had terrible video game performance for a $200 card and that for $4000 they should be getting a LOT LOT LOT LOT more. These people are just idiots of course but if Nvidia did make the CuLu God-GPU specially for us, it would certainly be bad at video games and would piss of the dumb section of the overwhelmingly huge gamer portion of their market.[/QUOTE] Well, yeah the huge majority use their GPU's for gaming, and I understand that. The one at good and bad at another isn't necessarily true though, look at AMD's GCN(example), they are tied with Nvidia's 600's on gaming. [quote] We're just along for the ride, in the end...[/quote]Yes indeed. I don't think anyone can disagree with that... |
[QUOTE=TheMawn;350412]I've heard of people complaining a $4000 Quadro or some such had terrible video game performance for a $200 card and that for $4000 they should be getting a LOT LOT LOT LOT more. These people are just idiots of course but if Nvidia did make the CuLu God-GPU specially for us, it would certainly be bad at video games and would piss of the dumb section of the overwhelmingly huge gamer portion of their market.
[/QUOTE] Well, I work in a company using the Quadro series for everything we do. first of all, Quadro series are not meant for Cuda, even they can be used for that. They have registered memory, and have been downclocked to ensure every pixel is rendered without mistake. You dont care about a pixel error when playing battlefield at 100 frames per second. Byt you do care about pixel error when rendering out a scene meant for high level production for TV, movies, or using the cards for graphics live at Television.. tickers, lower thirds etc.. then that cannot happen. that is what the Quadro boards are for. simply a professional platform with a API to hookup to applicaitons in a different way. Not possible with Geforce drivers. Also Quadro boards have better support from Nvidia both on software siden and hardware side.. so you do not pay for the board, you pay for the services, API, development that the card gives you and it gives you 100% quarantee with no pixel issues. The Cuda card Nvidia has for enthusiasts is called Nvidia Titan.. Its great.. but the design with memory on back side is terrible. so you have to downclock the card to ensure the memory on backside doesnt run too hot. |
1 Attachment(s)
This has the mentioned bug fixes. Again compiled with cuda toolkit 5.0.
I've been messing with building cudalucas on windows. It builds and runs with correct results and decent speed, but with some oddities. Is it normal on windows for the polite option needing to be set to a very low positive value so that the gui is usable? |
CUDAP-1 r.45
I updated my svn repository and started the program without switches (only the exponent to test). Same GTX 580.
Before the update, I had the following results: [code] M65171233 found no factor (P-1, B1=610000, B2=15860000, e=6, n=3584K CUDAPm1 v0.10) M62651599 found no factor (P-1, B1=585000, B2=15063750, e=6, n=3456K CUDAPm1 v0.10) M62650493 found no factor (P-1, B1=585000, B2=15063750, e=6, n=3456K CUDAPm1 v0.10) [/code] Now, I am working with the following parameters: [code] M61603063 fft length = 3888K Using b1 = 555000, b2 = 12348750, d = 2310, e = 2, nrp = 35 [/code] I have a smaller exponent using a bigger FFT and e=2 instead of e=6 (useful for Brent-Suyama extensions, I think). Is this a normal behavior? :surprised: :help: Luigi |
[QUOTE=ET_;350715]I updated my svn repository and started the program without switches (only the exponent to test). Same GTX 580.
Before the update, I had the following results: [code] M65171233 found no factor (P-1, B1=610000, B2=15860000, e=6, n=3584K CUDAPm1 v0.10) M62651599 found no factor (P-1, B1=585000, B2=15063750, e=6, n=3456K CUDAPm1 v0.10) M62650493 found no factor (P-1, B1=585000, B2=15063750, e=6, n=3456K CUDAPm1 v0.10) [/code]Now, I am working with the following parameters: [code] M61603063 fft length = 3888K Using b1 = 555000, b2 = 12348750, d = 2310, e = 2, nrp = 35 [/code]I have a smaller exponent using a bigger FFT and e=2 instead of e=6 (useful for Brent-Suyama extensions, I think). Is this a normal behavior? :surprised: :help: Luigi[/QUOTE] Yes and no. The function where the fft lengths are assigned has two sections, one for timings taken from a 570, and one for timings taken from a titan. The 570 timings would be more appropriate in your case. Look at the function choose_fft_length on line 1332. As for the smaller e, older versions automatically selected e = 6. This latest version acually makes an attempt at chosing an optimal value for e, which is apparently e = 2 in your case. |
[QUOTE=owftheevil;350718]Yes and no. The function where the fft lengths are assigned has two sections, one for timings taken from a 570, and one for timings taken from a titan. The 570 timings would be more appropriate in your case. Look at the function choose_fft_length on line 1332.
As for the smaller e, older versions automatically selected e = 6. This latest version acually makes an attempt at chosing an optimal value for e, which is apparently e = 2 in your case.[/QUOTE] I assume you meant line 1371 (my soirce code has the function located there), or maybe my code is (still) outdated? Provided that my card is choosing Titan's timings, should I modify them or keep with the chosen parameters? Or wait for some new GTX 580 timings? Luigi |
1 Attachment(s)
A new version with some memory optimizations is up at sourceforge. There is also a new ini file option to control the amount of memory the program will not try to use. A line such as
[CODE]UnusedMem 100[/CODE] in the ini file will leave at least 100 MB of vram free. More will probably be left free due to inaccurate reporting of free memory by CUDA and optimizations of the number of relative primes processed in a pass. Here's a windows build with CUDA 5.0. |
I just completed a test run on 50001781. The factor was found: 4392938042637898431087689
There were two instances of "Couldn't write checkpoint," once in stage one, and once in stage two. I don't know if this has any significance. |
Just tested with the Sep3 version above.
Improvements measured: 6144k FFT candidate 6.8ms -> 5.5ms 4000k FFT candidate 3.8ms -> 3.2ms I also updated to latest drivers. Full speed ram now works (so far). -- Craig |
[QUOTE=owftheevil;351683]A new version with some memory optimizations is up at sourceforge. There is also a new ini file option to control the amount of memory the program will not try to use. A line such as
[CODE]UnusedMem 100[/CODE]in the ini file will leave at least 100 MB of vram free. More will probably be left free due to inaccurate reporting of free memory by CUDA and optimizations of the number of relative primes processed in a pass. Here's a windows build with CUDA 5.0.[/QUOTE] Is this build considered production-ready? I have run it without incident on some of the test cases in the readme. |
Has PrimeNet been updated to handle CUDAPm1 results files?
|
Yes, the manual results form has supported it for a while now.
|
[QUOTE]Is this build considered production-ready? I have run it without incident on some of the test cases in the readme. [/QUOTE]
As far as I can tell, the current version doesn't have any bugs that would keep it from finding factors. |
[QUOTE=owftheevil;352414]As far as I can tell, the current version doesn't have any bugs that would keep it from finding factors.[/QUOTE]
Cool! Thanks for everybody's work on this! One more question: is "SaveAllCheckpoints" still disabled? I did a trial run of a live assignment, hoping to compare checkpoints with P95 on the same exponent, but no checkpoints were saved. |
Hello!
I'm doing this test first time and I have a trouble. After completing a main test with a 800k+ iterations, program (cudapm1_win64_20130902) stopped for a while at 'starting stage 1' and then the window disappeared. There is nothing in the 'results'. There is still 'Pfactor=N/A,1,2,61262347,-1,73,2' in 'worktodo'. I tried to run test again and get a screenshot. That it is: [URL="http://radikal.ru/fp/c58ac17661a3439a8740472689016ec5"][img]http://s001.radikal.ru/i196/1309/02/4fcaedfc97ac.jpg[/img][/URL] There is [B]still[/B] nothing in the 'results'. There is [B]still[/B] 'Pfactor=N/A,1,2,61262347,-1,73,2' in 'worktodo'. And I have 'c61262347s1' and 't61262347s1' files in 'D:\CUDA_P-1' each 7479KB. System: Win7 x64, 6Gb RAM, Nvidia GeForce GTX 780 with 980Mhz Core and 3Gb Memory. CUDA 5.5 is also installed. What I did wrong? Or it is a program bug? What should I do to fix it? UPD: There was wrong exponent. I'll try again with right exp. (62980369) and then reply. |
[QUOTE]One more question: is "SaveAllCheckpoints" still disabled? I did a trial run of a live assignment, hoping to compare checkpoints with P95 on the same exponent, but no checkpoints were saved.[/QUOTE]
Yes its still disabled. In any event, I woud be surprised if anything matched between the checkpoints. [QUOTE]Hello! I'm doing this test first time and I have a trouble. After completing a main test with a 800k+ iterations, program (cudapm1_win64_20130902) stopped for a while at 'starting stage 1' and then the window disappeared. There is nothing in the 'results'. There is still 'Pfactor=N/A,1,2,61262347,-1,73,2' in 'worktodo'. I tried to run test again and get a screenshot. That it is: [URL]http://s001.radikal.ru/i196/1309/02/4fcaedfc97ac.jpg[/URL] There is [B]still[/B] nothing in the 'results'. There is [B]still[/B] 'Pfactor=N/A,1,2,61262347,-1,73,2' in 'worktodo'. And I have 'c61262347s1' and 't61262347s1' files in 'D:\CUDA_P-1' each 7479KB. System: Win7 x64, 6Gb RAM, Nvidia GeForce GTX 780 with 980Mhz Core and 3Gb Memory. CUDA 5.5 is also installed. What I did wrong? Or it is a program bug? What should I do to fix it? UPD: There was wrong exponent. I'll try again with right exp. (62980369) and then reply. [/QUOTE] The version of the program I posted was compiled with cuda toolkit 5.0. That might be casuing the problem. I'll try to get a 5.5 version up soon. What iteration times are you getting with the 780? |
[QUOTE]Yes its still disabled. In any event, I woud be surprised if anything matched between the checkpoints.
[/QUOTE] OK. Thanks very much. I am "clearly" unclear on the details. :smile: I am glad of information which keeps me from pursuing spurious correlations. |
[QUOTE=owftheevil;352439]The version of the program I posted was compiled with cuda toolkit 5.0. That might be casuing the problem. I'll try to get a 5.5 version up soon. What iteration times are you getting with the 780?[/QUOTE]
Last night I successfully completed this test [U]with default settings in ini-file[/U] and with a correct exponent, [B]BUT[/B] there were 5 or 6 times drops of the program [maybe because of overheating, but I'm not sure (t=80[SUP]o[/SUP]C)] Also I think that I know what a problem was yesterday. I had Prime95 running at all 4 cores of my CPU. So 'stage 1 gcd' had not enough CPU-time to get. When I started the second test with a correct exponent, I switched off 2 cores in Prime95. But, as I already said, there were still some drops of the program (caused maybe either overheating or my CUDA 5.5 instead of 5.0). And one more thing that could be useful. 'Drops of the program' at the 2nd test were only at the 1st part of test, where were many iterations. |
More error 30s
I have been experimenting with CUDAPM1, on a GTX 570. I had it throttled back from the factory OC of 845 core, 1900 VRAM, to 830 core, 1700 VRAM. At least twice I have gotten this error-
[QUOTE]C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(1131) : cudaSafeCall() Runtime API error 30: unknown error. [/QUOTE] I then decided to go back to CuLu, since I have never found the stable speed point for this card. Overnight it quit. Unfortunately, it was running from a batch, so the prompt window closed with the program. I restarted manually at the same clocks, and after a while got this error- [CODE]CUDALucas.cu(693) : cudaSafeCall() Runtime API error 30: unknown error. [/CODE] I have now turned the card down to 810 MHz core, 1600 MHz VRAM. I have restarted CuLu to see if the error still happens. Any thoughts or suggestions? I have searched out and read parts of threads which discuss 'error 30 unknown', but I could not be sure if there has been a conclusion as to the cause or a remedy. |
I haven't run the diagnositcs in Windows, but the related error in linux is caused by the driver stepping on cufft's toes. Nivida is aware of this, several other programs have been seeing similar errors. What they have in commom is double precision ffts repeated a large number of times. Hopefully there will eventually (soon?) be a fix from Nvidia.
To workaround the problem, I've been running CUDALucas and CUDAPm1 from a shell script that loops on a non-zero exit value. |
[QUOTE=owftheevil;352526]I haven't run the diagnositcs in Windows, but the related error in linux is caused by the driver stepping on cufft's toes. Nivida is aware of this, several other programs have been seeing similar errors. What they have in commom is double precision ffts repeated a large number of times. Hopefully there will eventually (soon?) be a fix from Nvidia.
[U]To workaround the problem, I've been running CUDALucas and CUDAPm1 from a shell script that loops on a non-zero exit value.[/U][/QUOTE] OK. Thanks. I remember that approach, now that you mention it. I have currently rolled back the Windows graphics driver to 314.22. I wonder if one of the earlier versions I have would have better odds of stability. |
1 Attachment(s)
You have to go back to < 300 drivers to avoid this problem.
I just spent a couple hours messing around in Windows. The error (Unknown Error 30) or whatever it was, showed up under the same circumstances the timeout errors show up in Linux. This is a version of Cudapm1 for cuda toolkit 5.5. It has a cufftbench option which evokes the error somewhat frequently. Run cufftbench with eg: [CODE]CUDAPm1.exe -cufftbench 2 8192 5 [/CODE] The first argument is the starting fft length, the second is the end length, and the 5 is the number of passes it will make. I've never made it through 20 passes without the error occuring. |
[QUOTE=owftheevil;352538]You have to go back to < 300 drivers to avoid this problem.[/QUOTE]
Rats. I have versions back to the 280's, but I don't think current mfaktc will run on those. As I remember, the 290's had problems. I have 301.42 in right now. Guess I'll stick with it for the time being, since I just did the clean-and-reinstall routine to put it there. [QUOTE]I just spent a couple hours messing around in Windows. The error (Unknown Error 30) or whatever it was, showed up under the same circumstances the timeout errors show up in Linux. This is a version of Cudapm1 for cuda toolkit 5.5. It has a cufftbench option which evokes the error somewhat frequently. Run cufftbench with eg: [CODE]CUDAPm1.exe -cufftbench 2 8192 5 [/CODE]The first argument is the starting fft length, the second is the end length, and the 5 is the number of passes it will make. I've never made it through 20 passes without the error occuring.[/QUOTE] It's good to know the cufftbench parameters. Thanks. |
1 Attachment(s)
The latest version of cudapm1 now up at sourceforge has optimizations for fft selection. To get it to work, you need to first run
[CODE]./CUDAPm1 -cufftbench n1 n2 p[/CODE] where n1 is the starting fft length (in KB), n2 is the end length, and p is the number of times it will repeat the test for each length. I usually run [CODE]./CUDAPm1 1 8196 1 [/CODE] It is important that n1 and n2 be powers of 2. It will run and give results otherwise, but the lengths in the output file are unlikely to all be optimal. What this does is generate a list of optimal fft lengths for you card, which will be used in any subsequent tests instead of the default lengths. If a particular fft length is going to be used often, it is a good idea to also run [CODE]./CUDAPm1 -cufftbench n n p [/CODE] where n is the fft length you will be using. This finds optimal thread values and can improve the iteration times by a few percent. Once this is set up, you shouldn't have to speccify any fft length in the command line or ini file unless you have a particular need to run with other fft lengths. As far as I know, there are no major problems with this version. I haven't yet looked into the occasional inability to write save files that Kladner reported, it sometimes writes some meaningless information to the screen, and excessive stage 2 round-off errors simply halt the program without error messages. Something not thouroughly tested yet is the selection of fft lengths for particular exponents. I have been slightly conservative in the selection mechanism, but there could be some inefficient fft lengths that I haven't looked at yet, which will cause a test to terminate with an excessive round-off error. Next feature I will work on is the ability to extend b1. Here is a Win64 version compiled with cuda tookit 5.5. |
[QUOTE=owftheevil;353933][CODE]./CUDAPm1 1 8196 1 [/CODE]It is important that n1 and n2 be powers of 2.[/QUOTE]8196 isn't a power of 2 :smile:
edit: also, your example doesn't include -cufftbench |
Thanks for pointing that out. My proof-reading skills are almost non-existant.
[CODE]./CUDAPm1 -cufftbench 1 8192 1[/CODE] is indeed what I usually run. |
Sorry if this should be obvious, but I am having trouble tracking down the 5.5 CUDA libraries. Have they been posted somewhere? I can't seem to get them via the Toolkit as I don't have Visual Studio installed.
|
I also couldn't find them, so I ended up downloading the 853MB toolkit, grabbing the relevant DLLs and posting them to my CUDApm1 mirror site:
[url]http://download.mersenne.ca/CUDAPm1/[/url] |
[QUOTE=James Heinrich;354097]I also couldn't find them, so I ended up downloading the 853MB toolkit, grabbing the relevant DLLs and posting them to my CUDApm1 mirror site:
[URL]http://download.mersenne.ca/CUDAPm1/[/URL][/QUOTE] Many thanks, James! Did you have to install the toolkit, or were you able to dig them out of the .cab files? I went looking in the VCRedist folder, but did not immediately turn anything up. |
The 853MB setup file [url=http://developer.download.nvidia.com/compute/cuda/5_5/rel/installers/cuda_5.5.20_winvista_win7_win8_general_64.exe]cuda_5.5.20_winvista_win7_win8_general_64.exe[/url] from [url]https://developer.nvidia.com/cuda-downloads[/url] is a self-extracting 7-zip file, you can open it in 7-zip or WinRAR (possibly other tools). Look in \CUDAToolkit\bin\ and you'll find the DLLs.
|
[QUOTE=James Heinrich;354101]The 853MB setup file [URL="http://developer.download.nvidia.com/compute/cuda/5_5/rel/installers/cuda_5.5.20_winvista_win7_win8_general_64.exe"]cuda_5.5.20_winvista_win7_win8_general_64.exe[/URL] from [URL]https://developer.nvidia.com/cuda-downloads[/URL] is a self-extracting 7-zip file, you can open it in 7-zip or WinRAR (possibly other tools). Look in \CUDAToolkit\bin\ and you'll find the DLLs.[/QUOTE]
I had the right approach (with 7-zip). I just didn't do enough digging. Thanks for the files AND for the 'treasure map'! \bin\ marks the spot! |
With much thankfulness to owftheevil and James, I now have the latest 5.5 version of CUDAPm1 running a first test on a 580 @ 830 MHz. It appears that the total time will be ~2 hours.
I ran my first couple of assignments on the last 5.0 version a day or two ago. I was startled and suspicious when both found factors: one in Stage 1 and one in Stage 2. It proved needless, but I fed them back through P95 just to be sure. Two P-1 factors in succession is a fairly unusual event in my experience. (Yes, I know that statistics have no memory, but the synchronicity did raise an eyebrow or two. :surprised) |
[QUOTE=kladner;354121]With much thankfulness to owftheevil and James, I now have the latest 5.5 version of CUDAPm1 running a first test on a 580 @ 830 MHz. It appears that the total time will be ~2 hours.[/QUOTE]
For those who are using Carl's amazing program, may I please ask that they TF at least one candidate to 74 for every one they P-1? We are comfortably staying ahead of (or, at least, on) the LL wave, but we've now suddenly got so much P-1 firepower that some candidates are being assigned for P-1ing TFed to "only" 73, rather than the better 74. |
[QUOTE=chalsall;354129]For those who are using Carl's amazing program, may I please ask that they TF at least one candidate to 74 for every one they P-1?
We are comfortably staying ahead of (or, at least, on) the LL wave, but we've now suddenly got so much P-1 firepower that some candidates are being assigned for P-1ing TFed to "only" 73, rather than the better 74.[/QUOTE] Aye aye, Skipper! I'll mostly be doing TF to 74 anyway, with the odd excursion into Uncwilly territory. I mainly wanted to catch up with developments (indeed amazing!) on the CUDAPm1 front, so I could be sure of how it's done. EDIT: Once I get the 570 squared away with the proper -cufftbench runs I'll revert to my usual pursuits. EDIT2: [QUOTE]Stage 2 complete, 1503454 transforms, estimated total time = 1:07:28 Starting stage 2 gcd. M63522133 Stage 2 found no factor (P-1, B1=585000, B2=13016250, e=2, n=3584K CUDAPm1 v0.20)[/QUOTE] Overall time estimated (plus gcd, I assume) 2:18:41. |
Chalsall, would it work if we get p-1 assignments, notice they are insufficienlty trial factored, do and submit trial facor results, then proceed with the p-1 where no factors are found in trial facotring?
|
[QUOTE=owftheevil;354155]Chalsall, would it work if we get p-1 assignments, notice they are insufficienlty trial factored, do and submit trial facor results, then proceed with the p-1 where no factors are found in trial facotring?[/QUOTE]
Carl... If your program could do that, that would be *really* cool!!! :smile: We are mostly keeping ahead of even the P-1 wave front at the moment, but there are times we release at "only" 73 for P-1. BTW, I still owe you a *very* nice dinner here in Barbados (and, of course, my non-existent first born). Please let me know when you might be available... :smile: |
That's not quite what I was suggesting. That ability would be more appropriate for a misfit type program.
My question comes down to this: According to GPUto72, would I still own the assignment after turning in a trial factoring result, or would continuing with a p-1 test risk stepping on someones toes? I'll give you advance notice if I ever plan to be in Barbados. But rather than the gap, maybe you could suggest a nice local place. My wife and I don't go much for nightlife and fancy dining. |
[QUOTE=owftheevil;354191]My question comes down to this: According to GPUto72, would I still own the assignment after turning in a trial factoring result, or would continuing with a p-1 test risk stepping on someones toes?[/QUOTE]
Sorry for the latency on this -- missed the reply. To answer your question, what I had envisioned was something along the lines is if a candidate was assigned for P-1, "Spidy" would also watch for any additional TF work done on it. If so, it would check to see who had done the additional TFing, and credit the account for that while continuing to watch for the P-1 completion. As in, to be explicit, yes, the person assigned the candidate as P-1 would "keep it" until the P-1 work was completed, even if additional TFing was done. [QUOTE=owftheevil;354191]I'll give you advance notice if I ever plan to be in Barbados. But rather than the gap, maybe you could suggest a nice local place. My wife and I don't go much for nightlife and fancy dining.[/QUOTE] Absolutely. There are some very nice boutique hotels and B&B's around. And I agree with you -- "The Gap" is a bit "party central" and I don't recommend anyone stay in any of the hotels there because of the (late night) noise. But there are some very nice (not always the same as "fancy") restaurants there. |
[quote]No GeForce GTX 670 threads.txt file found. Using default thread sizes.
For optimal thread selection, please run ./CUDALucas -cufftbench 17496 17496 r for some small r, 0 < r < 6 e.g. Using threads: norm1 256, mult 256, norm2 128.[/quote]Should the text there be changed from "CUDALucas" to "CUDAPm1"? |
| All times are UTC. The time now is 23:18. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.