![]() |
[QUOTE=James Heinrich;260748]I understand the issues you're trying to balance, but I don't particularly want any program truncating my results file. Or at least give me the option of maxlogfilesize=0 to never-truncate, please.[/QUOTE]
Of course...this is why we are discussing the issues involved and how best to resolve them...I mean, it isn't exactly difficult to call up the text editor and do the job yourself....and I'm expecting anyone that *could* run out of space for log files due to the high productivity of their card to have around 10s of Gigs of free space, since it's getting hard to find disks under 120Gig these days. P.S. Taking disks apart for my employer yesterday was remarkably easy, just a couple of Torx drivers, and, since I was destroying them, bending them a bit to get them off the hubs. Then hit the disk surfaces with the bead blaster and bent in an arbor press. The crazy one was the IBM, which split down the middle lengthwise, rather than lifting off the cover on one flat side. |
"Black Screen"
1 Attachment(s)
Sometimes, when stopping mfaktc by CTRL-C, my 270.61 display drivers fails. After some seconds of "black" I get the attached error message (not very informative). I've noticed this in 0.16 and 0.17. I start mfaktc via a batch file: [CODE]start "mfaktc-win-64.exe_autostart" /D"C:\Program Files\mfaktc\mfaktc_0.17_autostart" /LOW "C:\Program Files\mfaktc\mfaktc_0.17_autostart\mfaktc-win-64.exe"
[/CODE]Is it my/Nvidia's fault or is there a "stream.close()" missing? P.S.: The dog is called "Bonny". :wink: |
I don't recommend the 270.61 WHQL driver for crunching. On my box It did not restore the clocks after pausing and resuming any CUDA application (when the GPU clock for performance level 1 wasn't exactly set to the stock value). The GPU ended up in performance level 2 (405 MHz/810 MHz) and a reboot was necessary. While searching for a solution for this problem I found various problem reports describing this behaviour. I did a complete uninstall of the driver and installed 266.58 WHQL.
|
Cool - it's not me. Yes - I've had troubles with closing down mfaktc. I've had the screen go black and driver had to restart errors.
I've found closing down mfaktc via a remote session (RDP) was less likely to cause a problem. I'm guessing it's related to gui and cuda being run at same time on the same card. -- Craig |
How are TF results checked?
On one instance of mfaktc I've had 3x exponents in a row with a factor: (column 1 is a date-time stamp in AEST+10) 20110507-212631 M380006269 has a factor: 7137711802080292028081 20110507-212631 found 1 factor(s) for M380006269 from 2^65 to 2^74 (partially tested) [mfaktc 0.16-Win barrett79_mul32] 20110507-224219 M380006657 has a factor: 226708240715562532609 20110507-224219 found 1 factor(s) for M380006657 from 2^65 to 2^74 (partially tested) [mfaktc 0.17-Win barrett79_mul32] 20110508-015024 M380006191 has a factor: 99680981725148840593 20110508-015025 found 1 factor(s) for M380006191 from 2^65 to 2^74 (partially tested) [mfaktc 0.17-Win barrett79_mul32] I've seen 2x factored in close proximity before, but 3 in a row has me a little suspect. -- Craig |
[QUOTE=nucleon;260792]How are TF results checked?[/quote]PrimeNet verifies that the reported factor is correct.
[quote]I've seen 2x factored in close proximity before, but 3 in a row has me a little suspect.[/QUOTE] Chances are that happens and usually doesn't get noticed. I would not worry |
[QUOTE=nucleon;260792]found 1 factor(s) for M380xxxxxx from 2^65 to 2^74
I've seen 2x factored in close proximity before, but 3 in a row has me a little suspect.[/QUOTE]Note also that you're testing a wide bitrange. 3 factors in a row would be unusual, but certainly not impossible, testing a single bit depth for each exponent (e.g. 2^65 to 2^66). Your 3 factors are 72.6, 67.6 and 66.4 bits respectively. TF from 2^65-74 on M380M range has almost exactly a [url=http://mersenne-aries.sili.net/credit.php?worktype=TF&exponent=380006269&f_exponent=&b1=&b2=&numcurves=&factor=&frombits=65&tobits=74&submitbutton=Calculate]13% chance[/url] of finding a factor somewhere in that range. So finding 3-in-a-row should be a roughly 1:455 occurrence. |
2 Attachment(s)
FWIW, V17 data, probably skewed and worthless.
:max: |
[QUOTE=Xyzzy;260796]FWIW, V17 data, probably skewed and worthless.
:max:[/QUOTE] Cheers. Why go back to windows? I thought you went linux. -- Craig |
[QUOTE=Uncwilly;260793]PrimeNet verifies that the reported factor is correct. Chances are that happens and usually doesn't get noticed. I would not worry[/QUOTE]
[QUOTE=James Heinrich;260795]Note also that you're testing a wide bitrange. 3 factors in a row would be unusual, but certainly not impossible, testing a single bit depth for each exponent (e.g. 2^65 to 2^66). Your 3 factors are 72.6, 67.6 and 66.4 bits respectively. TF from 2^65-74 on M380M range has almost exactly a [url=http://mersenne-aries.sili.net/credit.php?worktype=TF&exponent=380006269&f_exponent=&b1=&b2=&numcurves=&factor=&frombits=65&tobits=74&submitbutton=Calculate]13% chance[/url] of finding a factor somewhere in that range. So finding 3-in-a-row should be a roughly 1:455 occurrence.[/QUOTE] Thanks guys. -- Craig |
[QUOTE]Why go back to windows? I thought you went linux.[/QUOTE]We are on a 30 day trial, which we suppose we could repeat every 30 days.
We want to get Linux up and running but we have been busy and we have trouble with the thought of going off-line while we could be doing so much work! :smile: Unrelated question: We have been factoring in the 1e6 to 2e6 range. Is this suboptimal for mfaktc efficiency? Something like this was alluded to in the README. Yes, we actually read it! What interesting factoring projects are there, other than factoring 1e9 digit work? |
[QUOTE=Xyzzy;260800]What interesting factoring projects are there, other than factoring 1e9 digit work?[/QUOTE]
You could start taking 30M-40M to 2^69. It's [I]only[/I] 380THzd or some such. :big grin: |
[QUOTE=Xyzzy;260800]We are on a 30 day trial, which we suppose we could repeat every 30 days.
We want to get Linux up and running but we have been busy and we have trouble with the thought of going off-line while we could be doing so much work! :smile: Unrelated question: We have been factoring in the 1e6 to 2e6 range. Is this suboptimal for mfaktc efficiency? Something like this was alluded to in the README. Yes, we actually read it! What interesting factoring projects are there, other than factoring 1e9 digit work?[/QUOTE] I noticed your load was about 40M candidates per instance. Maybe you may set up mfaktc to work on more than one bit-level at a time: as TheJudger often say, mfaktc works better (and the CPU use would be lower) if you work on (say) 500M candidates. As for your next question, yesterday I started factoring with mfaktc in the 53M-60M range up to 72 bits, to try and avoid the p-1 factoring stage. With my GTX275 at half load it only takes 6-7 hours to complete the 3-bit range. Luigi |
[QUOTE=ET_;260812]With my GTX275 at half load it only takes 6-7 hours to complete the 3-bit range.[/QUOTE]
Only:smile: GPUs deserve better speeds |
[QUOTE=Xyzzy;260800]Unrelated question: We have been factoring in the 1e6 to 2e6 range. Is this suboptimal for mfaktc efficiency? Something like this was alluded to in the README. Yes, we actually read it![/QUOTE]
Bigger exponents have a lower CPU usage compared to smaller exponents. The sieve (CPU part) does not depend on the size of the exponent while the verfication of each factor candidate (GPU part) depends on the size of the exponent. [QUOTE=ET_;260812]I noticed your load was about 40M candidates per instance. Maybe you may set up mfaktc to work on more than one bit-level at a time: as TheJudger often say, mfaktc works better (and the CPU use would be lower) if you work on (say) 500M candidates.[/QUOTE] If you replace "instance" with "per class" this is correct. The number of tested factor candidates is the 2nd column in the per class output. Efficiency of mfaktc increases with a higher number of candidates per class. 40M candidates per class is not much for mfaktc but it is not really worth. You might gain perhaps 5% more throughput for (much) bigger cases. Xyzzy: if you're running exponents and bit ranges of similar size you might try to manually set SievePrimes to an "optimal" value. Oliver |
Hello George,
[QUOTE=Prime95;260646]If you put any code under GPL v3 it will never be incorporated into prime95.[/QUOTE] ignoring the license stuff: any idea how likely you'll add (parts of) mfaktc to your prime95/mprime? Currently I'm the only copyright holder. I think I can re-release (parts of) mfaktc under other licenses. [QUOTE=Prime95;260649]It is open source with 2 exceptions: 1) The security code that makes it a tiny bit harder to forge results is not public. 2) You cannot use the code to Mersenne primes unless you agree to abide by the GIMPS prize rules. The first restriction will preclude you from releasing under GPL.[/QUOTE] I understand that the GPL is not the ideal license for your code. For mfaktc the GPL was the most obvious license. GPL is widely known. I don't like licenses which allow to distribute modifications without releasing the sourcecode. [QUOTE=Christenson;260655]Oliver and George, please work out the correct legal basis. Upgraded specification attached.[/QUOTE] Are you willing to re-release the needed functions from primenet.[ch] under GPL v3 for the integration in mfaktc? Oliver |
[QUOTE=Karl M Johnson;260813]Only:smile:
GPUs deserve better speeds[/QUOTE] Yes, but I have 3 cores of mprime, 2 cores of LLR and one core of gmp-fermat alresdy working together on my 4-cores machine... :smile: Luigi |
[QUOTE=Prime95;260649]It is open source with 2 exceptions:
1) The security code that makes it a tiny bit harder to forge results is not public. 2) You cannot use the code to Mersenne primes unless you agree to abide by the GIMPS prize rules. The first restriction will preclude you from releasing under GPL.[/QUOTE] George, Oliver and Christenson et al... Just putting out an idea which might eliminate the need for the "security code" to be released to the public, and thus allow the code in mfaktc which communicates with PrimeNet to be GPLed... As the value generated for the "Wd1:XXXXXXXX" or "Wd2:XXXXXXXX" fields in the response message is rather predictable for TFing (I won't go into further details, but those who do a lot of TFing will likely know what I'm talking about...), might this be an opportunity to change this value to be generated by a publicly known, but impossible to fake, (and inexpensive) process? I'm thinking (without having examined Oliver's code) of something like a cyclic sum of all the factor candidates tested between the "bit levels". The server wouldn't be able to tell if this was faked or not, but it would allow random "double checks" to spot cheaters. George and Oliver -- do you consider this suggestion to have any value? |
This is the sort of vision Christenson was looking for. Only problem I see with it is that it will vary with "SievePrimes", which controls the quality of the process to eliminate non-primes from consideration by the GPU, so the values of that and any changes will need to be reported. I don't see this as hard to generate from the mfaktc side, but getting mfaktc to duplicate a sieveprimes pattern is beyond the project scope at the moment. [Of course, this also ignores that the function is *ALWAYS* hard to compute, so we could release it to the public anyway]
|
[QUOTE=chalsall;260837]
Just putting out an idea which might eliminate the need for the "security code" to be released to the public, and thus allow the code in mfaktc which communicates with PrimeNet to be GPLed...[/QUOTE] I don't want to devote much in the way of resources to solve a problem that doesn't exist at the moment. The "security code" you refer to in results.txt is more of a checksum than a security code. It is only used in the manual results form. There is a different security mechanism associated with http traffic. I'm sure we can come up with something for licensing. @TheJudger: If I do incorporate parts of mfaktc in a future release (and I'm not opposed) then I'm sure we can work out the license issues. @Christenson: You can use primenet.c/h in GPL'ed code but without the security code. The downside to this is minimal (or can be made minimal). The server can be tweaked to accept non-security requests from mfaktc. The downside is that if someone builds a malicious mfaktc-clone, the server has no way to differentiate between the two - my only option would be to ignore requests from both the good mfaktc and the malicious one. If you want to build a mfaktc with our minimal security-through-secrecy scheme, then GPL won't work. |
George, would you be willing to release the checksum-generating code to trusted developers, provided that they don't make it public?
|
#$@!#%%^! "The token has expired, reload the window".
Gentlemen, I am half pragmatist and half idealist. I expect to crib public parts of prime95 code into mfaktc, especially, but not limited to, primenet.c/h in the first round. This will have exactly the same security problems as the current, manually submitted data from mfaktc. I'll turn the result over to the Oliver and George and let them figure out what the license is. A future round can include security, which we can ideally make public by having mfaktc calculate something which is relatively easy if the actual trial factoring is done, but otherwise reasonably painful. I think the step is worthwhile, in that otherwise detecting cheating will be expensive. Remember, odds are, M(Xyzzy) has no factor from 2^69 to 2^72 for all reasonable Xyzzy. |
Hello George,
I think I got your idea approach. :smile: I've thought about a opensource (GPL) version of mfaktc which can interact with the primenet server using some functions from mprime (primenet.[ch]) without the security module. Your approach is having a non-GPL version of mfaktc (binary distribution only) with the security code in it, right? If this is the case from my point of view:[LIST][*]I'll continue to develop mfaktc without primenet interaction under GPL[*]we can workout a binary distribution with the security code which is not GPL[/LIST] As I'm the only copyright holder of mfaktc I think I can easily release it with other licenses, too. Oliver |
If I can jump in on the mfaktc v prime95 discussion...
I have a feeling you guys are making it harder than it needs to be and at the same time developing a low-scalability solution. I think there doesn't need to be an 'all-encompassing license'. How about another idea... Prime95 has an 'add-ons' menu. Where prime95 launches additional executables, but fills out the worktodo.txt file, and submits results from the result.txt file. So for example, one clicks Advanced -> add-ons, this opens a screen with a list of current add-ons, with add & edit buttons. When you add an add-on, you specify type of add-on (TF, LL, ECM, P-1 etc...) path where everything is, then exe (& exe command line options), worktodo, and results filenames. One could also specify affinity and priority options. Prime95 then once/day checks the results file and worktodo files and processes them accordingly. This might need a little fleshing out, but this allows GPU code to be maintained and distributed completely separately. Also allows other modules to be developed completely independently (maybe a future OpenCL exe? or even wishful thinking - P-1 CUDA code). The only requirement is that the add-ons conform to existing results/worktodo formats. This would also free up Mr Prime95's time. This would only need to be done once. Instead of every time someone thinks their code is worthwhile to add. License and submission details can be completely isolated from public. I think this is more scalable solution. -- Craig |
Craig's suggestion has merit, but, right now, I am simply going to get my feet wet with adding P95 code to mfaktc. Long-term, I am still concerned with coming up with some kind of residue that makes calculating an undetectable cheat a little harder than actually doing the work.
|
[QUOTE=TheJudger;260890]
I think I got your idea approach. :smile: I've thought about a opensource (GPL) version of mfaktc which can interact with the primenet server using some functions from mprime (primenet.[ch]) without the security module. Your approach is having a non-GPL version of mfaktc (binary distribution only) with the security code in it, right?[/QUOTE] You understand perfectly! |
Looks like I messed up ... I uploaded some results, but apparently my session had timed out, so the credit went to Anonymous. In looking at the exponents assigned to me, they are all still there. Is there a way to move the credits or do I just need to unreserve the exponents so they are available for P-1?
Thanks ... |
[QUOTE=drh;260938]Looks like I messed up ... I uploaded some results, but apparently my session had timed out, so the credit went to Anonymous. In looking at the exponents assigned to me, they are all still there. Is there a way to move the credits or do I just need to unreserve the exponents so they are available for P-1?
Thanks ...[/QUOTE]I did the same. Just log in and then resubmit your results. You will get messages to the effect that the results are not needed but you will get the credit. Paul |
[QUOTE=xilman;260939]I did the same.
Just log in and then resubmit your results. You will get messages to the effect that the results are not needed but you will get the credit. Paul[/QUOTE] Thanks, will do this evening. Doug |
[QUOTE=drh;260940]Thanks, will do this evening.
Doug[/QUOTE] Well, it partially worked. The 4 exponents in question only had 1 of four mfaktc TF's accepted. Maybe it was because I had manually changed the bit assignments from 69-70 to 69-73 and ran them individually? The exponents have been released, and without an assignment code, they will not be accepted, unless there is a way around that ... down to 42 lost GHZ-days. Doug |
How about trying to reserve the same ones again by limiting the trial factoring range to what you are looking for?
|
[QUOTE=TheJudger;260709]Hello,
here is mfaktc 0.17! :smile:[LIST][*]Users with "fast CPU + slow GPU" can try to enable AllowSleep in mfaktc.ini. This allows the CPU to sleep instead of running a busyloop if the CPU has to wait for the GPU[*]replaced compiletime option "THREADS_PER_GRID_MAX" with the runtime option "GridSize" (mfaktc.ini)[LIST][*]maximum "threads per grid" is now 2[SUP]20[/SUP] (was 2[SUP]21[/SUP]), default value is the same as before[*]maximum SievePrimes increased from 100.000 to 200.000[*]lower GridSize might increase the usability of your GUI while running mfaktc at the cost of slightly lower performance[/LIST] [*]no change in performance compared to 0.16p1[/LIST][/QUOTE] Thanks Oliver, works like a charm! The increased SievePrimes adds about 5% performance for me as ~5% less FCs are to be evaluated. And CPU load is now just 30-50% of one core. I like it! B. PS: I now have some rudimentary OpenCL version using some of your code running on my ATI-GPU. It already found a few factors (proof of concept OK) but I just can't get it to utilize the GPU above ~10% ... work in progress. |
[QUOTE=vsuite;260974]How about trying to reserve the same ones again by limiting the trial factoring range to what you are looking for?[/QUOTE]
Tried that as well, but PrimeNet responded saying that those exponents were not available ... probably since I had taken them up to 73 bits, 2 more than PrimeNet was expecting. Oh well, lesson learned. |
In discussion wih Prime95, GW told me that although mfaktc splits results to single bit levels, the server will take a result with multiple bit levels. I haven't encountered the need to do that, but I've also been reporting mfaktc results on relatively fresh logins.
|
Tonight while I was uploading some more results, I got a timeout. Eventually, almost everything got uploaded correctly, except this one exponent ...
no factor for M76120789 from 2^69 to 2^70 [mfaktc 0.16p1-Win barrett79_mul32] no factor for M76120789 from 2^70 to 2^71 [mfaktc 0.16p1-Win barrett79_mul32] no factor for M76120789 from 2^71 to 2^72 [mfaktc 0.16p1-Win barrett79_mul32] M76120789 has a factor: 5666251615825315865351 found 1 factor(s) for M76120789 from 2^72 to 2^73 [mfaktc 0.16p1-Win barrett79_mul32] I got credit for everything except the factor ... Guess I can reserve it again and run the bit level again? Update: can't reserve that exponent ... no assignment available. Thanks, Doug |
[QUOTE=drh;261372]
M76120789 has a factor: 5666251615825315865351 found 1 factor(s) for M76120789 from 2^72 to 2^73 [mfaktc 0.16p1-Win barrett79_mul32] I got credit for everything except the factor ... Guess I can reserve it again and run the bit level again? Update: can't reserve that exponent ... no assignment available. Thanks, Doug[/QUOTE] False positive maybe? You didn't get credit as it's a false result? You can try to submit those 2 lines again manually. I have a 'free spot' on my GPU in 20mins, I'll run the test and see what result I get. I'll post result here. I won't attempt to get credit. -- Craig |
[QUOTE=nucleon;261378]False positive maybe?[/QUOTE]
[FONT=monospace] [/FONT]5666251615825315865351 = 2×5^2×7×11×13×83×17918861×76120789+1. That was easy. [QUOTE=drh;261372]I got credit for everything except the factor ... Guess I can reserve it again and run the bit level again? Update: can't reserve that exponent ... no assignment available. [/QUOTE] You will get credit for the manual submission of any [I]new [/I]factors even if the exponent in question is not assigned to you. You will also receive no-factor credit if the exponent is assigned to you for something [I]other than TF[/I]. |
Okay I got same answer
M76120789 has a factor: 5666251615825315865351 found 1 factor(s) for M76120789 from 2^72 to 2^73 (partially tested) [mfaktc 0.16-Win barrett79_mul32] -- Craig |
[QUOTE=drh;261372]Tonight while I was uploading some more results, I got a timeout. Eventually, almost everything got uploaded correctly, except this one exponent ...
no factor for M76120789 from 2^69 to 2^70 [mfaktc 0.16p1-Win barrett79_mul32] no factor for M76120789 from 2^70 to 2^71 [mfaktc 0.16p1-Win barrett79_mul32] no factor for M76120789 from 2^71 to 2^72 [mfaktc 0.16p1-Win barrett79_mul32] M76120789 has a factor: 5666251615825315865351 found 1 factor(s) for M76120789 from 2^72 to 2^73 [mfaktc 0.16p1-Win barrett79_mul32] I got credit for everything except the factor ... Guess I can reserve it again and run the bit level again? Update: can't reserve that exponent ... no assignment available. Thanks, Doug[/QUOTE] Have you tried to just resubmit the result -- just the factor line? It is a legitimate result. No need to redo the range. |
[QUOTE=drh;261372]Tonight while I was uploading some more results, I got a timeout. Eventually, almost everything got uploaded correctly, except this one exponent ...
no factor for M76120789 from 2^69 to 2^70 [mfaktc 0.16p1-Win barrett79_mul32] no factor for M76120789 from 2^70 to 2^71 [mfaktc 0.16p1-Win barrett79_mul32] no factor for M76120789 from 2^71 to 2^72 [mfaktc 0.16p1-Win barrett79_mul32] M76120789 has a factor: 5666251615825315865351 found 1 factor(s) for M76120789 from 2^72 to 2^73 [mfaktc 0.16p1-Win barrett79_mul32] I got credit for everything except the factor ... Guess I can reserve it again and run the bit level again? Update: can't reserve that exponent ... no assignment available. Thanks, Doug[/QUOTE] Just try again to submit the factor. If you ran into a timeout during upload of results than the primenet server typically missed some of your results. [QUOTE=nucleon;261378]False positive maybe? [/QUOTE] Luigis (ET_) defactor (using libgmp) with slightly modifications agrees the factor. :smile: [CODE]o@Lysithea:~> ./defactor.exe 76120789 5666251615825315865351 The factor 5666251615825315865351 divides 2^76120789 -1! K = 37218818211575 The factor 5666251615825315865351 is probably prime [/CODE] Oliver |
I've tried quite a few times to upload it again, with no success.
When I upload the file, the screen just sits there until I get a partial screen returned, like it times out. Everything else seems to working fine, other than the database not being available off and on this morning. |
[QUOTE=drh;261399]I've tried quite a few times to upload it again, with no success.
When I upload the file, the screen just sits there until I get a partial screen returned, like it times out. Everything else seems to working fine, other than the database not being available off and on this morning.[/QUOTE] Don't upload the whole file again & again!! Just copy the single line and paste it into the manual upload form |
[QUOTE=axn;261401]Don't upload the whole file again & again!! Just copy the single line and paste it into the manual upload form[/QUOTE]
After the first upload, I deleted all the other lines, so I was left with only the two lines related to the factor ... M76120789 has a factor: 5666251615825315865351 found 1 factor(s) for M76120789 from 2^72 to 2^73 [mfaktc 0.16p1-Win barrett79_mul32] I'll try to cut and paste it into the form and see if that works. |
[QUOTE=drh;261403]After the first upload, I deleted all the other lines, so I was left with only the two lines related to the factor ...
M76120789 has a factor: 5666251615825315865351 found 1 factor(s) for M76120789 from 2^72 to 2^73 [mfaktc 0.16p1-Win barrett79_mul32] I'll try to cut and paste it into the form and see if that works.[/QUOTE] Still no luck using the form ... same result, partial screen returned. |
[QUOTE=drh;261404]Still no luck using the form ... same result, partial screen returned.[/QUOTE]
Hmmm... Apparently, the issue is at the server itself. I guess you'll have to wait till it is up & running again. See: [url]http://www.mersenneforum.org/showthread.php?t=15602[/url] |
[QUOTE=axn;261405]Hmmm... Apparently, the issue is at the server itself. I guess you'll have to wait till it is up & running again. See: [URL]http://www.mersenneforum.org/showthread.php?t=15602[/URL][/QUOTE]
I'll try it once in a while throughout the day. Thanks, Doug |
[url]http://mersenneforum.org/showpost.php?p=261409&postcount=3[/url]
|
[QUOTE=axn;261405]Hmmm... Apparently, the issue is at the server itself. I guess you'll have to wait till it is up & running again. See: [url]http://www.mersenneforum.org/showthread.php?t=15602[/url][/QUOTE]
Yep, there's definitely an issue with submission of factor results. No factor results work perfectly. I have now 2x factor results ready for submission when it gets back. I guess we'll have to wait until the server admins get a chance to look at it. -- Craig |
The server issue is now fixed...exact cause unspecified.....
|
Got to thinking about how to structure the mfaktc interaction with primenet tonight, looking at how my "results.txt" gets hand marked-up about what's been reported or not.
The conclusion is that the output side should be a queue -- basically, a machine-edited version of what is in results.txt (call it results.queue) that the primenet thread/task looks at to decide what to tell the server, and deletes the front of when it is done. This allows mfaktc to be stopped and started as needed. I'm sure Prime95 can tell me how big the largest exponent I can report on is, so anything done for Operation Billion Digits can be dropped and reported manually. In addition, when the primenet task succeeds in reporting a result, it writes a line about the success to results.txt, unless a key tells it not to. Finally, this is in addition to any sort of log that the primenet task might create detailing all of its travails. This has the huge advantage of de-coupling the problem of input from the problem of output. |
[QUOTE=Christenson;261664]Got to thinking about how to structure the mfaktc interaction with primenet tonight, looking at how my "results.txt" gets hand marked-up about what's been reported or not.
The conclusion is that the output side should be a queue -- basically, a machine-edited version of what is in results.txt (call it results.queue) that the primenet thread/task looks at to decide what to tell the server, and deletes the front of when it is done. This allows mfaktc to be stopped and started as needed. I'm sure Prime95 can tell me how big the largest exponent I can report on is, so anything done for Operation Billion Digits can be dropped and reported manually. In addition, when the primenet task succeeds in reporting a result, it writes a line about the success to results.txt, unless a key tells it not to. Finally, this is in addition to any sort of log that the primenet task might create detailing all of its travails. This has the huge advantage of de-coupling the problem of input from the problem of output.[/QUOTE] Prime95 uses a "prime.spl" (spool file) for the results. This file should (must) be secured against modifications. Human readable results.txt and machine readable spool file is a good idea. For mfaktc one idea would be a function which adds a result to the spool file with some additional data if the result was the last one (last bit level) or not for the specified exponent. If it is the last result it is save to sent the result. Oliver |
I was thinking more along the lines of using mfaktc's internal data for the purpose of determining whether an exponent was finished and its result ready to send. Results.spl is a reasonable name for it.
I also didn't see any signal handler in mfaktc when I was reading the source last night. I take it that means that when I press ctrl-c, we are depending on the OS to shut the program down and close any open checkpoint file? (This doesn't make me feel very safe). |
[QUOTE=Christenson;261687]I was thinking more along the lines of using mfaktc's internal data for the purpose of determining whether an exponent was finished and its result ready to send. Results.spl is a reasonable name for it.
I also didn't see any signal handler in mfaktc when I was reading the source last night. I take it that means that when I press ctrl-c, we are depending on the OS to shut the program down and close any open checkpoint file? (This doesn't make me feel very safe).[/QUOTE] worktodo.txt, results.txt, mfaktc.ini and checkpoint files are opened right before and closed after being accessed. So the chance for this is quiet low because most of the time there is non of the files above opened. Oliver |
[QUOTE=TheJudger;261689]worktodo.txt, results.txt, mfaktc.ini and checkpoint files are opened right before and closed after being accessed. So the chance for this is quiet low because most of the time there is non of the files above opened.
Oliver[/QUOTE] This is perfectly good practice, but when I have 30 or 40 hours work involved, I don't like nonzero chances where I don't have to take them! A signal catcher (for ^C) would simply set a variable so that a new class would not be started, and return to the main program to allow it to close files and exit properly. E |
Got to thinking about this further, as my Xubuntu PC crashed the other night. We want to do multiples of the checkpoint files, like P95 does, because the other way to lose a bunch of work is to have the computer as a whole crash, whether due to power loss or O/S misbehavior.
Found out the Robotics team was given a GEFORCE GT480 card, but we didn't get to checking power supplies and probably need a second monitor for it....that'll really do some damage! |
Hi,
[QUOTE=Christenson;261735]This is perfectly good practice, but when I have 30 or 40 hours work involved, I don't like nonzero chances where I don't have to take them! A signal catcher (for ^C) would simply set a variable so that a new class would not be started, and return to the main program to allow it to close files and exit properly. E[/QUOTE] what about[LIST][*]first time ^C received: stop after the current class has finished and checkpoint file is written[*]second time ^C received: stop immediately without taking care of closing files, etc.[/LIST] Should be really easy to implement. :smile: Oliver P.S. hopefully this is not OS dependent, I never did signal handling on Windows... |
[QUOTE=TheJudger;262012]hopefully this is not OS dependent, I never did signal handling on Windows...[/QUOTE]I'm not sure if Windows treats it differently if you hit Ctrl-C on the keyboard vs clicking the [X] to close the window?
In either case, on receiving the first break command, something should immediately appear onscreen to the effect of "break command received, stopping after next class (hit Ctrl-C again to exit immediately)". |
I know some CL applications which cant be closed by Ctrl+C - instead, they print ♥ and continue working:smile:
|
I intend to be lazy about this and copy whatever method P95 is using, which is that he catches a signal and sets a variable telling things they need to exit.
I think that it is an awful idea not to be able to tell a program, any program to stop. The inability to stop a program in an operating system indicates a problem with the operating system. I like Oliver's suggestion, except that I think all files should get closed on that second ^C. (Remember that it is going to be reasonably hard to hit ^C in the middle of writing a file anyway). If we really need an asynchronous, middle-of-writing-files kill, that is what the OS is for. |
[QUOTE=Christenson;262038]I think that it is an awful idea not to be able to tell a program, any program to stop. The inability to stop a program in an operating system indicates a problem with the operating system.[/QUOTE]On all Unix-like systems I know of, sending the KILL signal is instant death for the process receiving it. That's not quite the same as removing the process from the process table because zombie processes are not removed by a KILL signal; however they are already dead for all practical purposes.
Don't know about the situation under other operating systems. Paul |
Win32 has a limited range of signals you can install a handler for; even then, the handler runs [i]once[/i] only. Hitting Ctrl-C will run your SIGINT handler; hitting it again will kill your program, without running the handler.
|
[QUOTE=xilman;262060]On all Unix-like systems I know of, sending the KILL signal is instant death for the process receiving it. That's not quite the same as removing the process from the process table because zombie processes are not removed by a KILL signal; however they are already dead for all practical purposes.[/QUOTE]
Any *nix which follows the POSIX.1 standard (all modern *nix) have two signals which cannot be "caught" or "ignored": SIGKILL and SIGSTOP. SIGKILL stops execution and releases all resources(*) including closing any open files. SIGSTOP simply stops execution without releasing resources; it can be restarted in the future. (*) - As xilman said above, a "zombie" processes can be created by a SIGKILL, but only if it was a "forked" process and its parent process is still running and hasn't acknowledged the SIGCHILD signal generated indicating its death. In this case the only resource still consumed is the Process structure entry -- a few kbytes at most. (Trivia: since there are a finite number of Process structures allowed in a system at any one time (often 65535), this can be used as a denial of service attack known as a "Wabbit".) |
[QUOTE=Christenson;261735]This is perfectly good practice, but when I have 30 or 40 hours work involved, I don't like nonzero chances where I don't have to take them!
[/QUOTE] I played around with the current code on Windows. Stopped the process in a debugger in interesting places (e.g. between writing the checkpoint and closing the file), and then killed the process. As the write is atomic (a single line shorter than a disk block), I either had the file content of before the write, or the new contents. No corruption. (I know this is no proof, just a test aimed at breaking it.) Therefore I think that no special signal handling is necessary - as a precaution one or two backup copies of the checkpoint file are sufficient. prime95 may be a different story as the save files there can be much bigger, so that writes to the file are not atomic. And there I already saw a few times that the latest save file was not usable. B. |
[QUOTE=jasonp;262069]Win32 has a limited range of signals you can install a handler for; even then, the handler runs [I]once[/I] only. Hitting Ctrl-C will run your SIGINT handler; hitting it again will kill your program, without running the handler.[/QUOTE]
Fine for purposes of mfaktc, but what about those Windows programs that simply won't be shut down, especially as you TRY to shut down the system nicely? |
[QUOTE=Christenson;262115]Fine for purposes of mfaktc, but what about those Windows programs that simply won't be shut down, especially as you TRY to shut down the system nicely?[/QUOTE]
Oh, you mean the virus', worms and other malware? [QUOTE=Christenson;262115]The inability to stop a program in an operating system indicates a problem with the operating system.[/QUOTE] You said it, not me.... :smile: |
Hi Jason,
[QUOTE=jasonp;262069]Win32 has a limited range of signals you can install a handler for; even then, the handler runs [i]once[/i] only. Hitting Ctrl-C will run your SIGINT handler; hitting it again will kill your program, without running the handler.[/QUOTE] Is this specific to Win32? On my Windows 7 64bit box I can run it multiple times. The difference to Linux is that you have to register your signal handler again (BSD style). e.g. [CODE] void my_signal_handler(int signum) /* very simple signal handler */ { #ifdef WINDOWS signal(signum, &my_signal_handler); #endif ... } int main() { signal(SIGINT, &my_signal_handler); ... }[/CODE] Of course this is not perfect... Oliver |
Oliver:
Let's do a simple sample program, "sigtest", that sleeps and says either "don't touch me, I'm sleeping!" or "It's OK, I'm awake" on Ctrl-C, and then says "exiting" on the second Ctrl-C. All printing is from the main loop. I can test Linux64 and Win32. E. |
I've been playing on my new computer with mfakt for a week or 2, right now I'm at 311 candidates tested. Most of them from 69 to 70 bit, and a small batch from 68 to 69 bit. And I haven't found 1 factor yet, where on average I should have found something along 4 or 5.
I'm starting to think there might be some problem with my hardware: i5-2500k not overclocked with a GTX 560, only running one instance of mfakt. I'm testing exponent in the 54-55XXXXXX range, does anyone have a candidate in that range with a 69 bit factor, it would help me have better confidence in my hardware. Or maybe someone already tested to that bit-level in that range while only submitting factor found? Thanks in advance, |
[QUOTE=diamonddave;262534]I've been playing on my new computer with mfakt for a week or 2, right now I'm at 311 candidates tested. Most of them from 69 to 70 bit, and a small batch from 68 to 69 bit. And I haven't found 1 factor yet, where on average I should have found something along 4 or 5.
I'm starting to think there might be some problem with my hardware: i5-2500k not overclocked with a GTX 560, only running one instance of mfakt. I'm testing exponent in the 54-55XXXXXX range, does anyone have a candidate in that range with a 69 bit factor, it would help me have better confidence in my hardware. Or maybe someone already tested to that bit-level in that range while only submitting factor found? Thanks in advance,[/QUOTE] Could you post the candidate list you tried? Thanks, Vincent |
1 Attachment(s)
[QUOTE=diep;262536]Could you post the candidate list you tried?
[/QUOTE] Sure thing |
[QUOTE=diamonddave;262534]I've been playing on my new computer with mfakt for a week or 2, right now I'm at 311 candidates tested. Most of them from 69 to 70 bit, and a small batch from 68 to 69 bit. And I haven't found 1 factor yet, where on average I should have found something along 4 or 5.
I'm starting to think there might be some problem with my hardware: i5-2500k not overclocked with a GTX 560, only running one instance of mfakt. I'm testing exponent in the 54-55XXXXXX range, does anyone have a candidate in that range with a 69 bit factor, it would help me have better confidence in my hardware. [/QUOTE] First of all you could run the full selftest (mfaktc.exe -st). M54789601 has one known factor between 2[SUP]67[/SUP] and 2[SUP]68[/SUP] M54790157 has one known factor between 2[SUP]69[/SUP] and 2[SUP]70[/SUP] hint: [url]http://mersenne.org/report_exponent/[/url] Oliver |
It's worth noting that the ranges we are now getting for TF from primenet are now beyond optimal for CPU-only TF, so we can expect the AVERAGE GHz-days effort per factor found to match or exceed that of 2 LL tests, or about O(200) GHz days for exponents in the 50M range.
|
This morning we discovered that two of our eight workers had transferred all of their remaining worktodo.txt contents to their results.txt files and thus were idle.
:mike: |
[QUOTE=Xyzzy;263107]This morning we discovered that two of our eight workers had transferred all of their remaining worktodo.txt contents to their results.txt files and thus were idle.
:mike:[/QUOTE]Yeah, I get that all too often. The GPUs chew through work at such a frightening rate that even though I ask for 200 assignments at a time there's still a risk of running dry. :sad: My other system, the one with a C1060, was upgraded to Fedora 15 a few days ago, which killed the CUDA installation. Also :sad: Until it's fixed it will be reduced to running msieve. Paul |
We usually reserve 500 or more assignments per core.
What we mentioned above is that the client actually moved the worktodo.txt assignments directly to the results.txt file, without doing them. (The results.txt file looked like a worktodo.txt file.) We are not sure if you are agreeing that the client moves stuff around which causes your queue to run dry or if you are just saying that your queue is running dry because the GPU is so fast. Or both? :max: |
[QUOTE=Xyzzy;263117]What we mentioned above is that the client actually moved the worktodo.txt assignments directly to the results.txt file, without doing them. (The results.txt file looked like a worktodo.txt file.)
We are not sure if you are agreeing that the client moves stuff around which causes your queue to run dry or if you are just saying that your queue is running dry because the GPU is so fast. Or both? :max:[/QUOTE]I misunderstood the point you made in your first quoted para. I've never seen that behaviour. My queue does indeed run dry too often because the GPU is so fast. |
FWIW, Fish1 is investigating the possibility of a user error in this situation.
[SIZE=1]Snake1: They are going to blame me for this mess![/SIZE] |
XYZZY, please let me know if it was a layer 8 problem or not. :wink:
Oliver |
[QUOTE=xilman;263115]Yeah, I get that all too often. The GPUs chew through work at such a frightening rate that even though I ask for 200 assignments at a time there's still a risk of running dry.[/QUOTE]</shame>Try running some of the expos in the 100M digit range up to 80 or 81 bits.<shame>
|
Just bump up your bit level...by 1 will keep you busy, by 10 will keep you busy all year!
|
[QUOTE=Uncwilly;263136]</shame>Try running some of the expos in the 100M digit range up to 80 or 81 bits.<shame>[/QUOTE]I take what the server gives me. The organizers of the project are much more likely to know better than I what my resources should be doing.
Paul |
[QUOTE=xilman;263160]I take what the server gives me. The organizers of the project are much more likely to know better than I what my resources should be doing.[/QUOTE]The PrimeNet settings don't take your GPU hardware into consideration. If you want to work on the expos that PrimeNet is handing out, then I would suggest that you do the following:
a) get an allotment from the server, b) find out how far they are normally to be taken ([url]http://mersenne-aries.sili.net/factorbits.php[/url]) c) add 2 to that number, d) replace, in your worktodo, the stop bit is that you were handed out to the new number. That is effectively what George says is ok for GPU's and should make you worktodo last much longer. |
[QUOTE=xilman;263160]I take what the server gives me. The organizers of the project are much more likely to know better than I what my resources should be doing.
Paul[/QUOTE] I thought, to some degree, you were a project organizer.... However, IMO, the project is advanced by maximizing the number of exponents eliminated for minimum effort....even if there is some disagreement on how best to measure that effort. I'm running about 1 in 30 successful TFs right now, taking perhaps an hour apeice on the wall clock. So for my one or two day's compute effort, I eliminate approximately one exponent. This compares quite favorably with my P-1 efforts, that take 50-60GHz days to eliminate an exponent, and significantly more than a day to get those 60GHz days of work in on a CPU. We are working, slowly, on the automatic interactions.... |
[QUOTE=Christenson;263172]ISo for my one or two day's compute effort, I eliminate approximately one exponent. This compares quite favorably with my P-1 efforts, that take 50-60GHz days to eliminate an exponent, and significantly more than a day to get those 60GHz days of work in on a CPU.[/QUOTE]
That's roughly what I'm doing, though the rate is probably closer to two factors a day. [QUOTE=Christenson;263172] We are working, slowly, on the automatic interactions....[/QUOTE]Good! The sooner it arrives the better. I'd much rather prefer a fire-and-forget solution than have to remember to do all the baby sitting. If I also have to faff around editing input files to compensate for a present inadequacy of the task allocation strategy then it's quite likely that my GIMPS contribution will fall to zero. Of course, if uncwilly would rather have no contribution if favour of his desired pattern of contribution ... Paul |
I suspect your two factors per day is a reflection of your "better" GPU -- mine's "only" a mid-range card, a GTX440. As for bumping up the bit level, it sounds nice in theory, but actual project progress is made by finding factors at least effort and eliminating exponents from LL tests. Breadth-first, rather than depth-first, does the best job of that, and most of us find faffing about more expensive than copying and pasting a bit more.
|
[QUOTE=Christenson;263195]I suspect your two factors per day is a reflection of your "better" GPU -- mine's "only" a mid-range card, a GTX440. As for bumping up the bit level, it sounds nice in theory, but actual project progress is made by finding factors at least effort and eliminating exponents from LL tests. Breadth-first, rather than depth-first, does the best job of that, and most of us find faffing about more expensive than copying and pasting a bit more.[/QUOTE]It's a GTX460.
I agree with your cost-benefit analysis. Paul |
[QUOTE=xilman;263181]Of course, if uncwilly would rather have no contribution if favour of his desired pattern of contribution ...[/QUOTE]
By no means. I am just offering some ideas..... Your GPU is besting all of my borg boxen. |
(possibly common feature for any GPU programs)
I have found an interesting "feature" (it is probably not a feature of mfaktc, but of CUDA and/or specific CUDA calls... maybe there exists a bypass for it).
1. mfaktc runs fine. It finishes its chunk and stops. 2. you connect to the same computer using remote desktop (in my case from a 32-bit XP console to a 64-bit Win7), replenish worktodo.txt 3. you try to run the binary and you get [CODE]$ ./mfaktc-win-64.exe mfaktc v0.17-Win (64bit built) cudaSetDevice(0) failed cudaGetLastError() returned 38: no CUDA-capable device is detected [/CODE] 4. you go to the computer physically, and then mfaktc again runs fine. 5. (another detail: to make matters possibly more convoluted, this particular remote connection is over VPN) Is there a way to tell CUDA to query the local hardware, not that of the remote console's? Granted, both computers do some tricks in remote desktop mode: for example, both remote and local printers are available, etc. Same (?) goes for the graphics device(s)? So some expertise in what Windows actually does for implementing remote connection is needed (I don't have any such expertise). Knowledgeable people, please help? P.S. "./mfaktc-win-64.exe -d 1", "-d 2" etc fail all the same |
I've done a bit of googling on this.
Apparently there's no work around for non-tesla cards. The tesla drivers allow this to work. This is using mstsc to remote into the box. Vnc on the other hand works but is incredibly slow. I find the best way is to use mstsc log on the remote box to set things up (cmd window type the command but dont press enter etc...). Then vnc in, just to press enter, make sure it loads, then re-log in with mstsc. -- Craig |
possibly write a program that runs in an infinite loop, looking for "xyzzy" in the process table..when it sees it, it launches mfaktc if it's not already also in the process table. Start locally. Xyzzy is just sleep(100) or something.
|
[QUOTE=Christenson;263172]
I'm running about 1 in 30 successful TFs right now,[/QUOTE] In what range are the exponents? One extra bit or more? 1/30 doesn't exactly tally with the 1/70 probability of a factor between 2^70 and 2^71. David |
[QUOTE]In what range are the exponents?
One extra bit or more?[/QUOTE]1:115 here. 53M range. 69 to 70 or 70-71. The sample size is > 9000 tested. :sad: |
[QUOTE=Christenson;263640]possibly write a program that runs in an infinite loop, looking for "xyzzy" in the process table..when it sees it, it launches mfaktc if it's not already also in the process table. Start locally. Xyzzy is just sleep(100) or something.[/QUOTE]
I've already tried launching "sleep 30; mfaktc" in a Cygwin shell and then disconnect. Some time later you connect and see the same result. My guess is that the other driver is not initiated (lazily, which is I guess usually a good thing), until I would go and login there physically. __________ [SIZE=1][COLOR=blue]Xyzzy's fish 1 says: "Rirelbar nyjnlf oynzrf zr, jgs?!"[/COLOR][/SIZE] |
[QUOTE=xilman;263160]I take what the server gives me. The organizers of the project are much more likely to know better than I what my resources should be doing.
Paul[/QUOTE] Thinking outside the box (smile), the original aim of the project was to find new MPs as quickly as possible. Of course, there have been great spinoffs from the venture, which is just as well since a reasonable expectation from now on is one MP per 6 years. I monitor the "Primenet summary" page carefully, and find it a bit sad that most LL assignments get returned unfinished.* If the primary goal is to find another MP asap, TF and P-1 needs to be be done between 53M and 60M for the couple of years the LL wavefront will take to sweep that range. David *Surely if an assignment is returned having had a significant fraction of the work needed performed on it, it would be worthwhile preserving the iteration number and residue, wouln't it? |
[QUOTE=davieddy;263646]In what range are the exponents?
One extra bit or more? 1/30 doesn't exactly tally with the 1/70 probability of a factor between 2^70 and 2^71. David[/QUOTE] My actual numbers (ignoring 10 or 15 NFs from the last 24 hours) 247 attempts, mostly in the 50-60M range, a few from the 80M range. 5 factors. So my original numbers are a bit optimistic, and I can expect to hit a dry spell here as my statistical experience increases. As for unfinished LLs, the issue is probably that positive feedback from the stats page takes at least a week, maybe two or three months, which is a *very* long time in the "instant" internet age. Almost anything else (LL-D, P-1, TF, ECM) is quicker. So is contributing to NSF@home, or, perhaps more practically, folding@home. The dedicated don't mind, but the casual don't have that much patience. I'm working on that patience part even for mfaktc, albeit slowly. |
[QUOTE=Batalov;263650]I've already tried launching "sleep 30; mfaktc" in a Cygwin shell and then disconnect. Some time later you connect and see the same result.
My guess is that the other driver is not initiated (lazily, which is I guess usually a good thing), until I would go and login there physically. __________ [SIZE=1][COLOR=blue]Xyzzy's fish 1 says: "Rirelbar nyjnlf oynzrf zr, jgs?!"[/COLOR][/SIZE][/QUOTE] Not quite what I had in mind..... when physically at the computer, launch a loop that says sleep 30; if (xyzzy in process table && mfaktc not in process table) mfaktc then, when away from the computer, launch xyzzy, which contains just a "sleep 30" while TRUE (or while mfaktc not in process table) command. that way, the mfaktc is local and launched locally, and the presence of xyzzy in the process table divorces the remoteness from the mfaktc process. |
[QUOTE=Xyzzy;263649]1:115 here.[/QUOTE]
That is concerning. What hit rates are others getting for single bit levels of TF? |
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.