![]() |
[QUOTE=Brian Gladman;209146]find the line:
.format(Y1, poly_p['m'], Y0, fact_p['n'])) referenced in the error report and replace it with: .format(denom, poly_p['m'], numer, fact_p['n'])) Brian[/QUOTE] Thank you, this did indeed let it get a little bit further. :smile: However, it just allowed a successful printing of an error. :sad: There is no rush, but could you let me know what the following error means, and how I might be able to fix it? [CODE] D:\Programming\ggnfs-msieve\c122>factmsieve01.py c001.poly -> ________________________________________________________________ -> | Running factmsieve.py, a Python driver for MSIEVE with GGNFS | -> | sieving support. It is Copyright, 2010, Brian Gladman and is | -> | a conversion of factmsieve.pl that is Copyright, 2004, Chris | -> | Monico. This is version 0.60, dated 19th March 2010. | -> |______________________________________________________________| -> This is client 1 of 1 -> Using 2 threads -> Working with NAME = c001 -> Error: 1 * 1000000000000000000000000 + 1000000000000000000000000 != 0 mod 1! [/CODE] [QUOTE=xilman;209154]I can't see where the error lies, but I deduce that you may be trying to find some Brilliant numbers. Paul [/QUOTE] Yes! You are correct, sir! Here lately I've been trying to do at least one per year. May I assume that it is you who are the current record holder? Maybe someday my searches will get that high. Hmmm, maybe after 122 I'll start on 151. :smile: |
Sorry, its another bug - take the 'not' out of this line just above the previous error:
if CHECK_POLY and not (denom * poly_p['m'] - numer) % fact_p['n']: so it becomes: if CHECK_POLY and (denom * poly_p['m'] - numer) % fact_p['n']: Brian |
Excellent! That fixed it! And it is now working swimingly. Thank you for your help.
|
Thank you for your efforts too - every bug squashed helps all users!
I am making good progress with a better approach in managing sieving intervals but the original script seems to contain quite a few hacks to get it working rather than doing things in the 'right' way. Is there anyone here who knows and might be willing to explain what is going on in setting up and running the siever(s)? That is the setting up of the associated .afb files and their deletion under some conditions etc. The script is also designed to set up two types of sieving job files but, as far as I can see, one of these (classical) is now completely redundant since it is never used when this sieving mode is run. Am I right about this? I can get sieving intervals working much more robustly for fixed situations but what is proving more difficult is to get the script to do the right thing in the following situations: (a) moving a resume file from one client to another (or changing the client's ID) (b) resuming sieving with a changed number of clients (either fewer or more) (c) resuming sieving with a changed number of threads (either fewer or more) What I am interested in is, given a saved list of intervals: (q_lo, q_pos, q_hi) that give the intervals and positions reached for N threads on client C (of NC) clients when it was stopped, how should I resume this sieving for M threads on client D (of ND) clients? As far as I can see gaps in q coverage don't matter much so it may well be easiest not to bother with completing partially sieved intervals in _these_ special resume situations and just start agin from a q value higher than the input intervals. But is this right? Should I worry about any of these more complex resume scenarios and, if so, which are the important ones? Any comments or advice here would be much appreciated. Brian |
[QUOTE=Brian Gladman;209254]Thank you for your efforts too - every bug squashed helps all users!
I am making good progress with a better approach in managing sieving intervals but the original script seems to contain quite a few hacks to get it working rather than doing things in the 'right' way.[/quote][SIZE="1"](Side note: I just looked at my first summary file--funnily enough, I completed my first GNFS job 3/29/06, a c130 completed in 3 weeks on a 2.4 GHz Athlon....I'm now looking at running a c143 in 8-12 days depending on how much I use the two dual cores doing the sieving.)[/SIZE] I think that you're the first person that's put forth the effort to do a full re-work of the framework. As the suite has progressed, people have added things and those additions have been tinkered with to tweak what, and how, they do their work.[quote]Is there anyone here who knows and might be willing to explain what is going on in setting up and running the siever(s)? That is the setting up of the associated .afb files and their deletion under some conditions etc.[/quote]The .afb.x file is the factorbase file used by the sievers. The 'x' is 0 or 1 depending on which side (rational/algebraic) you're sieving. The file is deleted if the script detects that the current sieve block is under the factorbase limit. (For example, if your rlim is 12000000 and you're just getting started, the script starts at 6000000. In this case, before the script calls the siever the [B]second[/B] time, it deletes the factorbase, since the original version of the siever would complain about the sieve value being less than the factorbase limit and then abort. The first time the siever gets called, there is no factorbase, so the unlink actually doesn't delete anything.) The latest version of the siever code will print a message and lower the factorbase limit instead of aborting, so assuming everyone is running current enough versions, the .afb.x deletion could be left out, but if left in, only occupies a few seconds while the siever regenerates the factorbase. One impact of this hack is that the rlim or alim value is adjusted in the job file before each call to the sievers, so that could be included or removed depending on how you go from here.[quote]The script is also designed to set up two types of sieving job files but, as far as I can see, one of these (classical) is now completely redundant since it is never used when this sieving mode is run. Am I right about this?[/quote]When I started using GGNFS, I never got the line siever to run; it always crashed with an error. After running some jobs on a different CPU, I think it may have been some ASM code in the line siever that was incompatible for the Athlon, since the line siever didn't seem to have trouble on a Pentium.... I think the main point of the line sieve was to catch the very easy, very quick relations near the beginning of a job. The consensus seems to be that there is no value in line sieving, since the lattice siever can catch most (all?) the relations that the line siever will.[quote]I can get sieving intervals working much more robustly for fixed situations but what is proving more difficult is to get the script to do the right thing in the following situations: (a) moving a resume file from one client to another (or changing the client's ID) (b) resuming sieving with a changed number of clients (either fewer or more) (c) resuming sieving with a changed number of threads (either fewer or more)[/quote]The main problem here is how much state do you want to build into the process. As I see it, right now the script is entirely self-contained in terms of each job you run. If you start a job with, say 5 clients, the only instantiation (sp?) of the script that needs to know anything about the other clients is #1, and it only needs to know that there are potentially up to 4 other scripts out there that might be producing spairs.add.x files. I have run up to 3 PCs as clients on jobs, running the PCs as 'client [2-5] of 5' stand-alone (saving relations to the local HD), and then moving the spairs.add files to the master PC once a day or so. (I did this since the 'generate relations, then append to an spairs.add file' method caused the entire spairs.outx file to be read/written over the network. Not too bad with only 3 PCs, but imagine the network storm if lots of PCs were doing the same thing....)[quote]What I am interested in is, given a saved list of intervals: (q_lo, q_pos, q_hi) that give the intervals and positions reached for N threads on client C (of NC) clients when it was stopped, how should I resume this sieving for M threads on client D (of ND) clients? [/quote]I think that this falls under the issue mentioned above: the only state needed with the current setup is the 'q0' and 'qstep' values saved in each .job file. If you want to try and complete 'all' the special-q ranges, you would seem to need to build a range allocator into a master script and turn this from a simple script into a whole client/server setup...either that or add the capability to parse through the accumulated relations looking for completed/missed ranges.[quote]As far as I can see gaps in q coverage don't matter much so it may well be easiest not to bother with completing partially sieved intervals in _these_ special resume situations and just start agin from a q value higher than the input intervals. But is this right?[/quote]That might be a solution, but how do you decide which is the highest range reached. If you have several clients running, some of which are faster than others, what if the #1 script is the slower one and a 'completed range' is written by a higher numbered client and is then over-written by the #1 script when it completes its range?[quote]Should I worry about any of these more complex resume scenarios and, if so, which are the important ones?[/quote]It's hard to figure out things like this, since there will be a diverse pool of people of different technical abilities running jobs. How much can you build into the script to keep people from possibly running duplicate work?[quote]Any comments or advice here would be much appreciated. Brian[/QUOTE]The one major issue that I can think of is the issue of multiple clients versus multiple threads. This may have come up in sleigher's job discussed here and in another thread. (This was with the factmsieve.pl script, so it touches on your script only with respect to how closely you followed the Perl code.) The clients vs. threads capabilities were added at different times. What happened to sleigher was that a massive amount of duplicates turned up in a job (in the millions). I have not looked closely at the code, but I think this happened because mutiple clients were run on different PCs with the NUM_CPUS set to different values in the config. I speculate that the sieve block ranges may have been set to one value by the 'multi-client' code and then split up differently by the 'multi-thread' code. The thing to watch there would be that the multiple client code should determine the sieve block and then make sure that the multi-threading code respects those block boundaries. As far as moving save files from PC to PC, as long as the person running the job moved the job files correctly and didn't run multiple clients with the same number, you don't have to really do anything. The main problem I see with trying to build too much more into the script has been pointed out in the past by jasonp in trying to multi-thread msieve: the lack of a portable, cross-platform solution to file locking. If you want to build much more state into the process you'll have to figure out a way to keep the different clients from stomping on each other if you run everything from a central network fileshare. Maybe filenames with a timestamp addition to unique the names would work. As mentioned by chris2be8 [URL="http://www.mersenneforum.org/showpost.php?p=208621&postcount=315"]here[/URL], one solution to the network storm effect would be to rename the spairs.out files to a unique value, rather than appending to a fixed file each time. I'm sure there are other issues missed here, but this at least gets thing rolling..... All I can say is thank you for the effort and keep up the good work. I have not used the Python script as yet, since the Perl solution works for me right now. If I ever get some more spare time I really mean to sit down and see what I can put together as far as a framework for automating the whole process as well, but we all know what they say about good intentions.... |
Thankyou for a most helpful response. It will take me some time to absorb your input so I will make a more detailed response when I have been through it.
Brian |
Thank you again for your most helpful input, which I have now looked through.
I am a strong believer in keeping things as simple as possible so if the only cost of removing the afb file manipulation in the script is to force people to use the Perl script or update their ggnfs siever executables, then I am all for going down this path. Which I will do unless anyone has a strong objection. i also intend to remove the setting up of the classical siever since this is not even used in the script now. On the issue of managing the q values, currently client n + 1 of N (0 <= n < N) starts at qstart + n * qstep, does a qstep interval and then moves to its next range starting at qstart + (n + N) * qstep. So its q intervals are of length qstep and start at qstart + (n + k * N) * qstep for k = 0, 1, 2, ... I am not inclined to change this unless anyone can come up with a better idea. On resuming threads, once I resolve not to play with client changes, it is much easier to deccide how to resume when the number of threads changes. I think this is worth doing because I find myself changing the number of threads I am willing to devote to sieving depending on what else my machines are doing. Its also not too hard to split or combine sieving intervals now that I have moved to a siever driven by a queue of sieving intervals rather than a fixed interval allocation algorithm. So this is the path I intend to follow on this aspect of sieving unless anyone feels that there is a better alternative. Brian |
[QUOTE=Brian Gladman;206282]I will look at this but it in a part of the code that was hard to get right so I am reluctant to fiddle with it without careful thought.
Is it clear from the siever code what that error code means? Brian[/QUOTE] I've been running several snfs factoring jobs with factmsieve.py and have run into the "Return value -1073741819" problem. It has happened to 6 of my 67 snfs factoring jobs. The simple fix for each case was to update the c024.job.resume file to be the next block to be sieved. Then I would just start factmsieve.py c024.poly, and it would pick back up and run to completion. For example (I've been running 3 threads), if it crashed during the 400000 to 450000 block, I would change the .resume file from: Q0: 400000 QSTEP: 50000 QLAST0: 415297 QLAST1: 431862 QLAST2: 449801 To: Q0: 450000 QSTEP: 50000 QLAST0: 450000 QLAST1: 466667 QLAST2: 483334 I think I've found out the meaning of the error code too. When I went into microsoft's calculator program and entered in the number -1073741819, I then click to convert that to hexadecimal and get 0xffffffffc0000005. The important part here is the 0xc0000005 part. This is a generic windows error code for "Illegal memory access violation". When I look into my event log I see the following info: Faulting application gnfs-lasieve4i12e.exe, version 0.0.0.0, faulting module gnfs-lasieve4i12e.exe, version 0.0.0.0, fault address 0x00031dd4 I don't know how to look into the executable and find out what part of the code is at 0x31dd4. If anyone else does, this may be a good place to look. However, without getting into the siever code, I think just picking up sieving with the next block is a good way to proceed. What do you think of this course of action Brian? |
I'm also getting a "Return value -1073741819" quite often (probably same rate). It frequently happens that aliqueit tries to factor a c95 with ecm all night.
To fix this, only the Q0 line in the resume file has to increased (I usually skip a couple of 100K Q). QLAST will be calculated by the script automatically. It would be nice if the script could detect such a case. |
[quote=Brian Gladman;209165]Sorry, its another bug - take the 'not' out of this line just above the previous error:
if CHECK_POLY and not (denom * poly_p['m'] - numer) % fact_p['n']: so it becomes: if CHECK_POLY and (denom * poly_p['m'] - numer) % fact_p['n']: Brian[/quote] Hi! I have found a problem with this solution. If i use on a 8 core machine, the version of factmsieve.py from 19 of March everything runs ok. If i try the "beta test" version dated 20 of March, with this patch and the other on previous post of course, it fails after sieving the first range from 90000 to 100000. If necessary for debugging, i will post the error here after runnig the current job. Regards, scalabis |
Hi WraithX and smh,
Thanks for the further detail - I'll have a look at recovering in this situation. I assume that the error return if different if the siever is terminated by the user. Brian |
| All times are UTC. The time now is 22:51. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.