mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   No Prime Left Behind (https://www.mersenneforum.org/forumdisplay.php?f=82)
-   -   Report top-5000 primes for all k<=1001 (https://www.mersenneforum.org/showthread.php?t=9891)

mdettweiler 2009-05-28 14:23

Okay, I figured out what's going on with my scripts that didn't cause this prime to be written to the prime log file. The scripts themselves are not at fault; instead, it is the timing of when they're run. Here's essentially how they work:

-The copy-off script is rather simple: every time it's run, it takes all the results files and moves to the web results folder. This has the effect of "emptying" the active results file for the next day.

-The status page generator script copies (not move) the results files for each server to their respective intermediate files in the web folder. It then runs through those intermediate files to tabulate daily totals by user, and search for primes. If a prime is found, it adds it to the prime log file.

I have the copy-off script set to run at 00:01 every day (CDT). The status page script runs at :0, :15:, :30, and :45 of every hour.

Unfortunately, we have a bit of a dilemma. If the copy-off script is run *before* the last status page generator run of the day, then the status page generator ends up skipping an entire 15 minutes of results. Thus, a 15 minute window during which primes would make it into the daily results files, but not into the prime log file.

Hence, I have it set to run the copy-off script one minute after the last status page run of the day. But we still have a problem (albeit a smaller one): any results found within that one minute between the two scripts will be missed by the status page script and not make it into the prime log file. This was the case with Karsten's G8000 prime.

Obviously, a gap of 1 minute is a lot less crucial than a 15 minute one. Nonetheless, though, as Karsten's prime just demonstrated, we can't afford to miss even a minute.

Unfortunately, crontab (the otherwise extremely powerful program used in Linux to run programs on a schedule) only allows me to control things by minutes, not seconds. I would rather not have both scripts run at the exact same minute, since that can lead to crazy file locking problems that could potentially botch an entire day's results. Thus, it seems that a gap of 1 minute is rather unavoidable.

One possibility for fixing this would be to simply integrate the copy-off function into the status page script. I must admit, however, I am not sure of a good way to implement that, and besides, the status page script is such a cobbled-together mess that I'm somewhat afraid to touch it. :smile:

Max :smile:

kar_bon 2009-05-28 15:06

[QUOTE=mdettweiler;175085]
Hence, I have it set to run the copy-off script one minute after the last status page run of the day. But we still have a problem (albeit a smaller one): any results found within that one minute between the two scripts will be missed by the status page script and not make it into the prime log file. This was the case with Karsten's G8000 prime.
[/QUOTE]

i understand it right:

the script only misses primes found in that special minute, but the pair-results in that minute will not be missed?

what about copying/moving the results every hour to a file and with this new file, all scripts will be run? so there will be no new results during that one minute!

BTW: that prime is not shown either at [url]http://nplb-gb1.no-ip.org/llrnet/[/url] nor at [url]http://www.noprimeleftbehind.net/index.php?content=prime_list[/url] !

gd_barnes 2009-05-28 16:26

Having to add the prime to primes.log manually is somewhat OK as a short-term fix. Not having it go to the database is not OK.

Please come up with a creative solution for this, i.e. something like:

1. Copy off results file to temp file.
2. Write some special code to parse through the file looking for any prime in the last minute.
3. If prime in last minute, add more additional special code to add the prime to the primes.log file, the notification is sent, and that the database gets updated with the prime.

This was the situation that I have referred to in the past that we have to test on new servers as they relate to updating the database. That is testing over a change over in dates or months or years. I did enough coding during the Y2K crises and otherwise related to the timing of schedulers to know that these types of situations will bite you if you don't test them ahead of time.

I understand what you mean by file locking problems. That can happen when 2 users try to update the same record online in an online entry system. Is Crontab completely inflexible in that way? Here is the way we did it on our scheduler at work: The execution of one job would be dependent on the execution of a prior job.

Here is what could be the source of the problem -OR- it is another problem: You have the script set to run at a hard-coded time of 1 minute after the hour. If I did that at my insurance co., they would kick it back to me on a code review in short order.

If you have a job (i.e. process or program) that is dependent on another one to run, you have to set it to run after that job is complete; not at one minute after you THINK it will complete. At work, we used a Jobtrac scheduler and we set one job as a successor to another. In our case here, what if there is a power blip or a computer problem or something else that occurs as the results are being copied off? What if it just decides to take 2 minutes to finish due to a huge volume of results or for some other reason?

What you need is a job (or process or program-based) dependency not a time-based one. The copy-off process needs to run immediately after the status page script has completed its midnight run.

Therefore, it looks like you have 2 problems that need to be fixed: The immediate one and a potential one that would occur if the status page script takes more than a minute to run. I can't say if changing it to a program-based dependency would fix the current issue but it does need to be considered.

Programmers spend 95% of their time coding for 5% of situations. This is an excellent example.

This will be especially important to test on those very small very fast tests on the new PRPnet server. It will be easy to regularly find 2 or more primes during that last minute for tests at n=~20K if a lot of clients are running the server.

One thing more to check: Please check that the result got in the database.


Thank you,
Gary

mdettweiler 2009-05-28 17:32

It would be quite easy to run both scripts in immediate succession by putting them both on the same crontab entry. *However*--we have a problem. One script runs once a day, the other runs once every 15 minutes. If I string them both together as one command, one of them will be running at the wrong frequency.

Hmm....I just got an idea. :smile: Maybe I can make crontab run just the status page script at 15-minute intervals, *but*, at the end of each day, *both* scripts would be run concurrently. That it, at 00:01, the status page is updated to reflect any changes in the last minute, and then the copy-off is performed immediately after. Yes, that will work, I'm sure of it. :smile:

Don't mind me, just thinking out loud. :wink:

As to the result being in the DB, though: I have no idea why it didn't make it in. The DB *should* be importing all results files, both intermediate and daily, so nothing should be missed. We'll have to wait for Dave to answer on that one.

BTW: email notifications are handled on the DB end, so that has nothing to do with my scripts. In this case, the problem appears to be because the results never got into the DB, for reasons unknown.

Max :smile:

Edit: Okay, I've put that modification in place. That should close the 1-minute gap in prime checking.
Edit2: Karsten's prime has also been retroactively added to the prime log file.

MyDogBuster 2009-05-28 19:04

When I was programming I ran into similar problems with end of file records. Some languages treat the last physical record (with data) as the end of file. Others treat the first $null record as the end of file. I know that the former would cause a problem with a do while loop because a do while won't execute if the while is already true. I have run into that situation with some of the scripts I've coded for my own use here. I always got around it by putting the read of the file as the first statement AFTER the do while statement so that the do while would always execute for the last record.

gd_barnes 2009-05-28 21:46

[quote=mdettweiler;175117]It would be quite easy to run both scripts in immediate succession by putting them both on the same crontab entry. *However*--we have a problem. One script runs once a day, the other runs once every 15 minutes. If I string them both together as one command, one of them will be running at the wrong frequency.

Hmm....I just got an idea. :smile: Maybe I can make crontab run just the status page script at 15-minute intervals, *but*, at the end of each day, *both* scripts would be run concurrently. That it, at 00:01, the status page is updated to reflect any changes in the last minute, and then the copy-off is performed immediately after. Yes, that will work, I'm sure of it. :smile:

Don't mind me, just thinking out loud. :wink:

As to the result being in the DB, though: I have no idea why it didn't make it in. The DB *should* be importing all results files, both intermediate and daily, so nothing should be missed. We'll have to wait for Dave to answer on that one.

BTW: email notifications are handled on the DB end, so that has nothing to do with my scripts. In this case, the problem appears to be because the results never got into the DB, for reasons unknown.

Max :smile:

Edit: Okay, I've put that modification in place. That should close the 1-minute gap in prime checking.
Edit2: Karsten's prime has also been retroactively added to the prime log file.[/quote]


Now you've got it! Here was our montra on my job: Never schedule job B to run at a specific time of day IF it is dependent on job A running first; even if job A is scheduled to run hours before job B. Job A could easily fail and it would create a mess if job B would run. Here is what we did if we wanted job B to run at a specific time but it was still dependent on job A:

Set a time dependency for it to run; 12 midnight in this case.
Set it to be dependent on job A completing successfully.

Both of those conditions had to be true in order for job B to run.

The way you're going to do it works too. Run job A every 15 mins. throughout the day but run both jobs A and B at midnight. I'm assuming by you doing that that if job A has a problem, job B will not run. That is the key.

Dude, your skimmin' again. lol I never said the result did not get in the database. I said we need to CHECK that it got in the database. As Karsten said, it is the PRIME that did not get in the database. Please check on the result. Thanks.


Gary

gd_barnes 2009-05-28 21:48

[quote=mdettweiler;175117]
Edit: Okay, I've put that modification in place. That should close the 1-minute gap in prime checking.
Edit2: Karsten's prime has also been retroactively added to the prime log file.[/quote]

Has the prime been added to the database? Be sure and check that the result is in the database too. Karsten needs to get credit for both.


Gary

kar_bon:
not yet in NPLB-database, only in GB-server primelist file!

vaughan 2009-05-28 23:47

465*2^690044-1 already in database.

mdettweiler 2009-05-29 03:22

[quote=gd_barnes;175152]Dude, your skimmin' again. lol I never said the result did not get in the database. I said we need to CHECK that it got in the database. As Karsten said, it is the PRIME that did not get in the database. Please check on the result. Thanks.[/quote]
[quote=gd_barnes;175153]Has the prime been added to the database? Be sure and check that the result is in the database too. Karsten needs to get credit for both.


Gary

kar_bon:
not yet in NPLB-database, only in GB-server primelist file![/quote]
Actually, I wasn't skimming this time around. :smile: The reason why I worded it like I did was because from what I understand, the DB does *not* utilize the prime logs generated by the individual servers in any way. Those are only for the benefit of human users. Instead, the DB just imports the results, and when it does that, it checks to see if the result is prime. If so, then it sets one of the fields, "is_prime", to 1, and the result is counted with the primes thereafter.

gd_barnes 2009-05-29 09:04

OK but that doesn't answer my questions:

Is the result yet in the database?
Is the prime yet in the database?

Karsten says the prime is not in the database.

Regardless, if either the prime or the result is not in the database, based on what you just said, why aren't they? If you're saying that it gets them from the result file, then both should be there.

kar_bon 2009-05-29 11:10

the prime is still not in here [url]http://www.noprimeleftbehind.net/index.php?content=prime_list[/url]

so i think the pair itself isn't there too!

perhaps more pairs missing?

please David or David check this:

take a resultfile (older ones too) and check if the [b]last[/b] pair listed (in that special one minute timeframe) is in the database or not!

the pair score for my account seems correct for that file for the day the prime was found, but i'm not sure. so perhaps only the last pair (-> the prime) is missing, perhaps other files are the same issue!


All times are UTC. The time now is 23:06.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.