![]() |
|
|
#12 |
|
Jan 2005
Sydney, Australia
5·67 Posts |
I had a new SSD drive die within 3 weeks of purchasing it. Fortunately I had backups of all my important files (personal and business) on multiple HDDs on other networked PCs and also on-line (I use a paid version of Carbonite).
I learnt in my early days of computing, way back on an IBM /SY36 minicomputer, backup early and backup often. |
|
|
|
|
|
#13 | |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
Quote:
Gary, Dave and I are currently discussing backup options via email--stay tuned.
|
|
|
|
|
|
|
#14 | |
|
I quite division it
"Chris"
Feb 2005
England
31·67 Posts |
Quote:
So not "Unlimited Backups" despite their claims. (Now with SquirrelSave(!), UK based and no bandwidth limitations.) |
|
|
|
|
|
|
#15 |
|
May 2010
499 Posts |
|
|
|
|
|
|
#16 |
|
Mar 2006
Germany
55308 Posts |
I've got all resultfiles from every NLPB-server from beginning.
Altogether with all processed/checked data they are about 4GB. Older results are backuped on 2 different HD's, the newer ones on a stick and stored on another HD, too. My work-folders contain about 6GB of data: NPLB, aliquot, docs, code, progs and other stuff related to (prime)numbers. |
|
|
|
|
|
#17 |
|
Mar 2006
Germany
290410 Posts |
So what went wrong during or better after the failure yesterday?
All 660 pairs reserved for my 12 offline-cores (work for 24 hours!) were rejected from port 3000 NPLB-server! So this means the joblist.txt for that server was damaged or completely deleted? That's the only reason, why the server sent them again to any user (the pruning was not done because the results were not in the server then, but those pairs are still in the joblist.txt), the results submitted to the server and after that all, I've sent them back 'again' although I've reserved them first! Please try to figure out why! |
|
|
|
|
|
#18 | |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
11000011010012 Posts |
Quote:
That reminds me, I was going to send you a PM earlier today but didn't get the chance to yet: you can expect quite a load of duplicated pairs in the 8/21 results file when you process it, as well as possibly 8/20. Unfortunately, there's not much we can do about this kind of thing except let it work itself out of the system, which it should have done completely by now. Any additional such large batches of work you reserve from the server should be OK.
|
|
|
|
|
|
|
#19 |
|
May 2007
Kansas; USA
101000100110112 Posts |
Sorry that you lost pair processed for a day Karsten. It looks like all 3 of us ended up losing many pairs processed. For your pairs yesterday, it did not record in joblist that you had sent in the results the first time around (even though the results were actually accepted by the DB) and so handed the pairs back out again to Vaughan and me after they were "expired" from its perspective. We then processed them, it accepted them, and then they came back from AMDave's process as duplicates. Since your results came in first you got credit for them. For your pairs today, the reverse happened. Since it never recorded that you had reserved them in joblist, they were handed to Vaughan and me. We processed them before you did and you ended up having yours rejected.
Sorry about all of the problems guys. The "semi-crash" had much more far-reaching effects than what I expected. I hope the above is the last of it. I also just now saw in the CRUS forum that Max and my personal PRPnet servers as well CRUS PRPnet port 1300 were toasted. Fortunately Max quickly got them reconstructed. Karsten, this is something that I think I've brought up before. It can be hard on the servers to cache and then receive back so many hundred pairs at once, especially when those pairs come back in all at once over a day later. My suggestion would be to reserve an n=500 or n=1000 manual range in the drive for your own manual processing. It's a little more personal effort to post/send the results but would help prevent such large-scale problems like this. Note that I will be doing a full backup of Jeepford twice/month now. I know that some backup is already being done of some of the DB stuff by Dave but it is by no means comprehensive of the entire machine. Gary Last fiddled with by gd_barnes on 2010-08-22 at 06:03 |
|
|
|
|
|
#20 |
|
Mar 2006
Germany
55308 Posts |
I have to look in the llrserver process, perhaps the pruning of the joblist or the pruningtime/cycle could be changed to avoid such loss.
The option "prunePeriod" in "llr-serverconfig.txt" gives the timeframe when pruning the pairs. The internal list of pairs given to any client is written to file only after "pruningPeriod" is over. This pruning can only be done when a event occurs to the server. This event is the connecting of a client to the server. So if there're many reservation at once, those reservations are first stored in the internally list of the server (memory). After the pruningPeriod is over, that list will written/updated to a file in that case. So, assumed there're 100 reservations at once and the next time a client connects to the server after 24 hours (such I do it for the k=300-400 n=1M-2M server), the whole reservations are stored only in memory, not in a file! I don't know what options are set on all servers, we should have a closer look at them. Suggestions: - Set an option not only by timeframe (pruningPeriod in seconds) but also by amount (pruningAmout in number of reservations of all clients), so which option is first, do the pruning. -Set a special option for the client to call 'llrserver -s' which will force the server to simplify joblist and knpairs files. Both seems small changes on client and/or server but without changing the whole communication (old clients not affected!). Minds? |
|
|
|
|
|
#21 | |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
11000011010012 Posts |
Quote:
Normally, that would be enough; in the past if, say, a power outages occurred, then we didn't lose anything as long as nobody was in the middle of talking to the server at the time of outage. In your case, though, your client was in the middle of talking to the server, so the entire communication's worth was lost--which in your particular case was unfortunately quite a bit of pairs. What PRPnet does to address this is write each reservation to the database as it happens--for instance, in a batch of 100, it hands out #1, writes #1 to DB, hands out #2, writes #2 to DB, etc. By contrast, LLRnet hands our #1, hands out #2, etc., then writes them all in one big bunch at the end. If LLRnet could be modified to behave more like PRPnet in this regard, it would solve the problem: in a case like yours, you theoretically wouldn't have lost any of your cached pairs. |
|
|
|
|
|
|
#22 |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
186916 Posts |
BTW @all: we now have a formal external backup process set up for the server machine (jeepford). First, once a day the databases (both stats and PRPnet), web pages, and server files will be backed up to a location on jeepford itself; the last 5 days of backups will be retained. Then, the last 3 days will be copied over the network to another of Gary's machines within his network (humpford, which happens to be our previous server machine). Lastly, Gary will backup the 3 days worth from humpford to an external USB hard drive twice a month.
That should cover us pretty well--if we ever have an even more serious hard drive issue than we had now, we should be able to restore everything exactly the way it was without losing more than a day of processing. Even if jeepford itself is completely fried, we should be able to transplant the backups to another of Gary's machines relatively easily and get everything rolling again within a couple of days.
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| 20th Test of primality and factorization of Lepore with Pythagorean triples | Alberico Lepore | Alberico Lepore | 43 | 2018-01-17 15:55 |
| Move the 20th (moving to endgame soon) | Dubslow | Game 1 - ♚♛♝♞♜♟ - Shaolin Pirates | 10 | 2013-03-03 08:59 |
| Rally Feb. 20th-22nd | gd_barnes | No Prime Left Behind | 13 | 2009-02-20 14:06 |
| Prime95's backups broken? | abstractius | Software | 4 | 2007-12-18 02:31 |
| New Server Hardware and price quotes, Funding the server | Angular | PrimeNet | 32 | 2002-12-09 01:12 |