![]() |
|
|
#320 | |
|
Nov 2003
164448 Posts |
Quote:
results. Also, under the current scheme it is possible that a given number may never finish. It gets close to finishing (say within 50 curves). These assignments fail. They get reassigned. They fail..... etc. If the server kept track of which users returned results quickly it could hand out reassignments of failed assignments to just those users. This way a number would not take more than a week to finish (say) the last 50 or so curves. 5 days seems quite generous. Just one core on my 2.4GHz laptop finishes a curve (with B1 = 10^9, composite = 200 digits) in 4500 seconds...…. The policy of how new numbers get assigned seems very reasonable. However, if a number is stalled owing to repeated failed assignments, then that particular project will not accumulate much computing power during that period. i.e. almost none. Should not the next number that gets started then get assigned to that project? This is not what I see. I see numbers in some projects get stalled for an extended period, but new numbers seemingly get started up in OTHER projects. Perhaps my eyesight needs correction??? We see numbers stalled for more than a week [thus no computing power applied] and no new numbers get started despite the dearth of compute power over that period. Of course, my perceptions might be wrong. |
|
|
|
|
|
|
#321 | |
|
Nov 2003
164448 Posts |
Quote:
"additional sieving". |
|
|
|
|
|
|
#322 |
|
May 2009
2×11 Posts |
Hi folks.
Thanks for your efforts in doing this computation with cado-nfs. Just a few random thoughts. (that came after reading this thread)
Briefly put: we're well aware that cado-nfs has rough edges. It's not easy to get it to work on large projects when you're not familiar with the internals. On the other hand, contributions are welcome, and we're happy to help understand some obscure features. (preferrably using the mailing list). |
|
|
|
|
|
#323 |
|
"Curtis"
Feb 2005
Riverside, CA
4,861 Posts |
thome-
Thank you very much for your comments, and time spent browsing this lengthy thread! Unfortunately, most times CADO has failed to continue issuing workunits it has done so without any error at all. Both the server terminal window and log look fully normal- but the server is stalled. A pair of ctrl-c's issued to the server terminal kill CADO, and a restart using the snapshot file fixes whatever stalled the server. These stalls may be linked to a large user starting many clients at once; but we're talking maybe 20 new clients, not hundreds. One failure was due to a poisoned client, but that problem was easy to discover and correct with the client and by extending the number of bad results before shutting down the server. A client-blacklist option would be welcome. Questions concerning postprocessing: The host currently has 64GB ram and 200GB swap on NVM SSD. Do you expect I will need to upgrade to 128GB ram? Does fast swap help enough on filtering to maybe not need the extra memory? The job is being run on a 1TB SSD, and the relations will take up ~300GB of disk. My only data point is a C186 that ran 32/33LP. 675M relations filtered in 32GB without issue, and the matrix also ran within 32GB without swap. I should have found an C19x job to do before tackling this one, but sometimes interesting projects appear before we're perfectly ready! |
|
|
|
|
|
#324 | |||
|
May 2009
1616 Posts |
Quote:
Quote:
Quote:
From the rsa220 log files (more than 5 years ago, on a machine which was probably 3 years old by then): - purge: 6h WCT, 62G RAM - merge: 60h WCT, 195G RAM (but cado-nfs merge has evolved a lot recently!) - replay: 4h WCT, 207G RAM - lingen in linear algebra (done in early 2016, somewhat later than the rest) required 500G RAM (but I'm actively working on it this summer.). the RAM counts represent the VmPeak which may have limited significance. I actually still have the relations for that one. Time permitting, I may give a try to the recent software to see how it fares with that data. |
|||
|
|
|
|
|
#325 | |||||
|
"Seth"
Apr 2019
29110 Posts |
Quote:
Quote:
Thome, would you be interested in a new server flag that saves a last N days of the database? I'm imagining copying it every night and deleting any copies older than N days, or maybe saving a new copy every X WUs (so that during periods of no work it doesn't delete the older copies) Quote:
Quote:
|
|||||
|
|
|
|
|
#326 |
|
"Curtis"
Feb 2005
Riverside, CA
4,861 Posts |
Looks like the server machine or connection failed. It does not respond to ssh, so it may be a campus internet outage.
I'll head in to my office and investigate, hopefully we'll be back up in half an hour or so. |
|
|
|
|
|
#327 |
|
"Ed Hall"
Dec 2009
Adirondack Mtns
1110111010012 Posts |
Three of four clients locked up, which also locked up the switch they were on. I see the server is not communicating with cloudygo ATM, either. In case (one, or more ,of) these machines caused the server to go down, I have taken all four off line.
The fourth machine went into the waiting loop. Edit: I guess I was constructing this msg as VBCurtis was posting. I had not seen his until after I submitted mine. I will still leave my clients off line for now. Last fiddled with by EdH on 2019-07-20 at 03:24 |
|
|
|
|
|
#328 |
|
"Curtis"
Feb 2005
Riverside, CA
4,861 Posts |
The power outage mentioned in post #169 that never happened got around to happening tonight.
According to campus police, power should return to the building at 7am pacific time Saturday (10 hours from now). I expect the machine will boot itself when it gets power, so I'll try to connect and fire up CADO shortly after 7am. If I am unable to connect, I'll drive back to campus and power on the machine manually. Sorry for the outage, folks! On the bright side, better this week than next (when I wouldn't be able to get to campus at all). |
|
|
|
|
|
#329 | |
|
"Ed Hall"
Dec 2009
Adirondack Mtns
11×347 Posts |
Quote:
Glad to hear my machines were not the cause.
Last fiddled with by EdH on 2019-07-20 at 12:26 |
|
|
|
|
|
|
#330 |
|
"Curtis"
Feb 2005
Riverside, CA
4,861 Posts |
We're back up and running; our host box did power up on its own.
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Coordination thread for redoing P-1 factoring | ixfd64 | Lone Mersenne Hunters | 81 | 2021-04-17 20:47 |
| big job planning | henryzz | Cunningham Tables | 16 | 2010-08-07 05:08 |
| Sieving reservations and coordination | gd_barnes | No Prime Left Behind | 2 | 2008-02-16 03:28 |
| Sieved files/sieving coordination | gd_barnes | Conjectures 'R Us | 32 | 2008-01-22 03:09 |
| Special Project Planning | wblipp | ElevenSmooth | 2 | 2004-02-19 05:25 |