![]() |
|
|
#100 |
|
Aug 2002
Ann Arbor, MI
1101100012 Posts |
I have another idea. If our server had the space and bandwidth, we could have the program send checkpoints. We wouldn't do this regularly, just, for example, every quarter of the way through a 33M, or 1/2 way through a first-time test. That way we'd lose less work if someone's hard-drive gets cleared. It would also send the checkpoint file if someone quit gimps or unreserved an exponent with some work on it, so the work wouldn't be lost. Because the checkpoints files are relatively large, we would make checkpoint sending optional (so people could keep the promise of low-bandwidth).
|
|
|
|
|
#101 |
|
Sep 2002
11101012 Posts |
hmmmm.... think 31000 users all sending 2-3 times more data... that sounds like a potential nightmare compared to the few data losses from HD failure or the like. Seems to me that if you're THAT concerned about your data, you're probably a computer fanatic and losing one exponent from the 10+ on your farm REALLY won't kill you.
That said, it really isn't a bad idea, but I vote we stick to the simple method that works rather than trying to impliment a data backup that MIGHT save a few hundred P90 years a month... |
|
|
|
|
#102 | ||||||
|
Oct 2002
25 Posts |
Quote:
documentation for a loooooong list of popular web sites and services that use MySQL). Specificially about webcounter.com (as I trust my personal experience better than long lists provided by involved parties), MySQL is used for all db needs, and there were no reasons to ever regret about choosing MySQL. In particular, MySQL provides replication for quite a while already, and the feature is very stable. It has some limitations tho (for example: one-way replication can be configured flexibly, but mutual replication works well only with 2 servers; if you add more, replication performance and relibility seriously suffer). So I have thoughts doing custom replication, it might provide few useful features for multi-server configuration specificially for GIMPS. Not that I like reinventing wheel tho... Any way, MySQL replication will (must!) be immediatelly used to perform remote realtime backups of the database. As I told, one-way replication is very robust, and can be configured in matter of few minutes. BTW, MySQL documentation has a chapter titled 'How MySQL Compares to PostgreSQL' - I strongly recommend to read it. Overall, MySQL documentation have always been an excellent well-maintained resource that does not merely cover syntax and utilities in great details, but also discusses alot of other important issues. That's a big plus for long-term projects. Quote:
Quote:
handle traffic spikes and be unsucceptable to most DoS attacks. As soon as event queued, background thread should be woken up, and it will: - record result in results database (if it havn't been recorded yet - unique keys will help to make this fast); - Release exponent (if it havn't been released yet - once again, single query does everything); - Mark log event as processed or delete it altogether (again single query); - Process next event, or go sleeping if there are no more unprocessed events. That's an example how to go without transactions (which means go much much faster). Note that none of the actions require immediate processing - if something gets delayed by few minutes (which is very much for P4 server :) ) nothing wrong happens. Handing out new exponent can be also done with just one event queue too; tho it will require both preprocessing and postprocessing of event (but programming codes and database structure will still be quite simple and straightforward). Then, stats: it's excessive to maintain summary tables (absolute majority of stats tables will be just summary tables) within transaction. If something fails, summary tables can be always rebuilt (at least, database must be designed to make it possible). Also, if we speak about some advanced stats more than just two running totals per account, such processing would take way too long if processed within request. Doing in-transaction stats processing would also make much more difficult to spread stats among few servers, and would notiecably increase traffic between core module and stats module. Currently PrimeNet shows updated numbers immediatelly; it is not really necessary. If stats gets updated by background thread, delay of a couple of seconds will not hurt. Alternatively, stats could be processed in batches, say once per hour - that's what dnet (once per day) and webcounter (once per hour) do with their traffic. Batch processing usually simplifies stats code. Quote:
rewrite protocol, making it more flexible extensible (XML based?) etc. I see this problem with supporting old clients: they make connection to URL within entropia.com domain, yes? I believe entropia.com guys need their domain more than GIMPS. Tho if it were something like gimps.entropia.com, it could be moved around. Problem could potentially be worked around by acting the URL as http proxy to the new servers, or installing primenet2 coserver at entropia - in all cases entropia people would need to make certain efforts. Quote:
(Oh well - the message board software does not allow me to upload html attachement - use the link above) Quote:
|
||||||
|
|
|
|
#103 | |
|
"Mike"
Aug 2002
2·23·179 Posts |
Quote:
Feel free to clog this thread up... I personally find it easier if everything is in one spot, and the search function really helps me when I need to find something later on... I agree about avoiding MS Word documents... Plaintext is fine for me... |
|
|
|
|
|
#104 |
|
Sep 2002
32·13 Posts |
After 100+ posts, clogged is hardly a strong enough adjective!
Anyway, I was just wondering about your difficulties with MS Word since you can read a word file in just about any other WP program. Actually I ask many of my colleagues to send me word docs rather than plaintext simply because I know I will be able to read it, and in many cases people screw up formats they aren't familiar with (not to suggest that George would have trouble with it). Admitted, a standardized plaintext is more usefull overall, but because of M$'s monopoly for so long, they've effectively standardized the WP market. |
|
|
|
|
#105 |
|
Oct 2002
1000002 Posts |
Here is plaintext version of what I referred to earlier as html page. This is how I envision things personally (and my opinion is influenced by previous posts in this forum); corrections/comments are solicited.
Feature: Giving CPU credit more often (mid-work-unit) prime95/mprime/PrimeNet: Credit is given only at the end of particular stage. PrimeNet2: Trial factoring might be divided into smaller blocks (with each block having same bits in tested potential factors?). This will break down factoring into small pieces that will be returned faster; it also allows several computers to factor the same exponent simultaneously. It will also allow to assign 'easy' factoring blocks to slow computers and new GIMPS users thus encouraging them, and tasky factiring like '68 to 69 bits' to more powerful computers that have been around for some time already and thus the user has alrady tired to check stats every hour :) [AG] Not completely in topic, but: few years ago I made java applet that allows to do factoring with java-enabled browser; because of java having specialized API for manipulation with large numbers, the performance was pretty good. That java applet did realtime subdivision and reassembling of factoring work into/from very small pieces. I might revive the idea eventually... It sounds good to create a page with such java applet that states 'you are already helping GIMPS... download native software for your specific operation system/hardware to do the math work faster'. Can P-1 factoring be broken down into small blocks? Looks like yes (varying B1 and B2), but should it be done (or will it affect overall GIMPS performance)? [George, please advise] For LL testing only speculative crediting might be done. I strongly believe it should not be handled within core module, but instead stats module should 'average out' values. Doing that within stats module is straightforward and not error prone. Since LL intermediate files are large, and only increase in size as GIMPS advances to large exponents, breaking LL tests down to blocks is not reasonable. At least not until someone is willing to pay dollars for huge amounts of bandwidth. Feature: After reporting a result, the server could return your current standing in the stats. prime95/mprime/PrimeNet: Ranks available only at server. PrimeNet2: I strongly advice to avoid bloating client-side software with features like that. Those bits of code are almost impossible to upgrade (how many v16 clients are still running?); it complicates core server module making it more error prone, and more prone for DoS attacks; and so on. Nevertheless, client-side software can use an additional connection method - not to core module but to stats module - which it can query periodically, and not only show position, but also show a nifty table what every computer in the account performs, and so on. But is that really neccessary? Everyone has web browser - client software might just open browser window for curious user. Feature: Client could report cpu type, speed, hours-per-day, rolling average to let server pick default work type. prime95/mprime/PrimeNet: At least, PrimeNet shows CPU types and frequencies. Tho there are some bugs (initially my 3 P4 computers were shown as 'Unknown', now two of them spontaneously changed to P-III). Anyway, as far as I know PrineNet does not use the information to make decisions. PrimeNet2: Good estimation of 'speed index' of each computer is definitelly a valuable information. PrimeNet2 should allow users to explicitly select which types of work they prefer or dislike; but default behaviour should be very flexible; it also should be implemented on server, not hardcoded into hard-to-upgrade clients. BTW, if server knows amount of RAM available, it will be able to avoid assigning too large exponents and provide factoring work for low-RAM computers when all small LL tests have already been assigned/done. Feature: P-1 factoring work units. prime95/mprime/PrimeNet: Accepts P-1 factoring result messages, and provides client with information if newly reserved exponent have already passed P-1 factoring step or not. P-1 factoring CPU time is not credited (?) and P-1 factoring status is not displayed anywhere. PrimeNet2: Database structure should be flexible enough to allow adding more types of work units easily. For example, why don't to start doing P+1 factoring, the code is already in prime95/mprime? Or someone might discover a new great factoring algorithm tomorrow. Specificially about P-1 work units, I have this concern: the algorithm requires alot of RAM. Thus low-end computers are probably excluded. Faster computers would likely prefer to do only LL awaiting for someone else to do P-1 - and who will do only P-1 if it requires even more resources than LL, but can not find prime? Other words, with this particular algorithm it might be better to enforce every assignee to do both P-1 and LL if they want to do LL. (This is how it currently goes) One idea might be to improve client so that if worktodo contains both P-1 and LL work, then P-1 is done during night and LL goes on during days; or something like that. I can assume that there are alot computers out there that have hundreds Mb spare RAM at night but users don't like to give more RAM than sufficient for LL during daytime. How small P-1 blocks could be done? For example, it is possible to create (ridiciously small) 1-minute P-1 blocks, or there is some long pre-postprocessing even for tiny blocks? Would such tiny blocks be suitable for processing at computers that usually do trial factoring (presumably, low-RAM computers)? Or amount of RAM required for P-1 does not depend on P-1 boundaries? Feature: Give the user the ability to configure some client options from the server. For example, user could change the memory settings or default work type from a web form. Great for client computers that are not easily accessible. prime95/mprime/PrimeNet: Not supported. PrimeNet2: That's what account module should do, among other things. Since the information should be submitted to clients eventually, the accounts module should be tightly integrated with core module. But one should be VERY careful with features like that. Choosing preferred type of work on server is probably fine; choosing amount of RAM used by client is NOT. It will cause great number of concerns from GIMPS users, I'm sure - noone wants to ever get prime95/mprime using 1.5Gb of virtual RAM (and thus swapping computer down to death) due to bug or DNS spoofing attack or whatever. Network-related notes: the settings on the server will not take immediate effect (client should at least check in with the server once again, and/or finish currently running work). Also, this feature precludes project from using proxies (at least, this functionality will make proxy protocol quite more complicated). Feature: Exponents pouching prime95/mprime/PrimeNet: Readily prefers results from poucher to work being done by 'legal' assignee. PrimeNet2: Pouching should not be credited. It might be considered as tripple-checking, but no credits of any kind are given to such tests. Why tripple-checking, not double-checking? Currently, poucher may maliciously pouch twice with the same exponent, using 2 different accounts making GIMPS to skip an exponent. Other words, results from pouchers should never be trusted. On the other hand, it's virtually impossible that server assigns the same exponent for checking and doublechecking for 2 -distinct- malicious users that report the same residue - we all hope that absolute majority of GIMPS users don't forge results and don't use faulty hardware. Feature: Milestone showstoppers prime95/mprime/PrimeNet: Are not handled specificially. PrimeNet2: It is proposed to assign each GIMPS computer a 'reliability level'. New GIMPS computers get zero value; as they process and return exponents the level increases first quickly then slower. Fast computers get additional boost. Computers that ever had timed out exponents get their reliability level noticeably descreased. Computers that ever returned faulty or forged result drop a mile below zero. Reliability levels of other computers in the same account affect reliability level of the new computer. Highest reliability level value can be assigned only manually. The exponents do not really have any special value - they have the similar chance to find new mersenne prime as other exponents. But humans are just humans... and do like milestones. So smaller exponents are assigned only to computers with high reliability level. The formula used to compute reliability level should not be too discriminative; probably, computers with very low reliability level (new computers assigned to GIMPS) should be assigned only trial factoring, to ensure that new users get their stats updated often. :) Unless they explicitly set their preferences, of course. Feature: Server failsafety prime95/mprime/PrimeNet: None. PrimeNet2: Either proxies or replicated servers should be used. If possible, design should allow both: replicated servers serve usual users, and proxies might be greatly appreciated by those who run large computer farms behind firewalls with strict rules. Feature: Server-side technologies prime95/mprime/PrimeNet: Windows NT server, with [resource-ineffective] cgi scripts and [some] SQL database. PrimeNet2: Proposed is JSP/Servlets server-side technology, with MySQL as backend. POSIX compliant server-side environment is strongly recommended; but choice of Java+MySQL completely avoids portability problems leaving only security and reliability issues to consider. While desktop-oriented operation systems should be avoided, they should still be capable of running PrimeNet2 codes if ever deeemed neccessary (in case of drastic outage, whatever). Feature: Database architecture prime95/mprime/PrimeNet: Monolitic transaction-oriented (?) SQL database. Database is not authoritative; master database is maintained by George. PrimeNet2: Three modules are proposed: - core module - serves client-side software enqueries (exponents reservations/relases/status and minimal amount of stuff aside that - to reach best possible performance, reliability and security); - accounts and teams module - provides web-based interface for the database; - stats module. Ideally, to improve flexibility reliability and security all modules should have independent databases. To satisfy practical requirements, first 2 modules need to share a chunk of the database. (Another option to consider is that database of second module is replicated to first module; but this somehow limits possible features). Stats module needs only read access to part of the database (but might use several temporary and/or summary tables); it sounds reasonable to keep stats database separately (ideally, at different hardware) by replicating relevant data from core module. This makes it impossible for stats module to advertisely affect core database, thus decreasing security concerns. [AG] Transactions are not needed; moreover, should be avoided. More discussion on this when we start architecting database structure. PrimeNet2 should ideally keep master copy of GIMPS database (to provide George with more spare time to speed prime95/mprime code ;) ). (This also means that there should be a feature to release/return large chunks of exponents that are coordinated elsewhere). Whichever server-side configuration is choosen, realtime backups of the database should go to few locations around the world to ensure data safety. At first stage, PrimeNet2 database should be just assigned large (huge) chunk of exponents from the database maintained by George, with existing data imported later without haste. Feature: Authentification prime95/mprime/PrimeNet: Accounts are protected with passwords. PrimeNet2: Accounts should keep having passwords, of course. Additionally proposed to introduce [randomly generated?] per-computer passwords. This is to avoid keping account passwords on potentially untrusted computers. Account password will be neccessary for tasks like instalation of new computer (and adding it to an existing account), browsing account stats, changing preferenses at web site, etc but a particular computer may easily go on unattended with only per-computer password. |
|
|
|
|
#106 |
|
Oct 2002
Lost in the hills of Iowa
26·7 Posts |
United Devices used a web-based "configuration program" for their clients.
I strongly disliked it, and it seems to cause a LOT of problems with newbies during their too-frequent outages. I like the "send checkpoints" idea, *if* 1) The bandwidth needed isn't too excessive 2) Poaching of checkpoints is somehow prohibited in the software 3) It's made an optional capability Tricky part is when a checkpoint is sent, then the machine it's from dies.... |
|
|
|
|
#107 |
|
Oct 2002
25 Posts |
Regarding remotely checkpointing LL tests (as well as P-1). That will require RAID with bunch of drives, stream of electricity, U4 case (maybe even U6), and quite a chunk of bandwidth. Rough math shows that resulting expenses will be 3 to 5 cents per checkpoint (plus large setup fees), multiply by number of exponents GIMPS handle... Anyone's willing to pay? :) No? :( Ah, no...
Well, then let's close the topic.
|
|
|
|
|
#108 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
7,537 Posts |
Quote:
Transactions have a performance penalty but they do provide a benefit. I briefly started reading some MySQL docs and see that transactions are supported. I think I saw that Oracle-style consistent read is also supported (a cool feature when generating reports). Subqueries are not (*sigh*), I use them a lot in generating the status.htm page now. |
|
|
|
|
|
#109 | |||
|
Oct 2002
25 Posts |
Quote:
Quote:
Quote:
If we adopt/use MySQL replication model for servers that serve requests simultaneously, than we face problem that can be generally described like this: order of transactions at slave server (i.e. at all servers) is not defined. This is going to cause problems in few scenarios, for example: -- server A accepts result of LL test, and makes neccessary transaction -- server B decides that the assignment has expired, and makes neccessary transaction ---- even worse if it quickly reassigns the exponent to someone else -- now servers try to cross replicate transactions and both fail as preconditions at slave differ from master. ---- if something like this happens, MySQL stops replication threads at all servers awaiting for manual intervention :( -- It can happen that transaction is not getting replicated to another server. Chance is awfully small but non-zero (all scenarious involve server crash) ---- it results from the fact that MySQL uses replication log that's independent of transaction engine and it can not really rollback transaction from replication log ---- MySQL could also have transaction replicated only partially... ------ I believe it's fixed in recent MySQL 3.23.52 ------ Don't know status of this issue in MySQL 4.x ------ Tiny chance of skipping transaction at slave still stays If we use different SQL engine that supports distributed transactions, then what happens if servers temporarily lose route between each other but still serve internet traffic successfully? -- both servers get hundreds threads stuck awaiting for transactions to start? -- each server assumes that it's the only server survived, and now we have two competing servers? ---- after servers see each other again, one server dies together with some activity it processed during the partial outage? Otherwise we get transaction ordering problem. Ok, we could delay timing out by 24 hours to give it neccessary time marging... but similar problem happens if we assign the same exponent simultaneously... if we solely rely on transactions, it is not sufficient; or otherwise we can not really have remote servers working cooperatively. So, since transaction would work well only with single-server configuration, I (see the beginning of the message) consider events queue as a better approach for this particular task. Giving the topic more thoughts, looks like GIMPS have 2 options: -- Use single main server, and few proxies to guard clients against server outages; ---- isn't single server is what we are trying to avoid? ---- proxies raise a whole bunch of authentification issues ---- certain features are hard-to-proxy (if at all possible) -- Use event-based approach, and several cooperating servers (that presumably can't go down all at once). ---- hard to invent anything beter then this from failsafety point of view Later today I plan to put here larger text regarding proposed events model. |
|||
|
|
|
|
#110 |
|
Oct 2002
Lost in the hills of Iowa
26×7 Posts |
I don't believe we're trying to avoid having one "master" server - nor do I think we need to. Splitting the web and report stuff out might make sense, but the actual "I deliver the next item to be tested and keep track of what you sent back" server doesn't create that much of a load.
Not sure if we really need any "proxy" servers - GIMP doesn't soak any where near the transaction load of a small-block project like Distributed.net -and they managed with a single "master" server, with fairly few problems - despite a LOT more "blocks" of work per day done per client (ballpark TEN THOUSAND more for a conservative estimate vs. LL work, closer to *a* thousand on factoring) and a LOT more clients active in a given day. I think having a "proxy" server might be nice for when the master needs to be down for maintainance - but even that would be redundant if the client is set up to keep a week or two more "work on hand" than it's check-in interval. I suspect the data load for each block on transfer was quite a bit larger too for downloading, but somewhat smaller on average for uploading results - but don't know what George sends and recieves out of Prime95, so I am just guessing there. |
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Report of monitoring primenet server unavailability | Peter Nelson | PrimeNet | 13 | 2005-10-18 11:17 |
| Is Entropia in trouble? | ekugimps | PrimeNet | 1 | 2005-09-09 16:18 |
| mprime stalls if primenet server is unavailable :( | TheJudger | Software | 1 | 2005-04-02 17:08 |
| Primenet Server Oddity | xavion | PrimeNet | 28 | 2004-09-26 07:56 |
| PrimeNet server replacement | PrimeCruncher | PrimeNet | 10 | 2003-11-19 06:38 |