mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > NFS@Home

Reply
 
Thread Tools
Old 2016-06-05, 20:31   #1
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default Improving the queue management.

TL;DR: we're currently experiencing issues, we know we need to improve, and we need both a bit more design feedback, and some coding help to speed up the process. TIA


As you may have noticed in the past few weeks, the 14e and 15e parts of the NFS@Home grid have been suffering from a series of client hunger episodes. That is a consequence of the growing pain of feeding the 15e and especially the 14e clients with relatively small tasks. It's a problem we need to solve, preferably in a durable manner - and possibly at a wider scale than NFS@Home

Part of the work load and pain comes from the fact that the process, inherited and slightly improved from RSALS, is largely manual. The tasks of creating entries, entering number information, changing the number state multiple times, expanding the range of q values, updating post-processor reservations, updating factor information, are repetitive, wasteful and error-prone - and nowadays, they seem to fall on the shoulders of seemingly too few persons.
Giving more persons access to the management infrastructure should solve the immediate scalability problem, but does nothing about the repetitiveness of the process. We need to do better, and we know we can do better - for instance, Makoto Kamada's near-repdigit management pages let users reserve numbers and enter factors by themselves, which is a pretty good thing (but as it stands, the management infrastructure gives access in an all or nothing fashion).

I started the discussion several weeks ago in the middle of the queue management topic, and jyb picked up on my post.

I have slowly set up myself for improving the management infrastructure: gathering the production code, starting preparatory tasks for unifying the 14e and 15e management databases and code, and setting up a reproducible testing environment. All of that backed by a private Github repository, of course.
That's the current state of the work. The pages cannot be rendered yet because no stub was made for the BOINC DB access code. But the real, functional work is going to begin soon, and I'd like to make sure I'm not missing some functional requirements, to begin with - especially on the roles

A summarized excerpt of the todo list is as follows:
  • factoring out the duplicated code and databases behind 14e and 15e, creating a third, unified page which shall be used for both queues for now, and possibly the larger siever, or new task types (distributed polynomial selection ?) in the future;
  • building up the Role-Based Access Control system. I identified four initial roles:
    1. "admin" role: ability to delete numbers, to queue extra-wide ranges - should not be used most of the time;
    2. limited role 1 "queue managers": the existing single role minus the special abilities allowed by the admin role. Queue managers would be tasked for starting enough tasks to feed the grid, in a fair order if possible (use multiple number sources when there's enough ECM power, let ECM queues rebuild at other times, etc.);
    3. limited role 2 "scientist": say, William, Andrey and other project managers or frequent contributors (e.g. Sean), for creating entries and filling them in, but not moving them to QUEUED for SIEVING or later;
    4. limited role 3 "post-processer": post-processers could reserve tasks and post factor size information, and unreserve their own tasks, but not unreserve others' tasks;
    Of course, some user accounts could have multiple limited roles at once. And perhaps post-processer should be expanded with the ability to enter ECM work information, if we decide to track that, e.g. the way the near-repdigit project does.
  • automatically deducing some information from the poly, such as number value, SNFS / GNFS, difficulty, number of bits, etc.;
  • adding warnings upon events such as attempting to create an entry with the same internal name as an existing one, attempting to expand the range of q values above e.g. 500M for 14e (probably indicates that an extra zero slipped in), entering a polynomial containing a number whose value already exists (or with some more work, which has clear-cut parameters mismatches, such as 32-bit LPs tasks with rlim < 100M, etc.)

The end goal is to eliminate repetitive tasks and make the project's management more scalable. And throwing out an idea: if tighter integration between the yoyo@home and NFS@Home queues is seen as beneficial to both projects, I'd say, why not ?


For coders: the main technologies involved in the current infrastructure and the planned improved infrastructure are PHP 5.x and MySQL 5.5 (used by BOINC), currently on Ubuntu 14.04 LTS. IIRC, the web server is Apache 2.x, but the queue management code doesn't care about that, as long as it somehow sets PHP_AUTH_USER, for the not-yet-written RBAC.
The todo/wish list currently mentions database tables for users and their roles, but if the set of users is narrow and static enough (which it should be), hard-coded PHP associative arrays are clearly simpler...


Thanks in advance for your input and your help

Last fiddled with by debrouxl on 2016-06-05 at 20:47
debrouxl is offline   Reply With Quote
Old 2016-06-06, 13:25   #2
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

1012210 Posts
Default

Quote:
Originally Posted by debrouxl View Post
TL;DR: we're currently experiencing issues, we know we need to improve, and we need both a bit more design feedback, and some coding help to speed up the process. TIA


As you may have noticed in the past few weeks, the 14e and 15e parts of the NFS@Home grid have been suffering from a series of client hunger episodes. That is a consequence of the growing pain of feeding the 15e and especially the 14e clients with relatively small tasks. It's a problem we need to solve, preferably in a durable manner - and possibly at a wider scale than NFS@Home
I'll see what I can do, though my time is limited at present, not least preparing for a 2-week vacation during which my availability will be even more limited.

As a part-time feeder of the 14e queue I'm in full agreement with your assessment of the difficulties involved in keeping it fed.

As a consumer of the output, I would very much like to receive an email telling me which person has just completed post-processing on which number with what results.


Something I'd find very useful is concrete guidelines on the limits on composite size / SNFS difficulty for each queue and for setting sieving parameters for integers of specified GNFS or SNFS difficulty. If those guidelines can be made they could, presumably, be hardwired into a script which given N, "type={g,s}nfs" and a polynomial (which implicitly contains the difficulty) produces everything else required.
xilman is offline   Reply With Quote
Old 2016-06-06, 13:33   #3
debrouxl
 
debrouxl's Avatar
 
Sep 2009

17218 Posts
Default

Quote:
As a consumer of the output, I would very much like to receive an email telling me which person has just completed post-processing on which number with what results.

Something I'd find very useful is concrete guidelines on the limits on composite size / SNFS difficulty for each queue and for setting sieving parameters for integers of specified GNFS or SNFS difficulty. If those guidelines can be made they could, presumably, be hardwired into a script which given N, "type={g,s}nfs" and a polynomial (which implicitly contains the difficulty) produces everything else required.
Both excellent ideas, I'll add them to the todo list
The todo list already contains an audit log, but no notification means, e-mail or otherwise.

Calling YAFU for finding SNFS polynomials, and performing trial sieving when requested, are already on the todo list, as lower-priority items. But the data model doesn't currently support multiple polynomials, everything is lumped into a single table whose tuples have many fields - splicing the table, using formal forms above 1, and joining are probably in order in the mid-term.

Do you have a Github account ?
debrouxl is offline   Reply With Quote
Old 2016-06-06, 13:56   #4
wombatman
I moo ablest echo power!
 
wombatman's Avatar
 
May 2013

110110000112 Posts
Default

Lionel, for the test-sieving, YAFU can take multiple job files as input and tell you which sieves best as well: http://www.mersenneforum.org/showpos...&postcount=620

That may be an option for automated trial sieving of a GNFS number.
wombatman is offline   Reply With Quote
Old 2016-06-06, 16:16   #5
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160358 Posts
Default

Quote:
Originally Posted by xilman View Post
Something I'd find very useful is concrete guidelines on the limits on composite size / SNFS difficulty for each queue and for setting sieving parameters for integers of specified GNFS or SNFS difficulty. If those guidelines can be made they could, presumably, be hardwired into a script which given N, "type={g,s}nfs" and a polynomial (which implicitly contains the difficulty) produces everything else required.
Yafu can do this (as well as automated test sieving).
Dubslow is offline   Reply With Quote
Old 2016-06-06, 16:53   #6
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

2·3·7·241 Posts
Default

Quote:
Originally Posted by debrouxl View Post
Do you have a Github account ?
Yes, "Brnikat". I really need to do some more work on my little Algol 68 project.
xilman is offline   Reply With Quote
Old 2016-06-10, 16:38   #7
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default

First milestone today: the production setup can (nearly) be replicated in a testing environment built in a fully automated way by Vagrant + postinst script with interactive input. That is, despite warnings and missing CSS at the moment, the crunching + crunching_e and manage_crunching + manage_e pages now work as they should, rather than spitting fatal errors about missing files, DB access errors and whatnot

The management pages are behind user+password authentication, with 17 accounts corresponding to the new admin, the 4 (AFAIK) existing managers, those who have post-processed numbers for NFS@Home in the past few months and/or provide numbers. When the RBAC system is done, these users will be used for testing the four aforementioned roles.

fivemack: is the "twomack" Github account yours ?

Last fiddled with by debrouxl on 2016-06-10 at 16:41
debrouxl is offline   Reply With Quote
Old 2016-06-12, 06:06   #8
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

22·7·132 Posts
Default

Not sure how to help you Lionel, you're doing a great job. All my code skills vanish since I left university.
pinhodecarlos is offline   Reply With Quote
Old 2016-07-17, 13:10   #9
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

111748 Posts
Default

Hi Lionel, can you please give us an update? Thank you.
pinhodecarlos is offline   Reply With Quote
Old 2016-07-17, 15:24   #10
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default

Sure
The code base didn't evolve since my previous report.
debrouxl is offline   Reply With Quote
Old 2018-05-06, 21:05   #11
debrouxl
 
debrouxl's Avatar
 
Sep 2009

977 Posts
Default

Hi everyone. The code base still hasn't evolved since then...

I dropped the ball participating in the NFS@Home grid management, grid programming or number post-processing about two years ago. The work for my day job, software for the TI graphing calculators (which brought me to the integer factoring scene through being the backup admin of RSALS while it served its original purpose), managing the NFS@Home 14e (mainly) and 15e queues, reading the daily news flow in CS and IT, and on top of that, improving+expanding the grid management software, was too much to handle.
It still is, despite cutting back on reading CS / IT news and working on software for the TI graphing calculator community.

However, the four tabs on NFS@Home's crunching and management pages have stayed pinned in my browser throughout these two years. I monitored the 14e queue's state on an irregular basis, but probably once a month on average. I fixed typos once or twice. I saw the queue run dry a number of times, and it was partially my fault, clearly

Today's peek at the queue, and especially numbers in SIEVING state, is what brought me back here. 37 numbers in SIEVING state, more than half of which aren't reserved, and ~470 GB of data in result files, is much more than the last time I had checked. Looks like I could, and I should, lend a hand to help deflate the queue to more manageable levels...

Working on a much improved queue management system which enables more user cooperation and spreads the work load still makes at least as much sense as it did two years ago.
As I wrote above, I successfully set up a testing system, which was a very important milestone, not necessarily because it was hard to do, but because it enables making forward progress on the work; now, this system needs to be put to good use. Nearly exclusively by someone who isn't me - that's been pretty clear for a while.
debrouxl is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Fast Breeding (guru management) VictordeHolland NFS@Home 2466 2020-09-20 06:51
Improving Sieving by 18%. cipher Prime Sierpinski Project 10 2009-07-01 13:34
split a prime95 queue & client installation joblack Information & Answers 1 2009-01-06 08:45
Power Management settings PrimeCroat Hardware 3 2004-02-17 19:11
Improving the RAM allocation for Prime 95 Matthias C. Noc Software 3 2004-02-12 19:34

All times are UTC. The time now is 05:09.

Mon Oct 26 05:09:32 UTC 2020 up 46 days, 2:20, 0 users, load averages: 3.06, 2.60, 2.42

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.