mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Prime Sierpinski Project (https://www.mersenneforum.org/forumdisplay.php?f=48)
-   -   New server discussion (https://www.mersenneforum.org/showthread.php?t=5131)

magnav0x 2005-12-02 21:26

New server discussion
 
My only idea is that the server pspnet.no-ip.org on port 12984? I did a scan on the host and the port appears open, so what else could the issue be, other than the host not accepting incomming connections. Suppose I could jet on over to the UK and kick the server or something.


Edit by LTD: Sorry this entry should have not been splited but i don't know how to move it back

ltd 2005-12-02 21:42

I have allready mailed Footmaster to look at the server.
Sorry but there is nothing else i can do about it.

Lars

Edit by LTD: Sorry this entry should have not been splited but i don't know how to move it back

ltd 2005-12-07 06:52

Thats the small problem. The IP is not static so the redirect must go against the domain name if that is possible.

Our next problem are the linux machines that run for magnav0x and need the fixed IP entry cause of problems with domain name resolution in some of the linux clients.

Then we have to decide how we procede.
There are three possible ways.
1. We go back to the old server at footmaster if he is willing to continue to run it. Advantage:It is a machine with static IP and there is no problem with changing the client configuration.
Diadvantage: Machine is known to have some stability problems.
2. We continue to run with my server.
Advantage: I have everything in my hands what makes it easier to insert new ranges if needed or to remove sieved out k/n pairs within a day. It also gives me a better chance to come up with automated stats updates in the future as file handling will be easier.
Disadvantage: It cost me an extra 10Euro per month for the connection. ( Thats not really a problem as i am willing to pay that)
The server has to prove that it is stable.
The server has not a static IP
3. We move to the server offered by magnav0x.

OK let the discussion start.

Lars

Citrix 2005-12-07 07:07

Ill leave the decision up to you. I will transfer the .org account to your account in a day or two. (check email), so no more IP issues.

May be the linux computer's problems can be fixed?


What ever you think is best, is ok with me.

Citrix

drakkar67 2005-12-07 07:21

voting for trying the 2nd choice

drakkar

Mystwalker 2005-12-07 10:53

Maybe hosting it on a virtual server would be another possibility.

- the price is < 10 Euros meanwhile
- static IP address is available
- "direct" (except physical, of course) access as well
- system ressources would be lower than for the other three options, but I'm not sure whether a high performance is needed

I've seen that you can [url=https://www.server4you.de/de/v/vservertest.html]test[/url] a virtual server for 3 days without any hassle...

ltd 2005-12-07 18:46

The link Mystwalker has posted is very interesting.
I don't know why i ignored that provider when i looked for a server.
There virtual server should be able to even hold the complete database
if we don't build the stats every hour.
I will check there server next week when i am on holiday and also ask them
if the expected load from the stats runs is OK from there side for a virtual server.

I had this discussion earlier with other providers and got the answer that they would not allow that load on a virtual server but i think the machines offered at this provider should allow the load when we do only a handfull of statsruns a day. ( Start with one and maybe end with a run every 6 hours)

I will keep you informed.

Lars

Mystwalker 2005-12-08 10:47

Maybe it is possible to tweak the DB to lessen the stats creation effort?

- optimize queries
- applying indexes (if not already done, of course)
- aggregate data that doesn't change anymore, so that is does not have to be computed every time

ltd 2005-12-08 16:22

[QUOTE=Mystwalker]Maybe it is possible to tweak the DB to lessen the stats creation effort?

- optimize queries
- applying indexes (if not already done, of course)
- aggregate data that doesn't change anymore, so that is does not have to be computed every time[/QUOTE]
Here an answer to your suggestion.

There is allways space for optimized queries but i have no good ideas at the moment.

Indexes are set up as far as it makes sence with the RAM installation i have on my machine. But there are some queries that can not use an index due to some restrictions of mysql.

I have already aggregated data as far as it was easy possible. For example stats for users/teams are created within a second.
Importing new results is also done within "no time".
The most time is spend to create stats tables like "distribution of untested values".
These tables are made out of several queries with nice things like "group by"/ "min"/ "max"/"sum"/... in different combinations. I do not know other ways to get these informations without a significant danger to have inconsistent data in the display or without a significant slowdown of the data import.

To give an impression of the amount of data which has to be handled.
The complete database has a size of 264MB at the moment.

Lars

Citrix 2005-12-08 19:26

The .org account is pointing to ltd's server, now.

@magnav0x, I suggest you run a local LLRnet for your linux computers, since you have a computer with a static IP that you could use as a server. I think that should solve all the problems at present.

Citrix

Citrix 2005-12-08 20:23

Can we make the basic database public? That way people can study it and come up with new ideas etc. Are there any problems anyone can see?


Well the size is an issue, 264 MB. But we could make a smaller demo DB say upto 2 Million only. Or something like that.

What do you all think?

Citrix

magnav0x 2005-12-08 20:49

I only had LLRNET running on one Linux machine, so it won't be a problem. I siwtched it to SB anyways, because of the dynamic ip problems. I'll still offer up my dedicated server to host the server if needed, but since I'm only down one machine it's not a big deal to stick with ltd's server. Though it may cause a problem with client routing for other linux users in the future (until the problem with resolving DNS is solved in LLRNET). Releasing info about DB structure would be useful, I wouldn't mind taking a look at how you are storing results to possibly find a more efficient storage structure. I'd like to contribute to the project in any way possible.

ltd 2005-12-08 20:51

I see no real problem to give out the DB if anybody is really interested.
The only problem is that my documentation is by far to old. It still describes some tables that no longer exist and on the other hand some of the stats tables are not yet described at all. I think it is no fun to analyse the system without any documentation so you must wait until i have something ready.

But my first priority will be to check out the other provider and if that works move some of the contend over there.

By the way any ideas for a domain name? It must be a *.de domain as that one is in the prize for the server. (PSP.de is already used by a company)

Lars

magnav0x 2005-12-09 03:14

primesierp.de
pspproject.de
dcpsp.de (dc=distributed computing)
projectpsp.de
psp-project.de

Personaly, my favorites are psp-project.de and projectpsp.de (or maybe project-psp.de). Anyone else have any suggestions? Maybe you like one of these or maybe someone can think of other variations that may be good.

Citrix 2005-12-09 04:43

PSPnet.de

Though I see no need to get a paid domain name. We can use a free domain name.

Citrix

ltd 2005-12-09 18:02

[QUOTE=Citrix]PSPnet.de

Though I see no need to get a paid domain name. We can use a free domain name.

Citrix[/QUOTE]

The thing with the domain is that you get one automaticaly.

Lars

hhh 2005-12-10 08:34

As for the database, make sure to publish neither IP adresses nor residues. We just had the same discussion over at SoB.

And wouldn't it be psproject.de ?

Anyway, Yours, H.

Citrix 2005-12-10 17:17

hhh,

Could you point to the thread where you had the discussion? I am interested in reading through it.

Thanks,
Citrix

hhh 2005-12-11 13:05

Sorry I was too lazy. Here we go:

[URL=http://www.free-dc.org/forum/showthread.php?s=&threadid=10278&perpage=25&highlight=&pagenumber=2]Error Rates Thread[/URL]

It's basically the 2nd page.
Cheers, H.

Citrix 2005-12-12 00:00

THis seems intresting

[url]http://www.findmyhosting.com/[/url]

I can find some deals for less than $4 for a month.

There are some free hosting if you google the web. But I do not know, if they will let us run applications for us?

I do not know much about web hosting so I could be wrong. :redface:

Citrix

Citrix 2005-12-12 03:52

[QUOTE=ltd]
The most time is spend to create stats tables like "distribution of untested values".

Lars[/QUOTE]


I hope your DB keeps a cache of changes, since that should take no time at all. Atleast that is what I think. I hope you are not counting the values for each of the k every time your run a query.

Could you post the query you use?

Citrix

magnav0x 2005-12-14 21:02

[QUOTE=Citrix]THis seems intresting

[url]http://www.findmyhosting.com/[/url]

I can find some deals for less than $4 for a month.

There are some free hosting if you google the web. But I do not know, if they will let us run applications for us?

I do not know much about web hosting so I could be wrong. :redface:

Citrix[/QUOTE]


Most places will NOT let you run applications with hosting. They will just allow a website and SQL database. You will need a "Virtual Server" or "Dedicated Server" if you need a program to continuously run. However, depending on the user premissions for the PHP cli you may be able to execute the program you want via PHP, but I'm sure they'd shut it down as soon as they noticed it was running. So you need to look into a virtual server or dedicated server. I use to run a web hosting company through my dedicated server (I pretty much gave up on it, because I can't market for crap).

Let me know if you need any help with your PHP/SQL optimizations LTD, that's where I strive :showoff:

Another option is to write a socket PHP app that would listed on your desired port and accept communications, but you would need to know exactly how the llrnet server accepts communications and validates results and exports them. I have no idea how the llrnet processes connections and distributes/recieves work (never looked into it) so I don't know. If someone told me I may be able to think of something.

ltd 2005-12-14 21:35

After testing the virtual server, which should take place in the next two weeks i will transfer the llrnet server over to that location. There should be only an outage of maximum an hour or so and there will be no need to modify anything at the client as i will reroute the "no-ip" domains over to the new server. But i will have to test the procedure first with dummy data i have to create to make sure that everything works without any problems.

Next step will need some knowledge with PHP/html/apache and i think also linux.


I want to set up the database also on the new server. Which will allow us to implement the following features available in the future.
As i am a beginner in PHP i have no clue so far how to implement my ideas and have the setup also secure. At the moment all things run on one PC where nobody from the outside world has access so i did not do anything to make access sevcure.

The structure i want to implement first needs the following things.
1. MySQL as RDBMS
2. Public HTML pages (NO DB access)
3. PHP scripts which should be run every hour/day
These scripts generate the pages under 2 and some other internal things
4. no public HTML pages to have the posibility to run special PHP scripts
to make special checks or generate special reservation files,......
These pages should never be visible to the public
5. directory structure that stores public files like workunit files to be reserved. (must be accessed by PHP to write the data)
6. directory structure for no public files. ( like returned residues) These files must be also visible to PHP but never to the public
7. The PHP scripts need to access some files from the llrnet server.
So again access to a no public diretory but this time it is not within the apache scope.

Now to the plans for the future:

1. Some pages that allow partly access to the DB to make it possible to create new teams or to join an existing team.
This means that we will have to implement some authentication methods.
2. Pages to allow returning of sieved results via a web interface and not by mail/forum anymore.
3. Pages that allow returning PRP results via web interface and not by mail/forum anymore.

I will need every help that i can get to set up the apache server in the right way so that there are no security problems and no way to damage the DB from the outside world.

First thing i will have to find out is: How to implement the job automation (automated PHP script running)
Is there a way to do it with the apache server or
will i have to set up some cron jobs?

I am open for every suggestion and also for additional ideas what the server should do,

Lars

magnav0x 2005-12-14 22:18

[QUOTE=ltd]
1. MySQL as RDBMS
[/QUOTE]
Just setup MySQL with localhost access only and a conjob to backup and transfer database remotely for security and integrity (just in case). No need for a full RDBMS.

[QUOTE]
2. Public HTML pages (NO DB access)
3. PHP scripts which should be run every hour/day
These scripts generate the pages under 2 and some other internal things
[/QUOTE]
Easily done setting up PHP scripts in crontab. E.g. setup PHP files that generate stat pages, or whatever you want hourly/daily or whatever. A bit vague on what you really want here, so I'm unsure what you are wanting exactly.

[QUOTE]
4. no public HTML pages to have the posibility to run special PHP scripts
to make special checks or generate special reservation files,......
These pages should never be visible to the public
[/QUOTE]
Are you saying you want a page that say only the project admins have access to where they can do do these special checks and file generations?

[QUOTE]
5. directory structure that stores public files like workunit files to be reserved. (must be accessed by PHP to write the data)
[/QUOTE]
Easily done. I'm assuming the PHP scripts that will need access to write to this directory will be for uploading ranges? Again a tad vague.

[QUOTE]
6. directory structure for no public files. ( like returned residues) These files must be also visible to PHP but never to the public
[/QUOTE]
Returned results could be sent to any place in the file system you like. PHP can access them. You could have them all accessable from a central Admin returned results page.

[QUOTE]
7. The PHP scripts need to access some files from the llrnet server.
So again access to a no public diretory but this time it is not within the apache scope.
[/QUOTE]
PHP can do this, no need to configure anything special in Apache. Assuming whatever user Apache is run under has premissions to access the given fold, such as /opt .

[QUOTE]
Now to the plans for the future:

1. Some pages that allow partly access to the DB to make it possible to create new teams or to join an existing team.
This means that we will have to implement some authentication methods.
[/QUOTE]
Easy enough, just need to know what sort of verification you would like. E-mail verification?

[QUOTE]
2. Pages to allow returning of sieved results via a web interface and not by mail/forum anymore.


3. Pages that allow returning PRP results via web interface and not by mail/forum anymore.
[/QUOTE]
Both of these can be done with PHP.

[QUOTE]
First thing i will have to find out is: How to implement the job automation (automated PHP script running)
Is there a way to do it with the apache server or
will i have to set up some cron jobs?
[/QUOTE]
Just write your PHP scripts and run them in a cron job e.g "php /somefolder/somescript.php" and have them run how ever often you would like. I assume you mean scheduling a PHP script to run every so often. But you will need the PHP CLI installed, which is no big deal.

ltd 2005-12-14 22:40

[QUOTE]
2. Public HTML pages (NO DB access)
3. PHP scripts which should be run every hour/day
These scripts generate the pages under 2 and some other internal things


Easily done setting up PHP scripts in crontab. E.g. setup PHP files that generate stat pages, or whatever you want hourly/daily or whatever. A bit vague on what you really want here, so I'm unsure what you are wanting exactly.[/QUOTE]
You are right this should implement the stats creation and also the recreation of workunit files after factors are submitted

[QUOTE]
Are you saying you want a page that say only the project admins have access to where they can do do these special checks and file generations?
[/QUOTE]

Thats exactly what is needed.

[QUOTE]
Easily done. I'm assuming the PHP scripts that will need access to write to this directory will be for uploading ranges? Again a tad vague.
[/QUOTE]

You are very good in guessing what i mean. Again thats needed.
The post was only thought to give a short overview on my plans and not to describe all the things in detail. If you are interested i can write something done which should describe in detail what i have at the moment and what i think i will need.

It looks like most of my problems come only from not knowing the apache config ( allow/deny access)
and also not knowing linux permissions well enough. But lets see it this way. Its a good way to learn something new.

For the authentication i think it should be possible to make online changes on your personal setup after login in. That should be create/join/leave a team and return some results.

Lars

magnav0x 2005-12-14 23:20

[QUOTE=ltd]You are right this should implement the stats creation and also the recreation of workunit files after factors are submitted



Thats exactly what is needed.



You are very good in guessing what i mean. Again thats needed.
The post was only thought to give a short overview on my plans and not to describe all the things in detail. If you are interested i can write something done which should describe in detail what i have at the moment and what i think i will need.

It looks like most of my problems come only from not knowing the apache config ( allow/deny access)
and also not knowing linux permissions well enough. But lets see it this way. Its a good way to learn something new.

For the authentication i think it should be possible to make online changes on your personal setup after login in. That should be create/join/leave a team and return some results.

Lars[/QUOTE]

Lars,

Ok I understand those questions now. I'd love to read in detail everything you are wanting done. Post it here for others and/or e-mail me and we will discuss it further. I'd love to help you out. I'm no Linux genious but I know my away around good enough. Projects like these are always a good way to teach yourself. I learned PHP/SQL about 2 years ago when I wanted stats for Chessbrain (distributed computing project) but no one had any. So I decided to make my own. It took a while, but I found my way over hurdles and learned quite a bit about PHP and SQL. Now I'm doing it for a living. I'm still learning new things every day even at this point in time :whistle: I can help you with just about any aspect of what you want above, just let me know and I'll pull up my sleeves and dive in with you.

The hardest thing about helping is knowing how your database is/will structured. Typically I base code on how the database is layed out. Are you going to try to do everything from scratch (the database and all)? Or were you wanting to keep the current database structure and build around it? Sometimes it's better to start from scratch (redesign), but it depends on wether or not the current setup limit's the expansion in any way.

ltd 2005-12-15 08:06

The description of the structures i build are a little bit long for the forum so i will mail them to you when the update is ready. (Everytime the same if you are the only programmer on a project no uptodate documentation. :no: :rant: )

For the structure of the DB i am not sure if we should make a complete new structure. The original structure has its weak points but i am not sure if the load of addtional joins that is needed when using another structure are the better choice.
For example the result table holds the status of the k/n pair and also the test information like (residue,factor and userids of the contributor who send in the result)
The other way would be in my opinion to have a result table only holding a numeric key the k/n pair and some status information like (sieved, first prp test, second prp test) and then tables storing the testresults for each type using the numeric key as foreign key.

Lars

magnav0x 2005-12-15 08:41

[QUOTE=ltd]The description of the structures i build are a little bit long for the forum so i will mail them to you when the update is ready. (Everytime the same if you are the only programmer on a project no uptodate documentation. :no: :rant: )

For the structure of the DB i am not sure if we should make a complete new structure. The original structure has its weak points but i am not sure if the load of addtional joins that is needed when using another structure are the better choice.
For example the result table holds the status of the k/n pair and also the test information like (residue,factor and userids of the contributor who send in the result)
The other way would be in my opinion to have a result table only holding a numeric key the k/n pair and some status information like (sieved, first prp test, second prp test) and then tables storing the testresults for each type using the numeric key as foreign key.

Lars[/QUOTE]

Yes some table joins will help and having correct data types specified helps as well (such as using 'tinyinit' for a field that will only hold small numbers). More importantly for a large database is indexing of all primary keys and and any fields used in conditional joins or 'WHERE' clauses. However it will drastically increase hard drive space usage on very large databases (such as ones with millions of rows of data). I use this in any stats project that I track (daily stats for all users) for a 30 day period (especially if there are thousands of users on the project). Indexing is something that can be done any time after everything else is in place though. Joins will not increase server load too much. How many rows do you have in your current database?

OmbooHankvald 2005-12-15 11:56

[QUOTE=ltd]
3. Pages that allow returning PRP results via web interface and not by mail/forum anymore.[/QUOTE]
Maybe you should consider using something like [URL=http://12121.vocabulate.com/]12121 Search[/URL] do.

[QUOTE=OmbooHankvald from 321 thread]
3) The 12121 and 2721 have a brilliant way of reserving numbers! but wouldn't it be possible for some person to just submit a fake lresults.txt and destroy the whole thing?[/quote]

[QUOTE=justinsane answer]
You are absolutely correct. However there is a lot more to the reservation system than meets the eye. For example, any webmaster searching any K could feed it a sieve file of any size, and it will automatically be parsed into chunks of any given size, sorted into the reservation system and made available to the public with a click of a button. It is pretty nice when I am limited in my time availability for making new work available. I have offered this to a few other groups running similar searches, but was met with little or no desire to use this approach. And to answer your question, there is a multifold response:

1. There is absolutely nothing to gain by doing this, except for maybe a few laughs.
2. It would take a reasonable amount of time to generate a faken lresults.txt file that looks real enough to even be accepted by this parser. (Doable yes, but worth it?)
3. All submitted results are stored on two different servers and are all subject to double checking and/or verification if (and sometimes if not) flagged as a possible fake.[/quote]

Maybe you could use the info

OH

ltd 2005-12-15 13:30

Number of entries in the master table is 3653513. That is not that much. Due to the usage of index data most of the queries are quite fast.
Problem are only some request that can not make good use of the index data.

For example:
[CODE]
select kvalue,min(nvalue) as nmin from testresult
where distributionstate<30 and prptest1=0 and sievestatus=0
group by kvalue
[/CODE]

This searches for the lowest untested n value where no factor is found and no first PRP test is returned.
Primary key is kvalue,nvalue
Secondary keys on distributionstate,sievestatus and prptest1
Explain shows usage of index sievestatus.

Lars

ltd 2005-12-15 14:22

Oh by the way forgot to write:
The different column use the smallest possible datastructure like tinyint or mediumint.
Only lines like the factor column use dynamic length types.(text)

Lars

Edit: I did some more tests with a different index structure using a combined index. Now response time is much better. I had used that index in the beginning but had to change it due to some bad timing. But in the meantime i had restructured the calculations that forced me to drop the combined index.
And i forgot to give it a try once more.

Now the timing and resource load of the DB is good again. I think there will be no problem anymore to run that beast on a virtual server.

ltd 2005-12-15 14:56

@magnav0x:

Can you recomment some good literature/webpage about mysql performance tuning.
I tried some tthing in the last minutes and got some unexpected results.
For example i found that it is faster to use "distributionstate in (0,10,20)" instead of "distributionstate<30". This makes no difference when used in oracle. There is even a chance that the later is faster in oracle.

So i need to read something about the special behaviours of mysql to come up with additional performace gains.

Lars

magnav0x 2005-12-15 19:09

[QUOTE=ltd]@magnav0x:

Can you recomment some good literature/webpage about mysql performance tuning.
I tried some tthing in the last minutes and got some unexpected results.
For example i found that it is faster to use "distributionstate in (0,10,20)" instead of "distributionstate<30". This makes no difference when used in oracle. There is even a chance that the later is faster in oracle.

So i need to read something about the special behaviours of mysql to come up with additional performace gains.

Lars[/QUOTE]

You know, I've never used anything other than MySQL, so I don't know anything about Oracle. I will try to dig up some eBooks for you and send them your way via e-mail.

Chris


All times are UTC. The time now is 11:48.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.