mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   FactorDB (https://www.mersenneforum.org/forumdisplay.php?f=94)
-   -   Accessing FactorDB from Python (https://www.mersenneforum.org/showthread.php?t=23213)

shortcipher 2018-04-02 02:11

Accessing FactorDB from Python
 
I am accessing the database using FactorDB from a Python library module ([url]https://pypi.python.org/pypi/factordb-pycli[/url]). For some numbers (about 120 digits), the Python call just gets a 'C' response, even after many attempts. If I enter the same number in my browser at [url]http://factordb.com/[/url] and command Factorize!, it factors the number immediately. The Python call then gets a 'FF' response and obtains the factorization.

Is there some secret sauce in the browser interface which is denied to the Python interface?

Dubslow 2018-04-02 02:19

FactorDB doesn't autofactor anything above 70 digits. Which means if FF is appearing in your browser, then that factorization was already in the DB.

I'm not sure a priori what could be causing this behavior. The aliquot sequence status page does a ton of FDB queries (obviously), and although we get errors from time to time, retrying a handful of times eventually rectifies the error.

Is this problem reproducible? If so, what numbers specifically?

shortcipher 2018-04-02 02:49

Right now, 36868743542001151893415408511062812705584118566362468784537941808688929129381217798445710341479658115867192826982725123 is getting a 'C' via Python.

Is it possible to check whether this number is in the database, without any possibility of autofactoring?

shortcipher 2018-04-02 03:29

Now that number is 'FF' from Python.

This one is still 'C':-

52366186180679306335144096639289985739377662519577838330283010732839432648885536870843823075056707630585377852887738877

Dubslow 2018-04-02 04:05

Can you paste output logs of what you see in your Python calls? I don't understand what's happening. Both of those numbers are FF, and when I query the FDBID in Python I get the correct status FF.

shortcipher 2018-04-02 04:28

Here is what I am getting from a new number (the previous one is now 'FF').

The printout is from[CODE] f = FactorDB(n)
print f.connect().json()
[/CODE]
in my Python code.

[CODE]{u'status': u'C', u'id': u'1100000001115460681', u'factors': [[u'72532046707394527184361122364055313434377630022988043070962959961740902025113107530697959132646327596976526693065378307', 1]]}[/CODE]

How are you querying the FDBID in Python?

shortcipher 2018-04-02 05:41

More explicitly, with just this Python code:-

[CODE]>>> import requests
>>> requests.get("http://factordb.com/api", params={"query": str(72532046707394527184361122364055313434377630022988043070962959961740902025113107530697959132646327596976526693065378307)}).json()
{u'status': u'C', u'id': u'1100000001115460681', u'factors': [[u'72532046707394527184361122364055313434377630022988043070962959961740902025113107530697959132646327596976526693065378307', 1]]}[/CODE]

Dubslow 2018-04-02 06:26

Those both show FF for me in Python.

I'm not sure what precisely is going on. However, I can make an educated guess.

Going back to the number you gave in post 4: [url]http://factordb.com/index.php?id=1100000001115431883[/url]

When I click "More information", this number was added approximately 15 hours ago, perhaps 12 hours before you made the post stating that it still appeared as "C". However, the factors are listed as found in early February. So what sort of numbers are these? These seem to be small multiples of other numbers already fully factored in the db?

As for your Python, you said "even after many attempts". How long of a sleep between each request do you use? The FDB is rather notoriously slow about such things, so upon being queried the first time, it often takes several seconds for everything to be fully processed. So if your retry queries have no sleep time between them, and your "many attempts" are all within 1-2 seconds of each other, then that 1-2 seconds is insufficient for the DB to finish processing the new number, and you get C. Then when you check in your browser, at least several if not dozens or hundreds of seconds later, the processing has completed and everything shows normally.

The page I mentioned in passing before, the [URL="https://www.rechenkraft.net/aliquot/AllSeq.html"]aliquot sequence status page[/URL], queries the status of 100 sequences per half hour, first by querying the known ID of the current line, and then if that has changed to fully factored, using the aliquot sequence page on the FDB to get the latest line, which is itself then processed and stored. Occasionally, when a sequence has had new lines added but no one has queried those new lines, the processing of the newly-added lines gets deferred until someone actually triggers the sequence page on the FDB, and so the blue page's automated query is often the first trigger to actually process all those new lines, meaning that the query about the last unfactored line turns up garbage for several seconds while the newly-added lines are processed. This is analogous to you querying these small-multiples-of-fully-factored numbers for the first time, which turns up a C. In the rare cases that these aliquot sequence queries get garbage, the solution is simple: [URL="https://github.com/MersenneForum/MersenneForumAliquot/blob/master/mfaliquot/application/fdb.py#L158"]sleep 5 seconds[/URL] before trying again. The 5 second sleep is critical to give FDB time to finish processing the stuff that we triggered. After 5 retries, then it quits with an error for that sequence and moves on. Perhaps one sequence out of those hundred-per-half-hour runs into this garbage scenario, on average, and of those, another 99/100 are just fine after up to 25 seconds of querying/sleeping, while ~1 in 10,000 get garbage even after 5 slow retries -- but of those 1 in 10,000, literally none ever cause a problem when queried again in the next half-hourly batch.

So this is what it looks to me is happening. If you can confirm the nature of these numbers that you're querying, and that your "many attempts" are without any sleep between them, and/or change it to be so, that should clear this up.

(As for how I'm querying the FDB -- per the above link, it's just some crappy hand-rolled parsing of the literal HTML webpage. I didn't even know there was an API. I first started this page many years ago, and at the time [URL="https://github.com/MersenneForum/MersenneForumAliquot/commit/80ab2de6f57a505e08043a9d8b6b5d425bc5eb10"]it occurred to me[/URL] that there was probably some value in a dedicated Python-FDB interface package, but I never got around to actually doing anything about. In the months-ago rewrite I just slightly cleaned up and reorganized the simple HTTP/HTML handling stuff that had already been in use for years.)

shortcipher 2018-04-02 07:00

The numbers are taken from an aliquot sequence project I am running and don't necessarily relate to any existing factors in the database.

I was doing 6 attempts with 10 seconds of sleep between them.

I suspect that the numbers which later became FF (including the latest one 72532046707394527184361122364055313434377630022988043070962959961740902025113107530697959132646327596976526693065378307) did so after you accessed them with your HTML-based request. So I'll keep the latest one secret and check if it ever comes good.

I'm pretty sure that if I accessed the database with [CODE]requests.get('http://www.factordb.com/index.php?query=%s' % str(n))[/CODE] it would return FF immediately.

The question is, who is responsible for the API at factordb.com/api? I think this is where the problem lies.

DukeBG 2018-04-02 08:29

I've seen the behavior that the small numbers are factored when serving the page in browser (smallest factors added, like below 8-9 digits). I imagine there are some calls in /index.php that are just missing in /api.

10metreh 2018-04-02 13:23

[QUOTE=shortcipher;484001]The numbers are taken from an aliquot sequence project I am running and don't necessarily relate to any existing factors in the database.
[/QUOTE]

The thing is, they do.

It appears that you are computing the aliquot sequence of 2^(p-1)*(2^p-1)*3 for some Mersenne prime 2^p-1. If a has no common factors with 2^(p-1)*(2^p-1), then σ(2^(p-1)*(2^p-1)*a) = σ(2^(p-1)*(2^p-1))*σ(a) = 2^p*(2^p-1)*σ(a). Hence if 2^(p-1)*(2^p-1)*a is a term in your aliquot sequence, then the next term is 2^(p-1)*(2^p-1)*(2σ(a)-a).
But the value 2σ(a)-a does not depend on p, so for each Mersenne prime, 2^(p-1)*(2^p-1)*3 has essentially the same aliquot sequence, with just the power of two and the Mersenne prime differing. The pattern only breaks when a term happens to have a second factor of 2^p-1, i.e. the value a above has a common factor with 2^(p-1)*(2^p-1). This is very unlikely for large p.

For 2^13-1 and all greater Mersenne primes, we have not yet found such a term, so all of these sequences are still on the same trajectory. Thus when we compute a new term of the aliquot sequence of 2^12*(2^13-1)*3, we get a new term of the sequence of 2^(p-1)*(2^p-1)*3 for all larger Mersenne primes.

The aliquot sequence of 2^12*(2^13-1)*3 ([url]http://factordb.com/sequences.php?se=1&aq=2%5E12*8191*3&action=last20&fr=1&to=20[/url]) is known up to index 863; the factors you posted earlier in the thread come from terms 817 and 818, so whichever Mersenne prime you were actually using, you were in fact redoing work that has already been done.


All times are UTC. The time now is 12:29.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.