mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Aliquot Sequences (https://www.mersenneforum.org/forumdisplay.php?f=90)
-   -   A new tool to identify aliquot sequence mergers and acquisitions (https://www.mersenneforum.org/showthread.php?t=24423)

garambois 2019-05-11 19:25

A new tool to identify aliquot sequence mergers and acquisitions
 
I have created a new tool to identify the merges of the aliquots sequences I calculate, with the Open-End ones on the blue page, which start with numbers < 3,000,000.
To create this tool, I scanned the 27775 aliquots sequences of the blue page on FactorDB.

Then, next the starting number of each of these sequences, I noted his last 80-digit term.

That gives me a list of 27775 lines, of which here are the first 5 :

276 86429502525621235123826529054861326152211856309368313663717677245572194588259306
552 93191536898043837453054174955849514463373414360403592180608032037927142180955180
564 63373220338311185878213023083258486641521314878135641223432483032656144576662116
660 55462535746423901103008156653380035541277137961121218541098197822388620707825760
966 54217831165122758567899909677986173899380021129621221691901655666119614743931160

I hope that this tool will prove to be effective, like the one Wolfgang Creyaufm├╝ller offered on his website several years ago.
I put this list online.

You can access it here :

[URL="http://www.aliquotes.com/OE_3000000_C80.txt"]http://www.aliquotes.com/OE_3000000_C80.txt[/URL]

The future will tell if this list is useful.
I hope you will find it useful !

:smile:

Note 1 : I took the opportunity to check if there was no error on FactorDB and discovered by chance the error I reported on this page, post #26 : [URL="https://www.mersenneforum.org/showthread.php?t=19737&page=3"]https://www.mersenneforum.org/showthread.php?t=19737&page=3
[/URL]

Note 2 : This is the page of my own website that presents all the other databases I offer : [URL="http://www.aliquotes.com/aliquote_base.htm"]http://www.aliquotes.com/aliquote_base.htm[/URL]

Batalov 2019-05-11 20:18

This means that there are four merges in this set already:
[CODE]> awk '{print $2}' OE_3000000_C80.txt |uniq-c2
59571467094880018078029946989732846437606582341533407350905461344941861105419200
50252736564413937566890325321268184008775762413941293603245492877192791755326380
90846938908296202829506001436300301884399215435359180605787455350721908885807340
89738748261068384816085845904841017806345217413530107120872862702404968351017460
> grep 59571467094880018078029946989732846437606582341533407350905461344941861105419200 OE_3000000_C80.txt
29412 59571467094880018078029946989732846437606582341533407350905461344941861105419200
1316832 59571467094880018078029946989732846437606582341533407350905461344941861105419200
> grep 50252736564413937566890325321268184008775762413941293603245492877192791755326380 OE_3000000_C80.txt
181830 50252736564413937566890325321268184008775762413941293603245492877192791755326380
2240472 50252736564413937566890325321268184008775762413941293603245492877192791755326380
> grep 90846938908296202829506001436300301884399215435359180605787455350721908885807340 OE_3000000_C80.txt
20468 90846938908296202829506001436300301884399215435359180605787455350721908885807340
2129790 90846938908296202829506001436300301884399215435359180605787455350721908885807340
> grep 89738748261068384816085845904841017806345217413530107120872862702404968351017460 OE_3000000_C80.txt
9852 89738748261068384816085845904841017806345217413530107120872862702404968351017460
2042496 89738748261068384816085845904841017806345217413530107120872862702404968351017460
[/CODE]

garambois 2019-05-11 21:40

OK, well done Batalov, thank you very much for that remark !

The four aliquot sequences 1316832 (merge with 29412), 2240472 (merge with 181830), 2129790 (merge with 20468) and 2042496 (merge with 9852) were removed from the list on [URL="https://www.rechenkraft.net//aliquot/AllSeq.txt"]https://www.rechenkraft.net//aliquot/AllSeq.txt[/URL] during my program run.

So I corrected my list by removing these 4 aliquot sequences from the list.

LaurV 2019-10-31 06:31

Hey Jean Luc, you'll need to modify the file, few sequences went through the 80 digits boundary down and again up, without merging with anything. One example is 733752 which Dmitri mentioned in the reservation thread.

garambois 2019-11-02 18:20

Yes, I know that well !
But the modification takes more than a month.
It takes a long time to download the files from DB !
I plan to update the page in early 2020...

I hope that in a few months/years, the update can be done in a few hours only : everything will depend on the evolution of the server capacity.

EdH 2019-11-03 02:45

I might be totally off the track, but couldn't you find the mergers by checking the last index for each sequence instead of downloading the whole sequence and looking for 80 digit matches?

I have a bash script that does that on a small scale. I don't know if it's of use, but I'll post it here with a description.

Basically, you make a list of all the sequences you would like to check, called seqList in the example below. Then you run the script and it downloads the last line page for each sequence, and looks for the last index on that page. It puts the index number into a matching array to the sequences that were read in. After the last sequence is retrieved from the db, it does a slow crawl through the index list looking for matches. If it finds a match, it lists the two sequences. If you're concerned about the load on the db, you can uncomment the sleep line and set it to 1 or 2 (or more) seconds.

I haven't tried this on a really large list, but here is a sample list with some of your earlier mergers, the script and output:

seqList example:
[code]
276
3366
4788
9852
29412
314718
1316832
2042496
[/code]seqTest.sh:
[code]
#!/bin/bash

count1=1
exec <"seqList"
while read line
do
num[count1]=$line
let count1=${count1}+1
done

count2=1
while [ $count2 -lt $count1 ]
do
wget -q -U Mozilla/5.0 "http://factordb.com/sequences.php?se=1&aq=${num[$count2]}&action=last" -O dbTemp
exec <"dbTemp"
while read lastLine
do
case $lastLine in
*"index.php?id="*) ind1=${lastLine##*index.php?id=}
ind2[count2]=${ind1:0:19}
esac
done
let count2=${count2}+1
# sleep 1
done

count3=1
while [ $count3 -lt $count1 ]
do
count4=2
while [ $count4 -lt $count3 ]
do
if [ ${ind2[$count4]} -eq ${ind2[$count3]} ]
then
echo "Sequence ${num[$count4]} matches sequence ${num[$count3]}"
fi
let count4=${count4}+1
done
let count3=${count3}+1
done
[/code]Output:
[code]$ bash seqTest.sh
Sequence 4788 matches sequence 314718
Sequence 29412 matches sequence 1316832
Sequence 9852 matches sequence 2042496
[/code]I would think this could be used for a much larger list, but it would have to be let run to finish as it sits. I suppose it could be modified to write the sequence and index in a similar fashion to the earlier list and it could be done in sections then, with the sections concatenated for later review.

All comments are welcome, including those that tell me how full of it I am. . .

garambois 2019-11-03 12:19

You can read this page again : [URL="https://www.mersenneforum.org/showthread.php?t=23612&page=17"]https://www.mersenneforum.org/showthread.php?t=23612&page=17[/URL]
I remember, after long discussions, we came to the conclusion that the C80 file is a very good compromise.
If I understand your idea correctly, every time I have a list of mergers, I have to scan all the latest DB lines again.
It must work very well but it is very rconstraining. And I don't know how long it can take to scan all the last lines up to 3,000,000.
With the list of the last C80s of all sequences from 1 to 3,000,000 this is very fast. But there is still the problem raised by LaurV post #4 on this page.
If someone wants to check the merging of a sequence and it doesn't work with my C80 list, he has to check not with the last C80, but then try with the previous ones, taking them in decreasing order of indexes.
Strictly speaking, it is also necessary to check if the merger sequences (<3,000,000) itself has not merged with another smaller one (<3,000,000) by checking all the mergers of the last year (because of the update date of my page) here : [url]https://www.mersenneforum.org/showthread.php?t=11837&page=83[/url]

In addition, scanning all the sequences also allows me to find possible errors, such as this one post #26 and #27 on this page : [url]https://www.mersenneforum.org/showthread.php?t=19737&page=3[/url]

I hope I don't miss your question !

:smile:

EdH 2019-11-03 13:45

No, you did not miss anything, but I did, maybe. I was thinking that we were interested only in open sequences. If a sequence terminates, do we still want to know which others might have been merged into it? Are we looking for the entire "genealogy" or are we looking at open sequences only?

I don't know what the current limits are for queries to the db, but I plan to start a test on one of my RPi machines (hopefully today) with a 4 second delay between calls. I'll use the AllSeq.txt listing and see if I overrun my db limit.

I haven't been keeping up with your work enough to fully comprehend the goals. Sorry on that point! My interests are not currently as rooted in Aliquot sequences as they once were.

garambois 2019-11-03 14:17

We're only interested in sequences that remain open. For the sequences that terminate, the work is finished.
But on my C80 page, it is possible that an open sequence <3,000,000 has meanwhile merged with a smaller one, since I only update the page once a year...
It is therefore preferable to check (on this page : [URL="https://www.mersenneforum.org/showthread.php?t=11837&page=83"]https://www.mersenneforum.org/showthread.php?t=11837&page=83[/URL]) that the merging sequence itself has not also merged with a smaller sequence over the past year.

EdH 2019-11-03 15:43

[QUOTE=garambois;529546]We're only interested in sequences that remain open. For the sequences that terminate, the work is finished.
But on my C80 page, it is possible that an open sequence <3,000,000 has meanwhile merged with a smaller one, since I only update the page once a year...
It is therefore preferable to check (on this page : [URL]https://www.mersenneforum.org/showthread.php?t=11837&page=83[/URL]) that the merging sequence itself has not also merged with a smaller sequence over the past year.[/QUOTE]
I think my script can identify mergers by itself in the background of a running machine. I have it running right now at a rate of ~900 sequences per hour on one of my RPi machines. It should finish the AllSeq listing in around 31 hours, if the db does not have any issue with my checks. Actually, I shouldn't find any mergers though, since there has already been work done to eliminate them from the listing.

chris2be8 2019-11-03 16:41

[QUOTE=EdH;529542]
I don't know what the current limits are for queries to the db,
[/QUOTE]

The limit you will probably hit is no more than 5000 page requests per hour. So if you are doing nothing else a 1 second delay between requests will keep you under the limit.

That assumes your page requests don't use much CPU time or very many database queries each. And that you aren't creating IDs with your queries.

Note that the limit is per IP address. So if you have several systems all behind one IP address on the internet they share the limits.

Chris


All times are UTC. The time now is 00:22.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.