mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Factoring (https://www.mersenneforum.org/forumdisplay.php?f=19)
-   -   Pascal's OPN roadblock files (https://www.mersenneforum.org/showthread.php?t=19066)

charybdis 2021-01-11 23:07

According to jasonp the CADO filtering format has [URL="https://mersenneforum.org/showthread.php?t=22624"]changed[/URL] since that code was written. An old version of CADO could work but I don't know whether it had been sufficiently tested on jobs this big back then, and this would certainly have been before the multithreaded merge code was written.

ryanp 2021-01-11 23:08

So I successfully cloned the latest CADO repo and built the replay binary, but not quite sure how to invoke it.

[CODE]The available parameters are the following:
-purged input purged file
-his input history file
-out basename for output matrices
-skip number of heaviest columns that go to the dense matrix (default 32)
-index file containing description of rows (relations-sets) of the matrix
-ideals file containing correspondence between ideals and matrix columns
-force-posix-threads (switch) force the use of posix threads, do not rely on platform memory semantics
-path_antebuffer path to antebuffer program
-for_msieve (switch) output matrix in msieve format
-Nmax stop at Nmax number of rows (default 0)
-col0 print only columns with index >= col0
-colmax print only columns with index < colmax
-verbose_flags fine grained control on which messages get printed[/CODE]

I've got an input.dat.gz containing all my relations (gzipped) and already have msieve's input.fb and input.ini. Any advice?

chris2be8 2021-01-12 16:36

fivemack wrote a program called remsing to remove singleton relations. It might help.

See [url]https://mersenneforum.org/showthread.php?p=525512&highlight=remsing#post525512[/url] and related posts.

Chris

ryanp 2021-01-12 18:33

[QUOTE=chris2be8;569069]fivemack wrote a program called remsing to remove singleton relations. It might help.

See [url]https://mersenneforum.org/showthread.php?p=525512&highlight=remsing#post525512[/url] and related posts.

Chris[/QUOTE]

Thanks - to be clear, is the suggestion that after "remdups4" (I now have ~2.1B uniques) to then run that through remsing before handing the relations to msieve?

VBCurtis 2021-01-12 18:55

Yes. The idea, I think (?) is to run the first pass of singleton removal externally, so msieve handles a smaller dataset. This speeds things up when e.g. trying a handful of target densities, and in the case where relations counts are really big like this job it may help msieve handle the dataset.

You can run it multiple times, just like msieve runs multiple singleton-removal passes, but I found a single run helpful for the GNFS-207 team-sieve job.

ryanp 2021-01-12 19:01

I've got it running now. The output is currently at:

[CODE]$ ~/remsing/remsing input.dat input.rem
1449910000 read 1102385169 bad[/CODE]

Does that ration of "bad" to "read" seem reasonable?

VBCurtis 2021-01-12 19:09

No. The numbers should look quite similar to the first pass of msieve, such as these from your log on 9th Jan:
[code]commencing in-memory singleton removal
begin with 2069585205 relations and 2015728165 unique ideals
reduce to 940662702 relations and 730460437 ideals in 20 passes[/code]

Using this log as a very rough guide, you should get output of about half your input for the first pass through remsing.

Edit: I think that happened to me the first time I tried it; an updated remsing was later posted that worked fine for me. The link posted here was to the post #608 that had the updated remsing, though. Hrmmm.....

ryanp 2021-01-12 20:49

[QUOTE=VBCurtis;569099]No. The numbers should look quite similar to the first pass of msieve, such as these from your log on 9th Jan:
[code]commencing in-memory singleton removal
begin with 2069585205 relations and 2015728165 unique ideals
reduce to 940662702 relations and 730460437 ideals in 20 passes[/code]

Using this log as a very rough guide, you should get output of about half your input for the first pass through remsing.

Edit: I think that happened to me the first time I tried it; an updated remsing was later posted that worked fine for me. The link posted here was to the post #608 that had the updated remsing, though. Hrmmm.....[/QUOTE]

I just grabbed the latest code from [url]https://github.com/fivemack/factorisation-tools[/url]. Maybe fivemack (if he's reading this) can chime in? The final output was about 1/4 the original rels, which definitely seems wrong.

charybdis 2021-01-12 21:13

Another suggestion: if the huge volume of relations is posing problems for msieve, then you could try turning this into a 33/35 job by removing all the relations with 36-bit algebraic factors using grep. Granted, you might need to sieve a bit more, but it ought to be simpler than finding and fixing a bug.

VBCurtis 2021-01-12 22:06

Or, you could just let CADO run the filtering and see how big that matrix is. I think msieve generally creates a matrix slightly bigger than cado; so if CADO's matrix comes out a tolerable size, you can stop sieving and focus on getting the dataset through msieve.

Though, Charybdis' last suggestion might mean sieving a little more anyway if msieve proves to be the problem.

ryanp 2021-01-13 00:02

[QUOTE=VBCurtis;569117]Or, you could just let CADO run the filtering and see how big that matrix is.[/QUOTE]

I would, but I'm not quite sure how to run the replay binary? (also I thought there was some question about current. compatibility between CADO's output and msieve's LA?)

In any event, here's where things currently stand with the latest msieve run, 2.1B unique's and target_density=110:

[CODE]Tue Jan 12 12:21:48 2021 found 11 duplicates and 2110664553 unique relations
Tue Jan 12 12:21:48 2021 memory use: 31280.0 MB
Tue Jan 12 12:21:49 2021 reading ideals above 1591017472
Tue Jan 12 12:21:49 2021 commencing singleton removal, initial pass
Tue Jan 12 16:06:06 2021 memory use: 41024.0 MB
Tue Jan 12 16:06:07 2021 reading all ideals from disk
Tue Jan 12 16:07:00 2021 memory use: 39808.5 MB
Tue Jan 12 16:08:59 2021 commencing in-memory singleton removal
Tue Jan 12 16:10:33 2021 begin with 2110664552 relations and 2029607823 unique ideals
Tue Jan 12 16:27:46 2021 reduce to 992995519 relations and 761340257 ideals in 20 passes
Tue Jan 12 16:27:46 2021 max relations containing the same ideal: 35
Tue Jan 12 16:28:38 2021 reading ideals above 720000
Tue Jan 12 16:28:38 2021 commencing singleton removal, initial pass[/CODE]

ryanp 2021-01-14 21:39

msieve keeps getting wedged... Trying again with 2.1B uniques and target_density=90.

henryzz 2021-01-15 12:24

Would jasonp be able to give any advice on how to get past this blockage in the merge stage? I don't know whether he is reading this thread.

wreck 2021-01-15 14:59

I'd suggest to do the 'NFS Filtering' step 50 hours first , to look what will happen then. Since the relations count is 2000 million, it will meed more patience than 1000 million.

wreck 2021-01-15 15:06

Repeate messages.
Some background date:
1. filter step for 2,1076+ snfs 324 (688M relations) : 15 hours.
2. filter step for 10,459- snfs 306 (737M relations) : 22 hours.

ryanp 2021-01-15 16:48

[QUOTE=wreck;569361]I'd suggest to do the 'NFS Filtering' step 50 hours first , to look what will happen then. Since the relations count is 2000 million, it will meed more patience than 1000 million.[/QUOTE]

Thanks for the tip! I'll try to let this run (with 2.1B uniques and TD=90) sit for a while.

ryanp 2021-01-18 21:49

[CODE]Fri Jan 15 12:21:24 2021 commencing in-memory singleton removal
Fri Jan 15 12:22:01 2021 begin with 590365520 relations and 594141896 unique ideals
Fri Jan 15 12:35:29 2021 reduce to 585813821 relations and 580608996 ideals in 21 passes
Fri Jan 15 12:35:29 2021 max relations containing the same ideal: 155
Fri Jan 15 12:37:14 2021 relations with 0 large ideals: 307095
Fri Jan 15 12:37:14 2021 relations with 1 large ideals: 215009
Fri Jan 15 12:37:14 2021 relations with 2 large ideals: 1852650
Fri Jan 15 12:37:14 2021 relations with 3 large ideals: 10607404
Fri Jan 15 12:37:14 2021 relations with 4 large ideals: 38306704
Fri Jan 15 12:37:14 2021 relations with 5 large ideals: 90474854
Fri Jan 15 12:37:14 2021 relations with 6 large ideals: 141486075
Fri Jan 15 12:37:14 2021 relations with 7+ large ideals: 302564030
Fri Jan 15 12:37:14 2021 commencing 2-way merge[/CODE]

So we're now at 72+ hours and it's still running... I'll give it a few more days, but starting to lose hope.

pinhodecarlos 2021-01-18 22:23

Maybe get in touch with Greg from NFS@Home to see if he can give any support or advise?!

charybdis 2021-01-18 22:49

How's CPU usage looking? Is msieve actually doing anything or is it just hanging? And while we're at it, how's memory usage?

frmky 2021-01-19 05:43

[QUOTE=pinhodecarlos;569622]Maybe get in touch with Greg from NFS@Home to see if he can give any support or advise?![/QUOTE]

This is well outside of anything I've run. My largest has been about 1 billion relations. However, if the relations are available online to download I'll be happy to play with it.

wreck 2021-01-19 10:13

If possible , could you give it another try with unique relations less
than 1600M?
As a comparison, VBCurtis done 2,2330L (gnfs 207) with 162M unique relations.
And , in my memory, there is a time that fivemack finish a nfs job using relations count 720M successfully, while 800M failed (using lpb33 ).
A rough guess is that sometimes ago, there is a barrier near 800M, now it jump to 1600M for some reason.

VBCurtis 2021-01-19 15:18

No, I didn't use 162M uniques. Did you miss a zero? This job is tougher than the GNFS-207 by quite a lot, and uses bounds which are expected to require more relations (36/33 should require more than 35/34). 2e9 relations may not be enough, but is quite surely not too many.

Citing relations counts for 33-lp jobs is totally irrelevant to this job, which is using much larger bounds. The number of relations left heading into merge shows rather clearly that this is not oversieved.

There is no reason to think the old msieve large-dataset bug is the culprit here. However, Charybdis' idea to cull all 36-bit-large-prime relations from the dataset and try to filter as a 33/35 job has merit.

ryanp 2021-01-19 15:25

[QUOTE=VBCurtis;569670]There is no reason to think the old msieve large-dataset bug is the culprit here. However, Charybdis' idea to cull all 36-bit-large-prime relations from the dataset and try to filter as a 33/35 job has merit.[/QUOTE]

I'm willing to try culling the 36-bit large prime relations. Would you be able to construct the "grep" command? I don't quite know the msieve relation format well enough.

charybdis 2021-01-19 15:47

[QUOTE=ryanp;569671]I'm willing to try culling the 36-bit large prime relations. Would you be able to construct the "grep" command? I don't quite know the msieve relation format well enough.[/QUOTE]

[C]grep -v ",[8-9a-f]........$"[/C] should remove all lines ending with a 36-bit prime, which is what we need.

frmky 2021-01-23 22:13

There's definitely an msieve filtering bug. Good to have a data set that triggers it. Unfortunate that msieve needs to run for 15 hours to trigger it. Let's see what gdb says...

[CODE]commencing singleton removal, initial pass
memory use: 41024.0 MB
reading all ideals from disk
memory use: 39309.4 MB
commencing in-memory singleton removal
begin with 2074342591 relations and 1985137022 unique ideals
reduce to 992888838 relations and 765115141 ideals in 20 passes
max relations containing the same ideal: 35
reading ideals above 720000
commencing singleton removal, initial pass
memory use: 21024.0 MB
reading all ideals from disk
memory use: 46989.5 MB
keeping 913886427 ideals with weight <= 200, target excess is 5352837
commencing in-memory singleton removal
begin with 992888838 relations and 913886427 unique ideals
reduce to 992241034 relations and 913238552 ideals in 15 passes
max relations containing the same ideal: 200
removing 8630643 relations and 8331224 ideals in 2000000 cliques
commencing in-memory singleton removal
[kepler-0-0:29616] *** Process received signal ***
[kepler-0-0:29616] Signal: Segmentation fault (11)
[kepler-0-0:29616] Signal code: Address not mapped (1)
[kepler-0-0:29616] Failing at address: 0x7f001013550c
[kepler-0-0:29616] [ 0] /lib64/libpthread.so.0(+0xf5e0)[0x7eff125965e0]
[kepler-0-0:29616] [ 1] ./msieve993_new[0x43ffd0]
[kepler-0-0:29616] [ 2] ./msieve993_new[0x463ae7]
[kepler-0-0:29616] [ 3] ./msieve993_new[0x43c2fb]
[kepler-0-0:29616] [ 4] ./msieve993_new[0x4288dd]
[kepler-0-0:29616] [ 5] ./msieve993_new[0x415bc4]
[kepler-0-0:29616] [ 6] ./msieve993_new[0x405b1b]
[kepler-0-0:29616] [ 7] ./msieve993_new[0x404987]
[kepler-0-0:29616] [ 8] ./msieve993_new[0x40454c]
[kepler-0-0:29616] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7eff11f03c05]
[kepler-0-0:29616] [10] ./msieve993_new[0x4045f2]
[kepler-0-0:29616] *** End of error message ***[/CODE]

wreck 2021-01-25 01:21

Could you compile a debug version to see
which line it crashing?

frmky 2021-01-27 02:42

We’re in filter_purge_singletons_core() at common/filter/singleton.c:441, looping through the relations counting the number of times that each ideal occurs. But the last relation is broken! We’re at the last relation since i = num_relations-1. The ideal_list for this last relation contains entries greater than num_ideals. Don’t know why though… Probably an overflow somewhere. But where? Trying to track it down but might take a while since I don't have a lot of time to devote to this and individual tests take a day.

[CODE]read 2050M relations
read 2060M relations
read 2070M relations
found 578077506 hash collisions in 2074342600 relations
commencing duplicate removal, pass 2
found 9 duplicates and 2074342591 unique relations
memory use: 16280.0 MB
reading ideals above 1549860864
commencing singleton removal, initial pass
memory use: 41024.0 MB
reading all ideals from disk
memory use: 39309.4 MB
commencing in-memory singleton removal
begin with 2074342591 relations and 1985137022 unique ideals
reduce to 992888838 relations and 765115141 ideals in 20 passes
max relations containing the same ideal: 35
reading ideals above 720000
commencing singleton removal, initial pass
memory use: 21024.0 MB
reading all ideals from disk
memory use: 46989.5 MB
keeping 913886427 ideals with weight <= 200, target excess is 5352837
commencing in-memory singleton removal
begin with 992888838 relations and 913886427 unique ideals
reduce to 992241034 relations and 913238552 ideals in 15 passes
max relations containing the same ideal: 200
removing 8630643 relations and 8331224 ideals in 2000000 cliques
commencing in-memory singleton removal

Program received signal SIGSEGV, Segmentation fault.
0x000000000044a178 in filter_purge_singletons_core (obj=0x6de250, filter=0x7fffffffc710) at common/filter/singleton.c:441
441 freqtable[ideal]++;
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7_4.2.x86_64 gmp-6.0.0-15.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) backtrace
#0 0x000000000044a178 in filter_purge_singletons_core (obj=0x6de250, filter=0x7fffffffc710) at common/filter/singleton.c:441
#1 0x0000000000475e26 in filter_purge_cliques (obj=0x6de250, filter=0x7fffffffc710) at common/filter/clique.c:646
#2 0x0000000000443cf6 in filter_make_relsets (obj=0x6de250, filter=0x7fffffffc710, merge=0x7fffffffc6e0, min_cycles=5352837)
at common/filter/filter.c:65
#3 0x000000000042f0fb in do_merge (obj=0x6de250, filter=0x7fffffffc710, merge=0x7fffffffc6e0, target_density=130)
at gnfs/filter/filter.c:187
#4 0x000000000042fad0 in nfs_filter_relations (obj=0x6de250, n=0x7fffffffc960) at gnfs/filter/filter.c:411
#5 0x00000000004172ac in factor_gnfs (obj=0x6de250, input_n=0x7fffffffcb40, factor_list=0x7fffffffcbd0) at gnfs/gnfs.c:153
#6 0x0000000000404dcd in msieve_run_core (obj=0x6de250, n=0x7fffffffcb40, factor_list=0x7fffffffcbd0) at common/driver.c:158
#7 0x00000000004051b4 in msieve_run (obj=0x6de250) at common/driver.c:268
#8 0x00000000004038a4 in factor_integer (
buf=0x7fffffffd650 "38315657995194363034877423503084547947166751578940985843521212522635100246118059073205923746544331860205171086654671434719340358393954962433533212457600196112076644876654207767427267797808629935905445"..., flags=1027, savefile_name=0x0,
logfile_name=0x0, nfs_fbfile_name=0x0, seed1=0x7fffffffd64c, seed2=0x7fffffffd648, max_relations=0, cpu=cpu_core,
cache_size1=32768, cache_size2=20971520, num_threads=0, which_gpu=0, nfs_args=0x7fffffffdcee "target_density=130") at demo.c:235
#9 0x00000000004046bd in main (argc=4, argv=0x7fffffffd988) at demo.c:601
(gdb) info frame
Stack level 0, frame at 0x7fffffffc340:
rip = 0x44a178 in filter_purge_singletons_core (common/filter/singleton.c:441); saved rip 0x475e26
called by frame at 0x7fffffffc370
source language c.
Arglist at 0x7fffffffc2b8, args: obj=0x6de250, filter=0x7fffffffc710
Locals at 0x7fffffffc2b8, Previous frame's sp is 0x7fffffffc340
Saved registers:
rip at 0x7fffffffc338
(gdb) info locals
ideal = 2057043263
i = 983610390
j = 5
freqtable = 0x7fff1d2ad010
relation_array = 0x7ff47e0f1010
curr_relation = 0x7ffc79bde3a0
old_relation = 0x7f1fd8001e8480
orig_num_ideals = 913238552
num_passes = 32767
num_relations = 983610391
num_ideals = 913238552
new_num_relations = 8630643
(gdb) print *curr_relation
$2 = {rel_index = 15834702, ideal_count = 36 '$', gf2_factors = 69 'E', connected = 156 '\234', ideal_list = {885450581, 598542783,
158747510, 638930804, 786848709, 2057043263, 3845, 186587920, 18476918, 67526419, 598542783, 872055544, 2057043265, 2046824196,
3942562, 102078889, 58908383, 865042570, 2057043267, 872418055, 9125741, 85351335, 11880544, 43981132, 865042570, 873512089,
893921179, 2057043271, 2567, 93072473, 26460704, 33365801, 865042570, 517341201, 275602560, 862343378, 2057043273, 83889159,
66167424, 46818875, 59842776, 59333874, 194384291, 865042570, 172206968, 2057043276, 50334725, 905653709, 628443801, 865042570,
801305779, 869019178, 2057043277, 2046821898, 20184373, 101514515, 16353075, 87715774, 36505563, 58989284, 865042570, 598565998,
334060622, 469101029, 2057043280, 83889158, 73623668, 106612925, 359795440, 9473259, 157931537, 772472752, 2057043282, 218106376,
140592574, 157045250, 477152215, 866943502, 6146950, 41607604, 44380953, 772472752, 2057043284, 150998022, 105306193, 842728936,
7879065, 444703037, 772472752, 403730401, 2057043289, 83889414, 320662844, 329981033, 248067990, 772472752, 23316642, 631501233,
2057043290, 822087174}}
(gdb)
[/CODE]

wreck 2021-01-31 05:49

After read the code (msieve r1030) about eight hours
(1767 code line read, folder common/filter, file singleton.c, clique.c, etc.),
this filter problem seems not easy to solve.
But here are some thinkings.

1. From common/filter/filter_priv.h
The definition of ideal_map_t is
typedef struct {
uint32 payload : 30; /* offset in list of ideal_relation_t
structures where the linked list of
ideal_relation_t's for this ideal starts */
uint32 clique : 1; /* nonzero if this ideal can participate in
a clique */
uint32 connected : 1; /* nonzero if this ideal has already been
added to a clique under construction */
} ideal_map_t;
the maximum value of payload is 2^30, which is about 1000M.
If the ideal is more than 1000M in function purge_cliques_core(),
it is possible that the filter would not work properly.
Here when entering into purge_cliques_core function, the
relation count is 992888838, less than 2^30, so here this 30 bit should not
be the reason of the crash.

2.
2057043265 = 0x7A98FD41
0x3A98FD41 = 983104833
This number is near the num_relations (983610391).
It is possible that the ideal_map_t.clique bit is not cleared propered
in function purge_cliques_core().
But this is also a guess.

3. In function filter_purge_singletons_core().
curr_relation->ideal_count is 36, but there are 3 values
in curr_relation->ideal_list is the same (865042570).
curr_relation->ideal_list[17]
curr_relation->ideal_list[24]
curr_relation->ideal_list[32]
It is a little strange.

4. In function purge_cliques_core(), Line 370
ideal_map[ideal].payload = num_reverse++;
the variable num_reverse is possiblly exceed 2^32,
while its type is uint32.

5. A question.
Does ryanp give a try to use unique relations less than 1600M?
If done, what's the result?

Happy5214 2021-01-31 07:53

The length of [c]freqtable[/c] is [c]num_ideals[/c] (line 430), and [c]ideal[/c] (the index) is greater than that, so the array reference is out-of-bounds and thus we get the segfault. The real question is why there are so many entries in [c]ideal_list[/c] that are above [c]num_ideals[/c].

henryzz 2021-02-01 16:08

It looks to me that wreck has pointed out some issues that will soon become an issue even if they aren't quite an issue yet. How difficult would it be to future-proof issues 1 and 4? Would changing everything to uint64 increase memory usage a lot in these cases? I am not sure where the bulk of the memory usage is.

ryanp 2021-02-01 17:41

Thanks for the detailed investigation.

[QUOTE=wreck;570551]5. A question.
Does ryanp give a try to use unique relations less than 1600M?
If done, what's the result?[/QUOTE]

Here's the log from a run with ~1.44B uniques:

[CODE]Fri Jan 1 17:10:48 2021 commencing relation filtering
Fri Jan 1 17:10:48 2021 setting target matrix density to 120.0
Fri Jan 1 17:10:48 2021 estimated available RAM is 192087.4 MB
Fri Jan 1 17:10:48 2021 commencing duplicate removal, pass 1
Fri Jan 1 19:20:20 2021 found 323664961 hash collisions in 1440338304 relation
s
Fri Jan 1 19:20:20 2021 added 1 free relations
Fri Jan 1 19:20:20 2021 commencing duplicate removal, pass 2
Fri Jan 1 19:33:53 2021 found 7 duplicates and 1440338298 unique relations
Fri Jan 1 19:33:53 2021 memory use: 16280.0 MB
Fri Jan 1 19:33:54 2021 reading ideals above 1128464384
Fri Jan 1 19:33:54 2021 commencing singleton removal, initial pass
Fri Jan 1 22:00:14 2021 memory use: 41024.0 MB
Fri Jan 1 22:00:16 2021 reading all ideals from disk
Fri Jan 1 22:00:50 2021 memory use: 28689.1 MB
Fri Jan 1 22:02:12 2021 commencing in-memory singleton removal
Fri Jan 1 22:03:39 2021 begin with 1440338298 relations and 1728567752 unique
ideals
Fri Jan 1 22:08:40 2021 reduce to 54335106 relations and 30002747 ideals in 49
passes
Fri Jan 1 22:08:40 2021 max relations containing the same ideal: 12
Fri Jan 1 22:08:44 2021 reading ideals above 720000
Fri Jan 1 22:08:44 2021 commencing singleton removal, initial pass
Fri Jan 1 22:17:29 2021 memory use: 3012.0 MB
Fri Jan 1 22:17:30 2021 reading all ideals from disk
Fri Jan 1 22:17:33 2021 memory use: 2538.2 MB
Fri Jan 1 22:17:42 2021 keeping 118273914 ideals with weight <= 200, target ex
cess is 278398
Fri Jan 1 22:17:50 2021 commencing in-memory singleton removal
Fri Jan 1 22:17:55 2021 begin with 54335106 relations and 118273914 unique ide
als
Fri Jan 1 22:18:04 2021 reduce to 15293 relations and 0 ideals in 6 passes
Fri Jan 1 22:18:04 2021 max relations containing the same ideal: 0
Fri Jan 1 22:18:04 2021 filtering wants 1000000 more relations
Fri Jan 1 22:18:04 2021 elapsed time 05:07:18[/CODE]

And, for comparison, with 1.85B uniques:

[CODE]Tue Jan 5 11:20:36 2021 commencing relation filtering
Tue Jan 5 11:20:36 2021 setting target matrix density to 120.0
Tue Jan 5 11:20:36 2021 estimated available RAM is 192087.4 MB
Tue Jan 5 11:20:36 2021 commencing duplicate removal, pass 1
Tue Jan 5 15:29:49 2021 found 487341305 hash collisions in 1858202629 relation
s
Tue Jan 5 15:29:49 2021 added 1 free relations
Tue Jan 5 15:29:49 2021 commencing duplicate removal, pass 2
Tue Jan 5 16:45:46 2021 found 9 duplicates and 1858202621 unique relations
Tue Jan 5 16:45:46 2021 memory use: 16280.0 MB
Tue Jan 5 16:45:49 2021 reading ideals above 1440940032
Tue Jan 5 16:45:49 2021 commencing singleton removal, initial pass
Tue Jan 5 22:40:54 2021 memory use: 41024.0 MB
Tue Jan 5 22:40:59 2021 reading all ideals from disk
Tue Jan 5 22:41:53 2021 memory use: 35228.4 MB
Tue Jan 5 22:46:54 2021 commencing in-memory singleton removal
Tue Jan 5 22:51:35 2021 begin with 1858202621 relations and 1929946709 unique
ideals
Tue Jan 5 23:44:19 2021 reduce to 658487197 relations and 538436836 ideals in
28 passes
Tue Jan 5 23:44:19 2021 max relations containing the same ideal: 29
Tue Jan 5 23:45:56 2021 reading ideals above 720000
Tue Jan 5 23:45:58 2021 commencing singleton removal, initial pass
Wed Jan 6 03:03:10 2021 memory use: 21024.0 MB
Wed Jan 6 03:03:13 2021 reading all ideals from disk
Wed Jan 6 03:04:00 2021 memory use: 31064.1 MB
Wed Jan 6 03:08:34 2021 keeping 678560299 ideals with weight <= 200, target ex
cess is 3509174
Wed Jan 6 03:12:51 2021 commencing in-memory singleton removal
Wed Jan 6 03:15:24 2021 begin with 658487197 relations and 678560299 unique id
eals
Wed Jan 6 04:09:33 2021 reduce to 654548247 relations and 674618815 ideals in
22 passes
Wed Jan 6 04:09:33 2021 max relations containing the same ideal: 200
Wed Jan 6 04:11:48 2021 filtering wants 1000000 more relations
Wed Jan 6 04:11:48 2021 elapsed time 16:51:13[/CODE]

In both cases, filtering ran to completion and didn't get stuck. Where I started to run into problems was this run onwards:

[CODE]Wed Jan 6 15:21:43 2021 found 535727162 hash collisions in 1974476487 relation
s
Wed Jan 6 15:21:43 2021 added 1 free relations
Wed Jan 6 15:21:43 2021 commencing duplicate removal, pass 2
Wed Jan 6 16:50:15 2021 found 9 duplicates and 1974476479 unique relations[/CODE]

From roughly 1.9B relations onward, it repeatedly got stuck in "commencing 2-way merge".

henryzz 2021-02-01 19:15

It looks like reducing the large prime bound to 35 bits might be the best solution. Has this been tried yet?

ryanp 2021-02-02 00:38

[QUOTE=henryzz;570679]It looks like reducing the large prime bound to 35 bits might be the best solution. Has this been tried yet?[/QUOTE]

I'm running the [C]grep[/C] now, and will try a run - but I think frmky may have already tried this?

frmky 2021-02-03 06:54

[QUOTE=henryzz;570679]It looks like reducing the large prime bound to 35 bits might be the best solution. Has this been tried yet?[/QUOTE]

Yes. No success. With the start of the semester I've been very busy. Hopefully later this week or next week I'll test out some of wreck's ideas.

frmky 2021-02-03 07:37

[QUOTE=wreck;570551]
1. From common/filter/filter_priv.h
The definition of ideal_map_t is
typedef struct {
uint32 payload : 30; /* offset in list of ideal_relation_t
structures where the linked list of
ideal_relation_t's for this ideal starts */
uint32 clique : 1; /* nonzero if this ideal can participate in
a clique */
uint32 connected : 1; /* nonzero if this ideal has already been
added to a clique under construction */
} ideal_map_t;
the maximum value of payload is 2^30, which is about 1000M.
If the ideal is more than 1000M in function purge_cliques_core(),
it is possible that the filter would not work properly.
Here when entering into purge_cliques_core function, the
relation count is 992888838, less than 2^30, so here this 30 bit should not
be the reason of the crash.

4. In function purge_cliques_core(), Line 370
ideal_map[ideal].payload = num_reverse++;
the variable num_reverse is possiblly exceed 2^32,
while its type is uint32.
[/QUOTE]

For 4, if num_reverse exceeds 2^32, then ideal_map[ideal].payload has definitely exceeded 2^30. As a quick and dirty check of the latter, I edited filter_priv.h to

typedef struct {
uint64 payload : 32; /* offset in list of ideal_relation_t
structures where the linked list of
ideal_relation_t's for this ideal starts */
uint64 clique : 1; /* nonzero if this ideal can participate in
a clique */
uint64 connected : 1; /* nonzero if this ideal has already been
added to a clique under construction */
} ideal_map_t;

and started an overnight run. I know this isn't optimal for memory usage, but let's see if it gets us there.

frmky 2021-02-04 07:02

And that didn't work. Same crash in the same place.

[CODE]commencing duplicate removal, pass 2
found 9 duplicates and 2074342591 unique relations
memory use: 16280.0 MB
reading ideals above 1549860864
commencing singleton removal, initial pass
memory use: 41024.0 MB
reading all ideals from disk
memory use: 39309.4 MB
commencing in-memory singleton removal
begin with 2074342591 relations and 1985137022 unique ideals
reduce to 992888838 relations and 765115141 ideals in 20 passes
max relations containing the same ideal: 35
checking relations array at location 5
reading ideals above 720000
commencing singleton removal, initial pass
memory use: 21024.0 MB
reading all ideals from disk
memory use: 46989.5 MB
keeping 913886427 ideals with weight <= 200, target excess is 5352837
checking relations array at location 1
checking relations array at location 2
commencing in-memory singleton removal
begin with 992888838 relations and 913886427 unique ideals
reduce to 992241034 relations and 913238552 ideals in 15 passes
max relations containing the same ideal: 200
checking relations array at location 5
removing 8630643 relations and 8331224 ideals in 2000000 cliques
checking relations array at location 6
Loc 6: bad relation 983610390 of 983610391, num_ideals is 913238552
rel_index: 15834702, ideal_count: 36, gf2_factors: 69, connected: 156
Ideals: 885450581, 598542783, 158747510, 638930804, 786848709, 2057043263, 3845, 186587920, 18476918, 67526419, 598542783, 872055544, 2057043265, 2046824196, 3942562, 102078889, 58908383, 865042570, 2057043267, 872418055, 9125741, 85351335, 11880544, 43981132, 865042570, 873512089, 893921179, 2057043271, 2567, 93072473, 26460704, 33365801, 865042570, 517341201, 275602560, 862343378,
Found 1 bad relations in array.
commencing in-memory singleton removal

Program received signal SIGSEGV, Segmentation fault.
0x000000000044a176 in filter_purge_singletons_core (obj=0x6de250, filter=0x7fffffffc0f0) at common/filter/singleton.c:445
445 freqtable[ideal]++;
[/CODE]

The line number is a little different since I added a simple routine to check if relation with an ideal with index greater than num_ideals exists. This code is at
[url]https://github.com/gchilders/msieve_nfsathome/tree/msieve-nfsathome[/url]

So at the end of filter_purge_singletons_core(), the relation set appears fine, but something is going awry in purge_cliques_core() since at the end of the delete_relations() the last relation is bad.

Edit: At this point (uint32 *)curr_relation - (uint32 *)relation_array is nearly 2^35, so those relations at the end aren't even be participating in the purge.

Happy5214 2021-02-04 07:58

Just to be sure, can you put another check near the beginning of purge_cliques_core()? We want to rule out other code called between the singleton purge and the clique purge.

frmky 2021-02-04 09:03

[QUOTE=Happy5214;570831]Just to be sure, can you put another check near the beginning of purge_cliques_core()? We want to rule out other code called between the singleton purge and the clique purge.[/QUOTE]

I added a check of the relations at the beginning and before each major loop in purge_cliques_core() and restarted a run. I'll report back tomorrow.

frmky 2021-02-05 05:36

Right before the call to delete_relations(), the relations array looks fine. At the end of delete_relations(), it's broken. I just noticed that in the loop in that function, the calculation of array_word will overflow. I don't think that should cause the issue since it's just used as a comparison to delete_array[], but next I'll make the addresses in delete_array[] 64-bit and remove the check purge_clique_core() so all relations will participate in the purge and fix this overflow.

frmky 2021-02-06 09:02

So far, so good. Gone past the place of the previous crash, and looping through clique and singleton removal now. Relations array checks are good so far.

Happy5214 2021-02-06 12:36

That seems promising. Assuming you're working with Ryan's data, if you can successfully filter it, is it possible to send him the msieve output files so he doesn't have to redo the work?

ryanp 2021-02-06 14:23

If frmky is able to form a matrix, either he or I can run the LA. I'd also be interested to get the msieve patch so I can try the filtering step myself.

frmky 2021-02-07 17:39

1 Attachment(s)
[CODE]Sat Feb 6 23:19:11 2021 commencing 2-way merge
Sat Feb 6 23:28:47 2021 reduce to 334640603 relation sets and 328431256 unique ideals
Sat Feb 6 23:28:47 2021 commencing full merge
Sun Feb 7 04:51:15 2021 memory use: 38947.4 MB
Sun Feb 7 04:52:10 2021 found 147393213 cycles, need 147295456
Sun Feb 7 04:53:37 2021 weight of 147295456 cycles is about 19148588285 (130.00/cycle)
Sun Feb 7 04:53:37 2021 distribution of cycle lengths:
Sun Feb 7 04:53:37 2021 1 relations: 11327174
Sun Feb 7 04:53:37 2021 2 relations: 10934036
Sun Feb 7 04:53:37 2021 3 relations: 11762737
Sun Feb 7 04:53:37 2021 4 relations: 11269419
Sun Feb 7 04:53:37 2021 5 relations: 10840433
Sun Feb 7 04:53:37 2021 6 relations: 10270300
Sun Feb 7 04:53:37 2021 7 relations: 9536741
Sun Feb 7 04:53:37 2021 8 relations: 8923986
Sun Feb 7 04:53:37 2021 9 relations: 8177647
Sun Feb 7 04:53:37 2021 10+ relations: 54252983
Sun Feb 7 04:53:37 2021 heaviest cycle: 28 relations
Sun Feb 7 04:54:54 2021 commencing cycle optimization
Sun Feb 7 05:08:34 2021 start with 1239155413 relations
Sun Feb 7 07:05:08 2021 pruned 42215931 relations
Sun Feb 7 07:05:10 2021 memory use: 34774.5 MB
Sun Feb 7 07:05:11 2021 distribution of cycle lengths:
Sun Feb 7 07:05:11 2021 1 relations: 11327174
Sun Feb 7 07:05:11 2021 2 relations: 11209087
Sun Feb 7 07:05:11 2021 3 relations: 12224988
Sun Feb 7 07:05:11 2021 4 relations: 11643676
Sun Feb 7 07:05:11 2021 5 relations: 11239934
Sun Feb 7 07:05:11 2021 6 relations: 10573210
Sun Feb 7 07:05:11 2021 7 relations: 9830741
Sun Feb 7 07:05:11 2021 8 relations: 9126063
Sun Feb 7 07:05:11 2021 9 relations: 8350544
Sun Feb 7 07:05:11 2021 10+ relations: 51770039
Sun Feb 7 07:05:11 2021 heaviest cycle: 28 relations
Sun Feb 7 07:13:57 2021 RelProcTime: 199990
Sun Feb 7 07:13:57 2021 elapsed time 55:33:18[/CODE]
147.3M matrix! That would take my little 8 nodes about 2.5 months to solve!
Patch is attached. Edit: Had to add .txt to the end to get the forum to accept it.

ryanp 2021-02-07 18:01

[QUOTE=frmky;571096]147.3M matrix! That would take my little 8 nodes about 2.5 months to solve![/QUOTE]

Any chance you'd be able to run it? :smile:

Otherwise, I'll test out the patch and can try sieving some more.

frmky 2021-02-07 21:52

It's now in the official msieve SVN, so no need for the patch file. Just grab the latest code.

frmky 2021-02-08 03:15

[CODE]commencing linear algebra
read 147295456 cycles
cycles contain 531217644 unique relations
read 531217644 relations
using 8 quadratic characters above 4294917295
building initial matrix
memory use: 68478.7 MB
read 147295456 cycles
matrix is 147295267 x 147295456 (74519.1 MB) with weight 19922810683 (135.26/col)
sparse part has weight 18061791563 (122.62/col)
filtering completed in 2 passes
matrix is 147292440 x 147292629 (74518.9 MB) with weight 19922748759 (135.26/col)
sparse part has weight 18061766673 (122.63/col)
matrix starts at (0, 0)
matrix is 147292440 x 147292629 (74518.9 MB) with weight 19922748759 (135.26/col)
sparse part has weight 18061766673 (122.63/col)
saving the first 48 matrix rows for later
matrix includes 64 packed rows
matrix is 147292392 x 147292629 (71275.3 MB) with weight 17993852562 (122.16/col)
sparse part has weight 17211463600 (116.85/col)
using block size 8192 and superblock size 1966080 for processor cache size 20480 kB
commencing Lanczos iteration (16 threads)
memory use: 73902.4 MB
linear algebra at 0.0%, ETA 35242h 7m92629 dimensions (0.0%, ETA 35242h 7m)[/CODE]
I'll see you in 4 years... :lol:

VBCurtis 2021-02-08 05:57

[QUOTE=frmky;571129][CODE]
linear algebra at 0.0%, ETA 35242h 7m92629 dimensions (0.0%, ETA 35242h 7m)[/CODE]
I'll see you in 4 years... :lol:[/QUOTE]
Holy {censored}!!!!

I mean, wow. 147M dimensions.... But the ETA seems off- shouldn't this take around 4x as long as 73M dimensions? I solved one that size for 2330L in about 3100hr on a single 10-core Ivy Bridge socket, and around 2000hr when run naively (without MPI) on both 10-core sockets.

I'm not volunteering for ~8000 machine-hours, but it seems an 8-socket mini-cluster should be around 10x faster than this!

Thanks for finding and fixing this bug- I'm happy that we're likely to have msieve at our disposal for jobs as big as we can solve matrices for!

frmky 2021-02-08 06:49

I was being a bit facetious. I've found it scales closer to d^2.5, so say a little under 6 times as long. And that computer isn't particularly fast. A rough estimate is about 2.5 months on the cluster I most use for NFS@Home numbers.

I'll ask around for cluster time to not set NFS@Home back too far. I doubt any of the currently sieved numbers can be done without at least a small cluster with IB or at least 10GbE.

ryanp 2021-02-08 23:13

[QUOTE=frmky;571140]I was being a bit facetious. I've found it scales closer to d^2.5, so say a little under 6 times as long. And that computer isn't particularly fast. A rough estimate is about 2.5 months on the cluster I most use for NFS@Home numbers.

I'll ask around for cluster time to not set NFS@Home back too far. I doubt any of the currently sieved numbers can be done without at least a small cluster with IB or at least 10GbE.[/QUOTE]

If you're able to run this on your cluster, that would be greatly appreciated; if not, I also understand. Meanwhile I will continue sieving to see if I can form something resembling a reasonable matrix.

ryanp 2021-02-10 05:36

Verified the fix in the latest SVN update, and I was able to produce a matrix myself. For comparison, here's an ETA on a 64-core workstation:

[CODE]commencing linear algebra
read 147295123 cycles
cycles contain 531218272 unique relations
read 531218272 relations
using 8 quadratic characters above 4294917295
building initial matrix
memory use: 68478.7 MB
read 147295123 cycles
matrix is 147294934 x 147295123 (74519.4 MB) with weight 19922890703 (135.26/col)
sparse part has weight 18061865192 (122.62/col)
filtering completed in 2 passes
matrix is 147292104 x 147292281 (74519.2 MB) with weight 19922828489 (135.26/col)
sparse part has weight 18061839994 (122.63/col)
matrix starts at (0, 0)
matrix is 147292104 x 147292281 (74519.2 MB) with weight 19922828489 (135.26/col)
sparse part has weight 18061839994 (122.63/col)
saving the first 240 matrix rows for later
matrix includes 256 packed rows
matrix is 147291864 x 147292281 (69444.9 MB) with weight 16505103766 (112.06/col)
sparse part has weight 15847879420 (107.59/col)
using block size 8192 and superblock size 608256 for processor cache size 25344 kB
commencing Lanczos iteration (64 threads)
memory use: 98165.2 MB
linear algebra at 0.0%, ETA 4273h 0m292281 dimensions (0.0%, ETA 4273h 0m)
checkpointing every 40000 dimensions292281 dimensions (0.0%, ETA 4287h45m)
linear algebra completed 81656 of 147292281 dimensions (0.1%, ETA 4336h46m)[/CODE]

Unfortunately I don't think I can tie up this machine for ~half year, but hopefully some more sieving can bring down the matrix size a bit.

henryzz 2021-02-10 11:00

[QUOTE=ryanp;571253]Verified the fix in the latest SVN update, and I was able to produce a matrix myself. For comparison, here's an ETA on a 64-core workstation:[/QUOTE]
Is this using MPI? On a multi-cpu system that should help.

ryanp 2021-02-10 21:27

[QUOTE=henryzz;571258]Is this using MPI? On a multi-cpu system that should help.[/QUOTE]

I don't have a lot of experience using MPI. I tried installing mpirun and the necessary developer libs, and recompiled msieve with MPI=1. I'm getting this error when trying to run, though:

[CODE]$ mpirun -np 36 ~/math/msieve-mpi/msieve -v -i ./input.ini -l ./input.log -s ./input.dat -nf ./input.fb -nc2 2,18
Invalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 key[workstation:3437987] *** Process received signal ***
[workstation:3437987] Signal: Segmentation fault (11)
[workstation:3437987] Signal code: Invalid permissions (2)
[workstation:3437987] Failing at address: 0x558c64230990
[workstation:3437987] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3bd60)[0x7fa355b7bd60][/CODE]

henryzz 2021-02-11 17:43

[QUOTE=ryanp;571281]I don't have a lot of experience using MPI. I tried installing mpirun and the necessary developer libs, and recompiled msieve with MPI=1. I'm getting this error when trying to run, though:

[CODE]$ mpirun -np 36 ~/math/msieve-mpi/msieve -v -i ./input.ini -l ./input.log -s ./input.dat -nf ./input.fb -nc2 2,18
Invalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 key[workstation:3437987] *** Process received signal ***
[workstation:3437987] Signal: Segmentation fault (11)
[workstation:3437987] Signal code: Invalid permissions (2)
[workstation:3437987] Failing at address: 0x558c64230990
[workstation:3437987] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3bd60)[0x7fa355b7bd60][/CODE][/QUOTE]

I am no expert on MPI running although there supposed to be a speedup. Posts #229 and #230 from [url]https://mersenneforum.org/showthread.php?t=24211&page=13[/url] would be worth reading.

After googling MIT-MAGIC-COOKIE-1 errors seem to be related to x11. Could you try in console mode? unsetting the display variable may be enough "unset DISPLAY"

There have been issues with different versions of mpi/ubuntu. EdH might be able to help sort through that mess. This could be the same issue.

ryanp 2021-02-13 15:16

I'm back to sieving for this job again. Currently at 2,136,576,912 uniques. My plan for each ~day: gather new relations sieved, then do an msieve filtering run to see how large the resultant matrix is.

ryanp 2021-02-17 20:37

Currently at:

[CODE]found 0 duplicates and 2472407741 unique relations
memory use: 31280.0 MB
reading ideals above 2873819136
commencing singleton removal, initial pass
memory use: 41024.0 MB
reading all ideals from disk
memory use: 41399.9 MB
commencing in-memory singleton removal
begin with 2472407741 relations and 2048964861 unique ideals
reduce to 1444557329 relations and 911198417 ideals in 16 passes
max relations containing the same ideal: 34
reading ideals above 720000
commencing singleton removal, initial pass[/CODE]

With ~2.28B uniques, the matrix size was down to 134M, so hopefully we can push it down a bit further still.

RichD 2021-02-17 23:45

No wonder this has been the number one most wanted for 15 years or so...

ryanp 2021-02-19 22:06

Currently running the filtering with [B]2.59B[/B] uniques, and target_density=140.

Previous run (2.47B uniques) managed to get the matrix size down to 125M.

henryzz 2021-02-19 23:33

What's the target? How does the solving time scale?

ryanp 2021-02-20 14:46

[QUOTE=henryzz;572061]What's the target? How does the solving time scale?[/QUOTE]

Not sure about the target; something in the 90M to 100M range would be nice. I'm also running up against the sheer size of the relations file, which is now about 200GB even gzip'ed.

ryanp 2021-02-21 18:38

Assuming the current filtering run (2.59B uniques, target_density=140) succeeds, I will probably start LA afterward. The size of the relations file is becoming the issue at this point.

ryanp 2021-02-22 14:52

[CODE]commencing linear algebra
read 118001756 cycles
cycles contain 408993293 unique relations
read 408993293 relations
using 20 quadratic characters above 4294917295
building initial matrix
memory use: 54264.7 MB
read 118001756 cycles
matrix is 118001577 x 118001756 (63095.7 MB) with weight 17614526827 (149.27/col)
sparse part has weight 15360150213 (130.17/col)
filtering completed in 2 passes
matrix is 117998763 x 117998931 (63095.5 MB) with weight 17614443699 (149.28/col)
sparse part has weight 15360119182 (130.17/col)
matrix starts at (0, 0)
matrix is 117998763 x 117998931 (63095.5 MB) with weight 17614443699 (149.28/col)
sparse part has weight 15360119182 (130.17/col)
saving the first 240 matrix rows for later
matrix includes 256 packed rows
matrix is 117998523 x 117998931 (58680.1 MB) with weight 14077102445 (119.30/col)
sparse part has weight 13494642461 (114.36/col)
using block size 8192 and superblock size 608256 for processor cache size 25344 kB
commencing Lanczos iteration (64 threads)
memory use: 80685.9 MB
linear algebra at 0.0%, ETA 2503h12m998931 dimensions (0.0%, ETA 2503h12m)
checkpointing every 50000 dimensions998931 dimensions (0.0%, ETA 2521h18m)
linear algebra completed 484175 of 117998931 dimensions (0.4%, ETA 2391h22m)[/CODE]

Doable, but slow; fortunately frmky has offered to lend a hand with the LA in a few weeks' time.

ryanp 2021-03-05 19:08

After getting up to 2.8B uniques, filtering with target_density=150 produced a 109Mx109M matrix.

[CODE]commencing linear algebra
read 109443915 cycles
cycles contain 382966629 unique relations
read 382966629 relations
using 20 quadratic characters above 4294917295
building initial matrix
memory use: 51260.2 MB
read 109443915 cycles
matrix is 109443738 x 109443915 (61867.3 MB) with weight 17248536158 (157.60/col)
sparse part has weight 15123709456 (138.19/col)
filtering completed in 2 passes
matrix is 109441975 x 109442140 (61867.2 MB) with weight 17248479767 (157.60/col)
sparse part has weight 15123686159 (138.19/col)
matrix starts at (0, 0)
matrix is 109441975 x 109442140 (61867.2 MB) with weight 17248479767 (157.60/col)
sparse part has weight 15123686159 (138.19/col)
saving the first 240 matrix rows for later
matrix includes 256 packed rows
matrix is 109441735 x 109442140 (57387.5 MB) with weight 13866272003 (126.70/col)
sparse part has weight 13292709138 (121.46/col)
using block size 8192 and superblock size 608256 for processor cache size 25344 kB
commencing Lanczos iteration (64 threads)
memory use: 77410.7 MB
linear algebra at 0.0%, ETA 2362h54m442140 dimensions (0.0%, ETA 2362h54m)
checkpointing every 50000 dimensions442140 dimensions (0.0%, ETA 2353h21m)
linear algebra completed 12252 of 109442140 dimensions (0.0%, ETA 2299h53m) [/CODE]

This would probably be doable for me, but will be faster on frmky's MPI cluster.

VBCurtis 2021-03-05 19:45

Is there a sense that the filtering bug was squashed, and only the size of the dataset is in the way of filtering jobs this size?

Seems 33/35 is better than 33/36 for this job, to shrink the dataset by 25% (say, 2.1B uniques rather than 2.8B).

I suppose it only matters in principle, since matrix size also stands in the way of trying an even-tougher job.

Bravo for slaying this beast, all the same! Well, almost done..

frmky 2021-03-07 05:23

I think so... Certainly hope so! Filtering with lots of excess relations is still hard so things could go awry there, but with a reasonable excess the remaining limit is 2^32 total, not unique, relations. With prior duplicate and perhaps singleton removal, this permits large runs.

The size of the dataset isn't much of an issue. The filtering running time is dwarfed by both the sieving and LA running times anyway, and someone completing numbers this size likely has access to a computer with sufficient storage and memory.

[CODE]200G Mar 6 01:55 msieve.dat.gz

reading ideals above 720000
commencing singleton removal, initial pass
memory use: 41024.0 MB
reading all ideals from disk
memory use: 87634.0 MB
keeping 1372156493 ideals with weight <= 200, target excess is 10188461[/CODE]

lavalamp 2021-03-09 11:38

1 Attachment(s)
Finally finished my ECM work on the t2200 file!

Composites larger than 2^1018 have had 224 curves @ B1=50e3, B2=13.7e6 (25 digit level).

Composites less than 2^1018 have had 1152 curves @ B1=3e6, B2=14e9 (40 digit level).

The total haul is 815 factors, these have been submitted to factordb, and attached to this post.

Pascal Ochem 2021-03-25 12:10

Thanks! That is a lot of factors, with an explicit lower bound on the ECM effort for these numbers.

ThomRuley 2021-03-26 16:59

That's a great idea. What is the current ECM status on the t2200 composites?

ryanp 2021-03-29 18:31

Here's the latest on the #1 MWRB, with frmky lending a helping hand for the LA:

[CODE]linear algebra completed 20216008 of 109441779 dimensions (18.5%, ETA 854h19m)[/CODE]

ryanp 2021-05-05 18:32

Thanks in large part to frmky's help with the filtering fix and running LA, the #1 MWRB has fallen.

[url]http://factordb.com/index.php?query=6115909044841454629%5E17-1[/url]

frmky 2021-05-05 18:49

Here's the log for posterity.

[PASTEBIN]ziQFbDHJ[/PASTEBIN]

pinhodecarlos 2021-05-05 19:50

Congrats, well done gents!

Pascal Ochem 2021-05-05 20:19

Ryan, frmky, thank you.
Now I explore the subtrees unlocked by the new factors for lower bounds on many parameters of an OPN.

charybdis 2021-05-05 21:36

[QUOTE=ryanp;577739]Thanks in large part to frmky's help with the filtering fix and running LA, the #1 MWRB has fallen.

[url]http://factordb.com/index.php?query=6115909044841454629%5E17-1[/url][/QUOTE]

:bow wave:

Glad to see this completed successfully without any further bugs. I presume this means the 4G relation limit is now the limiting factor in the size of jobs that can be run with msieve?

frmky 2021-05-06 01:28

Yes, I think so. Along with computing resources to complete both the sieving and linear algebra.

wreck 2021-05-06 11:34

Congratulations, it's a pleasure to see that msieve could work without crash when unique relations' count larger than 2000M.

henryzz 2021-05-06 18:16

Congratulations for finally getting this problematic number done.

mathwiz 2021-05-08 15:28

[QUOTE=Pascal Ochem;577754]Ryan, frmky, thank you.
Now I explore the subtrees unlocked by the new factors for lower bounds on many parameters of an OPN.[/QUOTE]

Which file(s) of numbers needed will be regenerated as a result of this exploration?

Pascal Ochem 2021-05-10 20:26

For the factor P220 of the C301, I have factored 1+P220 by finding a P48.
And now the roadblock has completely disappeared for the bound [$]10^{2200}[/$].
I started the full run for proving the bounds [$]10^{2100}[/$] (to see how the weights drop)
and [$]10^{2200}[/$] (to hopefully get a new bound).
They will finish in a few weeks and we will have the new mwrb files.
The tXXXX files are unaffected.

The other parameters where we encountered the C301 are
- The total number of prime factors [$]\Omega[/$]: I try to extend the bound from 111 factors to 115.
- [$]\Omega-2\omega[/$]: trying to go from 51 to 53 seems difficult.
In both cases, C301 was also the worst roadblock, although way less bad than for the bound [$]10^{2100}[/$]
In both cases, there are still a few mild roadblocks in the unlocked subtrees, not worth considering.
There is no available file of composites here.

And my webpage needs an update: C301 was the example to explain how to circumvent roadblocks.

Pascal Ochem 2021-05-24 21:35

The run with bound [$]10^{2100}[/$] just finished and we see the impact of the C301.
The old file has 1589 composites.
[url]http://www.lirmm.fr/~ochem/opn/old_mwrb2100.txt[/url]
The new file has 1192 composites.
[url]http://www.lirmm.fr/~ochem/opn/mwrb2100.txt[/url]

Don't worry if you are working on a composite that has disappeared.
It will re-appear in mwrb2200 once the run with bound [$]10^{2200}[/$] is over, because of the composite [$]\sigma(11^{330})[/$].

mathwiz 2021-05-24 22:09

[QUOTE=Pascal Ochem;578993]The run with bound [$]10^{2100}[/$] just finished and we see the impact of the C301.
The old file has 1589 composites.
[url]http://www.lirmm.fr/~ochem/opn/old_mwrb2100.txt[/url]
The new file has 1192 composites.
[url]http://www.lirmm.fr/~ochem/opn/mwrb2100.txt[/url]

Don't worry if you are working on a composite that has disappeared.
It will re-appear in mwrb2200 once the run with bound [$]10^{2200}[/$] is over, because of the composite [$]\sigma(11^{330})[/$].[/QUOTE]

Does this tell us anything about whether the [$]10^{2200}[/$] run will succeed, or are the two completely independent?

wblipp 2021-05-25 05:01

@Pascal

I think you posted the old mwrb file to both links. They both start with

6115909044841454629 16

--------------
@mathwiz
As described in [URL="http://www.lirmm.fr/~ochem/opn/"]Pascal's summary page[/URL], he circumvents roadblocks using a process similar to Kevin Hare's process. This can be applied recursively. In principle, the proof can be extended to any level. In practice, each roadblock causes a large number of additional tree terms, and the limit on the proof is how much time he is willing to commit to generating the proof. Pascal's intuition is that the factoring of the C301 reduced the roadblock complexity sufficiently to allow him to complete the 2200 proof in tolerable time. Unless it turns out he has grossly underestimated the time, he will almost certainly stick to the process long enough to complete the 2200 proof.

ryanp 2021-05-25 05:21

[QUOTE=wblipp;579019]@Pascal

I think you posted the old mwrb file to both links. They both start with

6115909044841454629 16[/QUOTE]

Seems fixed now at least?

[CODE]$ curl -s http://www.lirmm.fr/~ochem/opn/old_mwrb2100.txt | wc -l
1589
$ curl -s http://www.lirmm.fr/~ochem/opn/mwrb2100.txt | wc -l
1192[/CODE]

ryanp 2021-05-25 16:40

[QUOTE=Pascal Ochem;578993]The run with bound [$]10^{2100}[/$] just finished and we see the impact of the C301.
The old file has 1589 composites.
[url]http://www.lirmm.fr/~ochem/opn/old_mwrb2100.txt[/url]
The new file has 1192 composites.
[url]http://www.lirmm.fr/~ochem/opn/mwrb2100.txt[/url]

Don't worry if you are working on a composite that has disappeared.
It will re-appear in mwrb2200 once the run with bound [$]10^{2200}[/$] is over, because of the composite [$]\sigma(11^{330})[/$].[/QUOTE]

I'm doing some ECM work now. Already found one: [url]http://factordb.com/index.php?id=1100000000012016966[/url] (I assume if I just report to FDB, it'll be picked up by Pascal's scraper as usual)?

wblipp 2021-05-25 21:12

[QUOTE=ryanp;579021]Seems fixed now at least?[/QUOTE]

I needed to clear my cache. Probably worked correctly all along if not cached.

ryanp 2021-06-01 13:40

[QUOTE=Pascal Ochem;578993]The run with bound [$]10^{2100}[/$] just finished and we see the impact of the C301.
The old file has 1589 composites.
[url]http://www.lirmm.fr/~ochem/opn/old_mwrb2100.txt[/url]
The new file has 1192 composites.
[url]http://www.lirmm.fr/~ochem/opn/mwrb2100.txt[/url]

Don't worry if you are working on a composite that has disappeared.
It will re-appear in mwrb2200 once the run with bound [$]10^{2200}[/$] is over, because of the composite [$]\sigma(11^{330})[/$].[/QUOTE]

I'm down to 1124 composites remaining in my mwrb2100.

Remaining: [url]https://cs.stanford.edu/~rpropper/mwrb2100.txt[/url]
Factors found: [url]https://cs.stanford.edu/~rpropper/opn.txt[/url]

Pascal Ochem 2021-07-09 10:49

no OPN is smaller than 10^2200
 
The run for [$]10^{2200}[/$] is now over. The program gave up on the following branches:

[$]127^{192}[/$] / [$]19^2[/$] [$]3^6[/$] [$]1093^2[/$] [$]398581^{102}[/$] / [$]7^2[/$] / [$]13^1[/$]
[$]127^{192}[/$] / [$]19^2[/$] [$]3^6[/$] [$]1093^2[/$] [$]398581^{108}[/$] / [$]7^2[/$] / [$]13^1 [/$]
[$]61^{232}[/$] / [$]13^2[/$] [$]3^2[/$] / [$]5^{520}[/$] / [$]179^{138}[/$] / [$]1789^1[/$]

The first two situations are due to a standard branching on [$]3^6[/$], so that the considered upper bound on the abundancy is [$]\frac32[/$] instead of [$]\frac{1093}{3^6}[/$].
I fixed it by making an exact branching, after [$]127^{192}[/$] / [$]19^2[/$], on every component [$]3^{2i}[/$], that is, even when [$]2i+1[/$] is composite.

There is no such fix for the third situation: we already have exact branchings on [$]13^2[/$], [$]3^2[/$], [$]1789^1[/$], and the abundancy [$]\sigma_{-1}(p^e)[/$] is really close to [$]\sigma_{-1}(p^\infty)=\frac{p}{p-1}[/$] for [$]61^{232}[/$], [$]5^{520}[/$], [$]179^{138}[/$].

The abundancy in this branch is [$]\frac{61}{60}\frac{183}{13^2}\frac{13}{3^2}\frac54\frac{179}{178}\frac{1790}{1789}=1.99999792324886[/$], which is very close to 2.
If we don't get a miraculous ECM factor and if we want to avoid a huge roadblock circumventing, we can use an old school argument:
One copy of 5 comes from [$]\sigma(1789^1)[/$].
The other 519 copies of 5 must come from things of the form [$]\sigma(p^4)[/$] with [$]p\equiv1\pmod{5}[/$]. Moreover, [$]p\ge963121[/$] so that the abundancy is at most 2.
The case of [$]\sigma(p^{5^k-1})[/$] giving [$]5^k[/$] with [$]k\ge2[/$] is ruled out because it is suboptimal for our argument.
Now [$](963121^4)^{519}[/$] is greater than our bound [$]10^{2200}[/$].

henryzz 2021-07-09 11:46

It might be a good learning exercise to push each of the forbidden factors as far as possible(assuming that previous primes have been forbidden). Working out the logic to circumvent these roadblocks could be an important learning exercise. I know runtime will limit some of the primes but AFAIK some are still quite quick runs.
The logic gained from these circumventions could potentially be used to shorten the current proof.

Unless I have made a mistake 179^139-1 looks factorable with 179*(179^23)^6-1 at 314 digits which looks within reach of nfs@home(or maybe ryanp) which is currently sieving a couple of SNFS candidates with 326 digits.

wblipp 2021-07-11 05:27

[QUOTE=Pascal Ochem;582876]
The first two situations are due to a standard branching on [$]3^6[/$], ...

I fixed it by making an exact branching, after [$]127^{192}[/$] / [$]19^2[/$], on every component [$]3^{2i}[/$], that is, even when [$]2i+1[/$] is composite.[/QUOTE]

Could I persuade you to explain the rationale for choosing this fix?

In the paper for [$]10^{1500}[/$] you used exact branches on [$]3^2[/$] and [$]3^4[/$], which required additional standard branches at [$]3^8[/$], [$]3^{14}[/$], and [$]3^{24}[/$]. You could have added an exact branch at [$]3^6[/$] by adding standard branches as [$]3^{20}[/$], [$]3^{34}[/$], and [$]3^{48}[/$]. I have thought of two possible reasons that you didn't do this. First, perhaps this wouldn't have been sufficient and more complex exact branches would have been required - [$]3^8[/$] would be particularly complicated. Second, perhaps you judged this growing list of extra standard branches to be aesthetically ugly, and chose the "every component" as more elegant and simpler to explain.

Pascal Ochem 2021-07-11 13:17

Henry:
Luckily, the ad-hoc argument is simple (and computer-free) thanks to a Fermat prime raised to large power (5^520) and a special prime that is already specified.
It is good enough for now, we don't need to rush the factorization of 179^139-1 that would cost dozens of dollars and grams of CO2.
Maybe more important feasible composites will show up after a closer look at the log files of this run.

William:
Assuming subtle mathematical or aesthetical motives is not a good bet this time.
The exact branch at 3^6 that you describe certainly works, but there is no suitable command option to do it, whereas I only had to replace "-X4 3" by "-Y 3".
The expected computational gain is not worth the effort of adding the "-X6" option and the risk to mess with Michael's code.

lavalamp 2021-07-11 23:26

[QUOTE=Pascal Ochem;583009]we don't need to rush the factorization of 179^139-1 that would cost dozens of dollars and grams of CO2.[/QUOTE]Has it been attacked much with ECM?

Alas it is just above the size that can be run with GPU-ECM.

RichD 2021-07-11 23:55

[QUOTE=lavalamp;583032]Has it been attacked much with ECM?

Alas it is just above the size that can be run with GPU-ECM.[/QUOTE]

I don't know of much ECM work except the above post by [B]ryanp[/B] which did a blanket ECM run of the entire file. Based upon the factors found I am guessing about a t60 or t61.

ryanp 2021-07-12 03:34

[QUOTE=lavalamp;583032]Has it been attacked much with ECM?

Alas it is just above the size that can be run with GPU-ECM.[/QUOTE]

For that one in particular, I've hit it with 110K curves at B1=85e7. May try B1=29e8 next.

henryzz 2021-09-09 14:06

[QUOTE=Pascal Ochem;583009]Henry:
Luckily, the ad-hoc argument is simple (and computer-free) thanks to a Fermat prime raised to large power (5^520) and a special prime that is already specified.
It is good enough for now, we don't need to rush the factorization of 179^139-1 that would cost dozens of dollars and grams of CO2.
Maybe more important feasible composites will show up after a closer look at the log files of this run.

William:
Assuming subtle mathematical or aesthetical motives is not a good bet this time.
The exact branch at 3^6 that you describe certainly works, but there is no suitable command option to do it, whereas I only had to replace "-X4 3" by "-Y 3".
The expected computational gain is not worth the effort of adding the "-X6" option and the risk to mess with Michael's code.[/QUOTE]

I have finally had time to think further on this and have a few more questions:

Am I correct in thinking that 5 being a Fermat prime made it easier to determine the optimal way of finding all the needed 5s and similar logic could be applied to other primes although less effectively(maybe [$]963121^{519}[/$] rather than [$](963121^4)^{519}[/$]?) Does the special prime need to always be specified as long as it is accounted for(there may be examples of it providing more than one of the prime).
If we can generalize this technique without it becoming ineffective it could reduce the tree significantly. Surely there must be a useful argument for finding the needed 232 61s. Applying this sort of argument to more than one prime at once could get complicated as a term could provide both needed primes. I will think on this further. It shouldn't be too difficult for a program to find a list of the 232 smallest [$]p^x[/$] such that [$]61 | \sigma(p^x)[/$], however, it would be necessary to also check for [$]61^2 | \sigma(p^x)[/$] for [$]\sigma(p^x)[/$] up to twice the size(and 61^3 etc.) which could be harder. Maybe a suboptimal(we underestimate the size) case could be proven.

You mentioned a -Y option above. The version posted at [url]https://www.arthy.org/opn/[/url] doesn't contain this. Are you working with a later version(if so would you be willing to share?)? Am I correct in thinking that the -X2, -X4 and -Y options are only needed for some of the primes to avoid issues but could be removed for other forbidden factors? Theoretically, the -Y could only be used for 127^192 reducing computation in the other trees where 3 appears especially 3^x.

Are we still only forbidding the factors 127,19,7,11,331,31,97,61,13,398581,1093,3,5,307,17? [url]http://www.lirmm.fr/~ochem/opn/efficiency.txt[/url] suggests this was only sufficient up to 1735

The link to the BCR paper [url]http://wwwmaths.anu.edu.au/~brent/pub/pub116.html[/url] is now broken. [url]https://maths-people.anu.edu.au/~brent/pub/pub116.html[/url] seems to work.

ThomRuley 2021-11-04 15:12

Is anybody doing the numbers in the t500 file or is there a better place to work on composites?

RichD 2021-11-04 19:17

[QUOTE=ThomRuley;592419]Is anybody doing the numbers in the t500 file or is there a better place to work on composites?[/QUOTE]

There is no t500 file. Did you mean t550 or t600?
I have a lot of ECM work completed on these latter files.

ThomRuley 2021-11-04 19:27

I mistyped, I meant t550. if most of the ECM has been done here, then I'll get started on the NFS unless someone suggests a more useful set of composites.

RichD 2021-11-04 20:54

ECM Status for t550
 
31479823396757 18 - Ready for NFS. I haven’t test sieved yet but it wouldn’t surprise me this may need the 16e siever with the larger coefficient.

1437263292796553 16 - 10K @ 260e6 completed.

125642407351 22 - Ready for NFS. Not test sieved yet.

310897033 30 - 10K @ 260e6 completed.

221110919 30 - 10K @ 260e6 completed.

8224356155341457 16 - 10K @ 260e6 completed.

331 108 - 0.5t55 completed.


All times are UTC. The time now is 13:53.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.