Register FAQ Search Today's Posts Mark Forums Read

 2021-02-01, 16:08 #903 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 5,857 Posts It looks to me that wreck has pointed out some issues that will soon become an issue even if they aren't quite an issue yet. How difficult would it be to future-proof issues 1 and 4? Would changing everything to uint64 increase memory usage a lot in these cases? I am not sure where the bulk of the memory usage is.
2021-02-01, 17:41   #904
ryanp

Jun 2012
Boulder, CO

27610 Posts

Thanks for the detailed investigation.

Quote:
 Originally Posted by wreck 5. A question. Does ryanp give a try to use unique relations less than 1600M? If done, what's the result?
Here's the log from a run with ~1.44B uniques:

Code:
Fri Jan  1 17:10:48 2021  commencing relation filtering
Fri Jan  1 17:10:48 2021  setting target matrix density to 120.0
Fri Jan  1 17:10:48 2021  estimated available RAM is 192087.4 MB
Fri Jan  1 17:10:48 2021  commencing duplicate removal, pass 1
Fri Jan  1 19:20:20 2021  found 323664961 hash collisions in 1440338304 relation
s
Fri Jan  1 19:20:20 2021  added 1 free relations
Fri Jan  1 19:20:20 2021  commencing duplicate removal, pass 2
Fri Jan  1 19:33:53 2021  found 7 duplicates and 1440338298 unique relations
Fri Jan  1 19:33:53 2021  memory use: 16280.0 MB
Fri Jan  1 19:33:54 2021  reading ideals above 1128464384
Fri Jan  1 19:33:54 2021  commencing singleton removal, initial pass
Fri Jan  1 22:00:14 2021  memory use: 41024.0 MB
Fri Jan  1 22:00:16 2021  reading all ideals from disk
Fri Jan  1 22:00:50 2021  memory use: 28689.1 MB
Fri Jan  1 22:02:12 2021  commencing in-memory singleton removal
Fri Jan  1 22:03:39 2021  begin with 1440338298 relations and 1728567752 unique
ideals
Fri Jan  1 22:08:40 2021  reduce to 54335106 relations and 30002747 ideals in 49
passes
Fri Jan  1 22:08:40 2021  max relations containing the same ideal: 12
Fri Jan  1 22:08:44 2021  reading ideals above 720000
Fri Jan  1 22:08:44 2021  commencing singleton removal, initial pass
Fri Jan  1 22:17:29 2021  memory use: 3012.0 MB
Fri Jan  1 22:17:30 2021  reading all ideals from disk
Fri Jan  1 22:17:33 2021  memory use: 2538.2 MB
Fri Jan  1 22:17:42 2021  keeping 118273914 ideals with weight <= 200, target ex
cess is 278398
Fri Jan  1 22:17:50 2021  commencing in-memory singleton removal
Fri Jan  1 22:17:55 2021  begin with 54335106 relations and 118273914 unique ide
als
Fri Jan  1 22:18:04 2021  reduce to 15293 relations and 0 ideals in 6 passes
Fri Jan  1 22:18:04 2021  max relations containing the same ideal: 0
Fri Jan  1 22:18:04 2021  filtering wants 1000000 more relations
Fri Jan  1 22:18:04 2021  elapsed time 05:07:18
And, for comparison, with 1.85B uniques:

Code:
Tue Jan  5 11:20:36 2021  commencing relation filtering
Tue Jan  5 11:20:36 2021  setting target matrix density to 120.0
Tue Jan  5 11:20:36 2021  estimated available RAM is 192087.4 MB
Tue Jan  5 11:20:36 2021  commencing duplicate removal, pass 1
Tue Jan  5 15:29:49 2021  found 487341305 hash collisions in 1858202629 relation
s
Tue Jan  5 15:29:49 2021  added 1 free relations
Tue Jan  5 15:29:49 2021  commencing duplicate removal, pass 2
Tue Jan  5 16:45:46 2021  found 9 duplicates and 1858202621 unique relations
Tue Jan  5 16:45:46 2021  memory use: 16280.0 MB
Tue Jan  5 16:45:49 2021  reading ideals above 1440940032
Tue Jan  5 16:45:49 2021  commencing singleton removal, initial pass
Tue Jan  5 22:40:54 2021  memory use: 41024.0 MB
Tue Jan  5 22:40:59 2021  reading all ideals from disk
Tue Jan  5 22:41:53 2021  memory use: 35228.4 MB
Tue Jan  5 22:46:54 2021  commencing in-memory singleton removal
Tue Jan  5 22:51:35 2021  begin with 1858202621 relations and 1929946709 unique
ideals
Tue Jan  5 23:44:19 2021  reduce to 658487197 relations and 538436836 ideals in
28 passes
Tue Jan  5 23:44:19 2021  max relations containing the same ideal: 29
Tue Jan  5 23:45:56 2021  reading ideals above 720000
Tue Jan  5 23:45:58 2021  commencing singleton removal, initial pass
Wed Jan  6 03:03:10 2021  memory use: 21024.0 MB
Wed Jan  6 03:03:13 2021  reading all ideals from disk
Wed Jan  6 03:04:00 2021  memory use: 31064.1 MB
Wed Jan  6 03:08:34 2021  keeping 678560299 ideals with weight <= 200, target ex
cess is 3509174
Wed Jan  6 03:12:51 2021  commencing in-memory singleton removal
Wed Jan  6 03:15:24 2021  begin with 658487197 relations and 678560299 unique id
eals
Wed Jan  6 04:09:33 2021  reduce to 654548247 relations and 674618815 ideals in
22 passes
Wed Jan  6 04:09:33 2021  max relations containing the same ideal: 200
Wed Jan  6 04:11:48 2021  filtering wants 1000000 more relations
Wed Jan  6 04:11:48 2021  elapsed time 16:51:13
In both cases, filtering ran to completion and didn't get stuck. Where I started to run into problems was this run onwards:

Code:
Wed Jan  6 15:21:43 2021  found 535727162 hash collisions in 1974476487 relation
s
Wed Jan  6 15:21:43 2021  added 1 free relations
Wed Jan  6 15:21:43 2021  commencing duplicate removal, pass 2
Wed Jan  6 16:50:15 2021  found 9 duplicates and 1974476479 unique relations
From roughly 1.9B relations onward, it repeatedly got stuck in "commencing 2-way merge".

 2021-02-01, 19:15 #905 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 5,857 Posts It looks like reducing the large prime bound to 35 bits might be the best solution. Has this been tried yet?
2021-02-02, 00:38   #906
ryanp

Jun 2012
Boulder, CO

22×3×23 Posts

Quote:
 Originally Posted by henryzz It looks like reducing the large prime bound to 35 bits might be the best solution. Has this been tried yet?
I'm running the grep now, and will try a run - but I think frmky may have already tried this?

2021-02-03, 06:54   #907
frmky

Jul 2003
So Cal

2,083 Posts

Quote:
 Originally Posted by henryzz It looks like reducing the large prime bound to 35 bits might be the best solution. Has this been tried yet?
Yes. No success. With the start of the semester I've been very busy. Hopefully later this week or next week I'll test out some of wreck's ideas.

2021-02-03, 07:37   #908
frmky

Jul 2003
So Cal

2,083 Posts

Quote:
 Originally Posted by wreck 1. From common/filter/filter_priv.h The definition of ideal_map_t is typedef struct { uint32 payload : 30; /* offset in list of ideal_relation_t structures where the linked list of ideal_relation_t's for this ideal starts */ uint32 clique : 1; /* nonzero if this ideal can participate in a clique */ uint32 connected : 1; /* nonzero if this ideal has already been added to a clique under construction */ } ideal_map_t; the maximum value of payload is 2^30, which is about 1000M. If the ideal is more than 1000M in function purge_cliques_core(), it is possible that the filter would not work properly. Here when entering into purge_cliques_core function, the relation count is 992888838, less than 2^30, so here this 30 bit should not be the reason of the crash. 4. In function purge_cliques_core(), Line 370 ideal_map[ideal].payload = num_reverse++; the variable num_reverse is possiblly exceed 2^32, while its type is uint32.
For 4, if num_reverse exceeds 2^32, then ideal_map[ideal].payload has definitely exceeded 2^30. As a quick and dirty check of the latter, I edited filter_priv.h to

typedef struct {
uint64 payload : 32; /* offset in list of ideal_relation_t
structures where the linked list of
ideal_relation_t's for this ideal starts */
uint64 clique : 1; /* nonzero if this ideal can participate in
a clique */
uint64 connected : 1; /* nonzero if this ideal has already been
added to a clique under construction */
} ideal_map_t;

and started an overnight run. I know this isn't optimal for memory usage, but let's see if it gets us there.

 2021-02-04, 07:02 #909 frmky     Jul 2003 So Cal 2,083 Posts And that didn't work. Same crash in the same place. Code: commencing duplicate removal, pass 2 found 9 duplicates and 2074342591 unique relations memory use: 16280.0 MB reading ideals above 1549860864 commencing singleton removal, initial pass memory use: 41024.0 MB reading all ideals from disk memory use: 39309.4 MB commencing in-memory singleton removal begin with 2074342591 relations and 1985137022 unique ideals reduce to 992888838 relations and 765115141 ideals in 20 passes max relations containing the same ideal: 35 checking relations array at location 5 reading ideals above 720000 commencing singleton removal, initial pass memory use: 21024.0 MB reading all ideals from disk memory use: 46989.5 MB keeping 913886427 ideals with weight <= 200, target excess is 5352837 checking relations array at location 1 checking relations array at location 2 commencing in-memory singleton removal begin with 992888838 relations and 913886427 unique ideals reduce to 992241034 relations and 913238552 ideals in 15 passes max relations containing the same ideal: 200 checking relations array at location 5 removing 8630643 relations and 8331224 ideals in 2000000 cliques checking relations array at location 6 Loc 6: bad relation 983610390 of 983610391, num_ideals is 913238552 rel_index: 15834702, ideal_count: 36, gf2_factors: 69, connected: 156 Ideals: 885450581, 598542783, 158747510, 638930804, 786848709, 2057043263, 3845, 186587920, 18476918, 67526419, 598542783, 872055544, 2057043265, 2046824196, 3942562, 102078889, 58908383, 865042570, 2057043267, 872418055, 9125741, 85351335, 11880544, 43981132, 865042570, 873512089, 893921179, 2057043271, 2567, 93072473, 26460704, 33365801, 865042570, 517341201, 275602560, 862343378, Found 1 bad relations in array. commencing in-memory singleton removal Program received signal SIGSEGV, Segmentation fault. 0x000000000044a176 in filter_purge_singletons_core (obj=0x6de250, filter=0x7fffffffc0f0) at common/filter/singleton.c:445 445 freqtable[ideal]++; The line number is a little different since I added a simple routine to check if relation with an ideal with index greater than num_ideals exists. This code is at https://github.com/gchilders/msieve_...ieve-nfsathome So at the end of filter_purge_singletons_core(), the relation set appears fine, but something is going awry in purge_cliques_core() since at the end of the delete_relations() the last relation is bad. Edit: At this point (uint32 *)curr_relation - (uint32 *)relation_array is nearly 2^35, so those relations at the end aren't even be participating in the purge. Last fiddled with by frmky on 2021-02-04 at 07:22
 2021-02-04, 07:58 #910 Happy5214     "Alexander" Nov 2008 The Alamo City 26·32 Posts Just to be sure, can you put another check near the beginning of purge_cliques_core()? We want to rule out other code called between the singleton purge and the clique purge.
2021-02-04, 09:03   #911
frmky

Jul 2003
So Cal

2,083 Posts

Quote:
 Originally Posted by Happy5214 Just to be sure, can you put another check near the beginning of purge_cliques_core()? We want to rule out other code called between the singleton purge and the clique purge.
I added a check of the relations at the beginning and before each major loop in purge_cliques_core() and restarted a run. I'll report back tomorrow.

 2021-02-05, 05:36 #912 frmky     Jul 2003 So Cal 2,083 Posts Right before the call to delete_relations(), the relations array looks fine. At the end of delete_relations(), it's broken. I just noticed that in the loop in that function, the calculation of array_word will overflow. I don't think that should cause the issue since it's just used as a comparison to delete_array[], but next I'll make the addresses in delete_array[] 64-bit and remove the check purge_clique_core() so all relations will participate in the purge and fix this overflow.
 2021-02-06, 09:02 #913 frmky     Jul 2003 So Cal 2,083 Posts So far, so good. Gone past the place of the previous crash, and looping through clique and singleton removal now. Relations array checks are good so far.

 Similar Threads Thread Thread Starter Forum Replies Last Post Xyzzy GPU Computing 1 2017-05-17 20:22 Mark Rose GPU Computing 52 2016-07-02 12:11 firejuggler GPU Computing 12 2016-02-23 06:55 Elhueno Homework Help 5 2008-06-12 16:37 jchein1 Factoring 30 2005-05-30 14:43

All times are UTC. The time now is 09:47.

Sun Apr 18 09:47:37 UTC 2021 up 10 days, 4:28, 0 users, load averages: 1.67, 1.54, 1.49