![]() |
![]() |
#892 |
Apr 2020
3008 Posts |
![]()
How's CPU usage looking? Is msieve actually doing anything or is it just hanging? And while we're at it, how's memory usage?
|
![]() |
![]() |
![]() |
#893 |
Jul 2003
So Cal
22×11×47 Posts |
![]()
This is well outside of anything I've run. My largest has been about 1 billion relations. However, if the relations are available online to download I'll be happy to play with it.
|
![]() |
![]() |
![]() |
#894 |
"Bo Chen"
Oct 2005
Wuhan,China
2×83 Posts |
![]()
If possible , could you give it another try with unique relations less
than 1600M? As a comparison, VBCurtis done 2,2330L (gnfs 207) with 162M unique relations. And , in my memory, there is a time that fivemack finish a nfs job using relations count 720M successfully, while 800M failed (using lpb33 ). A rough guess is that sometimes ago, there is a barrier near 800M, now it jump to 1600M for some reason. |
![]() |
![]() |
![]() |
#895 |
"Curtis"
Feb 2005
Riverside, CA
124316 Posts |
![]()
No, I didn't use 162M uniques. Did you miss a zero? This job is tougher than the GNFS-207 by quite a lot, and uses bounds which are expected to require more relations (36/33 should require more than 35/34). 2e9 relations may not be enough, but is quite surely not too many.
Citing relations counts for 33-lp jobs is totally irrelevant to this job, which is using much larger bounds. The number of relations left heading into merge shows rather clearly that this is not oversieved. There is no reason to think the old msieve large-dataset bug is the culprit here. However, Charybdis' idea to cull all 36-bit-large-prime relations from the dataset and try to filter as a 33/35 job has merit. |
![]() |
![]() |
![]() |
#896 |
Jun 2012
Boulder, CO
10716 Posts |
![]()
I'm willing to try culling the 36-bit large prime relations. Would you be able to construct the "grep" command? I don't quite know the msieve relation format well enough.
|
![]() |
![]() |
![]() |
#897 | |
Apr 2020
19210 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#898 |
Jul 2003
So Cal
22·11·47 Posts |
![]()
There's definitely an msieve filtering bug. Good to have a data set that triggers it. Unfortunate that msieve needs to run for 15 hours to trigger it. Let's see what gdb says...
Code:
commencing singleton removal, initial pass memory use: 41024.0 MB reading all ideals from disk memory use: 39309.4 MB commencing in-memory singleton removal begin with 2074342591 relations and 1985137022 unique ideals reduce to 992888838 relations and 765115141 ideals in 20 passes max relations containing the same ideal: 35 reading ideals above 720000 commencing singleton removal, initial pass memory use: 21024.0 MB reading all ideals from disk memory use: 46989.5 MB keeping 913886427 ideals with weight <= 200, target excess is 5352837 commencing in-memory singleton removal begin with 992888838 relations and 913886427 unique ideals reduce to 992241034 relations and 913238552 ideals in 15 passes max relations containing the same ideal: 200 removing 8630643 relations and 8331224 ideals in 2000000 cliques commencing in-memory singleton removal [kepler-0-0:29616] *** Process received signal *** [kepler-0-0:29616] Signal: Segmentation fault (11) [kepler-0-0:29616] Signal code: Address not mapped (1) [kepler-0-0:29616] Failing at address: 0x7f001013550c [kepler-0-0:29616] [ 0] /lib64/libpthread.so.0(+0xf5e0)[0x7eff125965e0] [kepler-0-0:29616] [ 1] ./msieve993_new[0x43ffd0] [kepler-0-0:29616] [ 2] ./msieve993_new[0x463ae7] [kepler-0-0:29616] [ 3] ./msieve993_new[0x43c2fb] [kepler-0-0:29616] [ 4] ./msieve993_new[0x4288dd] [kepler-0-0:29616] [ 5] ./msieve993_new[0x415bc4] [kepler-0-0:29616] [ 6] ./msieve993_new[0x405b1b] [kepler-0-0:29616] [ 7] ./msieve993_new[0x404987] [kepler-0-0:29616] [ 8] ./msieve993_new[0x40454c] [kepler-0-0:29616] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7eff11f03c05] [kepler-0-0:29616] [10] ./msieve993_new[0x4045f2] [kepler-0-0:29616] *** End of error message *** |
![]() |
![]() |
![]() |
#899 |
"Bo Chen"
Oct 2005
Wuhan,China
2·83 Posts |
![]()
Could you compile a debug version to see
which line it crashing? |
![]() |
![]() |
![]() |
#900 |
Jul 2003
So Cal
22·11·47 Posts |
![]()
We’re in filter_purge_singletons_core() at common/filter/singleton.c:441, looping through the relations counting the number of times that each ideal occurs. But the last relation is broken! We’re at the last relation since i = num_relations-1. The ideal_list for this last relation contains entries greater than num_ideals. Don’t know why though… Probably an overflow somewhere. But where? Trying to track it down but might take a while since I don't have a lot of time to devote to this and individual tests take a day.
Code:
read 2050M relations read 2060M relations read 2070M relations found 578077506 hash collisions in 2074342600 relations commencing duplicate removal, pass 2 found 9 duplicates and 2074342591 unique relations memory use: 16280.0 MB reading ideals above 1549860864 commencing singleton removal, initial pass memory use: 41024.0 MB reading all ideals from disk memory use: 39309.4 MB commencing in-memory singleton removal begin with 2074342591 relations and 1985137022 unique ideals reduce to 992888838 relations and 765115141 ideals in 20 passes max relations containing the same ideal: 35 reading ideals above 720000 commencing singleton removal, initial pass memory use: 21024.0 MB reading all ideals from disk memory use: 46989.5 MB keeping 913886427 ideals with weight <= 200, target excess is 5352837 commencing in-memory singleton removal begin with 992888838 relations and 913886427 unique ideals reduce to 992241034 relations and 913238552 ideals in 15 passes max relations containing the same ideal: 200 removing 8630643 relations and 8331224 ideals in 2000000 cliques commencing in-memory singleton removal Program received signal SIGSEGV, Segmentation fault. 0x000000000044a178 in filter_purge_singletons_core (obj=0x6de250, filter=0x7fffffffc710) at common/filter/singleton.c:441 441 freqtable[ideal]++; Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7_4.2.x86_64 gmp-6.0.0-15.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) backtrace #0 0x000000000044a178 in filter_purge_singletons_core (obj=0x6de250, filter=0x7fffffffc710) at common/filter/singleton.c:441 #1 0x0000000000475e26 in filter_purge_cliques (obj=0x6de250, filter=0x7fffffffc710) at common/filter/clique.c:646 #2 0x0000000000443cf6 in filter_make_relsets (obj=0x6de250, filter=0x7fffffffc710, merge=0x7fffffffc6e0, min_cycles=5352837) at common/filter/filter.c:65 #3 0x000000000042f0fb in do_merge (obj=0x6de250, filter=0x7fffffffc710, merge=0x7fffffffc6e0, target_density=130) at gnfs/filter/filter.c:187 #4 0x000000000042fad0 in nfs_filter_relations (obj=0x6de250, n=0x7fffffffc960) at gnfs/filter/filter.c:411 #5 0x00000000004172ac in factor_gnfs (obj=0x6de250, input_n=0x7fffffffcb40, factor_list=0x7fffffffcbd0) at gnfs/gnfs.c:153 #6 0x0000000000404dcd in msieve_run_core (obj=0x6de250, n=0x7fffffffcb40, factor_list=0x7fffffffcbd0) at common/driver.c:158 #7 0x00000000004051b4 in msieve_run (obj=0x6de250) at common/driver.c:268 #8 0x00000000004038a4 in factor_integer ( buf=0x7fffffffd650 "38315657995194363034877423503084547947166751578940985843521212522635100246118059073205923746544331860205171086654671434719340358393954962433533212457600196112076644876654207767427267797808629935905445"..., flags=1027, savefile_name=0x0, logfile_name=0x0, nfs_fbfile_name=0x0, seed1=0x7fffffffd64c, seed2=0x7fffffffd648, max_relations=0, cpu=cpu_core, cache_size1=32768, cache_size2=20971520, num_threads=0, which_gpu=0, nfs_args=0x7fffffffdcee "target_density=130") at demo.c:235 #9 0x00000000004046bd in main (argc=4, argv=0x7fffffffd988) at demo.c:601 (gdb) info frame Stack level 0, frame at 0x7fffffffc340: rip = 0x44a178 in filter_purge_singletons_core (common/filter/singleton.c:441); saved rip 0x475e26 called by frame at 0x7fffffffc370 source language c. Arglist at 0x7fffffffc2b8, args: obj=0x6de250, filter=0x7fffffffc710 Locals at 0x7fffffffc2b8, Previous frame's sp is 0x7fffffffc340 Saved registers: rip at 0x7fffffffc338 (gdb) info locals ideal = 2057043263 i = 983610390 j = 5 freqtable = 0x7fff1d2ad010 relation_array = 0x7ff47e0f1010 curr_relation = 0x7ffc79bde3a0 old_relation = 0x7f1fd8001e8480 orig_num_ideals = 913238552 num_passes = 32767 num_relations = 983610391 num_ideals = 913238552 new_num_relations = 8630643 (gdb) print *curr_relation $2 = {rel_index = 15834702, ideal_count = 36 '$', gf2_factors = 69 'E', connected = 156 '\234', ideal_list = {885450581, 598542783, 158747510, 638930804, 786848709, 2057043263, 3845, 186587920, 18476918, 67526419, 598542783, 872055544, 2057043265, 2046824196, 3942562, 102078889, 58908383, 865042570, 2057043267, 872418055, 9125741, 85351335, 11880544, 43981132, 865042570, 873512089, 893921179, 2057043271, 2567, 93072473, 26460704, 33365801, 865042570, 517341201, 275602560, 862343378, 2057043273, 83889159, 66167424, 46818875, 59842776, 59333874, 194384291, 865042570, 172206968, 2057043276, 50334725, 905653709, 628443801, 865042570, 801305779, 869019178, 2057043277, 2046821898, 20184373, 101514515, 16353075, 87715774, 36505563, 58989284, 865042570, 598565998, 334060622, 469101029, 2057043280, 83889158, 73623668, 106612925, 359795440, 9473259, 157931537, 772472752, 2057043282, 218106376, 140592574, 157045250, 477152215, 866943502, 6146950, 41607604, 44380953, 772472752, 2057043284, 150998022, 105306193, 842728936, 7879065, 444703037, 772472752, 403730401, 2057043289, 83889414, 320662844, 329981033, 248067990, 772472752, 23316642, 631501233, 2057043290, 822087174}} (gdb) |
![]() |
![]() |
![]() |
#901 |
"Bo Chen"
Oct 2005
Wuhan,China
2·83 Posts |
![]()
After read the code (msieve r1030) about eight hours
(1767 code line read, folder common/filter, file singleton.c, clique.c, etc.), this filter problem seems not easy to solve. But here are some thinkings. 1. From common/filter/filter_priv.h The definition of ideal_map_t is typedef struct { uint32 payload : 30; /* offset in list of ideal_relation_t structures where the linked list of ideal_relation_t's for this ideal starts */ uint32 clique : 1; /* nonzero if this ideal can participate in a clique */ uint32 connected : 1; /* nonzero if this ideal has already been added to a clique under construction */ } ideal_map_t; the maximum value of payload is 2^30, which is about 1000M. If the ideal is more than 1000M in function purge_cliques_core(), it is possible that the filter would not work properly. Here when entering into purge_cliques_core function, the relation count is 992888838, less than 2^30, so here this 30 bit should not be the reason of the crash. 2. 2057043265 = 0x7A98FD41 0x3A98FD41 = 983104833 This number is near the num_relations (983610391). It is possible that the ideal_map_t.clique bit is not cleared propered in function purge_cliques_core(). But this is also a guess. 3. In function filter_purge_singletons_core(). curr_relation->ideal_count is 36, but there are 3 values in curr_relation->ideal_list is the same (865042570). curr_relation->ideal_list[17] curr_relation->ideal_list[24] curr_relation->ideal_list[32] It is a little strange. 4. In function purge_cliques_core(), Line 370 ideal_map[ideal].payload = num_reverse++; the variable num_reverse is possiblly exceed 2^32, while its type is uint32. 5. A question. Does ryanp give a try to use unique relations less than 1600M? If done, what's the result? |
![]() |
![]() |
![]() |
#902 |
"Alexander"
Nov 2008
The Alamo City
1F016 Posts |
![]()
The length of freqtable is num_ideals (line 430), and ideal (the index) is greater than that, so the array reference is out-of-bounds and thus we get the segfault. The real question is why there are so many entries in ideal_list that are above num_ideals.
Last fiddled with by Happy5214 on 2021-01-31 at 07:55 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Passive Pascal | Xyzzy | GPU Computing | 1 | 2017-05-17 20:22 |
Tesla P100 — 5.4 DP TeraFLOPS — Pascal | Mark Rose | GPU Computing | 52 | 2016-07-02 12:11 |
Nvidia Pascal, a third of DP | firejuggler | GPU Computing | 12 | 2016-02-23 06:55 |
Calculating perfect numbers in Pascal | Elhueno | Homework Help | 5 | 2008-06-12 16:37 |
Factorization attempt to a c163 - a new Odd Perfect Number roadblock | jchein1 | Factoring | 30 | 2005-05-30 14:43 |