mersenneforum.org Looking for algorithmic help...
 Register FAQ Search Today's Posts Mark Forums Read

 2010-11-03, 13:03 #1 WraithX     Mar 2006 23·59 Posts Looking for algorithmic help... I've been working on a project for a while now, and I've been solving it with a brute force algorithm. As I've increased the bounds in my search, brute force is taking longer and longer. I was wondering if anyone could help me learn about an existing (or discover a new) algorithm to help me solve this problem faster. The problem is this. I have a matrix of 1's and 0's. The number of rows is fixed, the number of columns can increase if needed/desired. I am looking for a mask, which is a combination of columns, that does not encounter a 1 in every column of the mask, for all rows in the matrix. I call the number of columns in a mask, its number of bits. If I were looking at combinations of 4 columns, I would be looking for a mask with 4 bits. I'm also trying to minimize the number of columns needed for a mask. Hmm, this is hard for me to describe. Let me try giving an example: (this isn't from my actual data, it is just a small example to convey the process) Code: 0011010001011 1000101100100 1101010101101 1010001000010 1010010100100 0001000001000 1000000000010 0000000000000 1000101001000 1010010000001 1001100000000 1000000000000 1010101010101 0100001010010 1000001000000 0101000000100 1000000100000 1010100010100 0000000100000 1010001000000 0000000000001 1001000001000 1010101010100 0100010001000 1111111000111 0001111011111 So, that should be 26 rows and 13 columns. From brute force: I know that there is no mask with 1 bit, as each column has at least one 1 in it. I also know that there is no mask with 2 bits in it (that I could tell), because every combination of 2 columns eventually has a 1 in both positions. For the given matrix, I found a mask with 3 bits, ie, {2,3,8}, counting from the left from one. There may be (and probably are) more, but I stopped looking when I found this one. So, the search space is described by how many columns there are, and how many bits we're looking for. Since this is a combination, I'll use the notation nCr(n,k) where n=number of columns, and k=number of bits in the mask. For the above example, there was no solution in nCr(13,1)=13. There was also no solution in nCr(13,2)=78. There is at least one solution in nCr(13,3)=286, probably many more, but one is enough. In my actual data, there are over 30e6 rows. I started out with 64 columns, and couldn't find any masks in nCr(64,a), with 1 <= a <= 6. However, I did find several masks in nCr(64,7). I've extended my search to nCr(128,a) and have found no mask with 1 <= a <= 5. I'm going to extend my search to either nCr(128,6) or nCr(256,a), with 1 <= a <= 5. I only search one a at a time and I am looking for an a < 7. Either way, this search is taking longer each time, and I was wondering if there is an algorithm to speed up this search? I know I've probably glossed over, or messed up, some details. Please feel free to ask for any clarification if you'd like to help me in trying to find a better algorithm to solve this. If this sounds familiar to anyone, could you let me know what this is called so I can know what to google. Thanks for your time. Any help is appreciated.
 2010-11-03, 13:14 #2 Chris Card     Aug 2004 3×43 Posts I'm not sure what your brute-force algorithm is, but one approach would be this (not tested!): Code: for each row for each mask size you are interested in find each combination of columns of this mask size with all ones and add it to set of excluded masks for this size for each mask size calculate maximum number of masks if number of excluded masks < maximum number of masks find a mask not in set of excluded masks and return it Last fiddled with by Chris Card on 2010-11-03 at 13:18
2010-11-03, 13:18   #3
retina
Undefined

"The unspeakable one"
Jun 2006
My evil lair

554610 Posts

Quote:
 Originally Posted by WraithX Code: 0011010001011 1000101100100 1101010101101 1010001000010 1010010100100 0001000001000 1000000000010 0000000000000 1000101001000 1010010000001 1001100000000 1000000000000 1010101010101 0100001010010 1000001000000 0101000000100 1000000100000 1010100010100 0000000100000 1010001000000 0000000000001 1001000001000 1010101010100 0100010001000 1111111000111 0001111011111 ... I also know that there is no mask with 2 bits in it (that I could tell), because every
Why are columns 8 & 9 not in the set? I can't see any rows each with a one in adjacent columns there.

Last fiddled with by retina on 2010-11-03 at 13:18

 2010-11-03, 14:20 #4 axn     Jun 2003 4,637 Posts What is the measure of a "good" solution? Do you want to minimize the number of columns examined or the number of bits in the mask? Is a 32/32 solution better than a 4/64? EDIT:- Are you trying to get a minimal covering set of SPRP bases? Last fiddled with by axn on 2010-11-03 at 14:24
2010-11-03, 14:22   #5
WraithX

Mar 2006

23×59 Posts

Quote:
 Originally Posted by retina Why are columns 8 & 9 not in the set? I can't see any rows each with a one in adjacent columns there.
Ummm, haha, ummm, that was a test, to see if you were paying attention. Yeah... that's the ticket. That's what I get for creating data and trying to eyeball an answer in a few minutes. Thanks for spotting that. Yes, {8,9} should be counted as a solution.

2010-11-03, 14:29   #6
retina
Undefined

"The unspeakable one"
Jun 2006
My evil lair

2×47×59 Posts

Quote:
 Originally Posted by WraithX Ummm, haha, ummm, that was a test, to see if you were paying attention. Yeah... that's the ticket. That's what I get for creating data and trying to eyeball an answer in a few minutes. Thanks for spotting that. Yes, {8,9} should be counted as a solution.
Okay.

So you can just generate all combinatorial sets to start.
Then for each row, eliminate all column sets that don't match.
If the number of remaining column sets get to zero you are done.
Else when the last row is finished you have a set of columns remaining that are valid.

If the number of columns sets is very large then you can do multiple passes with different sets on each pass.

2010-11-03, 14:35   #7
WraithX

Mar 2006

23×59 Posts

Quote:
 Originally Posted by Chris Card I'm not sure what your brute-force algorithm is, but one approach would be this (not tested!): Code: for each row for each mask size you are interested in find each combination of columns of this mask size with all ones and add it to set of excluded masks for this size for each mask size calculate maximum number of masks if number of excluded masks < maximum number of masks find a mask not in set of excluded masks and return it
This might be possible for some search spaces, but when I get to nCr(128,6) ~= 5.4e9 or nCr(256,5) ~= 8.8e9, I might have a hard time storing all the solutions to see which ones have been excluded and which ones haven't. I guess 8.8 billion bits would just be 1.1GB. I have 4GB of RAM so this should be possible. I'll look to see if I can implement this.

I wonder if this will speed up the search? Currently, I'm generating a mask, then checking it against every row in the matrix. If it fails, I move on to the next mask. Stepping through masks in order is pretty quick. Hmmm, I'll have to do some calculations to see if I can get a speedup this way when I get home tonight. Thank you very much for the suggestion!

 2010-11-03, 14:42 #8 retina Undefined     "The unspeakable one" Jun 2006 My evil lair 2×47×59 Posts Perhaps something like this also: Sort the rows by population count. Starting with the highest bit count, generate the allowable mask sets from the zero bits only. Then progressively move through the rows matching the masks.
2010-11-03, 14:51   #9
WraithX

Mar 2006

1110110002 Posts

Quote:
 Originally Posted by axn What is the measure of a "good" solution? Do you want to minimize the number of columns examined or the number of bits in the mask? Is a 32/32 solution better than a 4/64? EDIT:- Are you trying to get a minimal covering set of SPRP bases?
LOL, you saw right through my veiled attempt at obfuscation. Yes, that is the problem I'm working on. I'm looking for solutions where the columns represent the primes in order (starting at 3), and the rows are all the composites < 2^64 that fool the sprp test with base 2. I want to use the smallest primes possible, although I know that's not important to the speed of the sprp test, and I want to find the fewest number of tests. So, I'm doing exhaustive searches through the groups of the first x primes after 2. I found a lot of sets in the first 64 primes in which 8 tests ({2} and 7 other primes) can correctly identify all numbers < 2^64. I'm looking for a set of 7 total, or maybe even 6 total, primes to be the covering set. 7 total would be 6 bits in the mask. 6 total would be 5 bits in the mask. I've already searched through all sets of 5 from the first 128 primes. None of those worked. I'm now going to expand that up to the first 256 primes and keep searching.

All this is assuming that Jan Feitsma's search for psp's (and sprp's) < 2^64 is complete. I haven't heard that there has been a definitive confirmation that his list is complete, but I am proceeding as if it were.

2010-11-03, 15:19   #10
Chris Card

Aug 2004

3·43 Posts

Quote:
 Originally Posted by WraithX This might be possible for some search spaces, but when I get to nCr(128,6) ~= 5.4e9 or nCr(256,5) ~= 8.8e9, I might have a hard time storing all the solutions to see which ones have been excluded and which ones haven't. I guess 8.8 billion bits would just be 1.1GB. I have 4GB of RAM so this should be possible. I'll look to see if I can implement this. I wonder if this will speed up the search? Currently, I'm generating a mask, then checking it against every row in the matrix. If it fails, I move on to the next mask. Stepping through masks in order is pretty quick. Hmmm, I'll have to do some calculations to see if I can get a speedup this way when I get home tonight. Thank you very much for the suggestion!
I thought it might give a speed up, because for each row you only need to generate the masks for the combinations of columns with 1 set, not the combinations of all the columns.
Plus, if you store each row as sparse row (i.e. just record the column numbers which are set), you just have to generate the combinations of the column numbers. That's assuming the rows are fairly sparse of course.

Chrsi

2010-11-03, 15:20   #11
CRGreathouse

Aug 2006

133468 Posts

Quote:
 Originally Posted by WraithX All this is assuming that Jan Feitsma's search for psp's (and sprp's) < 2^64 is complete. I haven't heard that there has been a definitive confirmation that his list is complete, but I am proceeding as if it were.
It's complete and has been double-checked on different hardware with different (independently written) software. The same algorithm was used in both cases, but that's been checked fairly carefully, as I understand. (I did not look over it myself, though.)