20101103, 13:03  #1 
Mar 2006
2^{3}·59 Posts 
Looking for algorithmic help...
I've been working on a project for a while now, and I've been solving it with a brute force algorithm. As I've increased the bounds in my search, brute force is taking longer and longer. I was wondering if anyone could help me learn about an existing (or discover a new) algorithm to help me solve this problem faster.
The problem is this. I have a matrix of 1's and 0's. The number of rows is fixed, the number of columns can increase if needed/desired. I am looking for a mask, which is a combination of columns, that does not encounter a 1 in every column of the mask, for all rows in the matrix. I call the number of columns in a mask, its number of bits. If I were looking at combinations of 4 columns, I would be looking for a mask with 4 bits. I'm also trying to minimize the number of columns needed for a mask. Hmm, this is hard for me to describe. Let me try giving an example: (this isn't from my actual data, it is just a small example to convey the process) Code:
0011010001011 1000101100100 1101010101101 1010001000010 1010010100100 0001000001000 1000000000010 0000000000000 1000101001000 1010010000001 1001100000000 1000000000000 1010101010101 0100001010010 1000001000000 0101000000100 1000000100000 1010100010100 0000000100000 1010001000000 0000000000001 1001000001000 1010101010100 0100010001000 1111111000111 0001111011111 From brute force: I know that there is no mask with 1 bit, as each column has at least one 1 in it. I also know that there is no mask with 2 bits in it (that I could tell), because every combination of 2 columns eventually has a 1 in both positions. For the given matrix, I found a mask with 3 bits, ie, {2,3,8}, counting from the left from one. There may be (and probably are) more, but I stopped looking when I found this one. So, the search space is described by how many columns there are, and how many bits we're looking for. Since this is a combination, I'll use the notation nCr(n,k) where n=number of columns, and k=number of bits in the mask. For the above example, there was no solution in nCr(13,1)=13. There was also no solution in nCr(13,2)=78. There is at least one solution in nCr(13,3)=286, probably many more, but one is enough. In my actual data, there are over 30e6 rows. I started out with 64 columns, and couldn't find any masks in nCr(64,a), with 1 <= a <= 6. However, I did find several masks in nCr(64,7). I've extended my search to nCr(128,a) and have found no mask with 1 <= a <= 5. I'm going to extend my search to either nCr(128,6) or nCr(256,a), with 1 <= a <= 5. I only search one a at a time and I am looking for an a < 7. Either way, this search is taking longer each time, and I was wondering if there is an algorithm to speed up this search? I know I've probably glossed over, or messed up, some details. Please feel free to ask for any clarification if you'd like to help me in trying to find a better algorithm to solve this. If this sounds familiar to anyone, could you let me know what this is called so I can know what to google. Thanks for your time. Any help is appreciated. 
20101103, 13:14  #2 
Aug 2004
3×43 Posts 
I'm not sure what your bruteforce algorithm is, but one approach would be this (not tested!):
Code:
for each row for each mask size you are interested in find each combination of columns of this mask size with all ones and add it to set of excluded masks for this size for each mask size calculate maximum number of masks if number of excluded masks < maximum number of masks find a mask not in set of excluded masks and return it Last fiddled with by Chris Card on 20101103 at 13:18 
20101103, 13:18  #3  
Undefined
"The unspeakable one"
Jun 2006
My evil lair
5546_{10} Posts 
Quote:
Last fiddled with by retina on 20101103 at 13:18 

20101103, 14:20  #4 
Jun 2003
4,637 Posts 
What is the measure of a "good" solution? Do you want to minimize the number of columns examined or the number of bits in the mask? Is a 32/32 solution better than a 4/64?
EDIT: Are you trying to get a minimal covering set of SPRP bases? Last fiddled with by axn on 20101103 at 14:24 
20101103, 14:22  #5 
Mar 2006
2^{3}×59 Posts 
Ummm, haha, ummm, that was a test, to see if you were paying attention. Yeah... that's the ticket. That's what I get for creating data and trying to eyeball an answer in a few minutes. Thanks for spotting that. Yes, {8,9} should be counted as a solution.

20101103, 14:29  #6  
Undefined
"The unspeakable one"
Jun 2006
My evil lair
2×47×59 Posts 
Quote:
So you can just generate all combinatorial sets to start. Then for each row, eliminate all column sets that don't match. If the number of remaining column sets get to zero you are done. Else when the last row is finished you have a set of columns remaining that are valid. If the number of columns sets is very large then you can do multiple passes with different sets on each pass. 

20101103, 14:35  #7  
Mar 2006
2^{3}×59 Posts 
Quote:
I wonder if this will speed up the search? Currently, I'm generating a mask, then checking it against every row in the matrix. If it fails, I move on to the next mask. Stepping through masks in order is pretty quick. Hmmm, I'll have to do some calculations to see if I can get a speedup this way when I get home tonight. Thank you very much for the suggestion! 

20101103, 14:42  #8 
Undefined
"The unspeakable one"
Jun 2006
My evil lair
2×47×59 Posts 
Perhaps something like this also:
Sort the rows by population count. Starting with the highest bit count, generate the allowable mask sets from the zero bits only. Then progressively move through the rows matching the masks. 
20101103, 14:51  #9  
Mar 2006
111011000_{2} Posts 
Quote:
All this is assuming that Jan Feitsma's search for psp's (and sprp's) < 2^64 is complete. I haven't heard that there has been a definitive confirmation that his list is complete, but I am proceeding as if it were. 

20101103, 15:19  #10  
Aug 2004
3·43 Posts 
Quote:
Plus, if you store each row as sparse row (i.e. just record the column numbers which are set), you just have to generate the combinations of the column numbers. That's assuming the rows are fairly sparse of course. Chrsi 

20101103, 15:20  #11 
Aug 2006
13346_{8} Posts 
It's complete and has been doublechecked on different hardware with different (independently written) software. The same algorithm was used in both cases, but that's been checked fairly carefully, as I understand. (I did not look over it myself, though.)

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Algorithmic breakthroughs  davieddy  PrimeNet  17  20100721 00:07 