mersenneforum.org  

Go Back   mersenneforum.org > Math Stuff > Other Mathematical Topics

Reply
 
Thread Tools
Old 2010-11-03, 13:03   #1
WraithX
 
WraithX's Avatar
 
Mar 2006

23·59 Posts
Default Looking for algorithmic help...

I've been working on a project for a while now, and I've been solving it with a brute force algorithm. As I've increased the bounds in my search, brute force is taking longer and longer. I was wondering if anyone could help me learn about an existing (or discover a new) algorithm to help me solve this problem faster.

The problem is this. I have a matrix of 1's and 0's. The number of rows is fixed, the number of columns can increase if needed/desired. I am looking for a mask, which is a combination of columns, that does not encounter a 1 in every column of the mask, for all rows in the matrix. I call the number of columns in a mask, its number of bits. If I were looking at combinations of 4 columns, I would be looking for a mask with 4 bits. I'm also trying to minimize the number of columns needed for a mask. Hmm, this is hard for me to describe. Let me try giving an example: (this isn't from my actual data, it is just a small example to convey the process)

Code:
0011010001011
1000101100100
1101010101101
1010001000010
1010010100100
0001000001000
1000000000010
0000000000000
1000101001000
1010010000001
1001100000000
1000000000000
1010101010101
0100001010010
1000001000000
0101000000100
1000000100000
1010100010100
0000000100000
1010001000000
0000000000001
1001000001000
1010101010100
0100010001000
1111111000111
0001111011111
So, that should be 26 rows and 13 columns.
From brute force:
I know that there is no mask with 1 bit, as each column has at least one 1 in it.
I also know that there is no mask with 2 bits in it (that I could tell), because every combination of 2 columns eventually has a 1 in both positions.
For the given matrix, I found a mask with 3 bits, ie, {2,3,8}, counting from the left from one. There may be (and probably are) more, but I stopped looking when I found this one.
So, the search space is described by how many columns there are, and how many bits we're looking for. Since this is a combination, I'll use the notation nCr(n,k) where n=number of columns, and k=number of bits in the mask. For the above example, there was no solution in nCr(13,1)=13. There was also no solution in nCr(13,2)=78. There is at least one solution in nCr(13,3)=286, probably many more, but one is enough.

In my actual data, there are over 30e6 rows. I started out with 64 columns, and couldn't find any masks in nCr(64,a), with 1 <= a <= 6. However, I did find several masks in nCr(64,7). I've extended my search to nCr(128,a) and have found no mask with 1 <= a <= 5. I'm going to extend my search to either nCr(128,6) or nCr(256,a), with 1 <= a <= 5. I only search one a at a time and I am looking for an a < 7.

Either way, this search is taking longer each time, and I was wondering if there is an algorithm to speed up this search? I know I've probably glossed over, or messed up, some details. Please feel free to ask for any clarification if you'd like to help me in trying to find a better algorithm to solve this. If this sounds familiar to anyone, could you let me know what this is called so I can know what to google. Thanks for your time. Any help is appreciated.
WraithX is offline   Reply With Quote
Old 2010-11-03, 13:14   #2
Chris Card
 
Chris Card's Avatar
 
Aug 2004

3×43 Posts
Default

I'm not sure what your brute-force algorithm is, but one approach would be this (not tested!):

Code:
for each row
   for each mask size you are interested in 
       find each combination of columns of this mask size with all ones and add it to set of excluded masks for this size

for each mask size
     calculate maximum number of masks
     if number of excluded masks < maximum number of masks
          find a mask not in set of excluded masks and return it

Last fiddled with by Chris Card on 2010-11-03 at 13:18
Chris Card is offline   Reply With Quote
Old 2010-11-03, 13:18   #3
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

554610 Posts
Default

Quote:
Originally Posted by WraithX View Post
Code:
0011010001011
1000101100100
1101010101101
1010001000010
1010010100100
0001000001000
1000000000010
0000000000000
1000101001000
1010010000001
1001100000000
1000000000000
1010101010101
0100001010010
1000001000000
0101000000100
1000000100000
1010100010100
0000000100000
1010001000000
0000000000001
1001000001000
1010101010100
0100010001000
1111111000111
0001111011111
...
I also know that there is no mask with 2 bits in it (that I could tell), because every
Why are columns 8 & 9 not in the set? I can't see any rows each with a one in adjacent columns there.

Last fiddled with by retina on 2010-11-03 at 13:18
retina is offline   Reply With Quote
Old 2010-11-03, 14:20   #4
axn
 
axn's Avatar
 
Jun 2003

4,637 Posts
Default

What is the measure of a "good" solution? Do you want to minimize the number of columns examined or the number of bits in the mask? Is a 32/32 solution better than a 4/64?

EDIT:- Are you trying to get a minimal covering set of SPRP bases?

Last fiddled with by axn on 2010-11-03 at 14:24
axn is offline   Reply With Quote
Old 2010-11-03, 14:22   #5
WraithX
 
WraithX's Avatar
 
Mar 2006

23×59 Posts
Default

Quote:
Originally Posted by retina View Post
Why are columns 8 & 9 not in the set? I can't see any rows each with a one in adjacent columns there.
Ummm, haha, ummm, that was a test, to see if you were paying attention. Yeah... that's the ticket. That's what I get for creating data and trying to eyeball an answer in a few minutes. Thanks for spotting that. Yes, {8,9} should be counted as a solution.
WraithX is offline   Reply With Quote
Old 2010-11-03, 14:29   #6
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

2×47×59 Posts
Default

Quote:
Originally Posted by WraithX View Post
Ummm, haha, ummm, that was a test, to see if you were paying attention. Yeah... that's the ticket. That's what I get for creating data and trying to eyeball an answer in a few minutes. Thanks for spotting that. Yes, {8,9} should be counted as a solution.
Okay.

So you can just generate all combinatorial sets to start.
Then for each row, eliminate all column sets that don't match.
If the number of remaining column sets get to zero you are done.
Else when the last row is finished you have a set of columns remaining that are valid.

If the number of columns sets is very large then you can do multiple passes with different sets on each pass.
retina is offline   Reply With Quote
Old 2010-11-03, 14:35   #7
WraithX
 
WraithX's Avatar
 
Mar 2006

23×59 Posts
Default

Quote:
Originally Posted by Chris Card View Post
I'm not sure what your brute-force algorithm is, but one approach would be this (not tested!):

Code:
for each row
   for each mask size you are interested in 
       find each combination of columns of this mask size with all ones and add it to set of excluded masks for this size

for each mask size
     calculate maximum number of masks
     if number of excluded masks < maximum number of masks
          find a mask not in set of excluded masks and return it
This might be possible for some search spaces, but when I get to nCr(128,6) ~= 5.4e9 or nCr(256,5) ~= 8.8e9, I might have a hard time storing all the solutions to see which ones have been excluded and which ones haven't. I guess 8.8 billion bits would just be 1.1GB. I have 4GB of RAM so this should be possible. I'll look to see if I can implement this.

I wonder if this will speed up the search? Currently, I'm generating a mask, then checking it against every row in the matrix. If it fails, I move on to the next mask. Stepping through masks in order is pretty quick. Hmmm, I'll have to do some calculations to see if I can get a speedup this way when I get home tonight. Thank you very much for the suggestion!
WraithX is offline   Reply With Quote
Old 2010-11-03, 14:42   #8
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

2×47×59 Posts
Default

Perhaps something like this also:

Sort the rows by population count.

Starting with the highest bit count, generate the allowable mask sets from the zero bits only.
Then progressively move through the rows matching the masks.
retina is offline   Reply With Quote
Old 2010-11-03, 14:51   #9
WraithX
 
WraithX's Avatar
 
Mar 2006

1110110002 Posts
Default

Quote:
Originally Posted by axn View Post
What is the measure of a "good" solution? Do you want to minimize the number of columns examined or the number of bits in the mask? Is a 32/32 solution better than a 4/64?

EDIT:- Are you trying to get a minimal covering set of SPRP bases?
LOL, you saw right through my veiled attempt at obfuscation. Yes, that is the problem I'm working on. I'm looking for solutions where the columns represent the primes in order (starting at 3), and the rows are all the composites < 2^64 that fool the sprp test with base 2. I want to use the smallest primes possible, although I know that's not important to the speed of the sprp test, and I want to find the fewest number of tests. So, I'm doing exhaustive searches through the groups of the first x primes after 2. I found a lot of sets in the first 64 primes in which 8 tests ({2} and 7 other primes) can correctly identify all numbers < 2^64. I'm looking for a set of 7 total, or maybe even 6 total, primes to be the covering set. 7 total would be 6 bits in the mask. 6 total would be 5 bits in the mask. I've already searched through all sets of 5 from the first 128 primes. None of those worked. I'm now going to expand that up to the first 256 primes and keep searching.

All this is assuming that Jan Feitsma's search for psp's (and sprp's) < 2^64 is complete. I haven't heard that there has been a definitive confirmation that his list is complete, but I am proceeding as if it were.
WraithX is offline   Reply With Quote
Old 2010-11-03, 15:19   #10
Chris Card
 
Chris Card's Avatar
 
Aug 2004

3·43 Posts
Default

Quote:
Originally Posted by WraithX View Post
This might be possible for some search spaces, but when I get to nCr(128,6) ~= 5.4e9 or nCr(256,5) ~= 8.8e9, I might have a hard time storing all the solutions to see which ones have been excluded and which ones haven't. I guess 8.8 billion bits would just be 1.1GB. I have 4GB of RAM so this should be possible. I'll look to see if I can implement this.

I wonder if this will speed up the search? Currently, I'm generating a mask, then checking it against every row in the matrix. If it fails, I move on to the next mask. Stepping through masks in order is pretty quick. Hmmm, I'll have to do some calculations to see if I can get a speedup this way when I get home tonight. Thank you very much for the suggestion!
I thought it might give a speed up, because for each row you only need to generate the masks for the combinations of columns with 1 set, not the combinations of all the columns.
Plus, if you store each row as sparse row (i.e. just record the column numbers which are set), you just have to generate the combinations of the column numbers. That's assuming the rows are fairly sparse of course.

Chrsi
Chris Card is offline   Reply With Quote
Old 2010-11-03, 15:20   #11
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

133468 Posts
Default

Quote:
Originally Posted by WraithX View Post
All this is assuming that Jan Feitsma's search for psp's (and sprp's) < 2^64 is complete. I haven't heard that there has been a definitive confirmation that his list is complete, but I am proceeding as if it were.
It's complete and has been double-checked on different hardware with different (independently written) software. The same algorithm was used in both cases, but that's been checked fairly carefully, as I understand. (I did not look over it myself, though.)
CRGreathouse is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Algorithmic breakthroughs davieddy PrimeNet 17 2010-07-21 00:07

All times are UTC. The time now is 17:12.

Thu Jul 16 17:12:50 UTC 2020 up 113 days, 14:45, 2 users, load averages: 1.23, 1.43, 1.53

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.