View Single Post
Old 2013-02-15, 00:20   #3
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5·79 Posts
Default

I have some experience in this area, at least on the CPU. First, let me say that I hope you're in a country that does not recognize software patents. Compression is an area that is highly patent-encumbered.

Your challenge appears to be to search your string for patterns of at least four characters that have occurred before. I used the inefficient Boyer-Moore search for the longest time, because, patents. I later found that this patent had expired, and used its search method (a hash of sets of, in your case, four characters). There are other methods, like search trees, out there, I'm sure.

Now, what can you do with the GPU? You can hash sets of four characters, but that's no great feat. The trick is to take each hash and link it to what the hash matched just before - easy in series but hard in parallel. I can think of an idea or two, but this is your challenge, not mine. You may want to search for alternative parallel string searching algorithms as well.

After doing that, I added an optimization, not in the patent, to ensure that the linked lists actually point to exact matches, rather than just inexact hash matches. I think it's commented out in the JSPack code, because, patents. This seems relatively more doable on the GPU.

Good luck!

P.S. Mods, might this be better in the Programming forum?
Ken_g6 is offline   Reply With Quote