![]() |
Intelligent Text Messaging...
My cell phone has a feature called T9WORD ... not sure what it means or if it is only Canadian ... but anyway:
This feature is for Text Messaging and uses intelligence to determine what it thinks I am trying to spell. To clarify, before T9WORD if I wanted to spell: "WORD" I would press: 9 once for W 6 three times for O (m-n-o) 7 three times for R (p-q-r) 3 for D --- 8 key strokes total. With T9WORD I press 9-6-7-3 (once each - 4 key strokes). As I press each number it determines what word it could be and "guesses" that. Now if it guesses wrong and I wanted "WORE" instead of "WORD" I just need to press a key labelled NEXT (0 on my phone) to have it change to the next option for the digits pressed. So WORD becomes WORE becomes YORE and back to WORD as I press NEXT repeatedly. If the digits entered do not form a word it recognizes there are other options but that is not relevant to this puzzle. Anyway to the puzzle portion: [B][U]What digit sequence produces the longest NEXT-CHAIN, i.e. the most possible words. [/U][/B] My example above had a chain of 3 words. I just made up this puzzle so honestly I don't have the answer but I thought collectively we could find it. If you prefer we can also find the longest chain for each number of digits from 2 to n. I don't know what dictionary my phone uses so for the sake of this puzzle let me just suggest we use the Webster dictionary: [url]http://www.webster.com[/url] Now you're probably thinking: Do you accept slang words, capatilized, abbreviations, etc.? There is no money involved, only pride, so I don't want to get into big debates about whether a word is acceptable or not. If you can find it in the dictionary, it's good enough for me. |
some clarification ...
in case you don't have a phone / cell phone.
The digit 2 represents a,b, and c 3 = d,e,f 4 = g,h,i 5 = j,k,l 6 = m,n,o 7 = p,q,r,s 8 = t,u,v 9 = w,x,y,z |
[QUOTE=petrw1;119620]With T9WORD I press 9-6-7-3 (once each - 4 key strokes). As I press each number it determines what word it could be and "guesses" that. Now if it guesses wrong and I wanted "WORE" instead of "WORD" I just need to press a key labelled NEXT (0 on my phone) to have it change to the next option for the digits pressed.[/QUOTE] Used to use this feature on my phone, but it took me more time to do it that way. 2 reasons: it doesn't predict common abbrevations, I used many words and names that aren't in the simple dictionary.
As to the puzzle. I am guessing it should be around 4 letters long, unless it ends in 3-3 (ed, de, fe, ee) or 3-3-3. |
If the feature is trainable, then it would simply be a matter of taking the time to train it. I remember a program I had on my Atari which worked as a spellchecker. it started out with about 100 words and you trained it by telling it whether or not a word it flagged was correct or not. Cool idea, but the first few documents you gave it took more time to spellcheck than they did to type.
Kind of reminds me of the whole concept of computer programmers being intelligently lazy people. A little work at the beginning to cut down on work later. |
This sounds like something that would be perfect for a regular expression or 153.4 million. Generate a list of numbers up to say 9 (have to go up to 2, up to 3, up to 4 etc. all individually) digits long (with leading zeroes) in base 8 (0-7 === 2-9), then from those create a list of regular expressions, then run them all on a wordlist.
Eg. 012 (=== 234) would become the reg exp /^[a|b|c][d|e|f][g|h|i]$/i. 000000012 (=== 222222234) would become the reg exp /^[a|b|c][a|b|c][a|b|c][a|b|c][a|b|c][a|b|c][a|b|c][d|e|f][g|h|i]$/i. An alternative to creating a larger number of reg exps would be to make them more complicated, but that would drastically increase the time to run a test. Eg. 012345670 (=== 234567892) would become the reg exp /^[a|b|c]([d|e|f]([g|h|i]([j|k|l]([m|n|o]([p|q|r|s]([t|u|v]([w|x|y|z]([a|b|c])?)?)?)?)?)?)?)?$/i The major downside approaching the problem this way is that running _millions_ of regular expressions on a word list that contains 216,555 words (I don't know how many are in the webster list, so I'm going off [url=http://en.wikipedia.org/wiki/SOWPODS]SOWPODS[/url]), will take a REALLY long time and require running a splash over 33.2 trillion tests. Not to mention that longer reg exps take longer to run. The process could be sped significantly by separating out all of the two, three, four etc. letter words and only running the appropriate reg exps on them.[code]One letters: 8 RE combinations, don't know if there are any 1 letter words in the list, I can only think of "a", "I", and maybe "O". Two letters: 64 RE combinations, 121 words, 7,744 operations. Three letters: 512 RE combinations, 1,229 words, 629,248 operations. Four letters: 4,096 RE combinations, 5,155 words, 21,114,880 operations. Five letters: 32,768 RE combinations, 11,812 words, 387055616 operations. Six letters: 262,144 RE combinations, 20,964 words, 5495586816 operations. Seven letters: 2,097,152 RE combinations, 31,229 words, 65491959808 operations. Eight letters: 16,777,216 RE combinations, 38,043 words, 638255628288 operations. Nine letters: 134,217,728 RE combinations, 36,847 words, 4945520623616 operations.[/code]It seems like it would be feesable to scan up to five letter words in a relatively short time scale, but six letters and beyond is where it would get sticky. All combined, to test up to nine letter words in this way would require a little under 5.655 trillion tests. At least the memory usage could be kept small by only keeping a list of the top 10 reg exps. I should mention that I haven't really thought this through, so my numbers may well be wrong, and there are no doubt additional ways to speed up the search. However, I decided to throw together some JavaScript to see how quickly it's possible to run tests, here's the output from it.[code]To generate 64 regular expressions and check 124 words it took: 0 hours 0 mins 0.015 secs First: /^[m|n|o][m|n|o]$/i with 6 matches. Second: /^[a|b|c][g|h|i]$/i with 5 matches. Third: /^[d|e|f][d|e|f]$/i with 5 matches. Fourth: /^[g|h|i][m|n|o]$/i with 5 matches. Fifth: /^[m|n|o][d|e|f]$/i with 5 matches.[/code]A small number of words and simple regular expressions makes for a very quick completion time, but even generating the regular expressions for 5 letter words takes around 6 seconds. Oh, and though it doesn't output it, the words for the winner are "mm", "mo", "no", "om", "on" and "oo". Since I didn't spend much time on the code, it's pretty unoptimised. I might tidy it up at some point and post it on the forum if anyone is interested. Though if you seriously want to stand a chance of answering your question, I think C would be the way to go, JavaScript isn't going to set any speed records no matter how optimised. Perhaps an even faster way would be to run through all strings one character at a time and put them in a very multi-dimentional array (every character deeper adds an extra dimention). Eg. words[0] = first char abc words[0][0] = second char abc words[0][0][0] = third char abc In C this method will probably eat up quite a lot of RAM, but might turn out to be pretty speedy. Yet another method might be to split each word into an array of it's characters, then loop through and convert all the characters into the corresponding number on the phone keypad, then rejoin the array. At that point it would be a matter of comparing numbers to find the most common, and computers are pretty damn fast when it comes to numbers, especially integers. |
[QUOTE=lavalamp;119668]Yet another method might be to split each word into an array of it's characters, then loop through and convert all the characters into the corresponding number on the phone keypad, then rejoin the array.[/QUOTE]
Since the dictionary is of a reasonable size, if you can obtain a listing of its entries, it would be easiest to transform each word into its corresponding code and then count how many entries correspond to each code. Given a flat file of all the words, all of the remaining operations can be performed with simple Unix commands. |
It's possible to download the dictionary from [url=http://www.scrabulous.com/sowpods_dictionary.php]this page on Scrabulous[/url], it's just a simple word list, one word to a line.
|
The Scrabulous dictionary has 1/4 million words.
[Code] M21:Warehouse rkw$ wc -l ~/Downloads/sowpods.txt 267751 /Volumes/Home/Users/rkw/Downloads/sowpods.txt [/Code]and produces these results [Code] M21:Warehouse rkw$ cat ~/Downloads/sowpods.txt | dos2unix | awk '/[A-Z]*/ { print tolower($1), $1 }' | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ 22233344455566677778889999 | sort -k 2 | uniq -c -f 1 | sort -r -k 1 | head -n 10 14 paper 72737 13 pomp 7667 13 acres 22737 12 purer 78737 12 popes 76737 12 poker 76537 12 pawer 72937 12 pater 72837 12 gomer 46637 11 pomes 76637 M21:Warehouse rkw$ cat ~/Downloads/sowpods.txt | dos2unix | awk '/[A-Z]*/ { print tolower($1), $1 }' | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ 22233344455566677778889999 | sort -k 2 | grep " 72737$" paper 72737 papes 72737 pards 72737 parer 72737 pares 72737 pases 72737 raper 72737 rapes 72737 rarer 72737 rares 72737 raser 72737 rases 72737 sards 72737 saser 72737 [/Code] |
Heh, I have no idea what those commands mean, but WOW that code is so much shorter than what I came up with.
There's a very odd thing when I run my code, Firefox uses twice the RAM and takes 20 times as long to run it than Internet Explo[i]d[/i]er. Ah well, I guess even IE has to have [b]something[/b] going for it. Anyway, here's what mine outputs:[code]To convert all 267,751 words to their numeric counter parts: 0 hours 0 mins 3.531 secs To determine the top 5: 0 hours 0 mins 3.406 secs First: 72737 with 14 words. PAPER, PAPES, PARDS, PARER, PARES, PASES, RAPER, RAPES, RARER, RARES, RASER, RASES, SARDS, SASER Second: 22737 with 13 words. ACRES, BARDS, BARER, BARES, BARFS, BASER, BASES, CAPER, CAPES, CARDS, CARER, CARES, CASES Third: 7667 with 13 words. POMP, POMS, PONS, POOP, POOR, POOS, ROMP, ROMS, ROOP, ROOS, SOMS, SONS, SOOP Fourth: 46637 with 12 words. GOMER, GONER, GOODS, GOOFS, HOMER, HOMES, HONDS, HONER, HONES , HOODS, HOOFS, INNER Fifth: 72837 with 12 words. PATER, PATES, PAVER, PAVES, RATER, RATES, RAVER, RAVES, SATES, SAVER, SAVES, SCUDS[/code]And I also modified it to output the top 5 by length with a minimum of two words:[code]To convert all 267,751 words to their numeric counter parts: 0 hours 0 mins 3.453 secs To determine the top 5 by length (minimum 2 words): 0 hours 0 mins 3.360 secs First: 232735377637737 with 2 words and 16 characters. BEARDLESSNESSES, CEASELESSNESSES Second: 247666843727437 with 2 words and 16 characters. CHROMOTHERAPIES, CHRONOTHERAPIES Third: 268476767462427 with 2 words and 16 characters. ANTHROPOPHOBIAS, ANTHROPOPHOBICS Fourth: 333328483637737 with 2 words and 16 characters. DEFECTIVENESSES, EFFECTIVENESSES Fifth: 333378372362437 with 2 words and 16 characters. DEFERVESCENCIES, EFFERVESCENCIES [/code]With minimum of three words:[code]To convert all 267,751 words to their numeric counter parts: 0 hours 0 mins 3.469 secs To determine the top 5 by length: 0 hours 0 mins 3.281 secs First: 42774637737 with 4 words and 12 characters. GASPINESSES, GASSINESSES, HAPPINESSES, HARSHNESSES Second: 78664637737 with 4 words and 12 characters. RUMMINESSES, RUNNINESSES, STONINESSES, SUNNINESSES Third: 25663637737 with 3 words and 12 characters. ALONENESSES, ALOOFNESSES, BLONDNESSES Fourth: 28886646537 with 3 words and 12 characters. BUTTONHOLDS, BUTTONHOLER, BUTTONHOLES Fifth: 68372665464 with 3 words and 12 characters. OVERBOOKING, OVERCOOKING, OVERCOOLING[/code]And here are some other random "interesting" ones:[code]74335464 with 7 words and 9 characters. PIDDLING, PIFFLING, RIDDLING, RIFFLING, SHEELING, SIDELING, SIFFLING 7277464 with 10 words and 8 characters. PAPPING, PARPING, PARRING, PARSING, PASSING, RAPPING, RAPPINI, RASPING, SAPPING, SASSING 2666437 with 9 words and 8 characters. ANOMIES, BOMMIES, BONNIER, BONNIES, BOOMIER, BOONIES, COMMIES, CONOIDS, COOMIER[/code] |
I'm impressed how quickly the results have come in and how thorough they are!!!
By the way my phone apparently does not have that elaborate a dictionary. For example: When I enter 72737 I only get 4 of the 14: paper, rapes, rarer, rares 22737 gives me 7 of 13 (better): cases, cards, bases, acres, cares, bards, bares |
Yeah, SOWPODS is a word list that integrates lots and LOTS of words, many of which are bizarre and unusual, but nevertheless valid. Since texts are mainly used for quick and short messages the phone probably only has the most common words in it's list.
|
| All times are UTC. The time now is 15:02. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.