Knuth-Morris-Pratt (KMP)
Back to cs141 |
KMP is a string searching method which is much more efficient than the "naive" string searching method. The basis of the "naive" method is to start at the first symbol is the text and see if the string you are searching for occurs in that position. If it does not occur there, you move to the second symbol of the text and see if it occurs there, etc. This is not a terribly efficient method.
The naive method is inefficient because you end up trying to match the string in positions where it would be impossible for the string to occur in the first place. For instance, if your text was "aabbbbb" and your string was "aaa", it doesn't make any sense to try and make a match between your string and the part of the text which contains only 'b's.
KMP tries to avoid making these extraneous comparisons by finding patterns within the string which will allow you to skip extraneous comparisons with some parts of the text. The theory behind why KMP works is somewhat tricky to explain, but running KMP by hand is actually quite easy.
The first thing that you need to do is to create the failure function for the string that you plan to match. To do this, take your string; say..."abababb" and create the following table:
string |
greatest |
substring |
0 | 0 | |
1 | 0 | a |
2 | 0 | ab |
3 | 1 | aba |
4 | 2 | abab |
5 | 3 | ababa |
6 | 4 | ababab |
7 | 0 | abababb |
Actually, explaining the rest of it coherently on paper is evading me at the moment. Come to workshop and I'll explain it there.