Home
News
Profile
Contact
Half-Life
Music
PCASTL
Computer Science
Videos
Readings
OpenGL
Elements
C64 sids
Links
|
Boyer-Moore
Merge Sort
Computers
|
Boyer-Moore String Search
There is my implementation of the Boyer-Moore string search algorithm in C++. The way the good-match table (skip[]) is created is the fastest I could imagine. It sure is faster than the way it's done in the example implementation I used as a starting point. It was from en.wikipedia.org. #include <climits> #define max(a,b) ((a) < (b) ? (b) : (a)) /* word: word searched * wl : length of word in characters * text: text in which we search the word * tl : length of text in characters * * Search word in text and returns the index of the first * match, returns -1 if there is no match or invalid parameters. */ int search(const char * word, int wl, const char * text, int tl) { if (wl > tl || wl <= 1 || !word || !text) return -1; int * skip = new int[wl]; // good-match table int occ[UCHAR_MAX+1]; // shifts for different characters int i; for (i = 0; i < wl; i++) skip[i] = wl; for (int ends = 0; ends < wl - 1; ends++) { /* The pattern we examine goes from indexes wl-i-1 to wl-1 * the first character of the pattern is not word[wl-i-1]. * We try to match it with the subword going from indexes * ends-i to ends. */ i = 0; bool match = true; while (match && ends-i >= 0) { match = word[ends-i] == word[wl-i-1]; if (!match) skip[wl-1-i] = wl - ends - 1; // (wl - ends - 1) is (wl-1-i) - (ends-i) i++; } if (match && ends-i < 0) { while (i < wl) { skip[wl-1-i] = wl - ends - 1; i++; } } } for (i = 0; i < UCHAR_MAX+1; i++) occ[i] = -1; for (i = 0; i < wl-1; i++) occ[word[i]] = i; int hpos = 0; while (hpos <= tl-wl) { int npos = wl-1; while (word[npos] == text[npos+hpos]) { if (npos == 0) { delete [] skip; return hpos; } npos--; } // added cast or else index < 0 can happen hpos += max(skip[npos], npos - occ[(unsigned char) text[npos+hpos]]); } delete [] skip; return -1; } Algorithm from: Robert S. Boyer and J. Strother Moore. A Fast String Searching Algorithm. Communications of the ACM, vol. 20 no 10 , pp. 762-772 (1977). |
Mobile
|