Page Speed Optimization Libraries
1.13.35.1
|
#include "fast_wildcard_group.h"
Public Member Functions | |
FastWildcardGroup (const FastWildcardGroup &src) | |
FastWildcardGroup & | operator= (const FastWildcardGroup &other) |
bool | Match (const StringPiece &str, bool allow_by_default) const |
void | Allow (const StringPiece &wildcard) |
void | Disallow (const StringPiece &wildcard) |
void | CopyFrom (const FastWildcardGroup &src) |
void | AppendFrom (const FastWildcardGroup &src) |
void | Merge (const FastWildcardGroup &src) |
Alias for use of CopyOnWrite. | |
GoogleString | Signature () const |
int | num_wildcards () const |
Return the number of configured wildcards. | |
bool | empty () const |
Static Public Attributes | |
static const int | kMinPatterns = 11 |
This forms the basis of a wildcard selection mechanism, allowing a user to issue a sequence of commands like:
This sequence would yield the following results: Match("x.cc") –> true due to rule #1 Match("c.cc") –> false due to rule #5 which overrides rule #1 Match("y.h") –> true due to rule #2 Match("a.h") –> false due to rule #3 which overrides rule #2 Match("ab.h") –> true due to rule #4 which overrides rule #3 So order matters.
Note that concurrent calls to Match(...) are permitted, but modifications must not occur concurrently (as you would expect). A note on the algorithm used here:
Wildcard matching uses an O(nm) string search algorithm, where m is pattern length and n is string length (basically we search forward for first char in the next pattern chunk, then attempt a match at that position). This is not the asymptotically efficient O(n+m) as it ignores the effects of prefixes and repeated substrings, but the wildcards that occur in PageSpeed tend to contain chunks of diverse literals and so it's good enough in practice.
WildcardGroup simply iterates through wildcards in the group, attempting to match against each one in turn.
In FastWildcardGroup we attempt a Rabin-Karp string match for a fixed-size substring of each of the wildcards. We choose the largest possible substring size for a given group (for a single wildcard pattern, this will be the length of the longest literal in the pattern; for the group, it is the minimum such length). Note that in the worst case this is a single character (we treat all-wildcard patterns specially). We track the insertion index of the latest-inserted matched pattern (so the first pattern in the set has index 0, and initially our insertion index is -1). As in Rabin-Karp we traverse the string using a rolling hash; when we encounter a hash match, we retrieve the corresponding insertion index. If it's larger than our current insertion index (the pattern would override), we retrieve the pattern and attempt to match the whole string against it. If the match succeeds we update the insertion index. Our return value is the corresponding "allow" status.
We actually optimize this a little in two ways: rather than remembering the insertion index, we actually remember the insertion index just before the next change in "allow" status (the effective index). So for example, if we insert 10 "allow" patterns in a row and then a single "deny" pattern, matching against the first "allow" pattern means that we will subsequently check only against the "deny" pattern. The second optimization builds on this: if the effective index is the last pattern in the group (always true if the group is nothing but "allow" or "deny" entries) then we can immediately return.
We use a simple vector of indexes to store the hash table, dealing with collisions by linear probing. Metadata (eg a cached hash) is stored with the patterns. We make the table size >= 2x the number of patterns so that chains don't get long, and all failed probes terminate in an empty bucket.
void net_instaweb::FastWildcardGroup::Allow | ( | const StringPiece & | wildcard | ) |
Add an expression to Allow, potentially overriding previous calls to Disallow.
void net_instaweb::FastWildcardGroup::Disallow | ( | const StringPiece & | wildcard | ) |
Add an expression to Disallow, potentially overriding previous calls to Allow.
bool net_instaweb::FastWildcardGroup::Match | ( | const StringPiece & | str, |
bool | allow_by_default | ||
) | const |
Determines whether a string is allowed by the wildcard group. If none of the wildcards in the group matches, allow_by_default is returned.
|
static |
Don't generate a hash unless there are this many non-wildcard-only patterns. Exposed for testing purposes (we can't use FRIEND_TEST here for open-source dependency reasons).