Page Speed Optimization Libraries
1.13.35.1
|
#include "js_tokenizer.h"
Public Member Functions | |
JsTokenizer (const JsTokenizerPatterns *patterns, StringPiece input) | |
JsKeywords::Type | NextToken (StringPiece *token_out) |
bool | has_error () const |
GoogleString | ParseStackForTest () const |
Return a string representing the current parse stack, for testing only. | |
This class accurately breaks up JavaScript code into a sequence of tokens. This includes tokens for comments and whitespace; every byte of the input is represented in the token stream, so that concatenating the text of each token will perfectly recover the original input, even in error cases (since the final, error token will contain the entire rest of the input). Also, each whitespace token is classified by the tokenizer as 1) not containing linebreaks, 2) containing linebreaks but not inducing semicolon insertion, or 3) inducing semicolon insertion.
To do all this, JsTokenizer keeps track of a minimal amount of parse state to allow it to accurately differentiate between division operators and regex literals, and to determine which linebreaks will result in semicolon insertion and which will not. If the given JavaScript code is syntactically incorrect such that this differentiation becomes impossible, this class will return an error, but will still tokenize as much as it can up to that point (note however that many other kinds of syntax errors will be ignored; being a complete parser or syntax checker is a non-goal of this class).
This class can also be used to tokenize JSON. Note that a JSON object, such as {"foo":"bar"}, is NOT legal JavaScript code by itself (since, absent any context, the braces will be interpreted as a code block rather than as an object literal); however, JsTokenizer contains special logic to recognize this case and still tokenize it correctly.
This separation of tokens and classification of whitespace means that this class can be used to create a robust JavaScript minifier (see js_minify.h). It could also perhaps be used as the basis of a more complete JavaScript parser.
pagespeed::js::JsTokenizer::JsTokenizer | ( | const JsTokenizerPatterns * | patterns, |
StringPiece | input | ||
) |
Creates a tokenizer that will tokenize the given UTF8-encoded input string (which must outlive the JsTokenizer object).
|
inline |
True if an error has been encountered. All future calls to NextToken() will return JsKeywords::kError with an empty token string.
JsKeywords::Type pagespeed::js::JsTokenizer::NextToken | ( | StringPiece * | token_out | ) |
Gets the next token type from the input, and stores the relevant substring of the original input in token_out (which must be non-NULL). If the end of input has been reached, returns kEndOfInput and sets token_out to the empty string. If an error is encountered, sets has_error() to true, returns kError, and sets token_out to the remainder of the input.