#include "js_tokenizer.h"

Public Member Functions
	JsTokenizer (const JsTokenizerPatterns *patterns, StringPiece input)

JsKeywords::Type	NextToken (StringPiece *token_out)

bool	has_error () const

GoogleString	ParseStackForTest () const
	Return a string representing the current parse stack, for testing only.

Detailed Description

This class accurately breaks up JavaScript code into a sequence of tokens. This includes tokens for comments and whitespace; every byte of the input is represented in the token stream, so that concatenating the text of each token will perfectly recover the original input, even in error cases (since the final, error token will contain the entire rest of the input). Also, each whitespace token is classified by the tokenizer as 1) not containing linebreaks, 2) containing linebreaks but not inducing semicolon insertion, or 3) inducing semicolon insertion.

To do all this, JsTokenizer keeps track of a minimal amount of parse state to allow it to accurately differentiate between division operators and regex literals, and to determine which linebreaks will result in semicolon insertion and which will not. If the given JavaScript code is syntactically incorrect such that this differentiation becomes impossible, this class will return an error, but will still tokenize as much as it can up to that point (note however that many other kinds of syntax errors will be ignored; being a complete parser or syntax checker is a non-goal of this class).

This class can also be used to tokenize JSON. Note that a JSON object, such as {"foo":"bar"}, is NOT legal JavaScript code by itself (since, absent any context, the braces will be interpreted as a code block rather than as an object literal); however, JsTokenizer contains special logic to recognize this case and still tokenize it correctly.

This separation of tokens and classification of whitespace means that this class can be used to create a robust JavaScript minifier (see js_minify.h). It could also perhaps be used as the basis of a more complete JavaScript parser.

Constructor & Destructor Documentation

pagespeed::js::JsTokenizer::JsTokenizer	(	const JsTokenizerPatterns *	patterns,
		StringPiece	input
	)

Creates a tokenizer that will tokenize the given UTF8-encoded input string (which must outlive the JsTokenizer object).

Member Function Documentation

bool pagespeed::js::JsTokenizer::has_error ( ) const

inline

True if an error has been encountered. All future calls to NextToken() will return JsKeywords::kError with an empty token string.

JsKeywords::Type pagespeed::js::JsTokenizer::NextToken ( StringPiece * token_out )

Gets the next token type from the input, and stores the relevant substring of the original input in token_out (which must be non-NULL). If the end of input has been reached, returns kEndOfInput and sets token_out to the empty string. If an error is encountered, sets has_error() to true, returns kError, and sets token_out to the remainder of the input.

The documentation for this class was generated from the following file:

pagespeed/kernel/js/js_tokenizer.h

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation