Page Speed Optimization Libraries
1.13.35.1
|
#include "html_lexer.h"
Public Member Functions | |
HtmlLexer (HtmlParse *html_parse) | |
void | StartParse (const StringPiece &id, const ContentType &content_type) |
Initialize a new parse session, id is only used for error messages. | |
void | Parse (const char *text, int size) |
void | FinishParse () |
Completes parse, reporting any leftover text as a final HtmlCharacterEvent. | |
bool | IsImplicitlyClosedTag (HtmlName::Keyword keyword) const |
Determines whether a tag should be terminated in HTML. | |
bool | TagAllowsBriefTermination (HtmlName::Keyword keyword) const |
Determines whether a tag can be terminated briefly (e.g. <tag>) | |
bool | IsOptionallyClosedTag (HtmlName::Keyword keyword) const |
Determines whether it's OK to leave a tag unclosed. | |
void | DebugPrintStack () |
Print element stack to stdout (for debugging). | |
HtmlElement * | Parent () const |
const DocType & | doctype () const |
void | set_size_limit (int64 x) |
Sets the limit on the maximum number of bytes that should be parsed. | |
bool | size_limit_exceeded () const |
Static Public Member Functions | |
static bool | IsLiteralTag (HtmlName::Keyword keyword) |
static bool | IsSometimesLiteralTag (HtmlName::Keyword keyword) |
Constructs a re-entrant HTML lexer. This lexer minimally parses tags, attributes, and comments. It is intended to parse the Wild West of the Web. It's designed to be tolerant of syntactic transgressions, merely passing through unparseable chunks as Characters.
|
inline |
Return the current assumed doctype of the document (based on the content type and any HTML directives encountered so far).
|
static |
Determines whether a tag should be interpreted as a 'literal' tag. That is, a tag whose contents are not parsed until a corresponding matching end tag is encountered.
|
static |
Determines whether a tag is interpreted as a 'literal' tag in some user agents. Since some user agents will interpret the contents of these tags, our lexer never treats them as literal tags.
HtmlElement* net_instaweb::HtmlLexer::Parent | ( | ) | const |
Returns the current lowest-level parent element in the element stack, or NULL if the stack is empty.
void net_instaweb::HtmlLexer::Parse | ( | const char * | text, |
int | size | ||
) |
Parse a chunk of text, adding events to the parser by calling html_parse_->AddEvent(...).
|
inline |
Indicates whether we have exceeded the limit on the maximum number of bytes that we should parse.