Page Speed Optimization Libraries
1.13.35.1
|
#include <cstddef>
#include "pagespeed/kernel/base/string.h"
#include "pagespeed/kernel/base/string_util.h"
Go to the source code of this file.
Classes | |
class | net_instaweb::UrlToFilenameEncoder |
Helper class for converting a URL into a filename. More... | |
Namespaces | |
net_instaweb | |
Unit-test framework for wget fetcher. | |
jmara (Joshua Marantz) ntz@ googl e.co m
URL filename encoder goals:
We need an escape-character for representing characters that are legal in URL paths, but not in filenames, such as '?'.
We can pick any legal character as an escape, as long as we escape it too. But as we have a goal of having filenames that humans can correlate with URLs, we should pick one that doesn't show up frequently in URLs. Candidates are ~`!#$%^&()-=_+{}[],. but we would prefer to avoid characters that are treated specially by tools like shells or build tools. It turns out that , is neither frequent in URLs nor special anywhere else, so we use that.
The escaping algorithm is: 1) Escape all unfriendly symbols as ,XX where XX is the hex code. 2) Add a ',' at the end (We do not allow ',' at end of any directory name, so this assures that e.g. /a and /a/b can coexist in the filesystem). 3) Go through the path segment by segment (where a segment is one directory or leaf in the path) and 3a) If the segment is empty, escape the second slash. i.e. if it was www.foo.com///<a then we escape the second / like www.foo.com/,2Fa, 3a) If it is "." or ".." prepend with ',' (so that we have a non- empty and non-reserved filename). 3b) If it is over 128 characters, break it up into smaller segments by inserting ,-/ (Windows limits paths to 128 chars, other OSes also have limits that would restrict us)
For example: URL File / /, /index.html /index.html, /. /., /a/b /a/b, /a/b/ /a/b/, /a/b/c /a/b/c, Note: no prefix problem /u?foo=bar /u,3Ffoo=bar, // /,2F, /./ /,./, /../ /,../, /, /,2C, /,./ /,2C./, /very...longname/ /very...long,-/name If very...long is about 126 long.