Title: CSS Syntax Module Level 3
Shortname: css-syntax
Level: 3
Status: ED
Work Status: Testing
Group: csswg
ED: https://drafts.csswg.org/css-syntax/
TR: https://www.w3.org/TR/css-syntax-3/
Previous Version: https://www.w3.org/TR/2019/CR-css-syntax-3-20190716/
Previous Version: https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/
Previous Version: https://www.w3.org/TR/2013/WD-css-syntax-3-20131105/
Previous Version: https://www.w3.org/TR/2013/WD-css-syntax-3-20130919/
Editor: Tab Atkins Jr., Google, http://xanthir.com/contact/, w3cid 42199
Editor: Simon Sapin, Mozilla, http://exyr.org/about/, w3cid 58001
Abstract: This module describes, in general terms, the basic structure and syntax of CSS stylesheets. It defines, in detail, the syntax and parsing of CSS - how to turn a stream of bytes into a meaningful stylesheet.
Ignored Terms: , , , , 
Ignored Vars: +b, -b, foo

Introduction

This section is not normative. This module defines the abstract syntax and parsing of CSS stylesheets and other things which use CSS syntax (such as the HTML style attribute). It defines algorithms for converting a stream of Unicode code points (in other words, text) into a stream of CSS tokens, and then further into CSS objects such as stylesheets, rules, and declarations.

Module interactions

This module defines the syntax and parsing of CSS stylesheets. It supersedes the lexical scanner and grammar defined in CSS 2.1.

Description of CSS's Syntax

This section is not normative. A CSS document is a series of style rules-- which are qualified rules that apply styles to elements in a document-- and at-rules-- which define special processing rules or values for the CSS document. A qualified rule starts with a prelude then has a {}-wrapped block containing a sequence of declarations. The meaning of the prelude varies based on the context that the rule appears in-- for style rules, it's a selector which specifies what elements the declarations will apply to. Each declaration has a name, followed by a colon and the declaration value. Declarations are separated by semicolons.
A typical rule might look something like this:
			p > a {
				color: blue;
				text-decoration: underline;
			}
		
In the above rule, "p > a" is the selector, which, if the source document is HTML, selects any <{a}> elements that are children of a <{p}> element. "color: blue" is a declaration specifying that, for the elements that match the selector, their 'color' property should have the value ''blue''. Similarly, their 'text-decoration' property should have the value ''underline''.
At-rules are all different, but they have a basic structure in common. They start with an "@" code point followed by their name as a CSS keyword. Some at-rules are simple statements, with their name followed by more CSS values to specify their behavior, and finally ended by a semicolon. Others are blocks; they can have CSS values following their name, but they end with a {}-wrapped block, similar to a qualified rule. Even the contents of these blocks are specific to the given at-rule: sometimes they contain a sequence of declarations, like a qualified rule; other times, they may contain additional blocks, or at-rules, or other structures altogether.
Here are several examples of at-rules that illustrate the varied syntax they may contain.
@import "my-styles.css";
The ''@import'' at-rule is a simple statement. After its name, it takes a single string or ''url()'' function to indicate the stylesheet that it should import.
			@page :left {
				margin-left: 4cm;
				margin-right: 3cm;
			}
		
The ''@page'' at-rule consists of an optional page selector (the '':left'' pseudoclass), followed by a block of properties that apply to the page when printed. In this way, it's very similar to a normal style rule, except that its properties don't apply to any "element", but rather the page itself.
			@media print {
				body { font-size: 10pt }
			}
		
The ''@media'' at-rule begins with a media type and a list of optional media queries. Its block contains entire rules, which are only applied when the ''@media''s conditions are fulfilled.
Property names and at-rule names are always ident sequences, which have to start with an ident-start code point, two hyphens, or a hyphen followed by an ident-start code point, and then can contain zero or more ident code points. You can include any code point at all, even ones that CSS uses in its syntax, by escaping it. The syntax of selectors is defined in the Selectors spec. Similarly, the syntax of the wide variety of CSS values is defined in the Values & Units spec. The special syntaxes of individual at-rules can be found in the specs that define them.

Escaping

This section is not normative. Any Unicode code point can be included in an [=ident sequence=] or quoted string by escaping it. CSS escape sequences start with a backslash (\), and continue with:

Error Handling

This section is not normative. When errors occur in CSS, the parser attempts to recover gracefully, throwing away only the minimum amount of content before returning to parsing as normal. This is because errors aren't always mistakes-- new syntax looks like an error to an old parser, and it's useful to be able to add new syntax to the language without worrying about stylesheets that include it being completely broken in older UAs. The precise error-recovery behavior is detailed in the parser itself, but it's simple enough that a short description is fairly accurate. After each construct (declaration, style rule, at-rule) is parsed, the user agent checks it against its expected grammar. If it does not match the grammar, it's invalid, and gets ignored by the UA, which treats it as if it wasn't there at all.

Tokenizing and Parsing CSS

User agents must use the parsing rules described in this specification to generate the [[CSSOM]] trees from text/css resources. Together, these rules define what is referred to as the CSS parser. This specification defines the parsing rules for CSS documents, whether they are syntactically correct or not. Certain points in the parsing algorithm are said to be parse errors. The error handling for parse errors is well-defined: user agents must either act as described below when encountering such problems, or must abort processing at the first error that they encounter for which they do not wish to apply the rules described below. Conformance checkers must report at least one parse error condition to the user if one or more parse error conditions exist in the document and must not report parse error conditions if none exist in the document. Conformance checkers may report more than one parse error condition if more than one parse error condition exists in the document. Conformance checkers are not required to recover from parse errors, but if they do, they must recover in the same way as user agents.

Overview of the Parsing Model

The input to the CSS parsing process consists of a stream of Unicode code points, which is passed through a tokenization stage followed by a tree construction stage. The output is a CSSStyleSheet object. Note: Implementations that do not support scripting do not have to actually create a CSSOM CSSStyleSheet object, but the CSSOM tree in such cases is still used as the model for the rest of the specification.

The input byte stream

When parsing a stylesheet, the stream of Unicode code points that comprises the input to the tokenization stage might be initially seen by the user agent as a stream of bytes (typically coming over the network or from the local file system). If so, the user agent must decode these bytes into code points according to a particular character encoding.
To decode a |stylesheet|’s stream of bytes into a stream of code points: 1. [=Determine the fallback encoding=] of |stylesheet|, and let |fallback| be the result. 2. [=Decode=] |stylesheet|’s stream of bytes with fallback encoding |fallback|, and return the result. Note: The decode algorithm gives precedence to a byte order mark (BOM), and only uses the fallback when none is found.
To determine the fallback encoding of a |stylesheet|:
  1. If HTTP or equivalent protocol provides an |encoding label| (e.g. via the charset parameter of the Content-Type header) for the |stylesheet|, [=get an encoding=] from |encoding label|. If that does not return failure, return it.
  2. Otherwise, check |stylesheet|’s byte stream. If the first 1024 bytes of the stream begin with the hex sequence
    40 63 68 61 72 73 65 74 20 22 XX* 22 3B
    where each XX byte is a value between 016 and 2116 inclusive or a value between 2316 and 7F16 inclusive, then [=get an encoding=] from a string formed out of the sequence of XX bytes, interpreted as ASCII.
    What does that byte sequence mean? The byte sequence above, when decoded as ASCII, is the string "@charset "…";", where the "…" is the sequence of bytes corresponding to the encoding's label.
    If the return value was utf-16be or utf-16le, return utf-8; if it was anything else except failure, return it.
    Why use utf-8 when the declaration says utf-16? The bytes of the encoding declaration spell out “@charset "…";” in ASCII, but UTF-16 is not ASCII-compatible. Either you've typed in complete gibberish (like 䁣桡牳整•utf-16be∻) to get the right bytes in the document, which we don't want to encourage, or your document is actually in an ASCII-compatible encoding and your encoding declaration is lying. Either way, defaulting to UTF-8 is a decent answer. As well, this mimics the behavior of HTML's <meta charset> attribute.
    Note: Note that the syntax of an encoding declaration looks like the syntax of an at-rule named ''@charset'', but no such rule actually exists, and the rules for how you can write it are much more restrictive than they would normally be for recognizing such a rule. A number of things you can do in CSS that would produce a valid ''@charset'' rule (if one existed), such as using multiple spaces, comments, or single quotes, will cause the encoding declaration to not be recognized. This behavior keeps the encoding declaration as simple as possible, and thus maximizes the likelihood of it being implemented correctly.
  3. Otherwise, if an environment encoding is provided by the referring document, return it.
  4. Otherwise, return utf-8.
Though UTF-8 is the default encoding for the web, and many newer web-based file formats assume or require UTF-8 encoding, CSS was created before it was clear which encoding would win, and thus can't automatically assume the stylesheet is UTF-8. Stylesheet authors should author their stylesheets in UTF-8, and ensure that either an HTTP header (or equivalent method) declares the encoding of the stylesheet to be UTF-8, or that the referring document declares its encoding to be UTF-8. (In HTML, this is done by adding a <meta charset=utf-8> element to the head of the document.) If neither of these options are available, authors should begin the stylesheet with a UTF-8 BOM or the exact characters
@charset "utf-8";
Document languages that refer to CSS stylesheets that are decoded from bytes may define an environment encoding for each such stylesheet, which is used as a fallback when other encoding hints are not available or can not be used. The concept of environment encoding only exists for compatibility with legacy content. New formats and new linking mechanisms should not provide an environment encoding, so the stylesheet defaults to UTF-8 instead in the absence of more explicit information. Note: [[HTML]] defines the environment encoding for <link rel=stylesheet>. Note: [[CSSOM]] defines the environment encoding for <xml-stylesheet?>. Note: [[CSS-CASCADE-3]] defines the environment encoding for @import.

Preprocessing the input stream

The input stream consists of the [=filtered code points=] pushed into it as the input byte stream is decoded.
To filter code points from a stream of (unfiltered) [=code points=] |input|:

Tokenization

To tokenize a stream of code points into a stream of CSS tokens |input|, repeatedly [=tokenizer/consume a token=] from |input| until an <> is reached, pushing each of the returned tokens into a stream. Note: Each call to the [=tokenizer/consume a token=] algorithm returns a single token, so it can also be used "on-demand" to tokenize a stream of code points during parsing, if so desired. The output of tokenization step is a stream of zero or more of the following tokens: <ident-token>, <function-token>, <at-keyword-token>, <hash-token>, <string-token>, <bad-string-token>, <url-token>, <bad-url-token>, <delim-token>, <number-token>, <percentage-token>, <dimension-token>, <unicode-range-token>, <whitespace-token>, <CDO-token>, <CDC-token>, <colon-token>, <semicolon-token>, <comma-token>, <[-token>, <]-token>, <(-token>, <)-token>, <{-token>, and <}-token>. Note: The type flag of hash tokens is used in the Selectors syntax [[SELECT]]. Only hash tokens with the "id" type are valid ID selectors.

Token Railroad Diagrams

This section is non-normative. This section presents an informative view of the tokenizer, in the form of railroad diagrams. Railroad diagrams are more compact than an explicit parser, but often easier to read than an regular expression. These diagrams are informative and incomplete; they describe the grammar of "correct" tokens, but do not describe error-handling at all. They are provided solely to make it easier to get an intuitive grasp of the syntax of each token. Diagrams with names such as <foo-token> represent tokens. The rest are productions referred to by other diagrams.
comment
			T: /*
			Star:
				N: anything but * followed by /
			T: */
			
newline
			Choice:
				T: \n
				T: \r\n
				T: \r
				T: \f
			
whitespace
			Choice:
				T: space
				T: \t
				N: newline
			
hex digit
			N: 0-9 a-f or A-F
			
escape
			T: \
			Choice:
				N: not newline or hex digit
				Seq:
					Plus:
						N: hex digit
						C: 1-6 times
					Opt: skip
						N: whitespace
			
<>
			Plus:
				N: whitespace
			
ws*
			Star:
				N: 
			
<>
			Or: 1
				T: --
				Seq:
					Opt: skip
						T: -
					Or:
						N: a-z A-Z _ or non-ASCII
						N: escape
			Star:
				Or:
					N: a-z A-Z 0-9 _ - or non-ASCII
					N: escape
			
<>
			N: 
			T: (
			
<>
			T: @
			N: 
			
<>
			T: #
			Plus:
				Choice:
					N:a-z A-Z 0-9 _ - or non-ASCII
					N: escape
			
<>
			Choice:
				Seq:
					T: "
					Star:
						Choice:
							N: not " \ or newline
							N: escape
							Seq:
								T: \
								N: newline
					T: "
				Seq:
					T: '
					Star:
						Choice:
							N: not ' \ or newline
							N: escape
							Seq:
								T: \
								N: newline
					T: '
			
<>
			N: 
			T: (
			N: ws*
			Star:
				Choice:
					N: not " ' ( ) \ ws or non-printable
					N: escape
			N: ws*
			T: )
			
<>
			Choice: 1
				T: +
				Skip:
				T: -
			Choice:
				Seq:
					Plus:
						N: digit
					T: .
					Plus:
						N: digit
				Plus:
					N: digit
				Seq:
					T: .
					Plus:
						N: digit
			Opt: skip
				Seq:
					Choice:
						T: e
						T: E
					Choice: 1
						T: +
						S:
						T: -
					Plus:
						N: digit
			
<>
			N: 
			N: 
			
<>
			N: 
			T: %
			
<>
			T: <!--
			
<>
			T: -->
			
<>
			Choice:
				T: U
				T: u
			T: +
			Choice:
				OneOrMore:
					N: hex digit
					C: 1-6 times
				Seq:
					ZeroOrMore:
						N: hex digit
						C: 1-5 times
					OneOrMore:
						T: ?
						C: 1 to (6-digits) times
				Seq:
					OneOrMore:
						N: hex digit
						C: 1-6 times
					T: -
					OneOrMore:
						N: hex digit
						C: 1-6 times
			

Definitions

This section defines several terms used during the tokenization phase.
next input code point
The first code point in the input stream that has not yet been consumed.
current input code point
The last code point to have been consumed.
reconsume the current input code point
Push the current input code point back onto the front of the input stream, so that the next time you are instructed to consume the next input code point, it will instead reconsume the current input code point.
EOF code point
A conceptual code point representing the end of the input stream. Whenever the input stream is empty, the next input code point is always an EOF code point.
digit
A code point between U+0030 DIGIT ZERO (0) and U+0039 DIGIT NINE (9) inclusive.
hex digit
A digit, or a code point between U+0041 LATIN CAPITAL LETTER A (A) and U+0046 LATIN CAPITAL LETTER F (F) inclusive, or a code point between U+0061 LATIN SMALL LETTER A (a) and U+0066 LATIN SMALL LETTER F (f) inclusive.
uppercase letter
A code point between U+0041 LATIN CAPITAL LETTER A (A) and U+005A LATIN CAPITAL LETTER Z (Z) inclusive.
lowercase letter
A code point between U+0061 LATIN SMALL LETTER A (a) and U+007A LATIN SMALL LETTER Z (z) inclusive.
letter
An uppercase letter or a lowercase letter.
non-ASCII ident code point
A code point whose value is any of: * U+00B7 * between U+00C0 and U+00D6 * between U+00D8 and U+00F6 * between U+00F8 and U+037D * between U+037F and U+1FFF * U+200C * U+200D * U+203F * U+2040 * between U+2070 and U+218F * between U+2C00 and U+2FEF * between U+3001 and U+D7FF * between U+F900 and U+FDCF * between U+FDF0 and U+FFFD * greater than or equal to U+10000 All of these ranges are inclusive.
Why these character, specifically? This matches the list of non-ASCII codepoints allowed to be used in HTML [=valid custom element names=]. It excludes a number of characters that appear as whitespace, or that can cause rendering or parsing issues in some tools, such as the direction override codepoints. Note that this is a weaker set of restrictions than UAX 31 recommends for identifiers (used by languages such as JavaScript to restrict their identifier syntax), allowing things such as starting an identifier with a combining character. Consistency with HTML custom element names (and thus, the ability to write selectors for all custom elements without having to use escapes) was considered valuable, and the set of characters restricted by HTML covers the "high value" restrictions well. These restrictions do not avoid all possible confusing renderings; mixing characters from LTR and RTL scripts can still result in unexpected visual transposition in most text editors, for example. Source text can contain the restricted characters in non-ident contexts, as well: most of them are completely valid in strings, for example. Even when used in a way that creates invalid CSS, the parsing errors they cause might be limited to something unimportant, while their effect on rendering the source text in code review tools might be significant and/or malicious. For more details on these sorts of "source text attacks", see this Rust-lang blog post (archived).
ident-start code point
A letter, a non-ASCII ident code point, or U+005F LOW LINE (_).
ident code point
An ident-start code point, a digit, or U+002D HYPHEN-MINUS (-).
non-printable code point
A code point between U+0000 NULL and U+0008 BACKSPACE inclusive, or U+000B LINE TABULATION, or a code point between U+000E SHIFT OUT and U+001F INFORMATION SEPARATOR ONE inclusive, or U+007F DELETE.
newline
U+000A LINE FEED. Note that U+000D CARRIAGE RETURN and U+000C FORM FEED are not included in this definition, as they are converted to U+000A LINE FEED during preprocessing.
whitespace
A newline, U+0009 CHARACTER TABULATION, or U+0020 SPACE.
maximum allowed code point
The greatest code point defined by Unicode: U+10FFFF.
ident sequence
A sequence of [=code points=] that has the same syntax as an <>. Note: The part of an <> after the "@", the part of a <> (with the "id" type flag) after the "#", the part of a <> before the "(", and the unit of a <> are all [=ident sequences=].

Tokenizer Algorithms

The algorithms defined in this section transform a stream of code points into a stream of tokens.

Consume a token

This section describes how to consume a token from a stream of code points. It additionally takes an optional boolean |unicode ranges allowed|, defaulting to false. It will return a single token of any type. Consume comments. Consume the next input code point.
whitespace
Consume as much whitespace as possible. Return a <>.
U+0022 QUOTATION MARK (")
Consume a string token and return it.
U+0023 NUMBER SIGN (#)
If the next input code point is an ident code point or the next two input code points are a valid escape, then:
  1. Create a <>.
  2. If the next 3 input code points would start an ident sequence, set the <>’s type flag to "id".
  3. Consume an ident sequence, and set the <>’s value to the returned string.
  4. Return the <>.
Otherwise, return a <> with its value set to the current input code point.
U+0027 APOSTROPHE (')
Consume a string token and return it.
U+0028 LEFT PARENTHESIS (()
Return a <(-token>.
U+0029 RIGHT PARENTHESIS ())
Return a <)-token>.
U+002B PLUS SIGN (+)
If the input stream starts with a number, reconsume the current input code point, consume a numeric token, and return it. Otherwise, return a <> with its value set to the current input code point.
U+002C COMMA (,)
Return a <>.
U+002D HYPHEN-MINUS (-)
If the input stream starts with a number, reconsume the current input code point, consume a numeric token, and return it. Otherwise, if the next 2 input code points are U+002D HYPHEN-MINUS U+003E GREATER-THAN SIGN (->), consume them and return a <>. Otherwise, if the input stream starts with an ident sequence, reconsume the current input code point, consume an ident-like token, and return it. Otherwise, return a <> with its value set to the current input code point.
U+002E FULL STOP (.)
If the input stream starts with a number, reconsume the current input code point, consume a numeric token, and return it. Otherwise, return a <> with its value set to the current input code point.
U+003A COLON (:)
Return a <>.
U+003B SEMICOLON (;)
Return a <>.
U+003C LESS-THAN SIGN (<)
If the next 3 input code points are U+0021 EXCLAMATION MARK U+002D HYPHEN-MINUS U+002D HYPHEN-MINUS (!--), consume them and return a <>. Otherwise, return a <> with its value set to the current input code point.
U+0040 COMMERCIAL AT (@)
If the next 3 input code points would start an ident sequence, consume an ident sequence, create an <> with its value set to the returned value, and return it. Otherwise, return a <> with its value set to the current input code point.
U+005B LEFT SQUARE BRACKET ([)
Return a <[-token>.
U+005C REVERSE SOLIDUS (\)
If the input stream starts with a valid escape, reconsume the current input code point, consume an ident-like token, and return it. Otherwise, this is a parse error. Return a <> with its value set to the current input code point.
U+005D RIGHT SQUARE BRACKET (])
Return a <]-token>.
U+007B LEFT CURLY BRACKET ({)
Return a <{-token>.
U+007D RIGHT CURLY BRACKET (})
Return a <}-token>.
digit
Reconsume the current input code point, consume a numeric token, and return it.
U+0055 LATIN CAPITAL LETTER U (U)
u+0075 LATIN LOWERCASE LETTER U (u)
If |unicode ranges allowed| is true and the input stream [=would start a unicode-range=], [=reconsume the current input code point=], [=consume a unicode-range token=], and return it. Otherwise, [=reconsume the current input code point=], [=consume an ident-like token=], and return it.
ident-start code point
Reconsume the current input code point, consume an ident-like token, and return it.
EOF
Return an <>.
anything else
Return a <> with its value set to the current input code point.

Consume comments

This section describes how to consume comments from a stream of code points. It returns nothing. If the next two input code point are U+002F SOLIDUS (/) followed by a U+002A ASTERISK (*), consume them and all following code points up to and including the first U+002A ASTERISK (*) followed by a U+002F SOLIDUS (/), or up to an EOF code point. Return to the start of this step. If the preceding paragraph ended by consuming an EOF code point, this is a parse error. Return nothing.

Consume a numeric token

This section describes how to consume a numeric token from a stream of code points. It returns either a <>, <>, or <>. Consume a number and let |number| be the result. If the next 3 input code points would start an ident sequence, then:
  1. Create a <> with the same value, type flag, and sign character as |number|, and a unit set initially to the empty string.
  2. Consume an ident sequence. Set the <>’s unit to the returned value.
  3. Return the <>.
Otherwise, if the next input code point is U+0025 PERCENTAGE SIGN (%), consume it. Create a <> with the same value and sign character as |number|, and return it. Otherwise, create a <> with the same value, type flag, and sign character as |number|, and return it.

Consume an ident-like token

This section describes how to consume an ident-like token from a stream of code points. It returns an <>, <>, <>, or <>. Consume an ident sequence, and let |string| be the result. If |string|’s value is an ASCII case-insensitive match for "url", and the next input code point is U+0028 LEFT PARENTHESIS ((), consume it. While the next two input code points are whitespace, consume the next input code point. If the next one or two input code points are U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), or whitespace followed by U+0022 QUOTATION MARK (") or U+0027 APOSTROPHE ('), then create a <> with its value set to |string| and return it. Otherwise, consume a url token, and return it. Otherwise, if the next input code point is U+0028 LEFT PARENTHESIS ((), consume it. Create a <> with its value set to |string| and return it. Otherwise, create an <> with its value set to |string| and return it.

Consume a string token

This section describes how to consume a string token from a stream of code points. It returns either a <> or <>. This algorithm may be called with an ending code point, which denotes the code point that ends the string. If an ending code point is not specified, the current input code point is used. Initially create a <> with its value set to the empty string. Repeatedly consume the next input code point from the stream:
ending code point
Return the <>.
EOF
This is a parse error. Return the <>.
newline
This is a parse error. Reconsume the current input code point, create a <>, and return it.
U+005C REVERSE SOLIDUS (\)
If the next input code point is EOF, do nothing. Otherwise, if the next input code point is a newline, consume it. Otherwise, (the stream starts with a valid escape) consume an escaped code point and append the returned code point to the <>’s value.
anything else
Append the current input code point to the <>’s value.

Consume a url token

This section describes how to consume a url token from a stream of code points. It returns either a <> or a <>. Note: This algorithm assumes that the initial "url(" has already been consumed. This algorithm also assumes that it's being called to consume an "unquoted" value, like ''url(foo)''. A quoted value, like ''url("foo")'', is parsed as a <>. Consume an ident-like token automatically handles this distinction; this algorithm shouldn't be called directly otherwise.
  1. Initially create a <> with its value set to the empty string.
  2. Consume as much whitespace as possible.
  3. Repeatedly consume the next input code point from the stream:
    U+0029 RIGHT PARENTHESIS ())
    Return the <>.
    EOF
    This is a parse error. Return the <>.
    whitespace
    Consume as much whitespace as possible. If the next input code point is U+0029 RIGHT PARENTHESIS ()) or EOF, consume it and return the <> (if EOF was encountered, this is a parse error); otherwise, consume the remnants of a bad url, create a <>, and return it.
    U+0022 QUOTATION MARK (")
    U+0027 APOSTROPHE (')
    U+0028 LEFT PARENTHESIS (()
    non-printable code point
    This is a parse error. Consume the remnants of a bad url, create a <>, and return it.
    U+005C REVERSE SOLIDUS (\)
    If the stream starts with a valid escape, consume an escaped code point and append the returned code point to the <>’s value. Otherwise, this is a parse error. Consume the remnants of a bad url, create a <>, and return it.
    anything else
    Append the current input code point to the <>’s value.

Consume an escaped code point

This section describes how to consume an escaped code point. It assumes that the U+005C REVERSE SOLIDUS (\) has already been consumed and that the next input code point has already been verified to be part of a valid escape. It will return a code point. Consume the next input code point.
hex digit
Consume as many hex digits as possible, but no more than 5. Note that this means 1-6 hex digits have been consumed in total. If the next input code point is whitespace, consume it as well. Interpret the hex digits as a hexadecimal number. If this number is zero, or is for a surrogate, or is greater than the maximum allowed code point, return U+FFFD REPLACEMENT CHARACTER (�). Otherwise, return the code point with that value.
EOF
This is a parse error. Return U+FFFD REPLACEMENT CHARACTER (�).
anything else
Return the current input code point.

Check if two code points are a valid escape

This section describes how to check if two code points are a valid escape. The algorithm described here can be called explicitly with two code points, or can be called with the input stream itself. In the latter case, the two code points in question are the current input code point and the next input code point, in that order. Note: This algorithm will not consume any additional code point. If the first code point is not U+005C REVERSE SOLIDUS (\), return false. Otherwise, if the second code point is a newline, return false. Otherwise, return true.

Check if three code points would start an ident sequence

This section describes how to check if three code points would start an [=ident sequence=]. The algorithm described here can be called explicitly with three code points, or can be called with the input stream itself. In the latter case, the three code points in question are the current input code point and the next two input code points, in that order. Note: This algorithm will not consume any additional code points. Look at the first code point:
U+002D HYPHEN-MINUS
If the second code point is an ident-start code point or a U+002D HYPHEN-MINUS, or the second and third code points are a valid escape, return true. Otherwise, return false.
ident-start code point
Return true.
U+005C REVERSE SOLIDUS (\)
If the first and second code points are a valid escape, return true. Otherwise, return false.
anything else
Return false.

Check if three code points would start a number

This section describes how to check if three code points would start a number. The algorithm described here can be called explicitly with three code points, or can be called with the input stream itself. In the latter case, the three code points in question are the current input code point and the next two input code points, in that order. Note: This algorithm will not consume any additional code points. Look at the first code point:
U+002B PLUS SIGN (+)
U+002D HYPHEN-MINUS (-)
If the second code point is a digit, return true. Otherwise, if the second code point is a U+002E FULL STOP (.) and the third code point is a digit, return true. Otherwise, return false.
U+002E FULL STOP (.)
If the second code point is a digit, return true. Otherwise, return false.
digit
Return true.
anything else
Return false.

Check if three code points would start a unicode-range

This section describes how to check if three code points would start a unicode-range. The algorithm described here can be called explicitly with three code points, or can be called with the input stream itself. In the latter case, the three code points in question are the current input code point and the next two input code points, in that order. Note: This algorithm will not consume any additional [=code points=]. If all of the following are true: * The first code point is either U+0055 LATIN CAPITAL LETTER U (U) or U+0075 LATIN SMALL LETTER U (u) * The second code point is U+002B PLUS SIGN (+). * The third code point is either U+003F QUESTION MARK (?) or a [=hex digit=] then return true. Otherwise return false.

Consume an ident sequence

This section describes how to consume an ident sequence from a stream of code points. It returns a string containing the largest name that can be formed from adjacent code points in the stream, starting from the first. Note: This algorithm does not do the verification of the first few code points that are necessary to ensure the returned code points would constitute an <>. If that is the intended use, ensure that the stream starts with an ident sequence before calling this algorithm. Let result initially be an empty string. Repeatedly consume the next input code point from the stream:
ident code point
Append the code point to result.
the stream starts with a valid escape
Consume an escaped code point. Append the returned code point to result.
anything else
Reconsume the current input code point. Return result.

Consume a number

This section describes how to consume a number from a stream of code points. It returns a numeric |value|, a string |type| which is either "integer" or "number", and an optional |sign character| which is either "+", "-", or missing. Note: This algorithm does not do the verification of the first few code points that are necessary to ensure a number can be obtained from the stream. Ensure that the stream starts with a number before calling this algorithm. Execute the following steps in order:
  1. Let |type| be the string "integer". Let |number part| and |exponent part| be the empty string.
  2. If the next input code point is U+002B PLUS SIGN (+) or U+002D HYPHEN-MINUS (-), consume it. Append it to |number part| and set |sign character| to it.
  3. While the next input code point is a digit, consume it and append it to |number part|.
  4. If the next 2 input code points are U+002E FULL STOP (.) followed by a digit, then:
    1. Consume the [=next input code point=] and append it to |number part|.
    2. While the next input code point is a digit, consume it and append it to |number part|.
    3. Set |type| to "number".
  5. If the next 2 or 3 input code points are U+0045 LATIN CAPITAL LETTER E (E) or U+0065 LATIN SMALL LETTER E (e), optionally followed by U+002D HYPHEN-MINUS (-) or U+002B PLUS SIGN (+), followed by a digit, then:
    1. Consume the [=next input code point=].
    2. If the [=next input code point=] is "+" or "-", consume it and append it to |exponent part|.
    3. While the next input code point is a digit, consume it and append it to |exponent part|.
    4. Set |type| to "number".
  6. Let |value| be the result of interpreting |number part| as a base-10 number. If |exponent part| is non-empty, interpret it as a base-10 integer, then raise 10 to the power of the result, multiply it by |value|, and set |value| to that result.
  7. Return |value|, |type|, and |sign character|.

Consume a unicode-range token

This section describes how to consume a unicode-range token from a stream of [=code points=]. It returns a <>. Note: This algorithm does not do the verification of the first few code points that are necessary to ensure the returned code points would constitute an <>. Ensure that the stream [=would start a unicode-range=] before calling this algorithm. Note: This token is not produced by the tokenizer under normal circumstances. This algorithm is only called during [=consume the value of a unicode-range descriptor=], which itself is only called as a special case for parsing the '@font-face/unicode-range' descriptor; this single invocation in the entire language is due to a bad syntax design in early CSS. 1. Consume the [=next input code point|next two input code points=] and discard them. 2. Consume as many [=hex digits=] as possible, but no more than 6. If less than 6 hex digits were consumed, consume as many U+003F QUESTION MARK (?) code points as possible, but no more than enough to make the total of hex digits and U+003F QUESTION MARK (?) code points equal to 6. Let |first segment| be the consumed code points. 3. If |first segment| contains any question mark code points, then: 1. Replace the question marks in |first segment| with U+0030 DIGIT ZERO (0) [=code points=], and interpret the result as a hexadecimal number. Let this be |start of range|. 2. Replace the question marks in |first segment| with U+0046 LATIN CAPITAL LETTER F (F) [=code points=], and interpret the result as a hexadecimal number. Let this be |end of range|. 3. Return a new <> starting at |start of range| and ending at |end of range|. 4. Otherwise, interpret |first segment| as a hexadecimal number, and let the result be |start of range|. 5. If the [=next input code point|next 2 input code points=] are U+002D HYPHEN-MINUS (-) followed by a [=hex digit=], then: 1. Consume the [=next input code point=]. 2. Consume as many [=hex digits=] as possible, but no more than 6. Interpret the consumed code points as a hexadecimal number. Let this be |end of range|. 3. Return a new <> starting at |start of range| and ending at |end of range|. 6. Otherwise, return a new <> both starting and ending at |start of range|.

Consume the remnants of a bad url

This section describes how to consume the remnants of a bad url from a stream of code points, "cleaning up" after the tokenizer realizes that it's in the middle of a <> rather than a <>. It returns nothing; its sole use is to consume enough of the input stream to reach a recovery point where normal tokenizing can resume. Repeatedly consume the next input code point from the stream:
U+0029 RIGHT PARENTHESIS ())
EOF
Return.
the input stream starts with a valid escape
Consume an escaped code point. This allows an escaped right parenthesis ("\)") to be encountered without ending the <>. This is otherwise identical to the "anything else" clause.
anything else
Do nothing.

Parsing

The CSS parser converts a [=token stream=] (produced by the tokenization process, defined earlier in this spec) into one or more of several CSS constructs (depending on which parsing algorithm is invoked).

Parser Railroad Diagrams

This section is non-normative. This section presents an informative view of the parser, in the form of railroad diagrams. These diagrams are informative and incomplete; they describe the grammar of "correct" stylesheets, but do not describe error-handling at all. They are provided solely to make it easier to get an intuitive grasp of the syntax.
Stylesheet
			Star:
				Choice: 3
					N: 
					N: 
					N: 
					N: Qualified rule
					N: At-rule
			
At-rule
			N: 
			Star:
				N: Component value
			Choice:
				N: {} block
				T: ;
			
Qualified rule
			Star:
				N: Component value
			N: {} block
			
{} block
			T: {
			N: ws*
			Star:
				Choice:
					Seq:
						N: Declaration
						T: ;
					N: At-rule
					N: Qualified rule
				N: ws*
			N: ws*
			T: }
			
Declaration
			N: 
			N: ws*
			T: :
			Star:
				N: Component value
			Opt: skip
				N: !important
			
!important
			T: !
			N: ws*
			N: 
			N: ws*
			
Component value
			Choice:
				N: Preserved token
				N: Simple block
				N: Function block
			
Simple block
			Choice:
				Seq:
					T: {
					Star:
						N: Component value
					T: }
				Seq:
					T: (
					Star:
						N: Component value
					T: )
				Seq:
					T: [
					Star:
						N: Component value
					T: ]
			
Function block
			N: 
			Star:
				N: Component value
			T: )
			

CSS Parsing Results

The result of parsing can be any of the following (or lists of these):
: stylesheet :: A stylesheet has a list of [=rules=]. : rule :: A [=rule=] is either an [=at-rule=] or a [=qualified rule=].
[=at-rule=]
An [=at-rule=] has a name which is a [=string=], a prelude consisting of a list of [=component values=]. [=Block at-rules=] (ending in a {}-block) will additionally have a list of [=declarations=] and a list of child [=rules=]. ([=Statement at-rules=], ending in a semicolon, will not.)
qualified rule
A qualified rule has a prelude consisting of a list of component values, a list of declarations, and a list of child rules. Note: Most qualified rules will be style rules, where the prelude is a selector [[selectors-4]] and its declarations are [=properties=].
declaration
A declaration has a name which is a [=string=], a value consisting of a list of [=component values=], and an important flag which is initially unset. It also has an optional |original text| which is a [=string=] (captured for only a few declarations). Declarations are further categorized as property declarations or descriptor declarations, with the former setting CSS [=properties=] and appearing most often in qualified rules and the latter setting CSS [=descriptors=], which appear only in at-rules. (This categorization does not occur at the Syntax level; instead, it is a product of where the declaration appears, and is defined by the respective specifications defining the given rule.)
component value
A component value is one of the [=preserved tokens=], a [=function=], or a [=simple block=].
preserved tokens
Any token produced by the tokenizer except for <>s, <{-token>s, <(-token>s, and <[-token>s. Note: The non-[=preserved tokens=] listed above are always consumed into higher-level objects, either functions or simple blocks, and so never appear in any parser output themselves. Note: The tokens <}-token>s, <)-token>s, <]-token>, <>, and <> are always parse errors, but they are preserved in the token stream by this specification to allow other specs, such as Media Queries, to define more fine-grained error-handling than just dropping an entire declaration or block.
function
A function has a name and a value consisting of a list of [=component values=].
simple block
{}-block
[]-block
()-block
A simple block has an associated token (either a <[-token>, <(-token>, or <{-token>) and a value consisting of a list of component values. [={}-block=], [=[]-block=], and [=()-block=] refer specifically to a [=simple block=] with that corresponding associated token.

Token Streams

A token stream is a [=struct=] representing a stream of [=tokens=] and/or [=component values=]. It has the following [=struct/items=]:
: tokens :: A [=list=] of [=tokens=] and/or [=component values=]. Note: This specification assumes, for simplicity, that the input stream has been fully tokenized before parsing begins. However, the parsing algorithms only use one token of "lookahead", so in practice tokenization and parsing can be done in lockstep. : index :: An index into the [=token stream/tokens=], representing the progress of parsing. It starts at 0 initially. Note: Aside from [=token stream/marking=], the [=token stream/index=] never goes backwards. Thus the already-processed prefix of [=token stream/tokens=] can be eagerly discarded as it's processed. : marked indexes :: A [=stack=] of index values, representing points that the parser might return to. It starts empty initially.
CSS has a small number of places that require referencing the precise text that was parsed for a declaration's value (not just what tokens were produced from that text). This is not explicitly described in the algorithmic structure here, but the [=token stream=] must, thus, have the ability to reproduce the original text of declarations on demand. See [=consume a declaration=] for details on when this is required. Several operations can be performed on a [=token stream=]:
: next token :: The item of [=token stream/tokens=] at [=token stream/index=]. If that index would be out-of-bounds past the end of the list, it's instead an <>. : empty :: A token stream is [=token stream/empty=] if the [=next token=] is an <>. : consume a token :: Let |token| be the [=token stream/next token=]. Increment [=token stream/index=], then return |token|. : discard a token :: If the [=token stream=] is not [=empty=], increment [=token stream/index=]. : mark :: Append [=token stream/index=] to [=token stream/marked indexes=]. : restore a mark :: [=stack/Pop=] from [=token stream/marked indexes=], and set [=token stream/index=] to the popped value. : discard a mark :: [=stack/Pop=] from [=token stream/marked indexes=], and do nothing with the popped value. : discard whitespace :: While the [=next token=] is a <>, [=discard a token=]. : process :: To [=token stream/process=], given a following list of token types and associated actions, perform the action associated with the [=next token=]. Repeat until one of the actions returns something, then return that.
An <> is a conceptual token, not actually produced by the tokenizer, used to indicate that the [=token stream=] has been exhausted.

Parser Entry Points

The algorithms defined in this section produce high-level CSS objects from lists of CSS tokens.
The algorithms here operate on a token stream as input, but for convenience can also be invoked with a number of other value types. To normalize into a token stream a given |input|: 1. If |input| is already a [=token stream=], return it. 2. If |input| is a list of CSS [=tokens=] and/or [=component values=], create a new [=token stream=] with |input| as its [=token stream/tokens=], and return it. 3. If |input| is a [=string=], then [=filter code points=] from |input|, [=tokenize=] the result, then create a new [=token stream=] with those tokens as its [=token stream/tokens=], and return it. 4. Assert: Only the preceding types should be passed as |input|.
Note: Other specs can define additional entry points for their own purposes.
The following notes should probably be translated into normative text in the relevant specs, hooking this spec's terms:
  • "Parse a stylesheet" is intended to be the normal parser entry point, for parsing stylesheets.
  • "Parse a stylesheet's contents" is intended for use by the {{CSSStyleSheet/replace()|CSSStyleSheet replace()}} method, and similar, which parse text into the contents of an existing stylesheet.
  • "Parse a rule" is intended for use by the {{CSSStyleSheet/insertRule()|CSSStyleSheet insertRule()}} method, and similar, which parse text into a single rule. CSSStyleSheet#insertRule method, and similar functions which might exist, which parse text into a single rule.
  • "Parse a declaration" is used in ''@supports'' conditions. [[CSS3-CONDITIONAL]]
  • "Parse a block's contents" is intended for parsing the contents of any block in CSS (including things like the style attribute), and APIs such as {{CSSStyleDeclaration/cssText|the CSSStyleDeclaration cssText attribute}}.
  • "Parse a component value" is for things that need to consume a single value, like the parsing rules for ''attr()''.
  • "Parse a list of component values" is for the contents of presentational attributes, which parse text into a single declaration's value, or for parsing a stand-alone selector [[SELECT]] or list of Media Queries [[MEDIAQ]], as in Selectors API or the media HTML attribute.

Parse something according to a CSS grammar

It is often desirable to parse a string or token list to see if it matches some CSS grammar, and if it does, to destructure it according to the grammar. This section provides a generic hook for this kind of operation. It should be invoked like "parse foo as a CSS <>", or similar. This algorithm returns either failure, if the input does not match the provided grammar, or the result of parsing the input according to the grammar, which is an unspecified structure corresponding to the provided grammar specification. The return value must only be interacted with by specification prose, where the representation ambiguity is not problematic. If it is meant to be exposed outside of spec language, the spec using the result must explicitly translate it into a well-specified representation, such as, for example, by invoking a CSS serialization algorithm (like "serialize as a CSS <> value"). Note: This algorithm, and [=parse a comma-separated list according to a CSS grammar=], are usually the only parsing algorithms other specs will want to call. The remaining parsing algorithms are meant mostly for [[CSSOM]] and related "explicitly constructing CSS structures" cases. Consult the CSSWG for guidance first if you think you need to use one of the other algorithms.
To parse something according to a CSS grammar (aka simply [=CSS/parse=]) given an |input| and a CSS |grammar| production:
  1. [=Normalize=] |input|, and set |input| to the result.
  2. Parse a list of component values from |input|, and let result be the return value.
  3. Attempt to match result against |grammar|. If this is successful, return the matched result; otherwise, return failure.

Parse a comma-separated list according to a CSS grammar

While one can definitely [=CSS/parse=] a value according to a grammar with commas in it, if any part of the value fails to parse, the entire thing doesn't parse, and returns failure. Sometimes that's what's desired (such as in list-valued CSS properties); other times, it's better to let each comma-separated sub-part of the value parse separately, dealing with the parts that parse successfully one way, and the parts that fail to parse another way (typically ignoring them, such as in <{img/sizes|<img sizes>}>). This algorithm provides an easy hook to accomplish exactly that. It returns a list of values split by "top-level" commas, where each values is either failure (if it failed to parse) or the result of parsing (an unspecified structure, as described in the [=CSS/parse=] algorithm).
To parse a comma-separated list according to a CSS grammar (aka [=CSS/parse a list=]) given an |input| and a CSS |grammar| production:
  1. [=Normalize=] |input|, and set |input| to the result.
  2. If |input| contains only <>s, return an empty [=list=].
  3. Parse a comma-separated list of component values from |input|, and let list be the return value.
  4. [=list/For each=] |item| of |list|, replace |item| with the result of [=CSS/parsing=] |item| with |grammar|.
  5. Return |list|.

Parse a stylesheet

To parse a stylesheet from an |input| given an optional [=/url=] |location|:
  1. If |input| is a byte stream for a stylesheet, [=decode bytes=] from |input|, and set |input| to the result.
  2. [=Normalize=] |input|, and set |input| to the result.
  3. Create a new stylesheet, with its location set to |location| (or null, if |location| was not passed).
  4. Consume a stylesheet's contents from |input|, and set the stylesheet's rules to the result.
  5. Return the stylesheet.

Parse a stylesheet's contents

To parse a stylesheet's contents from |input|:
  1. [=Normalize=] |input|, and set |input| to the result.
  2. Consume a stylesheet's contents from |input|, and return the result.

Parse a block's contents

To parse a block's contents from |input|:
  1. [=Normalize=] |input|, and set |input| to the result.
  2. Consume a block's contents from |input|, and return the result.

Parse a rule

To parse a rule from |input|:
  1. [=Normalize=] |input|, and set |input| to the result.
  2. [=token stream/Discard whitespace=] from |input|.
  3. If the next token from |input| is an <>, return a syntax error. Otherwise, if the next token from |input| is an <>, consume an at-rule from |input|, and let rule be the return value. Otherwise, consume a qualified rule from |input| and let rule be the return value. If nothing was returned, return a syntax error.
  4. [=token stream/Discard whitespace=] from |input|.
  5. If the next token from |input| is an <>, return rule. Otherwise, return a syntax error.

Parse a declaration

Note: Unlike "Parse a list of declarations", this parses only a declaration and not an at-rule.
To parse a declaration from |input|:
  1. [=Normalize=] |input|, and set |input| to the result.
  2. [=token stream/Discard whitespace=] from |input|.
  3. Consume a declaration from |input|. If anything was returned, return it. Otherwise, return a syntax error.

Parse a component value

To parse a component value from |input|:
  1. [=Normalize=] |input|, and set |input| to the result.
  2. [=token stream/Discard whitespace=] from |input|.
  3. If |input| is [=token stream/empty=], return a syntax error.
  4. Consume a component value from |input| and let value be the return value.
  5. [=token stream/Discard whitespace=] from |input|.
  6. If |input| is [=token stream/empty=], return value. Otherwise, return a syntax error.

Parse a list of component values

To parse a list of component values from |input|: 1. [=Normalize=] |input|, and set |input| to the result. 2. [=Consume a list of component values=] from |input|, and return the result.

Parse a comma-separated list of component values

To parse a comma-separated list of component values from |input|: 1. [=Normalize=] |input|, and set |input| to the result. 2. Let |groups| be an empty [=list=]. 3. While |input| is not [=token stream/empty=]: 1. [=Consume a list of component values=] from |input|, with <> as the stop token, and append the result to |groups|. 2. [=Discard a token=] from |input|. 4. Return |groups|.

Parser Algorithms

The following algorithms comprise the parser. They are called by the parser entry points above, and generally should not be called directly by other specifications. Note that CSS parsing is case-sensitive, and checking the validity of constructs in a given context must be done during parsing in at least some circumstances. This specification intentionally does not specify how sufficient context should be passed around to enable validity-checking.

Consume a stylesheet's contents

To consume a stylesheet's contents from a [=token stream=] |input|: Let |rules| be an initially empty [=list=] of rules. [=token stream/Process=] |input|:
<>
[=Discard a token=] from |input|.
<>
Return |rules|.
<>
<>
[=Discard a token=] from |input|.
What's this for? Back when CSS was first being introduced, the <{style}> element was treated as an unknown element by older browsers. To avoid having its contents displayed in the page for these legacy browsers, it became common practice to wrap the stylesheet in an HTML comment, and newer browsers would simply ignore the HTML comment syntax. This requirement carries over to today, decades later. The same practice was done for <{script}> elements, where HTML comment syntax is treated as a line comment in JS (similar to //) for the same reason.
<>
Consume an at-rule from |input|. If anything is returned, append it to |rules|.
anything else
Consume a qualified rule from |input|. If anything is returned, append it to |rules|.

Consume an at-rule

To consume an at-rule from a [=token stream=] |input|, given an optional bool |nested| (default false): Assert: The [=next token=] is an <>. [=token stream/Consume a token=] from |input|, and let |rule| be a new [=at-rule=] with its name set to the returned token's value, its prelude initially set to an empty [=list=], and no declarations or child rules. [=token stream/Process=] |input|:
<>
<>
[=Discard a token=] from |input|. If |rule| is valid in the current context, return it; otherwise return nothing.
<}-token>
If |nested| is true: * If |rule| is valid in the current context, return it. * Otherwise, return nothing. Otherwise, [=token stream/consume a token=] and append the result to |rule|'s prelude.
<{-token>
[=Consume a block=] from |input|, and assign the results to |rule|'s lists of [=declarations=] and child [=rules=]. If |rule| is valid in the current context, return it. Otherwise, return nothing.
anything else
Consume a component value from |input| and append the returned value to |rule|'s prelude.

Consume a qualified rule

To consume a qualified rule, from a [=token stream=] |input|, given an optional [=token=] |stop token| and an optional bool |nested| (default false): Let |rule| be a new [=qualified rule=] with its prelude, declarations, and child rules all initially set to empty [=lists=]. [=token stream/Process=] |input|:
<>
|stop token| (if passed)
This is a parse error. Return nothing.
<}-token>
This is a parse error. If |nested| is true, return nothing. Otherwise, [=token stream/consume a token=] and append the result to |rule|'s prelude.
<{-token>
If the first two non-<> values of |rule|'s prelude are an <> whose value starts with "--" followed by a <>, then: * If |nested| is true, [=consume the remnants of a bad declaration=] from |input|, with |nested| set to true, and return nothing. * If |nested| is false, [=consume a block=] from |input|, and return nothing.
What's this check for? [=Declarations=] and [=qualified rules=] don't generally overlap in their allowed syntax. No currently-defined CSS property allows {}-blocks in its value, so foo:bar {}; is definitely a rule, and foo: bar; is definitely a property. Even if a future CSS property allows {}-blocks in its value, the allowed syntax is restricted to the {}-block being the whole value, such as foo: {...};, which is guaranteed to not be a valid rule, since the '':'' doesn't have an ident or function following it to mark it as a pseudo-class. This allows us to mix declarations and rules in the same context: we first try to parse something as a declaration, and if that doesn't result in a valid declaration, we reparse it as a rule instead. An accidentally-invalid declaration will parse as a rule instead, but that's fine: the parser will stop at the declaration's ending semicolon and consider it an invalid rule. (Or in the case of a property containing a {}-block, will stop just *before* the semicolon, still considering it an invalid rule, and then the next attempt to parse something will throw out the lone semicolon as invalid.) So the total amount of tokens consumed is the same regardless. [=Custom properties=], however, don't have the CSSWG carefully vetting their syntax. Authors can write a custom property that takes a {}-block in its value, even combined with other thing; if that custom property is then invalid (due to an invalidly-written ''var()'' function, for example), when it's reparsed as a rule it will stop early, at the {}-block. The remaining tokens of the custom property's value will then get parsed as a fresh construct, potentially causing unexpected declarations or rules to be created. To avoid this (admittedly very niche) corner-case, we subtract the syntax of a [=custom property=] from that of a [=qualified rule=]; if you're in a context that allows properties and rules to be mixed, and you somehow end up parsing a rule that looks like a [=custom property=], you've messed up, and need to instead consume an entire custom property (all the way to the ending semicolon). (If we're in a context that doesn't allow properties, we just throw away the rule if it looks like a custom property. This ensures that --foo:hover { color: blue; } is consistently invalid everywhere, without potentially consuming a ton of a stylesheet looking for the non-existent ending semicolon.)
Otherwise, [=consume a block=] from |input|, and assign the results to |rule|'s lists of [=declarations=] and child [=rules=]. If |rule| is valid in the current context, return it; otherwise return nothing.
anything else
Consume a component value from |input| and append the result to |rule|'s prelude.

Consume a block

To consume a block, from a [=token stream=] |input|: Assert: The [=next token=] is a <{-token>. Let |decls| be an empty [=list=] of [=declarations=], and |rules| be an empty [=list=] of [=rules=]. [=Discard a token=] from |input|. [=Consume a block's contents=] from |input| and assign the results to |decls| and |rules|. [=Discard a token=] from |input|. Return |decls| and |rules|.

Consume a block's contents

To consume a block's contents from a [=token stream=] |input|: Let |decls| be an empty [=list=] of [=declarations=], and |rules| be an empty [=list=] of [=rules=]. [=token stream/Process=] |input|:
<>
<>
[=Discard a token=] from |input|.
<>
<}-token>
Return |decls| and |rules|.
<>
[=Consume an at-rule=] from |input|, with |nested| set to true. If a [=rule=] was returned, append it to |rules|.
anything else
[=Mark=] |input|. [=Consume a declaration=] from |input|, with |nested| set to true. If a [=declaration=] was returned, append it to |decls|, and [=discard a mark=] from |input|. Otherwise, [=restore a mark=] from |input|, then [=consume a qualified rule=] from |input|, with |nested| set to true, and <> as the |stop token|. If a [=rule=] was returned, append it to |rules|.
Implementation note This spec, as with many CSS specs, has been written to prioritize understandability over efficiency. A number of algorithms, notably the above "parse as a declaration, then parse as a rule" behavior can be fairly inefficient if implemented naively as described. However, the behavior has been carefully written to allow "early exits" as much as possible. In particular, and roughly in order of when the exit can occur: * If the first non-whitespace token isn't an <> for a recognized property name (or a custom property name), you can immediately stop parsing as a declaration and reparse as a rule instead. If the next non-whitespace token isn't a <>, you can similarly immediately stop parsing as a declaration. (That is, ''font+ ...'' is guaranteed to not be a property, nor is not-a-prop-name: ....) * If the first two non-whitespace tokens are a custom property name and a colon, it's definitely a custom property and won't ever produce a valid rule, so even if the custom property ends up invalid there's no need to try and reparse as a rule. (That is, ''--foo:hover {...}'' is guaranteed to be a custom property, not a rule.) * If the first three non-whitespace tokens are a valid property name, a colon, and anything other than a <<{-token>>, and then while parsing the declaration's value you encounter a <<{-token>>, you can immediately stop parsing as a declaration and reparse as a rule instead. (That is, ''font:bar {...'' is guaranteed to be an invalid property.) * If you see a recognized property name, a colon, and a {}-block, but the first non-whitespace tokens following that block isn't either immediately the final semicolon, or the !important followed by the semicolon, you can immediately stop parsing as a declaration and reparse as a rule instead. (That is, ''font: {} bar ...'' is guaranteed to be an invalid property; you don't need to keep parsing until you hit a semicolon.) Similarly, even tho the parsing requirements are specified to rely on checking the grammar of the declarations as you parse, a generic processor trying to implement a non-CSS language on top of the generic CSS syntax can still get away with just verifying that declarations start with an ident, a colon, and then either contain solely a {}-block or no {}-block at all. They'll just spent a little more time on parsing than an implementation with grammar knowledge in cases like ''foo:hover ... {}'', since they can't early-exit on the first token.

Consume a declaration

To consume a declaration from a [=token stream=] |input|, given an optional bool |nested| (default false): Let |decl| be a new [=declaration=], with an initially empty name and a value set to an empty [=list=].
  1. If the [=next token=] is an <>, [=token stream/consume a token=] from |input| and set |decl|'s name to the token's value. Otherwise, [=consume the remnants of a bad declaration=] from |input|, with |nested|, and return nothing.
  2. [=Discard whitespace=] from |input|.
  3. If the [=next token=] is a <>, [=discard a token=] from |input|. Otherwise, [=consume the remnants of a bad declaration=] from |input|, with |nested|, and return nothing.
  4. [=Discard whitespace=] from |input|.
  5. [=Consume a list of component values=] from |input|, with |nested|, and with <> as the stop token, and set |decl|'s value to the result.
  6. If the last two non-<>s in |decl|'s value are a <> with the value "!" followed by an <> with a value that is an ASCII case-insensitive match for "important", remove them from |decl|'s value and set |decl|'s important flag.
  7. While the last item in |decl|'s value is a <>, [=list/remove=] that token.
  8. If |decl|'s name is a [=custom property name string=], then set |decl|'s |original text| to the segment of the original source text string corresponding to the tokens of |decl|'s value. Otherwise, if |decl|'s value contains a top-level [=simple block=] with an associated token of <<{-token>>, and also contains any other non-<> value, return nothing. (That is, a top-level {}-block is only allowed as the entire value of a non-custom property.) Otherwise, if |decl|'s name is an [=ASCII case-insensitive=] match for "unicode-range", [=consume the value of a unicode-range descriptor=] from the segment of the original source text string corresponding to the tokens returned by the [=consume a list of component values=] call, and replace |decl|'s value with the result.
  9. If |decl| is valid in the current context, return it; otherwise return nothing.
To consume the remnants of a bad declaration from a [=token stream=] |input|, given a bool |nested|: [=token stream/Process=] |input|: : <> : <> :: [=Discard a token=] from |input|, and return nothing. : <}-token> :: If |nested| is true, return nothing. Otherwise, [=discard a token=]. : anything else :: [=Consume a component value=] from |input|, and do nothing.

Consume a list of component values

To consume a list of component values from a [=token stream=] |input|, given an optional [=token=] |stop token| and an optional boolean |nested| (default false): Let |values| be an empty [=list=] of [=component values=]. [=token stream/Process=] |input|: : <> : |stop token| (if passed) :: Return |values|. : <}-token> :: If |nested| is true, return |values|. Otherwise, this is a parse error. [=token stream/Consume a token=] from |input| and append the result to |values|. : anything else :: [=Consume a component value=] from |input|, and append the result to |values|.

Consume a component value

To consume a component value from a [=token stream=] |input|: [=token stream/Process=] |input|: : <{-token> : <[-token> : <(-token> :: Consume a simple block from |input| and return the result. : <> :: [=Consume a function=] from |input| and return the result. : anything else :: [=token stream/Consume a token=] from |input| and return the result.

Consume a simple block

To consume a simple block from a [=token stream=] |input|: Assert: the [=next token=] of |input| is <{-token>, <[-token>, or <(-token>. Let |ending token| be the mirror variant of the next token. (E.g. if it was called with <[-token>, the |ending token| is <]-token>.) Let |block| be a new simple block with its associated token set to the next token and with its value initially set to an empty [=list=]. [=Discard a token=] from |input|. [=token stream/Process=] |input|: : <> : |ending token| :: [=Discard a token=] from |input|. Return |block|. : anything else :: [=Consume a component value=] from |input| and append the result to |block|'s value.

Consume a function

To consume a function from a [=token stream=] |input|: Assert: The [=next token=] is a <>. [=token stream/Consume a token=] from |input|, and let |function| be a new function with its name equal the returned token's value, and a value set to an empty [=list=]. [=token stream/Process=] |input|: : <> : <)-token> :: [=Discard a token=] from |input|. Return |function|. : anything else :: [=Consume a component value=] from |input| and append the result to |function|'s value.

Consume a '@font-face/unicode-range' value

To consume the value of a unicode-range descriptor from a string |input string|: 1. Let |tokens| be the result of [=CSS/tokenizing=] |input string| with |unicode ranges allowed| set to true. Let |input| be a new [=token stream=] from |tokens|. 2. [=Consume a list of component values=] from |input|, and return the result. Note: The existence of this algorithm is due to a design mistake in early CSS. It should never be reproduced.

The An+B microsyntax

Several things in CSS, such as the '':nth-child()'' pseudoclass, need to indicate indexes in a list. The An+B microsyntax is useful for this, allowing an author to easily indicate single elements or all elements at regularly-spaced intervals in a list. The An+B notation defines an integer step (|A|) and offset (|B|), and represents the An+Bth elements in a list, for every positive integer or zero value of n, with the first element in the list having index 1 (not 0). For values of A and B greater than 0, this effectively divides the list into groups of A elements (the last group taking the remainder), and selecting the Bth element of each group. The An+B notation also accepts the ''even'' and ''odd'' keywords, which have the same meaning as ''2n'' and ''2n+1'', respectively.

Examples:

2n+0   /* represents all of the even elements in the list */
even   /* same */
4n+1   /* represents the 1st, 5th, 9th, 13th, etc. elements in the list */
The values of A and B can be negative, but only the positive results of An+B, for n ≥ 0, are used.

Example:

-1n+6   /* represents the first 6 elements of the list */
-4n+10  /* represents the 2nd, 6th, and 10th elements of the list */
		
If both A and B are 0, the pseudo-class represents no element in the list.

Informal Syntax Description

This section is non-normative. When A is 0, the An part may be omitted (unless the B part is already omitted). When An is not included and B is non-negative, the ''+'' sign before B (when allowed) may also be omitted. In this case the syntax simplifies to just B.

Examples:

0n+5   /* represents the 5th element in the list */
5      /* same */
When A is 1 or -1, the 1 may be omitted from the rule.

Examples:

The following notations are therefore equivalent:

1n+0   /* represents all elements in the list */
n+0    /* same */
n      /* same */
If B is 0, then every Ath element is picked. In such a case, the +B (or -B) part may be omitted unless the A part is already omitted.

Examples:

2n+0   /* represents every even element in the list */
2n     /* same */
When B is negative, its minus sign replaces the ''+'' sign.

Valid example:

3n-6

Invalid example:

3n + -6
Whitespace is permitted on either side of the ''+'' or ''-'' that separates the An and B parts when both are present.

Valid Examples with white space:

3n + 1
+3n - 2
-n+ 6
+6

Invalid Examples with white space:

3 n
+ 2n
+ 2

The <an+b> type

The An+B notation was originally defined using a slightly different tokenizer than the rest of CSS, resulting in a somewhat odd definition when expressed in terms of CSS tokens. This section describes how to recognize the An+B notation in terms of CSS tokens (thus defining the <an+b> type for CSS grammar purposes), and how to interpret the CSS tokens to obtain values for A and B. The <an+b> type is defined (using the Value Definition Syntax in the Values & Units spec) as:
		<an+b> =
		  odd | even |
		  <integer> |

		  <n-dimension> |
		  '+'? n |
		  -n |

		  <ndashdigit-dimension> |
		  '+'? <ndashdigit-ident> |
		  <dashndashdigit-ident> |

		  <n-dimension> <signed-integer> |
		  '+'? n <signed-integer> |
		  -n <signed-integer> |

		  <ndash-dimension> <signless-integer> |
		  '+'? n- <signless-integer> |
		  -n- <signless-integer> |

		  <n-dimension> ['+' | '-'] <signless-integer> |
		  '+'? n ['+' | '-'] <signless-integer> |
		  -n ['+' | '-'] <signless-integer>
	
where:
  • <n-dimension> is a <> with its type flag set to "integer", and a unit that is an ASCII case-insensitive match for "n"
  • <ndash-dimension> is a <> with its type flag set to "integer", and a unit that is an ASCII case-insensitive match for "n-"
  • <ndashdigit-dimension> is a <> with its type flag set to "integer", and a unit that is an ASCII case-insensitive match for "n-*", where "*" is a series of one or more digits
  • <ndashdigit-ident> is an <> whose value is an ASCII case-insensitive match for "n-*", where "*" is a series of one or more digits
  • <dashndashdigit-ident> is an <> whose value is an ASCII case-insensitive match for "-n-*", where "*" is a series of one or more digits
  • <integer> is a <> with its type flag set to "integer"
  • <signed-integer> is a <> with its type flag set to "integer", and a sign character
  • <signless-integer> is a <> with its type flag set to "integer", and no sign character

: When a plus sign (+) precedes an ident starting with "n", as in the cases marked above, there must be no whitespace between the two tokens, or else the tokens do not match the above grammar. Whitespace is valid (and ignored) between any other two tokens. The clauses of the production are interpreted as follows:

''odd''
A is 2, B is 1.
''even''
A is 2, B is 0.
<integer>
A is 0, B is the integer’s value.
<n-dimension>
'+'? n
-n
A is the dimension's value, 1, or -1, respectively. B is 0.
<ndashdigit-dimension>
'+'? <ndashdigit-ident>
A is the dimension's value or 1, respectively. B is the dimension's unit or ident's value, respectively, with the first code point removed and the remainder interpreted as a base-10 number. B is negative.
<dashndashdigit-ident>
A is -1. B is the ident's value, with the first two code points removed and the remainder interpreted as a base-10 number. B is negative.
<n-dimension> <signed-integer>
'+'? n <signed-integer>
-n <signed-integer>
A is the dimension's value, 1, or -1, respectively. B is the integer’s value.
<ndash-dimension> <signless-integer>
'+'? n- <signless-integer>
-n- <signless-integer>
A is the dimension's value, 1, or -1, respectively. B is the negation of the integer’s value.
<n-dimension> ['+' | '-'] <signless-integer>
'+'? n ['+' | '-'] <signless-integer>
-n ['+' | '-'] <signless-integer>
A is the dimension's value, 1, or -1, respectively. B is the integer’s value. If a '-' was provided between the two, B is instead the negation of the integer’s value.

Defining Grammars for Rules and Other Values

[[css-values-4#value-defs]] defines how to specify a grammar for properties. This section extends those definitions to also allow specifying a grammar for rules. Non-terminals representing the entire grammar of an [=at-rule=] are written as an @ character followed by the at-rule's name, between < and >, e.g. <<@media>> to represent the ''@media'' rule. The [[css-values-4#numeric-ranges|bracketed range notation]] can be used on any of the numeric token non-terminals. Several types of tokens are written literally, without quotes:
  • <>s (such as auto, disc, etc), which are simply written as their value.
  • <>s, which are written as an @ character followed by the token's value, like @media.
  • <>s, which are written as the function name followed by a ( character, like translate(.
  • The <> (written as :), <> (written as ,), <> (written as ;), <(-token>, <)-token>, <{-token>, and <}-token>s.
Tokens match if their value is a match for the value defined in the grammar. Unless otherwise specified, all matches are ASCII case-insensitive. Note: Although it is possible, with escaping, to construct an <> whose value ends with ( or starts with @, such a tokens is not a <> or an <> and does not match corresponding grammar definitions. <>s are written with their value enclosed in single quotes. For example, a <> containing the "+" code point is written as '+'. Similarly, the <[-token> and <]-token>s must be written in single quotes, as they're used by the syntax of the grammar itself to group clauses. <> is never indicated in the grammar; <>s are allowed before, after, and between any two tokens, unless explicitly specified otherwise in prose definitions. (For example, if the prelude of a rule is a selector, whitespace is significant.) When defining a function or a block, the ending token must be specified in the grammar, but if it's not present in the eventual token stream, it still matches.
For example, the syntax of the ''translateX()'' function is:
translateX( <> )
However, the stylesheet may end with the function unclosed, like:
.foo { transform: translate(50px
The CSS parser parses this as a style rule containing one declaration, whose value is a function named "translate". This matches the above grammar, even though the ending token didn't appear in the token stream, because by the time the parser is finished, the presence of the ending token is no longer possible to determine; all you have is the fact that there's a block and a function.

Defining Block Contents: the <>, <>, <>, <>, and <> productions

The CSS parser is agnostic as to the contents of blocks-- they're all [=consume a block's contents|parsed with the same algorithm=], and differentiate themselves solely by what constructs are valid. When writing a rule grammar, <block-contents> represents this agnostic parsing. It must only be used as the sole value in a block, and represents that no restrictions are implicitly placed on what the block can contain. Accompanying prose must define what is valid and invalid in this context. If any [=declarations=] are valid, and are [=property declarations=], it must define whether they interact with the cascade; if they do, it must define their specificity and how they use !important. In many cases, however, a block can't validly contain any constructs of a given type. To represent these cases more explicitly, the following productions may be used * <declaration-list>: only [=declarations=] are allowed; [=at-rules=] and [=qualified rules=] are automatically invalid. * <qualified-rule-list>: only [=qualified rules=] are allowed; [=declarations=] and [=at-rules=] are automatically invalid. * <at-rule-list>: only [=at-rules=] are allowed; [=declarations=] and [=qualified rules=] are automatically invalid. * <declaration-rule-list>: [=declarations=] and [=at-rules=] are allowed; [=qualified rules=] are automatically invalid. * <rule-list>: [=qualified rules=] and [=at-rules=] are allowed; [=declarations=] are automatically invalid. All of these are exactly equivalent to <> in terms of parsing, but the accompanying prose only has to define validity for the categories that aren't automatically invalid.
Some examples of the various productions: * A top-level ''@media'' uses <> for its block, while a nested one [[CSS-NESTING-1]] uses <>. * [=Style rules=] use <>. * ''@font-face'' uses <>. * ''@page'' uses <>. * ''@keyframes'' uses <>
For example, the grammar for ''@font-face'' can be written as:
<<@font-face>> = @font-face { <> }
and then accompanying prose defines the valid [=descriptors=] allowed in the block. The grammar for ''@keyframes'' can be written as:
			<<@keyframes>> = @keyframes { <> }
			<> = <> { <> }
		
and then accompanying prose defines that only <>s are allowed in ''@keyframes'', and that <>s accept all animatable CSS properties, plus the 'animation-timing-function' property, but they do not interact with the cascade.

Defining Arbitrary Contents: the <> and <> productions

In some grammars, it is useful to accept any reasonable input in the grammar, and do more specific error-handling on the contents manually (rather than simply invalidating the construct, as grammar mismatches tend to do). For example, custom properties allow any reasonable value, as they can contain arbitrary pieces of other CSS properties, or be used for things that aren't part of existing CSS at all. For another example, the <> production in Media Queries defines the bounds of what future syntax MQs will allow, and uses special logic to deal with "unknown" values. To aid in this, two additional productions are defined: The <declaration-value> production matches any sequence of one or more tokens, so long as the sequence does not contain <>, <>, unmatched <<)-token>>, <<]-token>>, or <<}-token>>, or top-level <> tokens or <> tokens with a value of "!". It represents the entirety of what a valid declaration can have as its value. The <any-value> production is identical to <>, but also allows top-level <> tokens and <> tokens with a value of "!". It represents the entirety of what valid CSS can be in any context.

CSS stylesheets

To parse a CSS stylesheet, first parse a stylesheet. Interpret all of the resulting top-level qualified rules as style rules, defined below. If any style rule is invalid, or any at-rule is not recognized or is invalid according to its grammar or context, it's a parse error. Discard that rule.

Style rules

A style rule is a qualified rule that associates a selector list with a list of property declarations and possibly a list of nested rules. They are also called rule sets in [[CSS2]]. CSS Cascading and Inheritance [[CSS-CASCADE-3]] defines how the declarations inside of style rules participate in the cascade. The prelude of the qualified rule is [=CSS/parsed=] as a <>. If this returns failure, the entire style rule is invalid. The content of the qualified rule’s block is parsed as a <>. Qualified rules in this block are also [=style rules=]. Unless defined otherwise by another specification or a future level of this specification, at-rules in that list are invalid and must be ignored. Note: [[CSS-NESTING-1]] defines that [=conditional group rules=] and some other [=at-rules=] are allowed inside of [=style rules=]. Declarations for an unknown CSS property or whose value does not match the syntax defined by the property are invalid and must be ignored. The validity of the style rule’s contents have no effect on the validity of the style rule itself. Unless otherwise specified, property names are ASCII case-insensitive. Note: The names of Custom Properties [[CSS-VARIABLES]] are case-sensitive. Qualified rules at the top-level of a CSS stylesheet are [=style rules=]. Qualified rules in other contexts may or may not be style rules, as defined by the context.

For example, qualified rules inside ''@media'' rules [[CSS3-CONDITIONAL]] are style rules, but qualified rules inside ''@keyframes'' rules [[CSS3-ANIMATIONS]] are not.

At-rules

An at-rule is a rule that begins with an at-keyword, and can thus be distinguished from [=style rules=] in the same context. [=At-rules=] are used to: * group and structure style rules and other at-rules such as in [=conditional group rules=] * declare style information that is not associated with a particular element, such as defining [=counter styles=] * manage syntactic constructs such as [[css-cascade-3#at-import|imports]] and [[CSS-NAMESPACES-3|namespaces]] keyword mappings * and serve other miscellaneous purposes not served by a [=style rule=] At-rules take many forms, depending on the specific rule and its purpose, but broadly speaking there are two kinds: statement at-rules which are simpler constructs that end in a semicolon, and block at-rules which end in a [={}-block=] that can contain nested [=qualified rules=], [=at-rules=], or [=declarations=]. [=Block at-rules=] will typically contain a collection of (generic or [=at-rule=]–specific) [=at-rules=], [=qualified rules=], and/or [=descriptor declarations=] subject to limitations defined by the [=at-rule=]. Descriptors are similar to [=properties=] (and are declared with the same syntax) but are associated with a particular type of [=at-rule=] rather than with elements and boxes in the tree.

The ''@charset'' Rule

The algorithm used to determine the fallback encoding for a stylesheet looks for a specific byte sequence as the very first few bytes in the file, which has the syntactic form of an at-rule named "@charset". However, there is no actual at-rule named @charset. When a stylesheet is actually parsed, any occurrences of an ''@charset'' rule must be treated as an unrecognized rule, and thus dropped as invalid when the stylesheet is grammar-checked. Note: In CSS 2.1, ''@charset'' was a valid rule. Some legacy specs may still refer to a ''@charset'' rule, and explicitly talk about its presence in the stylesheet.

Serialization

The tokenizer described in this specification does not produce tokens for comments, or otherwise preserve them in any way. Implementations may preserve the contents of comments and their location in the token stream. If they do, this preserved information must have no effect on the parsing step. This specification does not define how to serialize CSS in general, leaving that task to the [[CSSOM]] and individual feature specifications. In particular, the serialization of comments and whitespace is not defined. The only requirement for serialization is that it must "round-trip" with parsing, that is, parsing the stylesheet must produce the same data structures as parsing, serializing, and parsing again, except for consecutive <>s, which may be collapsed into a single token. Note: This exception can exist because CSS grammars always interpret any amount of whitespace as identical to a single space.
To satisfy this requirement:
  • A <> containing U+005C REVERSE SOLIDUS (\) must be serialized as U+005C REVERSE SOLIDUS followed by a newline. (The tokenizer only ever emits such a token followed by a <> that starts with a newline.)
  • A <> with the "unrestricted" type flag may not need as much escaping as the same token with the "id" type flag.
  • The unit of a <> may need escaping to disambiguate with scientific notation.
  • For any consecutive pair of tokens, if the first token shows up in the row headings of the following table, and the second token shows up in the column headings, and there's a ✗ in the cell denoted by the intersection of the chosen row and column, the pair of tokens must be serialized with a comment between them. If the tokenizer preserves comments, and there were comments originally between the token pair, the preserved comment(s) should be used; otherwise, an empty comment (/**/) must be inserted. (Preserved comments may be reinserted even if the following tables don't require a comment between two tokens.) Single characters in the row and column headings represent a <> with that value, except for "(", which represents a (-token.
ident function url bad url - number percentage dimension CDC ( * %
ident
at-keyword
hash
dimension
#
-
number
@
.
+
/

Serializing <an+b>

To serialize an <> value, with integer values |A| and |B|: 1. If |A| is zero, return the serialization of |B|. 2. Otherwise, let |result| initially be an empty [=string=]. 3.
: |A| is 1 :: Append "n" to |result|. : |A| is -1 :: Append "-n" to |result|. : |A| is non-zero :: Serialize |A| and append it to |result|, then append "n" to |result|.
4.
: |B| is greater than zero :: Append "+" to |result|, then append the serialization of |B| to |result|. : |B| is less than zero :: Append the serialization of |B| to |result|.
5. Return |result|.

Privacy Considerations

This specification introduces no new privacy concerns.

Security Considerations

This specification improves security, in that CSS parsing is now unambiguously defined for all inputs. Insofar as old parsers, such as whitelists/filters, parse differently from this specification, they are somewhat insecure, but the previous parsing specification left a lot of ambiguous corner cases which browsers interpreted differently, so those filters were potentially insecure already, and this specification does not worsen the situation.

Changes

This section is non-normative.

Changes from the 24 December 2021 Candidate Recommendation Draft

The following substantive changes were made: * The definition of [=non-ASCII code point=] was restricted to omit some potentially troublesome codepoints. (7129) * Defined the <foo()> and <@foo> productions. (5728) * Allow nested style rules. (7961). * As part of allowing Nesting, significantly rewrote the entire parsing section. Notably, removed "parse a list of rules" and "parse a list of declarations" in favor of "parse a stylesheet's contents" and "parse a block's contents". Only additional normative change is that semicolons trigger slightly different error-recovery now in some contexts, so that parsing of things like ''@media'' blocks is identical whether they're nested or not. (8834). * Removed the attempt at a <urange> production in terms of existing tokens, instead relying on a special re-parse from source specifically when you're parsing the '@font-face/unicode-range' descriptor. (8835) * Since the above removed the main need to preserve a token's "representation" (the original string it was parsed from), removed the rest of the references to "representation" in the An+B section and instead just recorded the few bits of information necessary for that (whether or not the number had an explicit sign). * Explicitly noted that some uses ([=custom properties=], the ''@font-face/unicode-range'' descriptor) require access to the original string of a declaration's entire value, and marked where that occurs in the parser.

Changes from the 16 August 2019 Candidate Recommendation

The following substantive changes were made: * Added a new [[#parse-comma-list]] algorithm. * Added a new "Parse a style block's content" algorithm and a corresponding <> production, and defined that [=style rules=] use it. * Aligned [=parse a stylesheet=] with the Fetch-related shenanigans. (See commit.)

To [=parse a stylesheet=] from an |input| given an optional [=/url=] |location|:

  1. ...
  2. Create a new stylesheet, with its location set to |location| (or null, if |location| was not passed).
  3. ...
The following editorial changes were made: * Added [[#at-rules]] to provide definitions for [=at-rules=], [=statement at-rules=], [=block at-rules=], and [=descriptors=]. (5633) * Improved the definition text for [=declaration=], and added definitions for [=property declarations=] and [=descriptor declarations=]. * Switched to consistently refer to [=ident sequence=], rather than sometimes using the term “name”. * Explicitly named several of the pre-tokenizing processes, and explicitly referred to them in the parsing entry points (rather than relying on a blanket "do X at the start of these algorithms" statement). * Added more entries to the "put a comment between them" table, to properly handle the fact that idents can now start with --. (6874)

Changes from the 20 February 2014 Candidate Recommendation

The following substantive changes were made: * Removed <>, in favor of creating a <> production. * url() functions that contain a string are now parsed as normal <>s. url() functions that contain "raw" URLs are still specially parsed as <>s. * Fixed a bug in the "Consume a URL token" algorithm, where it didn't consume the quote character starting a string before attempting to consume the string. * Fixed a bug in several of the parser algorithms related to the current/next token and things getting consumed early/late. * Fix several bugs in the tokenization and parsing algorithms. * Change the definition of ident-like tokens to allow "--" to start an ident. As part of this, rearrange the ordering of the clauses in the "-" step of [=tokenizer/consume a token=] so that <>s are recognized as such instead of becoming a ''--'' <>. * Don't serialize the digit in an <> when A is 1 or -1. * Define all tokens to have a representation. * Fixed minor bug in check if two code points are a valid escape-- a \ followed by an EOF is now correctly reported as not a valid escape. A final \ in a stylesheet now just emits itself as a <>. * @charset is no longer a valid CSS rule (there's just an encoding declaration that looks like a rule named @charset) * Trimmed whitespace from the beginning/ending of a declaration's value during parsing. * Removed the Selectors-specific tokens, per WG resolution. * Filtered surrogates from the input stream, per WG resolution. Now the entire specification operates only on scalar values. The following editorial changes were made: * The "Consume a string token" algorithm was changed to allow calling it without specifying an explicit ending token, so that it uses the current token instead. The three call-sites of the algorithm were changed to use that form. * Minor editorial restructuring of algorithms. * Added the [=CSS/parse=] and [=parse a comma-separated list of component values=] API entry points. * Added the <> and <> productions. * Removed "code point" and "surrogate code point" in favor of the identical definitions in the Infra Standard. * Clarified on every range that they are inclusive. * Added a column to the comment-insertion table to handle a number token appearing next to a "%" delim token. A Disposition of Comments is available.

Changes from the 5 November 2013 Last Call Working Draft

  • The Serialization section has been rewritten to make only the "round-trip" requirement normative, and move the details of how to achieve it into a note. Some corner cases in these details have been fixed.
  • [[ENCODING]] has been added to the list of normative references. It was already referenced in normative text before, just not listed as such.
  • In the algorithm to determine the fallback encoding of a stylesheet, limit the @charset byte sequence to 1024 bytes. This aligns with what HTML does for <meta charset> and makes sure the size of the sequence is bounded. This only makes a difference with leading or trailing whitespace in the encoding label:
    @charset "   (lots of whitespace)   utf-8";

Changes from the 19 September 2013 Working Draft

  • The concept of environment encoding was added. The behavior does not change, but some of the definitions should be moved to the relevant specs.

Changes from CSS 2.1 and Selectors Level 3

Note: The point of this spec is to match reality; changes from CSS2.1 are nearly always because CSS 2.1 specified something that doesn't match actual browser behavior, or left something unspecified. If some detail doesn't match browsers, please let me know as it's almost certainly unintentional. Changes in decoding from a byte stream:
  • Only detect ''@charset'' rules in ASCII-compatible byte patterns.
  • Ignore ''@charset'' rules that specify an ASCII-incompatible encoding, as that would cause the rule itself to not decode properly.
  • Refer to [[!ENCODING]] rather than the IANA registry for character encodings.
Tokenization changes:
  • Any U+0000 NULL code point in the CSS source is replaced with U+FFFD REPLACEMENT CHARACTER.
  • Any hexadecimal escape sequence such as ''\0'' that evaluates to zero produce U+FFFD REPLACEMENT CHARACTER rather than U+0000 NULL.
  • The definition of non-ASCII ident code point was changed to be consistent with HTML's [=valid custom element names=].
  • Tokenization does not emit COMMENT or BAD_COMMENT tokens anymore. BAD_COMMENT is now considered the same as a normal token (not an error). Serialization is responsible for inserting comments as necessary between tokens that need to be separated, e.g. two consecutive <>s.
  • The <> was removed, as it was low value and occasionally actively harmful. (''u+a { font-weight: bold; }'' was an invalid selector, for example...) Instead, a <> production was added, based on token patterns. It is technically looser than what 2.1 allowed (any number of digits and ? characters), but not in any way that should impact its use in practice.
  • Apply the EOF error handling rule in the tokenizer and emit normal <> and <> rather than BAD_STRING or BAD_URI on EOF.
  • The BAD_URI token (now <>) is "self-contained". In other words, once the tokenizer realizes it's in a <> rather than a <>, it just seeks forward to look for the closing ), ignoring everything else. This behavior is simpler than treating it like a <> and paying attention to opened blocks and such. Only WebKit exhibits this behavior, but it doesn't appear that we've gotten any compat bugs from it.
  • The <> has been added.
  • <>, <>, and <> have been changed to include the preceding +/- sign as part of their value (rather than as a separate <> that needs to be manually handled every time the token is mentioned in other specs). The only consequence of this is that comments can no longer be inserted between the sign and the number.
  • Scientific notation is supported for numbers/percentages/dimensions to match SVG, per WG resolution.
  • Hexadecimal escape for surrogate now emit a replacement character rather than the surrogate. This allows implementations to safely use UTF-16 internally.
Parsing changes:
  • Any list of declarations now also accepts at-rules, like ''@page'', per WG resolution. This makes a difference in error handling even if no such at-rules are defined yet: an at-rule, valid or not, ends at a {} block without a <> and lets the next declaration begin.
  • The handling of some miscellaneous "special" tokens (like an unmatched <}-token>) showing up in various places in the grammar has been specified with some reasonable behavior shown by at least one browser. Previously, stylesheets with those tokens in those places just didn't match the stylesheet grammar at all, so their handling was totally undefined. Specifically:
    • [] blocks, () blocks and functions can now contain {} blocks, <>s or <>s
    • Qualified rule preludes can now contain semicolons
    • Qualified rule and at-rule preludes can now contain <>s
An+B changes from Selectors Level 3 [[SELECT]]:
  • The An+B microsyntax has now been formally defined in terms of CSS tokens, rather than with a separate tokenizer. This has resulted in minor differences:
    • In some cases, minus signs or digits can be escaped (when they appear as part of the unit of a <> or <>).

Acknowledgments

Thanks for feedback and contributions from Anne van Kesteren, David Baron, Elika J. Etemad (fantasai), Henri Sivonen, Johannes Koch, 呂康豪 (Kang-Hao Lu), Marc O'Morain, Raffaello Giulietti, Simon Pieter, Tyler Karaszewski, and Zack Weinberg.