This CSS3 module defines properties for text manipulation and specifies their processing model. It covers line breaking, justification and alignment, white space handling, and text transformation.

CSS is a language for describing the rendering of structured documents (such as HTML and XML) on screen, on paper, in speech, etc.

The following features are at-risk, and may be dropped during the CR period:

“At-risk” is a W3C Process term-of-art, and does not necessarily imply that the feature is in danger of being dropped or delayed. It means that the WG believes the feature may have difficulty being interoperably implemented in a timely manner, and marking it as such allows the WG to drop the feature if necessary when transitioning to the Proposed Rec stage, without having to publish a new Candidate Rec without the feature first.

1. Introduction

This module describes the typesetting controls of CSS; that is, the features of CSS that control the translation of source text to formatted, line-wrapped text. Various CSS properties provide control over case transformation, white space collapsing, text wrapping, line breaking rules and hyphenation, alignment and justification, spacing, and indentation.

Font selection is covered in CSS Fonts Level 3 [CSS3-FONTS].

Features for decorating text, such as underlines, emphasis marks, and shadows, (previously part of this module) are covered in CSS Text Decoration Level 3 [CSS3-TEXT-DECOR].

Bidirectional and vertical text are addressed in CSS Writing Modes Level 3 [CSS3-WRITING-MODES].

1.1. Module Interactions

This module, together with [CSS3-TEXT-DECOR], replaces and extends the text-level features defined in [CSS21] chapter 16.

1.2. Values

This specification follows the CSS property definition conventions from [CSS21]. Value types not defined in this specification are defined in CSS Level 2 Revision 1 [CSS21]. Other CSS modules may expand the definitions of these value types: for example [CSS3VAL], when combined with this module, expands the definition of the <length> value type as used in this specification.

In addition to the property-specific values listed in their definitions, all properties defined in this specification also accept the inherit keyword as their property value. For readability it has not been repeated explicitly.

1.3. Terminology

In addition to the terms defined below, other terminology and concepts used in this specification are defined in [CSS21] and [CSS3-WRITING-MODES].

1.3.1. Characters and Letters

The basic unit of typesetting is the character. However, because writing systems are not always as simple as the basic English alphabet, what a character actually is depends on the context in which the term is used. For example, in Hangul (the Korean writing system), each square representation of a syllable (e.g. =Han) can be considered a character. However, the square symbol is really composed of multiple letters each representing a phoneme (e.g. =h, =a, =n) and these also could each be considered a character.

A basic unit of computer text encoding, for any given encoding, is also called a character, and depending on the encoding, a single encoding character might correspond to the entire pre-composed syllabic character (e.g. ), to the individual phonemic character (e.g. ), or to smaller units such as a base letterform (e.g. ) and any combining marks that vary it (e.g. extra strokes that represent aspiration).

In turn, a single encoding character can be represented in the data stream as one or more bytes; and in programming environments one byte is sometimes also called a character.

Therefore the term character is fairly ambiguous where technical precision is required.

For text layout, we will refer to the typographic character unit as the basic unit of text. Even within the realm of text layout, the relevant character unit depends on the operation. For example, line-breaking and letter-spacing will segment a sequence of Thai characters that include U+0E33 THAI CHARACTER SARA AM differently; or the behaviour of a conjunct consonant in a script such as Devanagari may depend on the font in use. So the typographic character represents a unit of the writing system— such as a Latin alphabetic letter (including its diacritics), Hangul syllable, Chinese ideographic character, Myanmar syllable cluster— that is indivisible with respect to a particular typographic operation (line-breaking, first-letter effects, tracking, justification, vertical arrangement, etc.).

Unicode Standard Annex #29: Text Segmentation defines a unit called the grapheme cluster which approximates the typographic character. A UA must use the extended grapheme cluster (not legacy grapheme cluster), as defined in [UAX29], as the basis for its typographic character unit. However, the UA should tailor the definitions as required by typographic tradition since the default rules are not always appropriate or ideal—and is expected to tailor them differently depending on the operation as needed.

The rules for such tailorings are out of scope for CSS.

The following are some examples of typographic character unit tailorings required by standard typesetting practice:

A typographic letter unit or letter for the purpose of this specification is a typographic character unit belonging to one of the Letter or Number general categories in Unicode. [UAX44] See Character Properties for how to determine the Unicode properties of a typographic character unit.

The rendering characteristics of a typographic character unit divided by an element boundary is undefined: it may be rendered as belonging to either side of the boundary, or as some approximation of belonging to both. Authors are forewarned that dividing grapheme clusters by element boundaries may give inconsistent or undesired results.

1.3.2. Languages and Typesetting

Many typographic effects vary by linguistic context. In CSS, language-specific typographic tailorings are only applied when the content language is known (declared).

Authors should language-tag their content accurately for the best typographic behavior.

The content language of an element is the (human) language the element is declared to be in, according to the rules of the document language. For example, the rules for determining the content language of an HTML element use the lang attribute and are defined in [HTML5], and the rules for determining the content language of an XML element use the xml:lang attribute and are defined in [XML10]. Note that it is possible for the content language of an element to be unknown.

2. Transforming Text

2.1. Case Transforms: the text-transform property

Name: text-transform
Value: none | capitalize | uppercase | lowercase | full-width
Initial: none
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: as specified
Animatable: no
Canonical order: N/A

This property transforms text for styling purposes. (It has no effect on the underlying content.) Values have the following meanings:

No effects.
Puts the first typographic letter unit of each word in titlecase; other characters are unaffected.
Puts all lettersin uppercase.
Puts all letters in lowercase.
Puts all typographic character units in fullwidth form. If a character does not have a corresponding fullwidth form, it is left as is. This value is typically used to typeset Latin letters and digits as if they were ideographic characters.

For capitalize, what constitutes a “word“ is UA-dependent; [UAX29] is suggested (but not required) for determining such word boundaries. Authors should not expect capitalize to follow language-specific titlecasing conventions (such as skipping articles in English).

The following example converts the ASCII characters used in abbreviations in Japanese text to their fullwidth variants so that they lay out and line break like ideographs:

abbr:lang(ja) { text-transform: full-width; }

Note that, as defined in Text Processing Order of Operations, transforming text affects line-breaking and other formatting operations.

The UA must use the full case mappings for Unicode characters, including any conditional casing rules, as defined in Default Case Algorithm section of The Unicode Standard [UNICODE]. If (and only if) the content language of the element is, according to the rules of the document language, known, then any appropriate language-specific rules must be applied as well. These minimally include, but are not limited to, the language-specific rules in Unicode’s SpecialCasing.txt.

For example, in Turkish there are two “i”s, one with a dot—“İ” and “i”— and one without—“I” and “ı”. Thus the usual case mappings between “I” and “i” are replaced with a different set of mappings to their respective undotted/dotted counterparts, which do not exist in English. This mapping must only take effect if the content language is Turkish (or another Turkic language that uses Turkish casing rules); in other languages, the usual mapping of “I” and “i” is required. This rule is thus conditionally defined in Unicode’s SpecialCasing.txt file.

The definition of fullwidth and halfwidth forms can be found on the Unicode consortium web site at [UAX11]. The mapping to fullwidth form is defined by taking code points with the <wide> or the <narrow> tag in their Decomposition_Mapping in [UAX44]. For the <narrow> tag, the mapping is from the code point to the decomposition (minus <narrow> tag), and for the <wide> tag, the mapping is from the decomposition (minus the <wide> tag) back to the original code point.

Text transformation happens after white space processing, which means that full-width only transforms U+0020 spaces to U+3000 within preserved white space.

A future level of CSS may introduce the ability to create custom mapping tables for less common text transforms, such as by an @text-transform rule similar to @counter-style from [CSS-COUNTER-STYLES-3].

3. White Space and Wrapping: the white-space property

Name: white-space
Value: normal | pre | nowrap | pre-wrap | pre-line
Initial: normal
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: as specified
Animatable: no
Canonical order: N/A

This property specifies two things:

Values have the following meanings, which must be interpreted according to the White Space Processing and Line Breaking rules:

This value directs user agents to collapse sequences of white space into a single character (or in some cases, no character). Lines may wrap at allowed soft wrap opportunities, as determined by the line-breaking rules in effect, in order to minimize inline-axis overflow.
This value prevents user agents from collapsing sequences of white space. Segment breaks such as line feeds and carriage returns are preserved as forced line breaks. Lines only break at forced line breaks; content that does not fit within the block container overflows it.
Like normal, this value collapses white space; but like pre, it does not allow wrapping.
Like pre, this value preserves white space; but like normal, it allows wrapping.
Like normal, this value collapses consecutive spaces and allows wrapping, but preserves segment breaks in the source as forced line breaks.

The following informative table summarizes the behavior of various white-space values:

New Lines Spaces and Tabs Text Wrapping
normal Collapse Collapse Wrap
pre Preserve Preserve No wrap
nowrap Collapse Collapse No wrap
pre-wrap Preserve Preserve Wrap
pre-line Preserve Collapse Wrap

See White Space Processing Rules for details on how white space collapses. An informative summary of collapsing (normal and nowrap) is presented below:

See Line Breaking for details on wrapping behavior.

4. White Space Processing Details

The source text of a document often contains formatting that is not relevant to the final rendering: for example, breaking the source into segments (lines) for ease of editing or adding white space characters such as tabs and spaces to indent the source code. CSS white space processing allows the author to control interpretation of such formatting: to preserve or collapse it away when rendering the document. White space processing in CSS interprets white space characters only for rendering: it has no effect on the underlying document data.

White space processing in CSS is controlled with the white-space property.

CSS does not define document segmentation rules. Segments can be separated by a particular newline sequence (such as a line feed or CRLF pair), or delimited by some other mechanism, such as the SGML RECORD-START and RECORD-END tokens. For CSS processing, each document language–defined segment break, CRLF sequence (U+000D U+000A), carriage return (U+000D), and line feed (U+000A) in the text is treated as a segment break, which is then interpreted for rendering as specified by the white-space property.

Note that a document parser might not only normalize any segment breaks, but also collapse other space characters or otherwise process white space according to markup rules. Because CSS processing occurs after the parsing stage, it is not possible to restore these characters for styling. Therefore, some of the behavior specified below can be affected by these limitations and may be user agent dependent.

Note that anonymous blocks consisting entirely of collapsible white space are removed from the rendering tree. Thus any such white space surrounding a block-level element is collapsed away. See [CSS21] section

Control characters (Unicode category Cc) other than tab (U+0009), line feed (U+000A), form feed (U+000C), and carriage return (U+000D) must be rendered as a visible glyph and otherwise treated as any other character of the Other Symbols (So) general category and Common script. The UA may use a glyph provided by a font specifically for the control character, substitute the glyphs provided for the corresponding symbol in the Control Pictures block, generate a visual representation of its codepoint value, or use some other method to provide an appropriate visible glyph. As required by [UNICODE], unsupported Default_ignorable characters must be ignored for rendering.

4.1. The White Space Processing Rules

White space processing in CSS affects only the document white space characters: spaces (U+0020), tabs (U+0009), and segment breaks.

Note that the set of characters considered document white space (part of the document content) and that considered syntactic white space (part of the CSS syntax) are not necessarily identical. However, since both include spaces (U+0020), tabs (U+0009), line feeds (U+000A), and carriage returns (U+000D) most authors won’t notice any differences.

4.1.1. Phase I: Collapsing and Transformation

For each inline (including anonymous inlines; see [CSS21] section within an inline formatting context, white space characters are handled as follows, ignoring bidi formatting characters as if they were not there:


