CSS Text Module Level 3

Editor’s Draft,

This version:
https://drafts.csswg.org/css-text-3/
Latest published version:
https://www.w3.org/TR/css-text-3/
Previous Versions:
https://www.w3.org/TR/2013/WD-css-text-3-20131010/
https://www.w3.org/TR/2012/WD-css3-text-20121113/
Test Suite:
http://test.csswg.org/suites/css3-text/nightly-unstable/
Issue Tracking:
GitHub
Inline In Spec
http://www.w3.org/Style/CSS/Tracker/products/10
Editors:
Elika J. Etemad / fantasai (Invited Expert)
(Invited Expert)

Abstract

This CSS3 module defines properties for text manipulation and specifies their processing model. It covers line breaking, justification and alignment, white space handling, and text transformation.

CSS is a language for describing the rendering of structured documents (such as HTML and XML) on screen, on paper, in speech, etc.

Status of this document

This is a public copy of the editors’ draft. It is provided for discussion only and may change at any moment. Its publication here does not imply endorsement of its contents by W3C. Don’t cite this document other than as work in progress.

GitHub Issues are preferred for discussion of this specification. When filing an issue, please put the text “css-text” in the title, preferably like this: “[css-text] …summary of comment…”. All issues and comments are archived, and there is also a historical archive.

This document was produced by the CSS Working Group (part of the Style Activity).

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 September 2015 W3C Process Document.

The following features are at-risk, and may be dropped during the CR period:

“At-risk” is a W3C Process term-of-art, and does not necessarily imply that the feature is in danger of being dropped or delayed. It means that the WG believes the feature may have difficulty being interoperably implemented in a timely manner, and marking it as such allows the WG to drop the feature if necessary when transitioning to the Proposed Rec stage, without having to publish a new Candidate Rec without the feature first.

1. Introduction

This module describes the typesetting controls of CSS; that is, the features of CSS that control the translation of source text to formatted, line-wrapped text. Various CSS properties provide control over case transformation, white space collapsing, text wrapping, line breaking rules and hyphenation, alignment and justification, spacing, and indentation.

Font selection is covered in CSS Fonts Level 3 [CSS3-FONTS].

Features for decorating text, such as underlines, emphasis marks, and shadows, (previously part of this module) are covered in CSS Text Decoration Level 3 [CSS3-TEXT-DECOR].

Bidirectional and vertical text are addressed in CSS Writing Modes Level 3 [CSS3-WRITING-MODES].

1.1. Module Interactions

This module, together with [CSS3-TEXT-DECOR], replaces and extends the text-level features defined in [CSS21] chapter 16.

1.2. Values

This specification follows the CSS property definition conventions from [CSS21]. Value types not defined in this specification are defined in CSS Level 2 Revision 1 [CSS21]. Other CSS modules may expand the definitions of these value types: for example [CSS3VAL], when combined with this module, expands the definition of the <length> value type as used in this specification.

In addition to the property-specific values listed in their definitions, all properties defined in this specification also accept the inherit keyword as their property value. For readability it has not been repeated explicitly.

1.3. Terminology

In addition to the terms defined below, other terminology and concepts used in this specification are defined in [CSS21] and [CSS3-WRITING-MODES].

1.3.1. Characters and Letters

The basic unit of typesetting is the character. However, because writing systems are not always as simple as the basic English alphabet, what a character actually is depends on the context in which the term is used. For example, in Hangul (the Korean writing system), each square representation of a syllable (e.g. =Han) can be considered a character. However, the square symbol is really composed of multiple letters each representing a phoneme (e.g. =h, =a, =n) and these also could each be considered a character.

A basic unit of computer text encoding, for any given encoding, is also called a character, and depending on the encoding, a single encoding character might correspond to the entire pre-composed syllabic character (e.g. ), to the individual phonemic character (e.g. ), or to smaller units such as a base letterform (e.g. ) and any combining marks that vary it (e.g. extra strokes that represent aspiration).

In turn, a single encoding character can be represented in the data stream as one or more bytes; and in programming environments one byte is sometimes also called a character.

Therefore the term character is fairly ambiguous where technical precision is required.

For text layout, we will refer to the typographic character unit as the basic unit of text. Even within the realm of text layout, the relevant character unit depends on the operation. For example, line-breaking and letter-spacing will segment a sequence of Thai characters that include U+0E33 THAI CHARACTER SARA AM differently; or the behaviour of a conjunct consonant in a script such as Devanagari may depend on the font in use. So the typographic character represents a unit of the writing system— such as a Latin alphabetic letter (including its diacritics), Hangul syllable, Chinese ideographic character, Myanmar syllable cluster— that is indivisible with respect to a particular typographic operation (line-breaking, first-letter effects, tracking, justification, vertical arrangement, etc.).

Unicode Standard Annex #29: Text Segmentation defines a unit called the grapheme cluster which approximates the typographic character. A UA must use the extended grapheme cluster (not legacy grapheme cluster), as defined in [UAX29], as the basis for its typographic character unit. However, the UA should tailor the definitions as required by typographic tradition since the default rules are not always appropriate or ideal—and is expected to tailor them differently depending on the operation as needed.

The rules for such tailorings are out of scope for CSS.

The following are some examples of typographic character unit tailorings required by standard typesetting practice:

A typographic letter unit or letter for the purpose of this specification is a typographic character unit belonging to one of the Letter or Number general categories in Unicode. [UAX44] See Character Properties for how to determine the Unicode properties of a typographic character unit.

The rendering characteristics of a typographic character unit divided by an element boundary is undefined: it may be rendered as belonging to either side of the boundary, or as some approximation of belonging to both. Authors are forewarned that dividing grapheme clusters by element boundaries may give inconsistent or undesired results.

1.3.2. Languages and Typesetting

Many typographic effects vary by linguistic context. In CSS, language-specific typographic tailorings are only applied when the content language is known (declared).

Authors should language-tag their content accurately for the best typographic behavior.

The content language of an element is the (human) language the element is declared to be in, according to the rules of the document language. For example, the rules for determining the content language of an HTML element use the lang attribute and are defined in [HTML5], and the rules for determining the content language of an XML element use the xml:lang attribute and are defined in [XML10]. Note that it is possible for the content language of an element to be unknown.

2. Transforming Text

2.1. Case Transforms: the text-transform property

Name: text-transform
Value: none | capitalize | uppercase | lowercase | full-width
Initial: none
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: as specified
Animatable: no
Canonical order: N/A

This property transforms text for styling purposes. (It has no effect on the underlying content.) Values have the following meanings:

none
No effects.
capitalize
Puts the first typographic letter unit of each word in titlecase; other characters are unaffected.
uppercase
Puts all lettersin uppercase.
lowercase
Puts all letters in lowercase.
full-width
Puts all typographic character units in fullwidth form. If a character does not have a corresponding fullwidth form, it is left as is. This value is typically used to typeset Latin letters and digits as if they were ideographic characters.

For capitalize, what constitutes a “word“ is UA-dependent; [UAX29] is suggested (but not required) for determining such word boundaries. Authors should not expect capitalize to follow language-specific titlecasing conventions (such as skipping articles in English).

The following example converts the ASCII characters used in abbreviations in Japanese text to their fullwidth variants so that they lay out and line break like ideographs:

abbr:lang(ja) { text-transform: full-width; }

Note that, as defined in Text Processing Order of Operations, transforming text affects line-breaking and other formatting operations.

The UA must use the full case mappings for Unicode characters, including any conditional casing rules, as defined in Default Case Algorithm section of The Unicode Standard [UNICODE]. If (and only if) the content language of the element is, according to the rules of the document language, known, then any appropriate language-specific rules must be applied as well. These minimally include, but are not limited to, the language-specific rules in Unicode’s SpecialCasing.txt.

For example, in Turkish there are two “i”s, one with a dot—“İ” and “i”— and one without—“I” and “ı”. Thus the usual case mappings between “I” and “i” are replaced with a different set of mappings to their respective undotted/dotted counterparts, which do not exist in English. This mapping must only take effect if the content language is Turkish (or another Turkic language that uses Turkish casing rules); in other languages, the usual mapping of “I” and “i” is required. This rule is thus conditionally defined in Unicode’s SpecialCasing.txt file.

The definition of fullwidth and halfwidth forms can be found on the Unicode consortium web site at [UAX11]. The mapping to fullwidth form is defined by taking code points with the <wide> or the <narrow> tag in their Decomposition_Mapping in [UAX44]. For the <narrow> tag, the mapping is from the code point to the decomposition (minus <narrow> tag), and for the <wide> tag, the mapping is from the decomposition (minus the <wide> tag) back to the original code point.

Text transformation happens after white space processing, which means that full-width only transforms U+0020 spaces to U+3000 within preserved white space.

A future level of CSS may introduce the ability to create custom mapping tables for less common text transforms, such as by an @text-transform rule similar to @counter-style from [CSS-COUNTER-STYLES-3].

3. White Space and Wrapping: the white-space property

Name: white-space
Value: normal | pre | nowrap | pre-wrap | pre-line
Initial: normal
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: as specified
Animatable: no
Canonical order: N/A

This property specifies two things:

Values have the following meanings, which must be interpreted according to the White Space Processing and Line Breaking rules:

normal
This value directs user agents to collapse sequences of white space into a single character (or in some cases, no character). Lines may wrap at allowed soft wrap opportunities, as determined by the line-breaking rules in effect, in order to minimize inline-axis overflow.
pre
This value prevents user agents from collapsing sequences of white space. Segment breaks such as line feeds and carriage returns are preserved as forced line breaks. Lines only break at forced line breaks; content that does not fit within the block container overflows it.
nowrap
Like normal, this value collapses white space; but like pre, it does not allow wrapping.
pre-wrap
Like pre, this value preserves white space; but like normal, it allows wrapping.
pre-line
Like normal, this value collapses consecutive spaces and allows wrapping, but preserves segment breaks in the source as forced line breaks.

The following informative table summarizes the behavior of various white-space values:

New Lines Spaces and Tabs Text Wrapping
normal Collapse Collapse Wrap
pre Preserve Preserve No wrap
nowrap Collapse Collapse No wrap
pre-wrap Preserve Preserve Wrap
pre-line Preserve Collapse Wrap

See White Space Processing Rules for details on how white space collapses. An informative summary of collapsing (normal and nowrap) is presented below:

See Line Breaking for details on wrapping behavior.

4. White Space Processing Details

The source text of a document often contains formatting that is not relevant to the final rendering: for example, breaking the source into segments (lines) for ease of editing or adding white space characters such as tabs and spaces to indent the source code. CSS white space processing allows the author to control interpretation of such formatting: to preserve or collapse it away when rendering the document. White space processing in CSS interprets white space characters only for rendering: it has no effect on the underlying document data.

White space processing in CSS is controlled with the white-space property.

CSS does not define document segmentation rules. Segments can be separated by a particular newline sequence (such as a line feed or CRLF pair), or delimited by some other mechanism, such as the SGML RECORD-START and RECORD-END tokens. For CSS processing, each document language–defined segment break, CRLF sequence (U+000D U+000A), carriage return (U+000D), and line feed (U+000A) in the text is treated as a segment break, which is then interpreted for rendering as specified by the white-space property.

Note that a document parser might not only normalize any segment breaks, but also collapse other space characters or otherwise process white space according to markup rules. Because CSS processing occurs after the parsing stage, it is not possible to restore these characters for styling. Therefore, some of the behavior specified below can be affected by these limitations and may be user agent dependent.

Note that anonymous blocks consisting entirely of collapsible white space are removed from the rendering tree. Thus any such white space surrounding a block-level element is collapsed away. See [CSS21] section 9.2.2.1

Control characters (Unicode category Cc) other than tab (U+0009), line feed (U+000A), form feed (U+000C), and carriage return (U+000D) must be rendered as a visible glyph and otherwise treated as any other character of the Other Symbols (So) general category and Common script. The UA may use a glyph provided by a font specifically for the control character, substitute the glyphs provided for the corresponding symbol in the Control Pictures block, generate a visual representation of its codepoint value, or use some other method to provide an appropriate visible glyph. As required by [UNICODE], unsupported Default_ignorable characters must be ignored for rendering.

4.1. The White Space Processing Rules

White space processing in CSS affects only the document white space characters: spaces (U+0020), tabs (U+0009), and segment breaks.

Note that the set of characters considered document white space (part of the document content) and that considered syntactic white space (part of the CSS syntax) are not necessarily identical. However, since both include spaces (U+0020), tabs (U+0009), line feeds (U+000A), and carriage returns (U+000D) most authors won’t notice any differences.

4.1.1. Phase I: Collapsing and Transformation

For each inline (including anonymous inlines; see [CSS21] section 9.2.2.1) within an inline formatting context, white space characters are handled as follows, ignoring bidi formatting characters as if they were not there:

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Advisements are normative sections styled to evoke special attention and are set apart from other normative text with <strong class="advisement">, like this: UAs MUST provide an accessible alternative.

Conformance classes

Conformance to this specification is defined for three conformance classes:

style sheet
A CSS style sheet.
renderer
A UA that interprets the semantics of a style sheet and renders documents that use them.
authoring tool
A UA that writes a style sheet.

A style sheet is conformant to this specification if all of its statements that use syntax defined in this module are valid according to the generic CSS grammar and the individual grammars of each feature defined in this module.

A renderer is conformant to this specification if, in addition to interpreting the style sheet as defined by the appropriate specifications, it supports all the features defined by this specification by parsing them correctly and rendering the document accordingly. However, the inability of a UA to correctly render a document due to limitations of the device does not make the UA non-conformant. (For example, a UA is not required to render color on a monochrome monitor.)

An authoring tool is conformant to this specification if it writes style sheets that are syntactically correct according to the generic CSS grammar and the individual grammars of each feature in this module, and meet all other conformance requirements of style sheets as described in this module.

Requirements for Responsible Implementation of CSS

The following sections define several conformance requirements for implementing CSS responsibly, in a way that promotes interoperability in the present and future.

Partial Implementations

So that authors can exploit the forward-compatible parsing rules to assign fallback values, CSS renderers must treat as invalid (and ignore as appropriate) any at-rules, properties, property values, keywords, and other syntactic constructs for which they have no usable level of support. In particular, user agents must not selectively ignore unsupported property values and honor supported values in a single multi-value property declaration: if any value is considered invalid (as unsupported values must be), CSS requires that the entire declaration be ignored.

Implementations of Unstable and Proprietary Features

To avoid clashes with future stable CSS features, the CSSWG recommends following best practices for the implementation of unstable features and proprietary extensions to CSS.

Implementations of CR-level Features

Once a specification reaches the Candidate Recommendation stage, implementers should release an unprefixed implementation of any CR-level feature they can demonstrate to be correctly implemented according to spec, and should avoid exposing a prefixed variant of that feature.

To establish and maintain the interoperability of CSS across implementations, the CSS Working Group requests that non-experimental CSS renderers submit an implementation report (and, if necessary, the testcases used for that implementation report) to the W3C before releasing an unprefixed implementation of any CSS features. Testcases submitted to W3C are subject to review and correction by the CSS Working Group.

Further information on submitting testcases and implementation reports can be found from on the CSS Working Group’s website at http://www.w3.org/Style/CSS/Test/. Questions should be directed to the public-css-testsuite@w3.org mailing list.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CSS-BACKGROUNDS-3]
CSS Backgrounds and Borders Module Level 3 URL: https://drafts.csswg.org/css-backgrounds-3/
[CSS-CASCADE-4]
Elika Etemad; Tab Atkins Jr.. CSS Cascading and Inheritance Level 4. 14 January 2016. CR. URL: http://dev.w3.org/csswg/css-cascade/
[CSS-POSITION-3]
Rossen Atanassov; Arron Eicholz. CSS Positioned Layout Module Level 3. 17 May 2016. WD. URL: https://drafts.csswg.org/css-position/
[CSS-SIZING-4]
CSS Intrinsic & Extrinsic Sizing Module Level 4 URL: https://drafts.csswg.org/css-sizing-4/
[CSS21]
Bert Bos; et al. Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification. 7 June 2011. REC. URL: http://www.w3.org/TR/CSS2
[CSS3-FONTS]
John Daggett. CSS Fonts Module Level 3. 3 October 2013. CR. URL: http://dev.w3.org/csswg/css-fonts/
[CSS3-WRITING-MODES]
Elika Etemad; Koji Ishii. CSS Writing Modes Level 3. 15 December 2015. CR. URL: http://dev.w3.org/csswg/css-writing-modes-3/
[CSS3RUBY]
Elika Etemad; Koji Ishii. CSS Ruby Layout Module Level 1. 5 August 2014. WD. URL: http://dev.w3.org/csswg/css-ruby-1/
[CSS3VAL]
Tab Atkins Jr.; Elika Etemad. CSS Values and Units Module Level 3. 11 June 2015. CR. URL: http://dev.w3.org/csswg/css-values/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[UAX11]
Asmus Freytag. East Asian Width. 23 March 2001. Unicode Standard Annex #11. URL: http://www.unicode.org/unicode/reports/tr11/tr11-8.html
[UAX14]
Asmus Freytag. Line Breaking Properties. 29 March 2005. Unicode Standard Annex #14. URL: http://www.unicode.org/unicode/reports/tr14/tr14-17.html
[UAX24]
Mark Davis. Script Names. 28 March 2005. Unicode Standard Annex #24. URL: http://www.unicode.org/unicode/reports/tr24/tr24-7.html
[UAX29]
Mark Davis. Text Boundaries. 25 March 2005. Unicode Standard Annex #29. URL: http://www.unicode.org/unicode/reports/tr29/tr29-9.html
[UAX44]
Mark Davis; Ken Whistler. Unicode Character Database. 25 September 2013. URL: http://www.unicode.org/reports/tr44/
[UNICODE]
The Unicode Standard. URL: http://www.unicode.org/versions/latest/
[UTR50]
Koji Ishii. Unicode Properties for Vertical Text Layout. 31 August 2013. URL: http://www.unicode.org/reports/tr50/

Informative References

[CSS-COUNTER-STYLES-3]
Tab Atkins Jr.. CSS Counter Styles Level 3. 11 June 2015. CR. URL: http://dev.w3.org/csswg/css-counter-styles/
[CSS3-TEXT-DECOR]
Elika Etemad; Koji Ishii. CSS Text Decoration Module Level 3. 1 August 2013. CR. URL: http://dev.w3.org/csswg/css-text-decor-3/
[HTML5]
Ian Hickson; et al. HTML5. 28 October 2014. REC. URL: http://www.w3.org/html/wg/drafts/html/master/
[JIS4051]
Formatting rules for Japanese documents (『日本語文書の組版方法』). Japanese Standards Association. 2004. JIS X 4051:2004. In Japanese
[JLREQ]
Yasuhiro Anan; et al. Requirements for Japanese Text Layout. 3 April 2012. NOTE. URL: http://www.w3.org/TR/jlreq/
[XML10]
Tim Bray; et al. Extensible Markup Language (XML) 1.0 (Fifth Edition). 26 November 2008. REC. URL: http://www.w3.org/TR/xml
[ZHMARK]
标点符号用法 (Punctuation Mark Usage). 1995. 中华人民共和国国家标准

Property Index

Name Value Initial Applies to Inh. %ages Media Ani­mat­able Canonical order Com­puted value Computed value
text-transform none | capitalize | uppercase | lowercase | full-width none all elements yes N/A visual no N/A as specified
white-space normal | pre | nowrap | pre-wrap | pre-line normal all elements yes N/A visual no N/A as specified
tab-size <integer> | <length> 8 block containers yes N/A visual as length N/A the specified integer or length made absolute
word-break normal | keep-all | break-all normal all elements yes N/A visual no N/A specified value
line-break auto | loose | normal | strict auto all elements yes N/A visual no N/A specified value
hyphens none | manual | auto manual all elements yes N/A visual no N/A specified value
overflow-wrap normal | break-word normal all elements yes N/A visual no N/A specified value
word-wrap normal | break-word normal all elements yes N/A visual no N/A specified value
text-align start | end | left | right | center | justify | match-parent | justify-all start block containers yes N/A visual no N/A specified value, except for match-parent which computes as defined below
text-align-all start | end | left | right | center | justify | match-parent start block containers yes N/A visual no N/A specified value
text-align-last auto | start | end | left | right | center | justify auto block containers yes N/A visual no N/A specified value
text-justify auto | none | inter-word | inter-character auto block containers and, optionally, inline elements yes N/A visual no N/A specified value
word-spacing normal | <length> | <percentage> normal all elements yes refers to width of the affected glyph visual as length, percentage, or calc N/A an absolute length
letter-spacing normal | <length> normal all elements yes N/A visual as length N/A an absolute length
text-indent [ <length> | <percentage> ] && hanging? && each-line? 0 block containers yes refers to width of containing block visual as length, percentage, or calc, but only if keywords match per grammar the percentage as specified or the absolute length, plus any keywords as specified
hanging-punctuation none | [ first || [ force-end | allow-end ] || last ] none inline elements yes N/A visual no per grammar as specified

Issues Index

Comments on how well this would work in practice would be very much appreciated, particularly from people who work with Thai and similar scripts. Note that browser implementations do not currently follow these rules (although IE does in some cases transform the break).
Any guidance for appropriate references here would be much appreciated.
The rules here are following guidelines from KLREQ for Korean, which don’t allow the Chinese/Japanese-specific breaks. However, the resulting behavior could use some review and feedback to make sure they are correct, particularly when “word basis” breaking is used (word-break: keep-all) in Korean.
If you find any issues, recommendations to add, or corrections, please send the information to www-style@w3.org with [css-text] in the subject line.
Should block and cluster scripts be merged? They have different tolerances for space-justification vs inter-character justification, but both admit both.