CSS Text Module Level 4

Editor’s Draft,

More details about this document
This version:
https://drafts.csswg.org/css-text-4/
Latest published version:
https://www.w3.org/TR/css-text-4/
Previous Versions:
Feedback:
CSSWG Issues Repository
Inline In Spec
Editors:
Elika J. Etemad / fantasai (Invited Expert)
(Invited Expert)
(Adobe Systems)
Florian Rivoal (Invited Expert)
Suggest an Edit for this Spec:
GitHub Editor
Test Suite:
https://wpt.fyi/results/css/css-text/

Abstract

This CSS module defines properties for text manipulation and specifies their processing model. It covers line breaking, justification and alignment, white space handling, and text transformation.

CSS is a language for describing the rendering of structured documents (such as HTML and XML) on screen, on paper, etc.

Status of this document

This is a public copy of the editors’ draft. It is provided for discussion only and may change at any moment. Its publication here does not imply endorsement of its contents by W3C. Don’t cite this document other than as work in progress.

Please send feedback by filing issues in GitHub (preferred), including the spec code “css-text” in the title, like this: “[css-text] …summary of comment…”. All issues and comments are archived. Alternately, feedback can be sent to the (archived) public mailing list www-style@w3.org.

This document is governed by the 2 November 2021 W3C Process Document.

1. Introduction

Tests

The test coverage information in this specification covers wpt/css/css-text/ and subdirectories, as well as those tests in wpt/css/CSS2/ and subdirectories that relate to this specification.

Missing tests:


This module describes the typesetting controls of CSS; that is, the features of CSS that control the translation of source text to formatted, line-wrapped text. Various CSS properties provide control over case transformation, white space collapsing, text wrapping, line breaking rules and hyphenation, alignment and justification, spacing, and indentation. See Additions Since Level 3 for additions since Level 3.

Note: Font selection is covered in the CSS Fonts Module. [CSS-FONTS-3]

Features for decorating text, such as underlines, emphasis marks, and shadows, (previously part of this module) are covered in the CSS Text Decoration Module. [CSS-TEXT-DECOR-3]

Bidirectional and vertical text are addressed in the CSS Writing Modes Module. [CSS-WRITING-MODES-4].

Further information about the typesetting requirements of various languages and writing systems around the world can be found in the Internationalization Working Group’s Language Enablement Index. [TYPOGRAPHY]

Tests

The following tests are crash tests that relate to general usage of the features described in this specification but are not tied to any particular normative statement.


1.1. Module Interactions

Tests

Tests not needed for this section.


This module, together with the CSS Text Decoration Module, replaces and extends the text-level features defined in Cascading Style Sheets Level 2 chapter 16. [CSS-TEXT-DECOR-3] [CSS2]

In addition to the terms defined below, other terminology and concepts used in this specification are defined in Cascading Style Sheets Level 2 and the CSS Writing Modes Module. [CSS2] and [CSS-WRITING-MODES-4].

1.2. Value Definitions

Tests

Tests not really needed for this section; could possibly test that css-wide keywords apply to every property.


This specification follows the CSS property definition conventions from [CSS2] using the value definition syntax from [CSS-VALUES-3]. Value types not defined in this specification are defined in CSS Values & Units [CSS-VALUES-3]. Combination with other CSS modules may expand the definitions of these value types.

In addition to the property-specific values listed in their definitions, all properties defined in this specification also accept the CSS-wide keywords as their property value. For readability they have not been repeated explicitly.

1.3. Languages and Typesetting

Tests

Tests not needed for this section: these are definitions, they get tested through their application, not by themselves.


Authors should accurately language-tag their content for the best typographic behavior.

Many typographic effects vary by linguistic context. Language and writing system conventions can affect line breaking, hyphenation, justification, glyph selection, and many other typographic effects. In CSS, language-specific typographic tailorings are only applied when the content language is known (declared). Therefore, higher quality typography requires authors to communicate to the UA the correct linguistic context of the text in the document.

The content language of an element is the (human) language the element is declared to be in, according to the rules of the document language. Note that it is possible for the content language of an element to be unknown—e.g. untagged content, or content in a document language that does not have a language-tagging facility, is considered to have an unknown content language.

Note: Authors can declare the content language using the global lang attribute in HTML or the universal xml:lang attribute in XML. See the rules for determining the content language of an HTML element in HTML, and the rules for determining the content language of an XML element in XML 1.0. [HTML] [XML10]

The content language an element is declared to be in also identifies the specific written form of that language used in that element, known as the content writing system. Depending on the document language’s facilities for identifying the content language, this information can be explicit or implied. See the normative Appendix F: Identifying the Content Writing System.

Note: Some languages have more than one writing system tradition; in other cases a language can be transliterated into a foreign writing system. Authors should subtag such cases so that the UA can adapt appropriately.

For example, Korean (ko) can be written in Hangul (-Hang), Hanja (-Hani), or a combination (-Kore). Historical documents written solely in Hanja do not use word spaces and are formatted more like modern Chinese than modern Korean. In other words, for typographic purposes ko-Hani behaves more like zh-Hant than ko (ko-Kore).

As another example Japanese (ja) is typically written in a combination (-Japn) of Hiragana (-Hira), Katakana (-Kana), and Kanji (-Hani). However, it can also be ”romanized” into Latin (-Latn) for special purposes like language-learning textbooks, in which case it should be formatted more like English than Japanese.

As a third example contemporary Mongolian is written in two scripts: Cyrillic (-Cyrl, officially used in Mongolia) and Mongolian (-Mong, more common in Inner Mongolia, part of China). These have very different formatting requirements, with Cyrillic behaving similar to Latin and Greek, and Mongolian deriving from both Arabic and Chinese writing conventions.

1.4. Characters and Letters

Tests

For the most part, tests not really needed for this section: these are definitions, they get tested through their applications, by themselves. The few testable assertions that are made have coverage.

Possible additions:


The basic unit of typesetting is the character. However, because writing systems are not always as simple as the basic English alphabet, what a character actually is depends on the context in which the term is used. For example, in Hangul (the Korean writing system), each square representation of a syllable (e.g. =Han) can be considered a character. However, the square symbol is really composed of multiple letters each representing a phoneme (e.g. =h, =a, =n) and these also could each be considered a character.

A basic unit of computer text encoding, for any given encoding, is also called a character, and depending on the encoding, a single encoding character might correspond to the entire pre-composed syllabic character (e.g. ), to the individual phonemic character (e.g. ), or to smaller units such as a base letterform (e.g. ) and any combining marks that vary it (e.g. extra strokes that represent aspiration).

In turn, a single encoding character can be represented in the data stream as one or more bytes; and in programming environments one byte is sometimes also called a character.

Therefore the term character is fairly ambiguous where technical precision is required.

For text layout, we will refer to the typographic character unit as the basic unit of text. Even within the realm of text layout, the relevant character unit depends on the operation. For example, line-breaking and letter-spacing will segment a sequence of Thai characters that include U+0E33  ำ THAI CHARACTER SARA AM differently; or the behavior of a conjunct consonant in a script such as Devanagari may depend on the font in use. So the typographic character represents a unit of the writing system—such as a Latin alphabetic letter (including its diacritics), Hangul syllable, Chinese ideographic character, Myanmar syllable cluster—that is indivisible with respect to a particular typographic operation (line-breaking, first-letter effects, tracking, justification, vertical arrangement, etc.).

Tests

Unicode Standard Annex #29: Text Segmentation defines a unit called the grapheme cluster which approximates the typographic character. [UAX29] A UA must use the extended grapheme cluster (not legacy grapheme cluster), as defined in UAX29, as the basis for its typographic character unit. However, the UA should tailor the definitions as required by typographic tradition since the default rules are not always appropriate or ideal—and is expected to tailor them differently depending on the operation as needed.

Tests

Note: The rules for such tailorings are out of scope for CSS.

The following are some examples of typographic character unit tailorings required by standard typesetting practice:

A typographic letter unit (or letter for the purpose of this specification) is a typographic character unit belonging to one of the Letter or Number general categories. See Appendix E: Characters and Properties for how to determine the Unicode properties of a typographic character unit.

The rendering characteristics of a typographic character unit divided by an element boundary is undefined. Ideally each component should be rendered according to the formatting requirements of its respective element’s properties while maintaining correct shaping and positioning of the typographic character unit as a whole. However, depending on the nature of the formatting differences between its parts and the capabilities of the font technology in use, this is not always possible. Therefore such a typographic character unit may be rendered as belonging to either side of the boundary, or as some approximation of belonging to both. Authors are forewarned that dividing grapheme clusters or ligatures by element boundaries may give inconsistent or undesired results.

1.5. Text Processing

Tests

This section has adequate coverage. Exhaustive coverage unrealistic, since this section is effectively a dependency on all of Unicode. Some tests nonetheless provided for key functionality (such as the effect of certain control characters on Arabic shaping).


CSS is built on Unicode. [UNICODE] UAs that support Unicode must adhere to all normative requirements of the Unicode Core Standard, except where explicitly overridden by CSS. UAs implemented on the basis of a non-Unicode text encoding model are still expected to fulfill the same text handling requirements by assuming an appropriate mapping and analogous behavior.

Tests

For the purpose of determining adjacency for text processing (such as white space processing, text transformation, line-breaking, etc.), and thus in general within this specification, intervening inline box boundaries and out-of-flow elements must be ignored. With respect to text shaping, however, see § 8.7 Shaping Across Element Boundaries.

Tests

2. Transforming Text

Tests

This section and its subsections have good test coverage overall, and very good i18n coverage in particular.

Missing tests:

Possible additions:


2.1. Case Transforms: the text-transform property

Name: text-transform
Value: none | [capitalize | uppercase | lowercase ] || full-width || full-size-kana
Initial: none
Applies to: text
Inherited: yes
Percentages: n/a
Computed value: specified keyword
Canonical order: n/a
Animation type: discrete
Tests

This property transforms text for styling purposes. It has no effect on the underlying content, and must not affect the content of a plain text copy & paste operation.

Tests

Authors must not rely on text-transform for semantic purposes; rather the correct casing and semantics should be encoded in the source document text and markup.

Tests

Values have the following meanings:

none
No effects.
Tests
capitalize
Puts the first typographic letter unit of each word, if lowercase, in titlecase; other characters are unaffected.
Tests
uppercase
Puts all letters in uppercase.
Tests
lowercase
Puts all letters in lowercase.
Tests
full-width
Puts all typographic character units in full-width form. If a character does not have a corresponding full-width form, it is left as is. This value is typically used to typeset Latin letters and digits as if they were ideographic characters.
Tests
full-size-kana
Converts all small Kana characters to the equivalent full-size Kana. This value is typically used for ruby annotation text, where authors may want all small Kana to be drawn as large Kana to compensate for legibility issues at the small font sizes typically used in ruby.
Tests
The following example converts the ASCII characters used in abbreviations in Japanese text to their full-width variants so that they lay out and line break like ideographs:
abbr:lang(ja) { text-transform: full-width; }

Note: The purpose of text-transform is to allow for presentational casing transformations without affecting the semantics of the document. Note in particular that text-transform casing operations are lossy, and can distort the meaning of a text. While accessibility interfaces may wish to convey the apparent casing of the rendered text to the user, the transformed text cannot be relied on to accurately represent the underlying meaning of the document.

In this example, the first line of text is capitalized as a visual effect.
section > p:first-of-type::first-line {
  text-transform: uppercase;
}

This effect cannot be written into the source document because the position of the line break depends on layout. But also, the capitalization is not reflecting a semantic distinction and is not intended to affect the paragraph’s reading; therefore it belongs in the presentation layer.

In this example, the ruby annotations, which are half the size of the main paragraph text, are transformed to use regular-size kana in place of small kana.
rt { font-size: 50%; text-transform: full-size-kana; }
:is(h1, h2, h3, h4) rt { text-transform: none; /* unset for large text*/ }

Note that while this makes such letters easier to see at small type sizes, the transformation distorts the text: the reader needs to mentally substitute small kana in the appropriate places—not unlike reading a Latin inscription where all “U”s look like “V”s.

For example, if text-transform: full-size-kana were applied to the following source, the annotation would read “じゆう” (jiyū), which means “liberty”, instead of “じゅう” (jū), which means “ten”, the correct reading and meaning for the annotated “十”.

<ruby><rt>じゅう</ruby>

2.1.1. Mapping Rules

For capitalize, what constitutes a “word“ is UA-dependent; [UAX29] is suggested (but not required) for determining such word boundaries. Out-of-flow elements and inline element boundaries must not introduce a text-transform word boundary and must be ignored when determining such word boundaries.

Tests

Note: Authors cannot depend on capitalize to follow language-specific titlecasing conventions (such as skipping articles in English).

The UA must use the full case mappings for Unicode characters, including any conditional casing rules, as defined in the Default Case Algorithms section of The Unicode Standard. [UNICODE] If (and only if) the content language of the element is, according to the rules of the document language, known, then any appropriate language-specific rules must be applied as well. These minimally include, but are not limited to, the language-specific rules in Unicode’s SpecialCasing.txt.

Tests
For example, in Turkish there are two “i”s, one with a dot—“İ” and “i”—and one without—“I” and “ı”. Thus the usual case mappings between “I” and “i” are replaced with a different set of mappings to their respective dotless/dotted counterparts, which do not exist in English. This mapping must only take effect if the content language is Turkish written in its modern Latin-based writing system (or another Turkic language that uses Turkish casing rules); in other languages, the usual mapping of “I” and “i” is required. This rule is thus conditionally defined in Unicode’s SpecialCasing.txt file.
Tests

The definition of full-width and half-width forms can be found in Unicode Standard Annex #11: East Asian Width. [UAX11] The mapping to full-width form is defined by taking code points with the <wide> or the <narrow> tag in their Decomposition_Mapping in Unicode Standard Annex #44: Unicode Character Database. [UAX44] For the <narrow> tag, the mapping is from the code point to the decomposition (minus <narrow> tag), and for the <wide> tag, the mapping is from the decomposition (minus the <wide> tag) back to the original code point.

Tests

The mappings for small Kana to full-size Kana are defined in Appendix G: Small Kana Mappings.

2.1.2. Order of Operations

When multiple values are specified and therefore multiple transformations need to be applied, they are applied in the following order:

  1. capitalize, uppercase, and lowercase
  2. full-width
  3. full-size-kana
Tests

Text transformation happens after § 4.3.1 Phase I: Collapsing and Transformation but before § 4.3.2 Phase II: Trimming and Positioning. This means that full-width only transforms spaces (U+0020) to U+3000 IDEOGRAPHIC SPACE within preserved white space.

Tests

Note: As defined in Appendix A: Text Processing Order of Operations, transforming text affects line-breaking and other formatting operations.

2.2. Word Boundaries

In a number of languages and writing system, such as Japanese or Thai, words are not deliminated by spaces (or any other character) as is the case in English (See Approaches to line breaking for a discussion the approach various languages take to word separation and line breaking).

However, even if text without spaces is the dominant style in such languages, there are cases where making word boundaries (or phrase boundaries) visible through the use of spaces is desired. This is a purely stylistic effect, with no implication on the semantics of the text.

In Japan for instance, this is commonly done in books for people learning the language—young children or foreign students. People with dyslexia also tend to find this style easier to read.

The mechanism described in this specification builds upon the existing use of the wbr element or of U+200B ZERO WIDTH SPACE (See [UNICODE]) in the document markup as a word (or phrase) delimiter.

Should we have a shorthand for the following two properties?

2.2.1. Detecting Word Boundaries: the word-boundary-detection property

Name: word-boundary-detection
Value: normal | manual | auto(<lang>)
Initial: normal
Applies to: text
Inherited: yes
Percentages: N/A
Computed value: as specified (However, see special provision for unsupported <lang>)
Canonical order: per grammar
Animation type: discrete
Tests

The design of this property is still being worked out. Don’t implement it just yet! You can ask the editors about status if this is blocking you.

This property allows the author to decide whether and how the user agent must analyse the content to determine where word boundaries are, and to insert virtual word boundaries accordingly.

A virtual word boundary is similar to the presence of the ZERO WIDTH SPACE (U+200B) character: it introduces a soft wrap opportunity and is affected by the word-boundary-expansion property. However, its presence alone has no effect on text shaping, spacing, or justification. Inserting virtual word boundaries must have no effect on the underlying content, and must not affect the content of a plain text copy & paste operation.

Tests
manual
Linguistic analysis is not used in any language or writing system to determine line wrapping opportunities not indicated by the markup or characters of the element.

The user agent must not insert virtual word boundaries.

Typographic character units with class SA in [UAX14] must be treated as if they had class AL (i.e. assuming word-break: normal and a value of line-break other than anywhere, there is no soft wrap opportunity between pairs of such characters).

Tests
Authors using this value for Southeast Asian languages are expected to manually indicate word boundaries, for instance using wbr or U+200B. Otherwise, there will be no soft wrap opportunity and the text may overflow.
normal
The user agent must not insert virtual word boundaries, except within runs of characters belonging to Southeast Asian languages, where content analysis must be performed to determine where to insert virtual word boundaries.

As with manual, typographic character units with class SA in [UAX14] must be treated as if they had class AL; however, the user agent must additionally analyse the content of a run of such characters and insert virtual word boundaries where appropriate. Within the constraints set by this specification, the specific algorithm used is UA-dependent.

Tests

As various languages can be written in scripts which use the characters with class SA, if the content language is known, the user agent should use this information to tailor its analysis.

In order to avoid unexpected overflow, if the user agent is unable to perform this analysis for any subset of the characters with class SA—for example due to lacking a dictionary for certain languages—there must be a soft wrap opportunity between pairs of typographic letter units in that subset.

Note: This soft wrap opportunity is not a virtual word boundary, and is ignored by word-boundary-expansion.

Note: This provision is not triggered merely when the UA fails to find a word boundary in a particular text run; the text run may well be a single unbreakable word. It applies for example when a text run is composed of Khmer characters (U+1780 to U+17FF) if the user agent does not know how to determine word boundaries in Khmer.

auto(<lang>)
This value directs the user agent to perform language-specific content analysis to determine where to insert virtual word boundaries.

<lang> must be a valid CSS <ident> or <string>. It represents an IETF BCP 47 language range (see [BCP47]). If the UA does not support word-boundary detection for all languages represented by the specified range, that specified value is invalid (and will cause the declaration to be ignored).

Tests

Note: Wildcards in the language subtag would imply support for detecting word boundaries in an undefined and effectively unlimited set of languages. As this is not possible, wildcards in the language subtag always result in the declaration being treated as invalid.

Note: Whether a word boundary detection system designed for one language is suitable for some or all dialects of that language is somewhat subjective, and this specifications leaves it at the discretion of the user agent. Even if a detection system is not able to cope with all nuances of a particular dialect, it may be reasonable to claim support if the detection correctly recognizes word boundaries most of the time. However, the user agent would do a disservice to authors and users if it claimed support for languages where it fails to detect most word boundaries or has a high error rate.

If the element’s content language, as represented in BCP 47 syntax [BCP47], does not match the language range described by the computed value’s <lang> in an extended filtering operation per [RFC4647] Matching of Language Tags (section 3.3.2) with both the content language and <lang> then the used value is normal, and this property has no effect on this element. Otherwise, the user agent must insert a virtual word boundary at each detected word boundary within the text sequence children of this element. Within the constraints set by this specification, the specific algorithm used is UA-dependent.

Tests

Note: This is the same matching logic as the one used for the :lang() selector.

If a user agent has a word-boundary detection system for Cantonese that is not suitable for the broader set of Chinese languages, it is expected to accept auto(yue), auto(zh-yue), or auto(zh-HK), but not auto(zh) or auto(zh-Hant).

However, if the user agent supports a generic word-boundary detection system that is suitable for Chinese in general, it is expected to accept the broad auto(zh) characterization, as well as any more specific ones, such as auto(zh-yue), auto(zh-Hant-HK), auto(zh-Hans-SG), or auto(zh-hak).

Specifying the language for which the word boundary detection is to be performed and making unsupported language ranges invalid is required in order to make this feature meaningfully testable with @supports.

For example, Japanese text normally allows line breaking between letters of a word (see word-break: normal). The following code disables that in h1 elements, and only allows line breaking at autodetected word boundaries instead, without requiring the author to manually indicate word boundaries in the markup. However, if word boundary detection is not supported for Japanese, this change is not applied, as word-break: keep-all could remove all soft wrap opportunities from the element, and risk causing overflow.

@supports (word-boundary-detection: auto(ja)) {
  h1:lang(ja) {
    word-boundary-detection: auto(ja);
    word-break: keep-all;
  }
}

User agents may activate language-specific content analysis in response to user preferences. User agents with this behavior must do this by setting the declared value of word-boundary-detection to ''word-boundary-detection/auto(<lang>)'' in the User Origin. User agents that do not support the User Origin may use the User-Agent Origin instead.

Manual analysis of the content can be more reliable than UA heuristics. For best results, authors who can perform this analysis are encouraged to markup their documents using wbr or U+200B to exhaustively indicate word boundaries.

Authors who prepare their content in this manner should not rely on the initial value, and should explicitly specify word-boundary-detection: manual on the relevant parts of the content, in order to override a potential ''word-boundary-detection: auto(<lang>)'' in the User Origin or User-Agent Origin.

Virtual word boundary insertion happens before CSS Text 3 § 4.1.1 Phase I: Collapsing and Transformation and before § 2.2.2 Making Word Boundaries Visible: the word-boundary-expansion property. Later operations (including CSS Text 3 § 4.1 The White Space Processing Rules, line breaking, and intrinsic sizing) must take the presence of the virtual word boundary into account. Selectors are not affected.

Tests

Inline box boundaries and out-of-flow elements must be ignored when determining word boundaries.

Tests

If a word boundary is found at the same position as one or more inline box boundaries, the virtual word boundary must be inserted in the outermost element that participates in this inline box boundary.

Tests
In the following example, the red “|” indicates reasonable positions for a user agent to insert virtual word boundaries:
กรุงเทพ|คือ|สวยงาม

If that sentence had contained some inline markup, the following example shows the correct position to insert the virtual word boundaries:

กรุงเทพ|คือ|<em>สวยงาม</em>

The following example shows incorrect positions:

กรุงเทพ|คือ<em>|สวยงาม</em>

The following shows the correct positions in a more contrived situation:

กรุงเทพ|<b><u>คือ</u>|<em>สวยงาม</em></b>

The user agent may tailor its word boundary detection algorithm depending on whether line-break is loose/normal/strict.

The user agent must not insert a virtual word boundary:

The user agent should not insert a virtual word boundary:

2.2.2. Making Word Boundaries Visible: the word-boundary-expansion property

Name: word-boundary-expansion
Value: none | space | ideographic-space
Initial: none
Applies to: text
Inherited: yes
Percentages: N/A
Computed value: as specified
Canonical order: per grammar
Animation type: discrete

The design of this property is still being worked out. Don’t implement it just yet! You can ask the editors about status if this is blocking you.

Tests

This name is quite long, we may want to find a better one. We should also consider how we may want to add values to this property, so that the name is compatible with them. For example, it has been suggested that we may want to use this to turn visible “spaces” such as the ETHIOPIC WORD SPACE (U+1361) into an ordinary SPACE (U+0020).

This property allows transforming certain word-separating characters into other word-separating characters, to accommodate variant typesetting styles.

none
This property has no effect.
Tests
space
Instances of U+200B ZERO WIDTH SPACE within the child text of this element are replaced by U+0020 SPACE.
Tests
ideographic-space
Instances of U+200B ZERO WIDTH SPACE within the child text of this element are replaced by U+3000 IDEOGRAPHIC SPACE.
Tests

The user agent must not replace instances of U+200B immediately preceding or following a forced line break (ignoring any intervening inline box boundaries, and associated margin/border/padding).

Tests

Instances of wbr are considered equivalent to U+200B, and are also replaced, as are virtual word boundaries inserted by word-boundary-detection.

Tests

Unlike text-transform, this substitution happens before CSS Text 3 § 4.1.1 Phase I: Collapsing and Transformation so that later operations that depend on the characters in the content (including CSS Text 3 § 4.1 The White Space Processing Rules, line breaking, and intrinsic sizing) use that character instead of the original U+200B.

Tests

Like text-transform, this property transforms text for styling purposes. It has no effect on the underlying content, and must not affect the content of a plain text copy & paste operation.

Tests
Note: The effects of this property are similar to those of the text-transform property. However, it is defined as a separate property rather than additional values to text-transform because:

Unlike books for adults, Japanese books for young children often feature spaces between sentence segments, to facilitate reading.

Absent any particular styling, the following sentence would be rendered as depicted below.

<p>むかしむかし、<wbr>あるところに、<wbr>おじいさんと<wbr>おばあさんが<wbr>すんでいました。

むかしむかし、あるところに、おじいさんとおばあさんがすんでいました。


Phrase-based spacing can be achieved with the following css:

p {
  word-boundary-expansion: ideographic-space;
}

むかしむかし、 あるところに、 おじいさんと おばあさんが すんでいました。


Another common variant additionally restricts the allowable line breaks to these phrase boundaries. Using the same markup, this is easily achieved with the following css:

p {
  word-break: keep-all;
  word-boundary-expansion: ideographic-space;
}

むかしむかし、 あるところに、 おじいさんと おばあさんが すんでいました。

Tests

In addition to making the source code more readable, using wbr rather than U+200B in the markup also allow authors to classify the delimiters into different groups.

In the following example, wbr elements are either unmarked when they delimit a word, or marked with class p when they also delimit a phrase.

<p>らいしゅう<wbr><wbr>じゅぎょう<wbr><wbr class=p
>たいこ<wbr><wbr>ばち<wbr><wbr class=p
>もって<wbr>きて<wbr>ください。

Using this, it is possible not only to enable the rather common phrase-based spacing, but also word-by-word spacing that is likely to be preferred by people with dyslexia to reduce ambiguities, or other variants such as a combination of phrase-based spacing and of word-based wrapping.

Usual rendering

らいしゅうじゅぎょうたいこばちもってきてください。


Phrase spacing
p wbr.p {
  word-boundary-expansion: ideographic-space;
}

らいしゅうじゅぎょうに たいこばちを もってきてください。


Word spacing
p wbr {
  word-boundary-expansion: ideographic-space;
}

らいしゅう の じゅぎょう に たいこ と ばち を もって きて ください。


Phrase spacing, word wrapping
p {
  word-break: keep-all;
}
p wbr.p {
  word-boundary-expansion: ideographic-space;
}

らいしゅうじゅぎょうに たいこばちを もってきてください。


Word spacing and wrapping
p {
  word-break: keep-all;
}
p wbr {
  word-boundary-expansion: ideographic-space;
}

らいしゅう の じゅぎょう に たいこ と ばち を もって きて ください。

3. White Space and Wrapping: the white-space property

Tests

This section has good overall test coverage, particularly through tests for § 4 White Space Processing & Control Characters and subsections.

Missing tests:


Name: white-space
Value: normal | pre | nowrap | pre-wrap | pre-line | <'white-space-collapse'> || <'text-wrap'> || <'white-space-trim'>
Initial: normal
Applies to: text
Inherited: yes
Percentages: n/a
Computed value: specified keyword
Canonical order: n/a
Animation type: discrete
Tests

This property is a shorthand for white-space-collapse, text-wrap, and white-space-trim. It specifies two things:

Note: This shorthand combines both inheritable and non-inheritable properties. If this is a problem, please inform the CSSWG.

Unless otherwise specified, any omitted longhand is set to its initial value.

The following table gives the normative mapping of the values of the shorthand’s special keywords to their equivalent longhand values.

white-space white-space-collapse text-wrap white-space-trim
normal collapse wrap none
pre preserve nowrap none
pre-wrap preserve wrap none
pre-line preserve-breaks wrap none

These keywords have the following informative definitions:

Remove these definitions once the tests annotations have been redistributed.

normal
This value directs user agents to collapse sequences of white space into a single character (or in some cases, no character). Lines may wrap at allowed soft wrap opportunities, as determined by the line-breaking rules in effect, in order to minimize inline-axis overflow.
Tests
pre
This value prevents user agents from collapsing sequences of white space. Segment breaks such as line feeds are preserved as forced line breaks. Lines only break at forced line breaks; content that does not fit within the block container overflows it.
Tests
nowrap
Like normal, this value collapses white space; but like pre, it does not allow wrapping.
Tests
pre-wrap
Like pre, this value preserves white space; but like normal, it allows wrapping.
Tests
pre-line
Like normal, this value collapses consecutive white space characters and allows wrapping, but it preserves segment breaks in the source as forced line breaks.
Tests

Note: In some cases, preserved white space and other space separators can hang when at the end of the line; this can affect whether they are measured for intrinsic sizing.

The following informative table summarizes the behavior of various white-space values:

New Lines Spaces and Tabs Text Wrapping End-of-line spaces End-of-line other space separators
normal Collapse Collapse Wrap Remove Hang
pre Preserve Preserve No wrap Preserve No wrap
nowrap Collapse Collapse No wrap Remove Hang
pre-wrap Preserve Preserve Wrap Hang Hang
break-spaces Preserve Preserve Wrap Wrap Wrap
pre-line Preserve Collapse Wrap Remove Hang

4. White Space Processing & Control Characters

Tests

This section has reasonably good test coverage.

Missing tests:


The source text of a document often contains formatting that is not relevant to the final rendering: for example, breaking the source into segments (lines) for ease of editing or adding white space characters such as tabs and spaces to indent the source code. CSS white space processing allows the author to control interpretation of such formatting: to preserve or collapse it away when rendering the document. White space processing in CSS (which is controlled with the white-space-collapse and white-space-trim properties) interprets white space characters only for rendering: it has no effect on the underlying document data.

Note: Depending on the document language, segments can be separated by a particular newline sequence (such as a line feed or CRLF pair), or delimited by some other mechanism, such as the SGML RECORD-START and RECORD-END tokens.

For CSS processing, each document language–defined “segment break” or “newline sequence”—or if none are defined, each line feed (U+000A)—in the text is treated as a segment break, which is then interpreted for rendering as specified by the white-space property.

In the case of HTML, each newline sequence is normalized to a single line feed (U+000A) for representation in the DOM, so when an HTML document is represented as a DOM tree each line feed (U+000A) is treated as a segment break. [HTML] [DOM]

Note: In most common CSS implementations, HTML does not get styled directly. Instead, it is processed into a DOM tree, which is then styled. Unlike HTML, the DOM does not give any particular meaning to carriage returns (U+000D), so they are not treated as segment breaks. If carriage returns (U+000D) are inserted into the DOM by means other than HTML parsing, they then get treated as defined below.

Tests

Note: A document parser might not only normalize any segment breaks, but also collapse other space characters or otherwise process white space according to markup rules. Because CSS processing occurs after the parsing stage, it is not possible to restore these characters for styling. Therefore, some of the behavior specified below can be affected by these limitations and may be user agent dependent.

Note: Anonymous blocks consisting entirely of collapsible white space are removed from the rendering tree. Thus any such white space surrounding a block-level element is collapsed away. See CSS 2.1 § 9.2.2.1 Anonymous inline boxes. [CSS2]

Control characters (Unicode category Cc)—other than tabs (U+0009), line feeds (U+000A), carriage returns (U+000D) and sequences that form a segment breakmust be rendered as a visible glyph which the UA must synthesize if the glyphs found in the font are not visible, and must be otherwise treated as any other character of the Other Symbols (So) general category and Common script. The UA may use a glyph provided by a font specifically for the control character, substitute the glyphs provided for the corresponding symbol in the Control Pictures block, generate a visual representation of its code point value, or use some other method to provide an appropriate visible glyph. As required by Unicode, unsupported Default_ignorable characters must be ignored for text rendering. [UNICODE]

Tests

Carriage returns (U+000D) are treated identically to spaces (U+0020) in all respects.

Tests

Note: For HTML documents, carriage returns present in the source code are converted to line feeds at the parsing stage (see HTML § 13.2.3.5 Preprocessing the input stream and the definition of normalize newlines in Infra and therefore do no appear as U+000D CARRIAGE RETURN to CSS. [HTML] [INFRA]) However, the character is preserved—and the above rule observable—when encoded using an escape sequence (&#x0d;).

4.1. White Space Collapsing: the white-space-collapse property

This section is still under discussion and may change in future drafts.

Name: white-space-collapse
Value: collapse | discard | preserve | preserve-breaks | preserve-spaces | break-spaces
Initial: collapse
Applies to: text
Inherited: yes
Percentages: n/a
Computed value: specified keyword
Canonical order: per grammar
Animation type: discrete

This property specifies whether and how white space is collapsed. Values have the following meanings, which must be interpreted according to the White Space Processing Rules:

collapse
This value directs user agents to collapse sequences of white space into a single character (or in some cases, no character).
preserve
This value prevents user agents from collapsing sequences of white space. Segment breaks such as line feeds are preserved as forced line breaks.
preserve-breaks
Like collapse, this value collapses consecutive white space characters, but preserves segment breaks in the source as forced line breaks.
preserve-spaces
This value prevents user agents from collapsing sequences of white space, and converts tabs and segment breaks to spaces. (This value is intended to represent the behavior of xml:space="preserve" in SVG.)
break-spaces
The behavior is identical to that of preserve, except that:

Note: This value does not guarantee that there will never be any overflow due to white space: for example, if the line length is so short that even a single white space character does not fit, overflow is unavoidable.

discard
This value directs user agents to “discard” all white space in the element.

Does this preserve line break opportunities or no? Do we need a distinct "hide" value? If it preserves line break opportunities, maybe it should be replaced with a word-boundary-expansion value?

White space that was not removed or collapsed due to white space processing is called preserved white space.

The following style rules implement MathML’s white space processing:

@namespace m "http://www.w3.org/1998/Math/MathML";
m|* {
  white-space-collapse: discard;
}
m|mi, m|mn, m|mo, m|ms, m|mtext {
  white-space-trim: discard-inner;
}

4.2. White Space Trimming: the white-space-trim property

Name: white-space-trim
Value: none | discard-before || discard-after || discard-inner
Initial: none
Applies to: inline boxes and block containers
Inherited: no
Percentages: n/a
Computed value: specified keyword(s)
Canonical order: per grammar
Animation type: discrete

This property allows authors to specify trimming behavior at the beginning and end of a box. Values have the following meanings:

discard-before
This value directs the UA to collapse all collapsible whitespace immediately before the start of the element.
discard-after
This value directs the UA to collapse all collapsible whitespace immediately after the end of the element.
discard-inner
For block containers this value directs UAs to discard all whitespace at the beginning of the element up to and including the last segment break before the first non-white-space character in the element as well as to discard all white space at the end of the element starting with the first segment break after the last non-white-space character in the element. For other elements this value directs UAs to discard all whitespace at the beginning and end of the element.

Note: Discarding document white space using white-space-trim can change where soft wrap opportunities occur in the text.

The following style rules render DT elements as a comma-separated list, even if they are coded on separate lines of the source document:

dt { display: inline; }
dt + dt:before { content: ", "; white-space-trim: discard-before; }

The following style rule removes source-formatting white space adjacent to the opening/closing tags of a preformatted block, but not any indentation or interleaved white space applied to the actual contents of the element:

pre { white-space: pre; white-space-trim: discard-inner; }

This results in the following two source-code snippets:

<pre>

  some
preformatted

  text

</pre>
<pre>  some
preformatted

  text</pre>
rendering identically as:
  some
preformatted

  text

If instead we apply it to an inline element:

span { white-space: normal; white-space-trim: discard-inner; }
start[<span>

  some
inline
  text

</span>]end
start[<span>  some
inline
  text</span>]end
this directs the UA to discard all of the leading/trailing white space before the actual contents of the element:
start[some inline text]end

White space processing for white-space-trim takes place before § 4.3.1 Phase I: Collapsing and Transformation.

4.3. The White Space Processing Rules

Tests

This section has good test coverage, all parts are well exercised. Most tests to be found in subsections.


Except where specified otherwise, white space processing in CSS affects only the document white space characters: spaces (U+0020), tabs (U+0009), and segment breaks.

Tests

Note: The set of characters considered document white space (part of the document content) and those considered syntactic white space (part of the CSS syntax) are not necessarily identical. However, since both include spaces (U+0020), tabs (U+0009), and line feeds (U+000A) most authors won’t notice any differences.

Besides space (U+0020) and no-break space (U+00A0), Unicode defines a number of additional space separator characters. [UNICODE] In this specification all characters in the Unicode general category Zs except space (U+0020) and no-break space (U+00A0) are collectively referred to as other space separators.

Tests

4.3.1. Phase I: Collapsing and Transformation

Tests

This section has good test coverage, all parts are well exercised.


Note: white-space-trim is taken into account prior to this phase.

For each inline (including anonymous inlines; see CSS 2.1 § 9.2.2.1 Anonymous inline boxes [CSS2]) within an inline formatting context, white space characters are processed as follows prior to line breaking and bidi reordering, ignoring bidi formatting characters (characters with the Bidi_Control property [UAX9]) as if they were not there:

Tests
Tests
The following example illustrates the interaction of white-space collapsing and bidirectionality. Consider the following markup fragment, taking special note of spaces (with varied backgrounds and borders for emphasis and identification):
<ltr>A <rtl> B </rtl> C</ltr>

where the <ltr> element represents a left-to-right embedding and the <rtl> element represents a right-to-left embedding. If the white-space property is set to normal, the white-space processing model will result in the following:

This will leave two spaces, one after the A in the left-to-right embedding level, and one after the B in the right-to-left embedding level. The text will then be ordered according to the Unicode bidirectional algorithm, with the end result being:

A  BC

Note that there will be two spaces between A and B, and none between B and C. This is best avoided by putting spaces outside the element instead of just inside the opening and closing tags and, where practical, by relying on implicit bidirectionality instead of explicit embedding levels.

Tests

4.3.2. Phase II: Trimming and Positioning

Tests

This section has good test coverage, all parts are well exercised.


Then, the entire block is rendered. Inlines are laid out, taking bidi reordering into account, and wrapping as specified by the text-wrap property. As each line is laid out,

  1. A sequence of collapsible spaces at the beginning of a line is removed.
    Tests
  2. If the tab size is zero, preserved tabs are not rendered. Otherwise, each preserved tab is rendered as a horizontal shift that lines up the start edge of the next glyph with the next tab stop. If this distance is less than 0.5ch, then the subsequent tab stop is used instead. Tab stops occur at points that are multiples of the tab size from the starting content edge of the preserved tab’s nearest block container ancestor. The tab size is given by the tab-size property.
    Tests

    Note: See the Unicode rules on how tabulation (U+0009) interacts with bidi. [UAX9]

    Tests
  3. A sequence of collapsible spaces at the end of a line is removed, as well as any trailing U+1680   OGHAM SPACE MARK whose white-space-collapse property is collapse or preserve-breaks.
    Tests

    Note: Due to Unicode Bidirectional Algorithm rule L1, a sequence of collapsible spaces located at the end of the line prior to bidi reordering will also be at the end of the line after reordering. [UAX9] [CSS-WRITING-MODES-4]

    Tests
  4. If there remains any sequence of white space, other space separators, and/or preserved tabs at the end of a line (after bidi reordering [CSS-WRITING-MODES-4]):

    What should happen here for white-space-collapse: preserve-spaces?

This example shows that conditionally hanging white space at the end of lines with forced breaks provides symmetry with the start of the line. An underline is added to help visualize the spaces.
p {
  white-space: pre-wrap;
  width: 5ch;
  border: solid 1px;
  font-family: monospace;
  text-align: center;
}
<p> 0 </p>

The sample above would be rendered as follows:

0

Since the final space is before a forced line break and does not overflow, it does not hang, and centering works as expected.

This example illustrates the difference between hanging spaces at the end of lines without forced breaks, and conditionally hanging them at the end of lines with forced breaks. An underline is added to help visualize the spaces.
p {
  white-space: pre-wrap;
  width: 3ch;
  border: solid 1px;
  font-family: monospace;
}
<p> 0 0 0 0 </p>

The sample above would be rendered as follows:

0
0 0
0

If p { text-align: right; } was added, the result would be as follows:

0
0 0
0

As the preserved spaces at the end of lines without a forced break must hang, they are not considered when placing the rest of the line during text alignment. When aligning towards the end, this means any such spaces will overflow, and will not prevent the rest of the line’s content from being flush with the edge of the line. On the other hand, preserved spaces at the end of a line with a forced break conditionally hang. Since the space at the end of the last line would not overflow in this example, it does not hang and therefore is considered during text alignment.

In the following example, there is not enough room on any line to fit the end-of-line spaces, so they hang on all lines: the one on the line without a forced break because it must, as well as the one on the line with a forced break, because it conditionally hangs and overflows. An underline is added to help visualize the spaces.
p {
  white-space: pre-wrap;
  width: 3ch;
  border: solid 1px;
  font-family: monospace;
}
<p>0 0 0 0 </p>
0 0
0 0

The last line is not wrapped before the last 0 because characters that conditionally hang are not considered when measuring the line’s contents for fit.

4.3.3. Segment Break Transformation Rules

Tests

This section has reasonable test coverage, though some assertions are only tested indirectly through test for other features that rely on this, rather than by dedicated tests.


When white-space-collapse is not collapse, segment breaks are not collapsible. For values other than collapse or preserve-spaces (which transforms them into spaces), segment breaks are instead transformed into a preserved line feed (U+000A).

Tests

When white-space-collapse is collapse, segment breaks are collapsible, and are collapsed as follows:

  1. First, any collapsible segment break immediately following another collapsible segment break is removed.
  2. Then any remaining segment break is either transformed into a space (U+0020) or removed depending on the context before and after the break. The rules for this operation are UA-defined in this level.

    Should we define this for Level 4?

    Note: The white space processing rules have already removed any tabs and spaces around the segment break before this context is evaluated.

The purpose of the segment break transformation rules (and white space collapsing in general) is to “unbreak” text that has been broken into segments to make the document source code easier to work with. In languages that use word separators, such as English and Korean, “unbreaking” a line requires joining the two lines with a space.
Here is an English paragraph
that is broken into multiple lines
in the source code so that it can
be more easily read and edited
in a text editor.

Here is an English paragraph that is broken into multiple lines in the source code so that it can be more easily read and edited in a text editor.

Eliminating a line break in English requires maintaining a space in its place.

In languages that have no word separators, such as Chinese, “unbreaking” a line requires joining the two lines with no intervening space.

這個段落是那麼長,
在一行寫不行。最好
用三行寫。

這個段落是那麼長,在一行寫不行。最好用三行寫。

Eliminating a line break in Chinese requires eliminating any intervening white space.

The segment break transformation rules can use adjacent context to either transform the segment break into a space or eliminate it entirely.

Note: Historically, HTML and CSS have unconditionally converted segment breaks to spaces, which has prevented content authored in languages such as Chinese from being able to break lines within the source. Thus UA heuristics need to be conservative about where they discard segment breaks even as they strive to improve support for such languages.

4.4. Tab Character Size: the tab-size property

Tests

This section has good test coverage.


Name: tab-size
Value: <number> | <length>
Initial: 8
Applies to: text
Inherited: yes
Percentages: n/a
Computed value: the specified number or absolute length
Canonical order: n/a
Animation type: by computed value type
Tests

This property determines the tab size used to render preserved tab characters (U+0009). A <number> represents the measure as a multiple of the advance width of the space character (U+0020) of the nearest block container ancestor of the preserved tab, including its associated letter-spacing and word-spacing. Negative values are not allowed.

Tests

5. Line Breaking and Word Boundaries

Tests

Tests mostly not needed for this section: these are definitions, they get tested through their application, not by themselves.

Can be a good section to host tests for i18n requirements not covered in detail by the spec.

Possible additions:


When inline-level content is laid out into lines, it is broken across line boxes. Such a break is called a line break. When a line is broken due to explicit line-breaking controls (such as a preserved newline character), or due to the start or end of a block, it is a forced line break. When a line is broken due to content wrapping (i.e. when the UA creates unforced line breaks in order to fit the content within the measure), it is a soft wrap break. The process of breaking inline-level content into lines is called line breaking.

Wrapping is only performed at an allowed break point, called a soft wrap opportunity. When wrapping is enabled (see white-space), the UA must minimize the amount of content overflowing a line by wrapping the line at a soft wrap opportunity, if one exists.

Tests

In most writing systems, in the absence of hyphenation a soft wrap opportunity occurs only at word boundaries. Many such systems use spaces or punctuation to explicitly separate words, and soft wrap opportunities can be identified by these characters. Scripts such as Thai, Lao, and Khmer, however, do not use spaces or punctuation to separate words. Although the zero width space (U+200B) can be used as an explicit word delimiter in these scripts, this practice is not common. As a result, a lexical resource is needed to correctly identify soft wrap opportunities in such texts.

In some other writing systems, soft wrap opportunities are based on orthographic syllable boundaries, not word boundaries. Some of these systems, such as Javanese and Balinese, are similar to Thai and Lao in that they require analysis of the text to find breaking opportunities. In others such as Chinese (as well as Japanese, Yi, and sometimes also Korean), each syllable tends to correspond to a single typographic letter unit, and thus line breaking conventions allow the line to break anywhere except between certain character combinations. Additionally the level of strictness in these restrictions varies with the typesetting style.

Tests

While CSS does not fully define where soft wrap opportunities occur, some controls are provided to distinguish common variations:

Note: Unicode Standard Annex #14: Unicode Line Breaking Algorithm defines a baseline behavior for line breaking for all scripts in Unicode, which is expected to be further tailored. [UAX14] More information on line breaking conventions can be found in Requirements for Japanese Text Layout [JLREQ] and Formatting Rules for Japanese Documents [JIS4051] for Japanese, Requirements for Chinese Text Layout [CLREQ] and General Rules for Punctuation [ZHMARK] for Chinese. See also the Internationalization Working Group’s Language Enablement Index which includes more information on additional languages. [TYPOGRAPHY] Any guidance on additional appropriate references would be much appreciated.

5.1. Line Breaking Details

Tests

This section has partial test coverage.

Missing tests:

Untestable(?):


When determining line breaks:

5.2. Breaking Rules for Letters: the word-break property

Tests

This section has partial test coverage.

Missing tests:


Name: word-break
Value: normal | keep-all | break-all | break-word
Initial: normal
Applies to: text
Inherited: yes
Percentages: n/a
Computed value: specified keyword
Canonical order: n/a
Animation type: discrete
Tests

This property specifies soft wrap opportunities between letters, i.e. where it is “normal” and permissible to break lines of text. Specifically it controls whether a soft wrap opportunity generally exists between adjacent typographic letter units, treating non-letter typographic character units belonging to the NU, AL, AI, or ID Unicode line breaking classes as typographic letter units for this purpose (only). [UAX14] It does not affect rules governing the soft wrap opportunities created by white space (as well as by other space separators) and around punctuation. (See line-break for controls affecting punctuation and small kana.)

Tests
For example, in some styles of CJK typesetting, English words are allowed to break between any two letters, rather than only at spaces or hyphenation points; this can be enabled with word-break:break-all.
A snippet of Japanese text with English in it.
			          The word 'caption' is broken into 'capt' and 'ion' across two lines.
An example of English text embedded in Japanese being broken at an arbitrary point in the word.

As another example, Korean has two styles of line-breaking: between any two Korean syllables (word-break: normal) or, like English, mainly at spaces (word-break: keep-all).

각 줄의 마지막에 한글이 올 때 줄 나눔 기
준을 “글자” 또는 “어절” 단위로 한다.
각 줄의 마지막에 한글이 올 때 줄 나눔
기준을 “글자” 또는 “어절” 단위로 한다.

Ethiopic similarly has two styles of line-breaking, either only breaking at word separators (word-break: normal), or also allowing breaks between letters within a word (word-break: break-all).

ተወልዱ፡ኵሉ፡ሰብእ፡ግዑዛን፡ወዕሩያን፡
በማዕረግ፡ወብሕግ።ቦሙ፡ኅሊና፡ወዐቅል፡
ወይትጌበሩ፡አሐዱ፡ምስለ፡አሀዱ፡
በመንፈሰ፡እኍና።
ተወልዱ፡ኵሉ፡ሰብእ፡ግዑዛን፡ወዕሩያን፡በማ
ዕረግ፡ወብሕግ።ቦሙ፡ኅሊና፡ወዐቅል፡ወይትጌ
በሩ፡አሐዱ፡ምስለ፡አሀዱ፡በመንፈሰ፡እኍና።

Note: To enable additional break opportunities only in the case of overflow, see overflow-wrap.

Values have the following meanings:

normal
Words break according to their customary rules, as described above. Korean, which commonly exhibits two different behaviors, allows breaks between any two consecutive Hangul/Hanja. For Ethiopic, which also exhibits two different behaviors, such breaks within words are not allowed.
Tests
break-all
Breaking is allowed within “words”: specifically, in addition to soft wrap opportunities allowed for normal, any typographic letter units (and any typographic character units resolving to the NU (“numeric”), AL (“alphabetic”), or SA (“Southeast Asian”) line breaking classes [UAX14]) are instead treated as ID (“ideographic characters”) for the purpose of line-breaking. Hyphenation is not applied.
Tests

Note: This value does not affect whether there are soft wrap opportunities around punctuation characters. To allow breaks anywhere, see line-break: anywhere.

Tests

Note: This option enables the other common behavior for Ethiopic. It is also often used in a context where the text consists predominantly of CJK characters with only short non-CJK excerpts, and it is desired that the text be better distributed on each line.

keep-all
Breaking is forbidden within “words”: implicit soft wrap opportunities between typographic letter units (or other typographic character units belonging to the NU, AL, AI, or ID Unicode line breaking classes [UAX14]) are suppressed, i.e. breaks are prohibited between pairs of such characters (regardless of line-break settings other than anywhere) except where opportunities exist due to dictionary-based breaking. Otherwise this option is equivalent to normal. In this style, sequences of CJK characters do not break.
Tests

Note: This is the other common behavior for Korean (which uses spaces between words), and is also useful for mixed-script text where CJK snippets are mixed into another language that uses spaces for separation.

Symbols that line-break the same way as letters of a particular category are affected the same way as those letters.

Here’s a mixed-script sample text:
这是一些汉字 and some Latin و کمی خط عربی และตัวอย่างการเขียนภาษาไทย በጽሑፍ፡ማራዘሙን፡አንዳንድ፡

The break-points are determined as follows (indicated by ‘·’):

word-break: normal
这·是·一·些·汉·字·and·some·Latin·و·کمی·خط·عربی·และ·ตัวอย่าง·การเขียน·ภาษาไทย·በጽሑፍ፡·ማራዘሙን፡·አንዳንድ፡
word-break: break-all
这·是·一·些·汉·字·a·n·d·s·o·m·e·L·a·t·i·n·و·ﮐ·ﻤ·ﻰ·ﺧ·ﻁ·ﻋ·ﺮ·ﺑ·ﻰ·แ·ล·ะ·ตั·ว·อ·ย่·า·ง·ก·า·ร·เ·ขี·ย·น·ภ·า·ษ·า·ไ·ท·ย·በ·ጽ·ሑ·ፍ፡·ማ·ራ·ዘ·ሙ·ን፡·አ·ን·ዳ·ን·ድ፡
word-break: keep-all
这是一些汉字·and·some·Latin·و·کمی·خط·عربی·และ·ตัวอย่าง·การเขียน·ภาษาไทย·በጽሑፍ፡·ማራዘሙን፡·አንዳንድ፡

Japanese is usually typeset allowing line breaks within words. However, it is sometimes preferred to suppress these wrapping opportunities and to only allow wrapping at the end of certain sentence fragments. This is most commonly done in very short pieces of text, such as headings and table or figure captions.

This can be achieved by marking the allowed wrapping points with wbr or U+200B ZERO WIDTH SPACE, and suppressing the other ones using word-break: keep-all.

For instance, the following markup can produce either of the renderings below, depending on the value of the word-break property:

<h1>窓ぎわの<wbr>トットちゃん</h1>
h1 { word-break: normal } h1 { word-break: keep-all }
Expected rendering
窓ぎわのトットちゃ
ん
窓ぎわの
トットちゃん
Result in your browser 窓ぎわのトットちゃん 窓ぎわのトットちゃん

When shaping scripts such as Arabic are allowed to break within words due to break-all the characters must still be shaped as if the word were not broken (see § 6.4 Shaping Across Intra-word Breaks).

Tests

For compatibility with legacy content, the word-break property also supports a deprecated break-word keyword. When specified, this has the same effect as word-break: normal and overflow-wrap: anywhere, regardless of the actual value of the overflow-wrap property.

Tests