Text Justification

Unofficial Proposal Draft,

This version:
https://drafts.csswg.org/css-text-3/text-justify-i18n
Latest published version:
https://www.w3.org/TR/text-justify-i18n/
Editor:
Elika J. Etemad / fantasai (Invited Expert)
Issue Tracking:
GitHub Issues

Abstract

This Note serves as a clearinghouse to further information on worldwide conventions for text justification: the process of stretching text to fill a line.

CSS is a language for describing the rendering of structured documents (such as HTML and XML) on screen, on paper, in speech, etc.

Status of this document

1. Introduction

Since the amount of content on a line tends to vary, even if minutely, from line to line within a paragraph, typographers have come up with various methods for effective full justificationcausing the text to completely fill the text—in order to create visual alignment on both edges of a paragraph.

Typographic conventions for full text justification depend on the writing system, the content language, and the calligraphic style of the text. Results also tend to vary based on the capabilities of the layout engine and a given typographer’s preferences for weighing its various detrimental effects on typographic color and readability.

This document collects together references for further information on the typographic conventions for full justification as they apply to the various writing systems around the world, together with some guidance for implementers handling unpredictable Web content. (General information and technical requirements for CSS are described under the Justification section of [CSS3TEXT].)

Additional information and references are hereby solicited; please send any suggestions for additions, clarifications, corrections, and other improvements to the W3C Internationalization Working Group at www-international@w3.org.

Note: Information on which languages use which writing systems is maintained in the Unicode CLDR.

2. References

2.1. Chinese Writing System (Han Ideographs)

Historically, Chinese was written as Han ideographs, with no punctuation. Under this system, justification was automatic, as the characters fit perfectly into a square grid. However, the introduction of punctuation in recent centuries, plus the increase in mixed-script text (such as the inclusion of European numbers and/or words, phrases, names, and trademarks) has created a need for adjustments within a line.

Chinese notably does not use word spaces, so these do not provide a justification opportunity within the lines; thus justification techniques focus on adjustments to spacing around punctuation, script-change boundaries, and inter-character spacing.

2.2. Japanese Writing System

Like Chinese, Japanese was historically written in Han ideographs; however it has since developed its own phonetic scripts Hiragana and Katakana (collectively, Kana). While pure kana texts do exist, particularly in children’s literature, Han ideographs (Kanji, in Japanese) continue to be an integral part of normal Japanese text, and are interspersed with kana within a sentence.

Like Chinese, embraced European-inspired punctuation, numerals, and other foreign snippets that don’t conform to the standard full-width character grid. The Japanese writing system also does not use word spaces, and similarly focuses on adjustments to spacing around punctuation, script-change boundaries, and inter-character spacing, with a notable preference for compression of intra-glyph spacing over expansion between glyphs.

2.3. Korean Writing System

Like Japanese, Korean was historically written in pure Han ideographs, and has since developed its own phonetic script, Hangul. Also like Japanese, it has adopted punctuation and numerals. However, unlike Japanese, Korean has also adopted word spaces, and tends towards narrow (Western-style, rather than full-width) punctuation. This allows it to use inter-word justification: as in English publications, this method stretches the spaces between words in order to fill the line.

While Han ideographs (Hanja, in Korean) were kept as part of the writing system, they have become increasingly scarce over time such that many documents are written in pure Hangul, and some only use Hanja as inline annotations for disambiguation among homophones rather than as part of the main text. However, Hanja and Hangul together remain important components of Korean writing.

2.4. Latin (Roman) Writing System

Quite possibly the writing system familiar to more people than any other, the Latin writing system derives from the Roman alphabet, including a few additional characters and diacritic marks to accommodate languages such as Icelandic and modern Vietnamese. Thanks to the Europeans in the Age of Exploration, their missionaries, and the Western-dominated global scholastic culture of the modern age, most languages in the world have one or more Latin transcriptions, even those that do not use it as their primary writing system.

The Latin alphabet is a phonetic system with disjoint letterforms, and typically uses spaces between words. This allows it to use inter-word justification, although it can and sometimes does increase the spacing between individual letters as well. Since it is frequently adopted into other writing systems, it can sometimes adopt characteristics of that system; for example, some styles of Japanese typesetting treat Latin letters the same as Japanese characters for the purpose of line-breaking and justification.

2.5. Ethiopic Writing System

Like Latin, the Ethiopic writing system uses an alphabet of disjoint letters and uses punctuation to indicate the break between words. Unlike Latin, Ethiopic traditionally uses a visible word separator—the Ethiopic Word Space U+1361 “፡”—although modern documents sometimes use a regular space U+0020 “ ” instead. Justification strategies are as for Latin: increasing the space at the word separator, and/or distributing space between letters.

2.6. Arabic Writing System (and Other Cursive Systems)

Arabic is a cursive script, meaning its letters are typically joined together within a word. This creates additional challenges, as the usual method for stretching out text—inserting spaces between glyphs—does not work.

Since Arabic uses spaces between words, one method for justification is inter-word justification—stretching out the spaces within the line to fill it. However, most styles of Arabic writing prefer calligraphic elongation or compression, distorting the shapes and connections between letters in order to fill the line while preserving its typographic color. This is often called “kashida”, meaning “stretched”. A simplistic variant of this technique inserts elongation marks (sometimes represented with U+0640 “ـ” TATWEEL) at appropriate points in the text.

Syriac and Mongolian have properties similar to Arabic, and in the absence of additional information should be given similar treatment for justification.

2.7. Tibetan Writing System

Tibetan is a Brahmic writing system related to Indic scripts like Devanagari and Gujarati; however, unlike these systems, it does not use Western-style punctuation nor spaces between words, and instead uses the Tibetan Tsheg Mark U+0F0B “་” between syllables and its own punctuation marks such as the Tibetan Shad U+0F0D “།” and Tibetan Nyis Shad U+ 0F0E “༎”, which indicate the end of longer segments.

Justification techniques used in Tibetan include stretching the space after a shad, minutely increasing the spaces after tsheg marks, and simply filling the remaining space on a line with tsheg marks.

2.8. Southeast Asian Writing Systems

In Southeast Asian systems such as Thai and Lao, letters are merged together into “clusters”. There are no spaces between words (lines must be broken by dictionary), but spaces serve to separate larger units of text.

Techniques for justification include stretching spaces on the line (if it happens to have any) and interspersing extra space between clusters.

Scripts in this category include Khmer, Myanmar, Lao, and Thai.

2.9. Other Writing Systems

Most (but not all) writing systems not mentioned here have discrete letters, like Latin, and in the absence of more specific information may be assumed to justify in a similar manner.

Note: Readers who wish to provide such “more specific information” are invited (and strongly encouraged) to contact the W3C Internationalization Working Group so that this document may be updated.

3. Guidance for Authors and Implementers

3.1. Tagging Content By Writing System

While most languages have a preferred writing system, many can be transcribed into a different system. As a common example, most languages have a Latin transcription, and can thus be written in the Latin writing system. In these cases the document typically adopts the typographic conventions of the Latin writing system: for example Japanese “romaji” and Chinese Pinyin use word spaces and justify accordingly. As another example, historical ideographic Korean (ko-Hant) does not use word spaces, and should therefore be justified as for Chinese.

Authors can indicate the use of the Latin writing system with the -Latn language subtag, e.g. ja-Latn for Japanese romaji. Other subtags exist for other writing systems, see ????. Some common/historical examples follow:

zh-Latn
Chinese, written in Latin transcription
ko-Hant
Korean, written in Hanja (Chinese ideographic characters)
??-Arab
Turkish, written in Arabic script.
??-???
Mongolian, written in Cyrillic
??-???
Mongolian, written in traditional Mongolian script.

UAs should assume the most common writing system for a given language when choosing a justification strategy, but must not assume that writing system if the author has explicitly indicated a different one.

3.2. Justifying Untagged Content

Web browsers frequently have to deal with untagged, potentially mixed-script content. The following are some guidelines for designing a strategy to deal with such content.

Authors should use (correct) language tags in order to get the best possible typographic behavior. For example, if Japanese text is tagged as Japanese, the UA knows to preferentially compress the space rather than expand it.

4. Acknowledgements

This document was compiled with guidance from: the W3C Internationalization and CSS Working Groups, and the W3C Japanese, Chinese, Korean, and Ethiopic Language Task Forces.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Advisements are normative sections styled to evoke special attention and are set apart from other normative text with <strong class="advisement">, like this: UAs MUST provide an accessible alternative.

Conformance classes

Conformance to this specification is defined for three conformance classes:

style sheet
A CSS style sheet.
renderer
A UA that interprets the semantics of a style sheet and renders documents that use them.
authoring tool
A UA that writes a style sheet.

A style sheet is conformant to this specification if all of its statements that use syntax defined in this module are valid according to the generic CSS grammar and the individual grammars of each feature defined in this module.

A renderer is conformant to this specification if, in addition to interpreting the style sheet as defined by the appropriate specifications, it supports all the features defined by this specification by parsing them correctly and rendering the document accordingly. However, the inability of a UA to correctly render a document due to limitations of the device does not make the UA non-conformant. (For example, a UA is not required to render color on a monochrome monitor.)

An authoring tool is conformant to this specification if it writes style sheets that are syntactically correct according to the generic CSS grammar and the individual grammars of each feature in this module, and meet all other conformance requirements of style sheets as described in this module.

Requirements for Responsible Implementation of CSS

The following sections define several conformance requirements for implementing CSS responsibly, in a way that promotes interoperability in the present and future.

Partial Implementations

So that authors can exploit the forward-compatible parsing rules to assign fallback values, CSS renderers must treat as invalid (and ignore as appropriate) any at-rules, properties, property values, keywords, and other syntactic constructs for which they have no usable level of support. In particular, user agents must not selectively ignore unsupported property values and honor supported values in a single multi-value property declaration: if any value is considered invalid (as unsupported values must be), CSS requires that the entire declaration be ignored.

Implementations of Unstable and Proprietary Features

To avoid clashes with future stable CSS features, the CSSWG recommends following best practices for the implementation of unstable features and proprietary extensions to CSS.

Implementations of CR-level Features

Once a specification reaches the Candidate Recommendation stage, implementers should release an unprefixed implementation of any CR-level feature they can demonstrate to be correctly implemented according to spec, and should avoid exposing a prefixed variant of that feature.

To establish and maintain the interoperability of CSS across implementations, the CSS Working Group requests that non-experimental CSS renderers submit an implementation report (and, if necessary, the testcases used for that implementation report) to the W3C before releasing an unprefixed implementation of any CSS features. Testcases submitted to W3C are subject to review and correction by the CSS Working Group.

Further information on submitting testcases and implementation reports can be found from on the CSS Working Group’s website at http://www.w3.org/Style/CSS/Test/. Questions should be directed to the public-css-testsuite@w3.org mailing list.

Index

Terms defined by this specification

References

Normative References

[CSS3TEXT]
Elika Etemad; Koji Ishii. CSS Text Module Level 3. 10 October 2013. LCWD. URL: http://dev.w3.org/csswg/css-text-3/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119