Word joiner

Character in text processing
ZERO WIDTH NO-BREAK SPACE, see Byte order mark.

The word joiner (WJ) is a Unicode format character which is used to indicate that line breaking should not occur at its position.[1] It does not affect the formation of ligatures or cursive joining and is ignored for the purpose of text segmentation.[1] It is encoded since Unicode version 3.2 (released in 2002) as U+2060 WORD JOINER (⁠).

The word joiner replaces the zero-width no-break space (ZWNBSP, U+FEFF), as a usage of the no-break space of zero width. The ZWNBSP is originally and currently used as the byte order mark (BOM) at the start of a file. However, if encountered elsewhere, it should, according to Unicode, be treated as a word joiner, a no-break space of zero width.

The deliberate use of U+FEFF for this purpose is deprecated as of Unicode 3.2, with the word joiner strongly preferred.[1][2]

See also

  • Byte order mark, which uses U+FEFF ZERO WIDTH NO-BREAK SPACE (ZWNBSP) character
  • Zero-width space
  • Zero-width joiner, which in scripts such as Arabic or Indic causes two characters to be shown in a connected form, even if they would otherwise not.

References

  1. ^ a b c "Layout Controls" (PDF). The Unicode Standard, Version 12.0.0. The Unicode Consortium. p. 871.
  2. ^ FAQ - UTF-8, UTF-16, UTF-32 & BOM, ”What should I do with U+FEFF in the middle of a file?“.
  • v
  • t
  • e
Unicode
Unicode
Code points
Characters
Special purpose
Lists
Processing
Algorithms
Comparison of encodings
On pairs of
code pointsUsageRelated standardsRelated topics
Scripts and symbols in Unicode
Common and
inherited scripts
Modern scripts
Ancient and
historic scripts
Notational scripts
Symbols, emojis
  •  Category: Unicode
  •  Category: Unicode blocks


Stub icon

This software-engineering-related article is a stub. You can help Wikipedia by expanding it.

  • v
  • t
  • e