Unicode Normalizer
Normalize Unicode text using NFC, NFD, NFKC, and NFKD forms.
Input
Output
Readme
What is Unicode normalization?
Unicode defines multiple ways to represent the same visible character. For example, the character "é" can be stored as a single precomposed code point (U+00E9) or as the letter "e" followed by a combining acute accent (U+0065 U+0301). Both render identically but are byte-for-byte different, which causes problems for string comparison, searching, and text processing.
Unicode normalization is the process of converting text to a canonical representation so that equivalent strings become identical. The Unicode Standard defines four normalization forms:
- NFC (Canonical Decomposition, followed by Canonical Composition): Precomposed form; most compact, widely used on the web and in most operating systems.
- NFD (Canonical Decomposition): Fully decomposed form; each character is broken into base letter plus combining marks.
- NFKC (Compatibility Decomposition, followed by Canonical Composition): Like NFC but also folds compatibility characters (e.g., ligatures, superscripts, full-width variants) into their canonical equivalents.
- NFKD (Compatibility Decomposition): Like NFD but also applies compatibility decomposition.
Tool description
This tool normalizes Unicode text from one normalization form to another. Select the source and target forms, paste your text, and the converted output appears instantly.
Features
- All four normalization forms: Supports NFC, NFD, NFKC, and NFKD as both source and target.
- Bidirectional conversion: Switch between any combination of forms freely.
- Real-time output: Text is normalized instantly as you type or paste.
- Handles any Unicode script: Works with Latin, CJK, Arabic, Cyrillic, and any other Unicode-encoded text.
How it works
The tool applies the standard JavaScript String.prototype.normalize() method with the selected target form. This is a fully spec-compliant implementation of Unicode normalization as defined in Unicode Standard Annex #15.
Use cases
- String comparison fixes: Normalize text before comparing or indexing it to ensure that visually identical strings match correctly.
- Search and database consistency: Standardize user input to a single form (typically NFC) before storing it in a database to prevent duplicate entries that differ only in encoding.
- Compatibility folding: Use NFKC to collapse ligatures, superscripts, and full-width characters into their standard equivalents for search indexing or natural language processing.