What is OCR?

Optical Character Recognition (OCR) is the technology that converts images containing text into machine-readable characters. It works by analyzing the shapes and patterns in an image and matching them against known character representations. OCR is widely used to digitize printed documents, extract text from photos, and make scanned content searchable or editable.

Modern browser-based OCR uses trained neural network models that run entirely on your device. This means your images are never sent to a server — recognition happens locally using your CPU.

Tool description

This tool extracts text from images directly in your browser using the Tesseract.js OCR engine. Upload a photo, screenshot, or scanned document, choose the language of the text, and click Extract Text. The recognized text appears in the output area, where you can copy it to the clipboard or download it as a .txt file. No installation, no uploads, no internet connection required after the page loads.

Features

  • Runs entirely in the browser — no file uploads, full privacy
  • Supports 19 languages including English, Russian, Chinese (Simplified and Traditional), Japanese, Korean, Arabic, Hindi, and major European languages
  • Accepts JPEG, PNG, WebP, GIF, BMP, and TIFF image formats
  • Real-time progress indicator during recognition
  • Download extracted text as a .txt file named after the source image

Use cases

  • Digitizing printed documents: Scan a page with your phone and extract all text for editing or searching without manually retyping it.
  • Copying text from screenshots: Extract code snippets, error messages, or quotes from screenshots where the text cannot be selected normally.
  • Processing images with foreign language text: Use the language selector to recognize text in non-Latin scripts such as Arabic, Japanese, or Cyrillic.

Supported formats

Format Extensions
JPEG .jpg, .jpeg
PNG .png
WebP .webp
GIF .gif
BMP .bmp
TIFF .tif, .tiff

Supported languages

Language Code
English eng
Russian rus
French fra
German deu
Italian ita
Spanish spa
Portuguese por
Dutch nld
Polish pol
Arabic ara
Chinese (Simplified) chi_sim
Chinese (Traditional) chi_tra
Japanese jpn
Korean kor
Hindi hin
Turkish tur
Swedish swe
Norwegian nor
Finnish fin

Tips

  • Better images produce better results: Use high-contrast images with sharp, evenly lit text. Blurry or low-resolution photos will reduce accuracy.
  • Select the correct language: Recognition accuracy drops significantly when the wrong language is selected, especially for non-Latin scripts.
  • Dark text on light background works best: If your image has light text on a dark background, try inverting it before uploading.
  • Scanned documents: Scan at 300 DPI or higher for best results with printed text.

Limitations

  • Recognition accuracy depends heavily on image quality, font style, and text size. Handwriting, decorative fonts, and very small text may not be recognized well.
  • The language model files are downloaded on first use (a few megabytes each), so the first extraction may take longer.
  • Multi-column layouts may produce text in an unexpected reading order.