OCR Image Text Extractor
Extract text from images directly in your browser using Tesseract.js OCR engine. Supports 19 languages including English, Russian, Chinese, Japanese, Arabic, and more.
Input
Output
Readme
What is OCR?
Optical Character Recognition (OCR) is the technology that converts images containing text into machine-readable characters. It works by analyzing the shapes and patterns in an image and matching them against known character representations. OCR is widely used to digitize printed documents, extract text from photos, and make scanned content searchable or editable.
Modern browser-based OCR uses trained neural network models that run entirely on your device. This means your images are never sent to a server — recognition happens locally using your CPU.
Tool description
This tool extracts text from images directly in your browser using the Tesseract.js OCR engine. Upload a photo, screenshot, or scanned document, choose the language of the text, and click Extract Text. The recognized text appears in the output area, where you can copy it to the clipboard or download it as a .txt file. No installation, no uploads, no internet connection required after the page loads.
Features
- Runs entirely in the browser — no file uploads, full privacy
- Supports 19 languages including English, Russian, Chinese (Simplified and Traditional), Japanese, Korean, Arabic, Hindi, and major European languages
- Accepts JPEG, PNG, WebP, GIF, BMP, and TIFF image formats
- Real-time progress indicator during recognition
- Download extracted text as a
.txtfile named after the source image
Use cases
- Digitizing printed documents: Scan a page with your phone and extract all text for editing or searching without manually retyping it.
- Copying text from screenshots: Extract code snippets, error messages, or quotes from screenshots where the text cannot be selected normally.
- Processing images with foreign language text: Use the language selector to recognize text in non-Latin scripts such as Arabic, Japanese, or Cyrillic.
Supported formats
| Format | Extensions |
|---|---|
| JPEG | .jpg, .jpeg |
| PNG | .png |
| WebP | .webp |
| GIF | .gif |
| BMP | .bmp |
| TIFF | .tif, .tiff |
Supported languages
| Language | Code |
|---|---|
| English | eng |
| Russian | rus |
| French | fra |
| German | deu |
| Italian | ita |
| Spanish | spa |
| Portuguese | por |
| Dutch | nld |
| Polish | pol |
| Arabic | ara |
| Chinese (Simplified) | chi_sim |
| Chinese (Traditional) | chi_tra |
| Japanese | jpn |
| Korean | kor |
| Hindi | hin |
| Turkish | tur |
| Swedish | swe |
| Norwegian | nor |
| Finnish | fin |
Tips
- Better images produce better results: Use high-contrast images with sharp, evenly lit text. Blurry or low-resolution photos will reduce accuracy.
- Select the correct language: Recognition accuracy drops significantly when the wrong language is selected, especially for non-Latin scripts.
- Dark text on light background works best: If your image has light text on a dark background, try inverting it before uploading.
- Scanned documents: Scan at 300 DPI or higher for best results with printed text.
Limitations
- Recognition accuracy depends heavily on image quality, font style, and text size. Handwriting, decorative fonts, and very small text may not be recognized well.
- The language model files are downloaded on first use (a few megabytes each), so the first extraction may take longer.
- Multi-column layouts may produce text in an unexpected reading order.