Help & Support

Comprehensive guides for using Insurgent Ink

OCR Processing

Understand how Insurgent Ink processes scanned documents and images using Optical Character Recognition (OCR) technology to extract translatable text.

What is OCR?

Optical Character Recognition (OCR) is a technology that converts scanned documents, images, or photos containing text into machine-readable text that can be translated.

Image Recognition

Analyzes visual patterns to identify text characters in images

Text Extraction

Converts visual text into digital, editable text format

Processing

Prepares extracted text for translation and analysis

When OCR is Used

Insurgent Ink automatically detects when documents require OCR processing and applies the appropriate technology:

Automatic OCR Detection

The system automatically determines when OCR is needed based on document content:

Scanned PDFs

PDFs containing images of text rather than searchable text require OCR processing to extract readable content.

Image Files

JPG, PNG, and other image formats uploaded with text content are automatically processed through OCR.

Mixed Content PDFs

Documents containing both searchable text and scanned images receive combined processing for complete text extraction.

Legacy Documents

Older documents that were scanned without OCR processing benefit from modern text extraction technology.

OCR Processing Steps

Understanding the OCR processing workflow helps you optimize your documents for the best results:

1

Document Analysis

The system analyzes the uploaded document to determine content type and optimal processing approach.

  • • Detection of text vs. image content
  • • Page layout and structure analysis
  • • Image quality assessment
  • • Language detection (when possible)
2

Image Enhancement

Images are automatically enhanced to improve OCR accuracy and text recognition.

  • • Contrast and brightness optimization
  • • Noise reduction and clarity improvement
  • • Skew correction for aligned text
  • • Resolution enhancement when needed
3

Text Recognition

Advanced OCR algorithms extract text while preserving structure and formatting.

  • • Character recognition and assembly
  • • Word and sentence boundary detection
  • • Paragraph and section structure preservation
  • • Special character and symbol handling
4

Text Processing

Extracted text is cleaned, validated, and prepared for segmentation and translation.

  • • Text validation and error correction
  • • Formatting normalization
  • • Segmentation into translatable units
  • • Quality assessment and confidence scoring

Factors Affecting OCR Quality

Several factors influence the accuracy and quality of OCR text extraction:

Optimal Conditions

  • High Resolution: 300 DPI or higher for best results
  • Clear Text: Sharp, well-defined characters without blurring
  • Good Contrast: Dark text on light backgrounds
  • Standard Fonts: Common typefaces are recognized better
  • Proper Alignment: Straight, non-skewed text orientation
  • Clean Background: Minimal noise, stains, or artifacts

Challenging Conditions

  • Low Resolution: Pixelated or compressed images
  • Poor Lighting: Shadows, glare, or uneven illumination
  • Handwritten Text: Cursive or irregular handwriting
  • Decorative Fonts: Stylized or artistic typefaces
  • Complex Layouts: Multi-column or irregular text arrangement
  • Damaged Documents: Stains, tears, or faded text

OCR Best Practices

Follow these guidelines to achieve the best OCR results for your documents:

📷 Document Preparation

  • • Scan or photograph at high resolution (300+ DPI)
  • • Ensure adequate lighting and avoid shadows
  • • Keep documents flat and properly aligned
  • • Clean scanner glass or camera lens
  • • Use auto-crop features when available

📄 File Optimization

  • • Save images in PNG or high-quality JPEG format
  • • Avoid excessive compression that reduces quality
  • • Consider converting to grayscale for text-only documents
  • • Remove unnecessary margins and borders
  • • Split multi-page documents if needed

⚙️ Processing Tips

  • • Allow extra time for OCR processing to complete
  • • Review extracted text for accuracy before translation
  • • Consider manual correction for critical documents
  • • Test with small sections before processing large documents
  • • Keep original files as backup references

🔍 Quality Control

  • • Compare extracted text with original document
  • • Check for missing paragraphs or sections
  • • Verify proper handling of special characters
  • • Confirm table and list formatting preservation
  • • Report consistent issues for system improvement

OCR Limitations

While OCR technology is highly advanced, there are some limitations to be aware of:

Text Recognition Challenges

Certain types of text and layouts may produce less accurate results:

  • • Handwritten content, especially cursive writing
  • • Very small font sizes (under 8pt)
  • • Heavily stylized or decorative fonts
  • • Text on complex or patterned backgrounds
  • • Rotated or significantly skewed text

Formatting Preservation

Complex formatting may not be perfectly preserved during OCR processing:

  • • Multi-column layouts may merge or reorder
  • • Tables and charts might lose structure
  • • Font styling (bold, italic) may not be detected
  • • Page headers and footers might be misplaced
  • • Graphics and images are not processed

Language Considerations

Some languages and writing systems present unique OCR challenges:

  • • Right-to-left scripts (Arabic, Hebrew) may need manual review
  • • Vertical text layouts (some Asian languages)
  • • Mixed language documents with different scripts
  • • Languages with complex character combinations
  • • Rare or regional language variants

Troubleshooting OCR Issues

No Text Extracted

If OCR fails to extract any text from your document:

  • • Verify the document contains actual text (not just images)
  • • Check image quality and resolution
  • • Ensure proper contrast between text and background
  • • Try uploading individual pages separately
  • • Consider manual text entry for very poor quality documents

Inaccurate Text Recognition

When extracted text contains numerous errors:

  • • Review and manually correct critical sections
  • • Re-scan with better quality settings if possible
  • • Check for document skew or rotation issues
  • • Consider alternative scanning methods or software
  • • Report persistent issues for system improvement

Slow Processing

If OCR processing takes unusually long:

  • • Large or high-resolution documents require more time
  • • Complex layouts increase processing duration
  • • System load may temporarily slow processing
  • • Consider breaking large documents into smaller sections
  • • Contact support if processing exceeds reasonable timeframes