Image Table OCR Guide

Practical tips for converting table screenshots to Excel/CSV with in-browser Korean OCR.

1. Upload the image

Drop a PNG/JPG/WEBP into the Upload table image zone. The image is never uploadedtesseract.js runs the Korean (kor) + English (eng) WebAssembly models inside your browser. On first use the Korean model (~12MB) and English model (~8MB) download once and are cached in your browser thereafter.

2. Capture for best accuracy

  • Resolution — 1600px on the long edge is ideal. We upscale automatically, but crisp originals help.
  • Perspective — keep the camera perpendicular to the table. 4–5° tilt is tolerated; more and column boundaries scatter.
  • Contrast — we grayscale + histogram-stretch internally; still, avoid dark or patterned backgrounds.
  • Crop margin — leave 30–50px padding around the table so the first and last rows don't get clipped.
  • Grid lines — optional. We do not use lines; row/column split is derived purely from word bounding boxes.

3. How rows and columns are inferred

We take each word's vertical centre to cluster into rows, and use 1-D clustering on left-edge X values to find column starts. Words in the same row falling in the same column stripe are joined with spaces. Heavy skew or multi-line cells degrade accuracy; use the preview to edit cells or delete mis-split rows.

4. Edit & export

  • Edit a celldouble-click any cell to type; Enter or blur to save.
  • Select rows — checkbox on the left; then "Delete selected rows" to bulk-remove.
  • Add a row — appends an empty row with the same column count.
  • Download XLSX — single-sheet "OCR" workbook.
  • Download CSV — UTF-8 with BOM so Korean/Japanese headers render in Excel.

5. Korean-aware post-processing

Each cell runs through quick clean-ups:

  • 12,345 원12,345원 (no-space before the common currency/unit suffixes 원 · 만원 · 억원 · % · 개 · 건 · 명 · 회)
  • Fullwidth comma ,
  • Standalone noise cells like | / · removed (often grid-line artefacts)

These clean-ups cannot recover text the underlying OCR already lost. If accuracy is poor, retake the image at a higher resolution — or edit cells manually.

6. Numbers & dates

  • Thousand separators stay as-is. To treat as numbers in Excel, use Text-to-Columns → remove comma, or paste as values.
  • Dates (e.g. 2025-03-01) come through as strings. Convert with =DATEVALUE(A2) or cell format "Date".
  • When Korean unit suffixes are mixed in with numeric values, splitting number and unit into separate columns generally helps downstream analysis.

7. Performance

Typical laptop: 3–8s for 1000×800 images, 10–20s for 1600px+. First-run model download adds another 10–20s depending on bandwidth. Very large images (4000px+) hit memory limits — crop to just the table region for faster and more accurate recognition.

8. Privacy

The site is static S3 hosting — no upload endpoint exists. Open DevTools → Network, run OCR, and confirm that upload traffic is literally zero bytes. Safe to use under enterprise "no-upload" policies.

Back to the tool