1. Upload the image
Drop a PNG/JPG/WEBP into the Upload table image zone. The image is never uploaded — tesseract.js runs the Korean (kor) + English (eng) WebAssembly models inside your browser. On first use the Korean model (~12MB) and English model (~8MB) download once and are cached in your browser thereafter.
2. Capture for best accuracy
- Resolution — 1600px on the long edge is ideal. We upscale automatically, but crisp originals help.
- Perspective — keep the camera perpendicular to the table. 4–5° tilt is tolerated; more and column boundaries scatter.
- Contrast — we grayscale + histogram-stretch internally; still, avoid dark or patterned backgrounds.
- Crop margin — leave 30–50px padding around the table so the first and last rows don't get clipped.
- Grid lines — optional. We do not use lines; row/column split is derived purely from word bounding boxes.
3. How rows and columns are inferred
We take each word's vertical centre to cluster into rows, and use 1-D clustering on left-edge X values to find column starts. Words in the same row falling in the same column stripe are joined with spaces. Heavy skew or multi-line cells degrade accuracy; use the preview to edit cells or delete mis-split rows.
4. Edit & export
- Edit a cell — double-click any cell to type; Enter or blur to save.
- Select rows — checkbox on the left; then "Delete selected rows" to bulk-remove.
- Add a row — appends an empty row with the same column count.
- Download XLSX — single-sheet "OCR" workbook.
- Download CSV — UTF-8 with BOM so Korean/Japanese headers render in Excel.
5. Korean-aware post-processing
Each cell runs through quick clean-ups:
12,345 원→12,345원(no-space before the common currency/unit suffixes 원 · 만원 · 억원 · % · 개 · 건 · 명 · 회)- Fullwidth comma
,→, - Standalone noise cells like
|/·removed (often grid-line artefacts)
These clean-ups cannot recover text the underlying OCR already lost. If accuracy is poor, retake the image at a higher resolution — or edit cells manually.
6. Numbers & dates
- Thousand separators stay as-is. To treat as numbers in Excel, use Text-to-Columns → remove comma, or paste as values.
- Dates (e.g. 2025-03-01) come through as strings. Convert with
=DATEVALUE(A2)or cell format "Date". - When Korean unit suffixes are mixed in with numeric values, splitting number and unit into separate columns generally helps downstream analysis.
7. Performance
Typical laptop: 3–8s for 1000×800 images, 10–20s for 1600px+. First-run model download adds another 10–20s depending on bandwidth. Very large images (4000px+) hit memory limits — crop to just the table region for faster and more accurate recognition.
8. Privacy
The site is static S3 hosting — no upload endpoint exists. Open DevTools → Network, run OCR, and confirm that upload traffic is literally zero bytes. Safe to use under enterprise "no-upload" policies.