HOCR

hOCR is an open standard of data representation for formatted text obtained from optical character recognition (OCR). The definition encodes text, style, layout information, recognition confidence metrics and other information using Extensible Markup Language (XML) in the form of Hypertext Markup Language (HTML) or XHTML.[1]

  1. ^ Breuel, T. (2007-09-01). "The hOCR Microformat for OCR Workflow and Results" (PDF). Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2. Vol. 2. pp. 1063–1067. doi:10.1109/ICDAR.2007.4377078. ISBN 978-0-7695-2822-9. S2CID 7565957.