A lot of tables with invisible borders, a complex structure, a lot of graphics, a broken charset, documents that were originally laid out in special programs (for instance InDesign, Corel Draw and so on), etc.
Steps we take:
- OCR with the right adjustments and language.
- Odd section breaks are removed.
- Segmentation is correct and no sentence has been broken—for better translation, even for machine translation.
- Odd tags are removed.
- Tables are formatted as tables, and not as texts with tabs.
- Pictures only have editable text boxes.
- Equations and formulas are inserted with an equation editor or as tables.
- Handwritten text, seals, stamps, etc. are typed in (if possible).
- Multi-column documents are in columns or tables according to the structure of the document.
- OCR Style Guide is followed (if it was provided).
- Consistent fonts, sizes, and styles.
- All text is visible in tables and text boxes.
- Instructions are checked before delivery.
- Headers and footers are correct and consistent with the source.
- Page numbering is automatic.
- If text is illegible, it is marked as [illegible].
- All TOCs are working as expected. They are ALWAYS updated automatically and match the source.
- Automatic spellcheck.
Download sample file