You can improve the accuracy and relevance of automated processing by uploading a small JSON file with additional metadata about your documents. The fields below are optional โ include only what you know.
Tip: If youโre unsure about a field, you can leave it out. The system will use defaults automatically.
writing_styleDescribes the visual form of the text in the document.
Options:
"handwritten""printed""typed"This helps the system choose between transcription strategies (e.g., handwriting recognition vs. printed text).
languageThe main language(s) used in the document.
Options:
"english""spanish""portuguese""french""german""mixed"time_periodGives a rough estimate of when the document was created. This can influence how the system interprets spelling, abbreviations, and writing style.
Options:
"contemporary""mid_20th_century""early_20th_century""19th_century_or_earlier"transcription_preferencesSpecify how youโd like the system to handle certain aspects of transcription. Each setting is optional โ include only the ones you want to customize.
expand_abbreviations: true to expand abbreviations like โDr.โ to โDoctorโpreserve_line_breaks: true to keep the original line formattingretain_punctuation_and_spelling: true to preserve original punctuation and spellingnormalize_to_modern_language: true to modernize spelling and phrasingignore_marginalia: true to skip notes in the marginsDefaults:
{
  "expand_abbreviations": false,
  "preserve_line_breaks": true,
  "retain_punctuation_and_spelling": true,
  "normalize_to_modern_language": false,
  "ignore_marginalia": false
}
layout_structureDescribes how the text is arranged on the page. This helps guide interpretation of complex layouts.
Options:
"free_form""paragraphs""lists""tables""forms""mixed"non_textual_elementsList any non-text content that appears in the document. This information can help avoid misinterpreting these as text.
Options (use an array of values):
"illustrations""stamps_or_seals""handwritten_notes""diagrams""charts_or_graphs"Example:
"non_textual_elements": ["illustrations", "handwritten_notes"]
color_formatIndicates the color style of the scanned image. This helps the system calibrate processing.
Options:
"color""grayscale""black_and_white""mixed"orientationSpecifies how the page is oriented. This helps avoid mistakes when the image isnโt upright.
Options:
"portrait""landscape""square"image_quality_notesList any known image quality issues so the system can adapt or flag the content.
Options (use an array of values):
"blurry""low_contrast""partial_page""includes_fingers""includes_color_checker""damaged_or_stained"Example:
"image_quality_notes": ["blurry", "includes_fingers"]
descriptionA free-text description of the document set. This can provide general context to improve results for both transcription and metadata generation.
Example:
"description": "These are scanned forms from a late 20th-century survey project on urban housing."