1
Ingest
Gather quotations into the corpus from sources.
Overview
The Ingest stage is the entry point of the editorial pipeline. It receives raw quotation submissions and performs initial validation, normalization, and lemma discovery.
This stage ensures that incoming text is properly formatted, linguistically valid, and ready for deeper analysis in subsequent stages.
Responsibilities
- Validate quotation (authentic language use?)
- Normalize text (Unicode NFC, whitespace)
- Discover lemmas (tokenize, lemmatize, rank by significance)
- Score 34-dimension significance (once for entire quotation)
- Create lemma affiliations (metadata linking lemma to quotation)
Output
IngestResponse
- normalizedText
- quotationSignificance (34 dimensions)
- lemmas[] with affiliation metadata
Coming Soon
Detailed documentation, metrics, and live stage monitoring will be available here.
of 4