An intelligent, two-pass workflow using the Gemini LLM to ensure consistent, high-quality translation of large documents.
Standard LLM calls fail to translate key terms consistently across a large document. Lacking memory of previous translations, LLMs produce a disjointed final text.
This pipeline solves the problem by intelligently preparing the LLM with context before the final translation.
The first pass analyzes the entire document with the Gemini to identify and extract key terms. It builds a comprehensive translation glossary.
Output: A comprehensive glossary file, built and refined as the script processes the book.
The second pass re-translates the whole document. For every block, it provides the entire glossary to Gemini, forcing the model to use the pre-approved translated terms.
Output: A final translated file with consistent translations.
1. Extract
PDF to JSON
2. Translate (Two-Pass)
Generates Glossary & Final Text
3. Load
JSON to Firestore