The Digital Paleographer: The Prompt & Reasoning
The accuracy and consistency of the transcriptions in this exhibit are governed by a specific set of instructions called a System Prompt. This prompt defines the AI's persona, its rules for interpretation, and its technical output constraints.
1. About the Prompt
The prompt is designed for high-fidelity extraction. It tells Gemini exactly how to handle the visual layout and historical nuances of the text.
**Role**: You are a professional transcriptionist specializing in high-accuracy document digitizing. **Task**: Transcribe all text from the attached image. ### Guidelines:* **Reading Order**: Follow a natural top-to-bottom, left-to-right reading flow.* **Accuracy**: Preserve original spelling, punctuation, capitalization, and line breaks.* **Handwriting**: Interpret handwritten text accurately. Use `[?]` for uncertainty and `[illegible]` for unreadable text.* **Layout**: Maintain paragraph structures. Use `[Margin: text]` for floating notes.* **Non-Text Elements**: Label structural elements like `[Header]` and `[Signature]` using IIIF-approved HTML (`<p>`, `<span>`, `<b>`, etc.). ### Constraints:* Provide **ONLY** the transcribed text. Do not include introductory remarks. 2. Understanding the Commentary (AI Reasoning)
While the transcription captures what is written, the Commentary Annotation explains how the AI decided to read it. In our exhibit, this is powered by a "Chain of Thought" process where Gemini verbalizes its internal logic.
How the Commentary is Generated
The IIIF Paleography utility makes a second pass (or a combined request) asking the model to reflect on its work. It specifically looks for:
- Visual Justification: "I identified this letter as an 'S' rather than an 'L' because of the distinct top loop consistent with the writer's previous pages."
- Contextual Inference: "The word 'University' was partially obscured, but inferred based on the surrounding sentence structure and the document's provenance."
- Resolution of Ambiguity: Explaining why a specific [?] was used or why a strike-through was ignored.
The IIIF Advantage: "Transcribing" vs. "Commenting"
In the technical manifest, these two outputs are stored with different Motivations according to W3C Web Annotation standards:
| Output | IIIF Motivation | Purpose |
|---|---|---|
| Transcription | transcribing | Intended to "transcribe" the visible text on the canvas as an overlay or searchable text layer. |
| Commentary | commenting | Intended to provide scholarly context or explanation about the resource. |
3. Inside the Annotations
Each annotation in the manifest follows the W3C Web Annotation model. The annotations carry two distinct groups of
properties: provenance metadata on the annotation itself, and content properties on the body.
Provenance properties (on the annotation)
| Property | Value | Description |
|---|---|---|
created | 2026-01-06T00:00:00Z | When the annotation record itself was created |
generated | 2026-01-06T00:00:00Z | When the software wrote the annotation to the manifest |
generator | iiif-paleography@v0.1.0 | The software tool that produced the annotation |
creator | gemini-3-pro-preview | The AI model that authored the content |
created and generated share the same timestamp here because the annotation was written to the manifest in the same
operation that created it. Together, generator and creator record the full provenance chain: the Python utility
that orchestrated the process and the underlying model that did the reasoning.
Body properties (the content)
The annotation body carries four properties that determine how a viewer renders and processes the content:
| Property | Description |
|---|---|
type | Always TextualBody — the content is inline text, not a linked resource |
value | The actual text content of the annotation |
format | The media type of value — determines how it should be parsed |
language | The declared language of the content |
Transcription bodies (motivation: transcribing)
The transcription bodies use "format": "text/html" and "language": "en". The value is structured HTML — <p>
tags for paragraphs, <br> for original line breaks, and <small> for marginal point values — preserving the
document's visual layout as markup rather than flattening it to plain text.
Commentary bodies (motivation: commenting)
The commentary bodies use "format": "text/markdown" and "language": "none". The value contains the AI's
chain-of-thought in Markdown prose — bold headings, bullet points, and inline emphasis — written in the AI's own voice
as it works through the handwriting. The "none" language tag reflects that this is generated interpretive text rather
than a language-declared human-authored transcription.
4. The Results
The manifest is rendered below using the Scroll component from Clover,
a IIIF viewing library built for exactly this kind of layered annotation display.
This exposes the reasoning and allows you to see the "Digital Paleographer" at work. If the AI misreads a word, the commentary often reveals a logical, albeit incorrect, path just as a human student might misinterpret a letter or punctuation mark. This provides a layer of metadata that helps human researchers validate the machine's work.