The Digital Paleographer: The Prompt & Reasoning

The accuracy and consistency of the transcriptions in this exhibit are governed by a specific set of instructions called a System Prompt. This prompt defines the AI's persona, its rules for interpretation, and its technical output constraints.


1. About the Prompt

The prompt is designed for high-fidelity extraction. It tells Gemini exactly how to handle the visual layout and historical nuances of the text.

**Role**: You are a professional transcriptionist specializing in high-accuracy document digitizing. **Task**: Transcribe all text from the attached image. ### Guidelines:* **Reading Order**: Follow a natural top-to-bottom, left-to-right reading flow.* **Accuracy**: Preserve original spelling, punctuation, capitalization, and line breaks.* **Handwriting**: Interpret handwritten text accurately. Use `[?]` for uncertainty and `[illegible]` for unreadable text.* **Layout**: Maintain paragraph structures. Use `[Margin: text]` for floating notes.* **Non-Text Elements**: Label structural elements like `[Header]` and `[Signature]` using IIIF-approved HTML (`<p>`, `<span>`, `<b>`, etc.). ### Constraints:* Provide **ONLY** the transcribed text. Do not include introductory remarks. 

2. Understanding the Commentary (AI Reasoning)

While the transcription captures what is written, the Commentary Annotation explains how the AI decided to read it. In our exhibit, this is powered by a "Chain of Thought" process where Gemini verbalizes its internal logic.

How the Commentary is Generated

The IIIF Paleography utility makes a second pass (or a combined request) asking the model to reflect on its work. It specifically looks for:

  • Visual Justification: "I identified this letter as an 'S' rather than an 'L' because of the distinct top loop consistent with the writer's previous pages."
  • Contextual Inference: "The word 'University' was partially obscured, but inferred based on the surrounding sentence structure and the document's provenance."
  • Resolution of Ambiguity: Explaining why a specific [?] was used or why a strike-through was ignored.

The IIIF Advantage: "Transcribing" vs. "Commenting"

In the technical manifest, these two outputs are stored with different Motivations according to W3C Web Annotation standards:

OutputIIIF MotivationPurpose
TranscriptiontranscribingIntended to "transcribe" the visible text on the canvas as an overlay or searchable text layer.
CommentarycommentingIntended to provide scholarly context or explanation about the resource.

3. Inside the Annotations

Each annotation in the manifest follows the W3C Web Annotation model. The annotations carry two distinct groups of properties: provenance metadata on the annotation itself, and content properties on the body.

Provenance properties (on the annotation)

PropertyValueDescription
created2026-01-06T00:00:00ZWhen the annotation record itself was created
generated2026-01-06T00:00:00ZWhen the software wrote the annotation to the manifest
generatoriiif-paleography@v0.1.0The software tool that produced the annotation
creatorgemini-3-pro-previewThe AI model that authored the content

created and generated share the same timestamp here because the annotation was written to the manifest in the same operation that created it. Together, generator and creator record the full provenance chain: the Python utility that orchestrated the process and the underlying model that did the reasoning.

Body properties (the content)

The annotation body carries four properties that determine how a viewer renders and processes the content:

PropertyDescription
typeAlways TextualBody — the content is inline text, not a linked resource
valueThe actual text content of the annotation
formatThe media type of value — determines how it should be parsed
languageThe declared language of the content

Transcription bodies (motivation: transcribing)

The transcription bodies use "format": "text/html" and "language": "en". The value is structured HTML — <p> tags for paragraphs, <br> for original line breaks, and <small> for marginal point values — preserving the document's visual layout as markup rather than flattening it to plain text.

Commentary bodies (motivation: commenting)

The commentary bodies use "format": "text/markdown" and "language": "none". The value contains the AI's chain-of-thought in Markdown prose — bold headings, bullet points, and inline emphasis — written in the AI's own voice as it works through the handwriting. The "none" language tag reflects that this is generated interpretive text rather than a language-declared human-authored transcription.


4. The Results

The manifest is rendered below using the Scroll component from Clover, a IIIF viewing library built for exactly this kind of layered annotation display.

This exposes the reasoning and allows you to see the "Digital Paleographer" at work. If the AI misreads a word, the commentary often reveals a logical, albeit incorrect, path just as a human student might misinterpret a letter or punctuation mark. This provides a layer of metadata that helps human researchers validate the machine's work.