Verification - Penquify

The Verification Pipeline

Penquify’s verification system ensures that generated photos contain the correct document data. The key design principle: the extraction model never sees ground truth values.

Generate photo

Gemini generates a photorealistic image from the clean document + variation config.

Blind extraction

A separate Gemini 2.5 Flash call receives the generated photo and a list of field names. It extracts values with confidence scores. It does NOT know the expected values.

Programmatic comparison

Extracted values are compared against the source schema in Python code. No model is involved. Values are normalized (strip whitespace, lowercase, remove $, commas, dots).

Classify results

Each field gets a status:

match — extracted value matches ground truth
mismatch — extracted value differs (image gen error)
illegible — model can’t read it (confidence below 0.5)
not_visible — field not in frame (cropped, occluded)

Retry on mismatch

Mismatches indicate the image generator rendered text incorrectly. Penquify retries up to N times, emphasizing the wrong fields in the prompt.

Only mismatch triggers retries. illegible and not_visible are expected outcomes of intentional variation effects (blur, crop, stains) and are documented in the occlusion manifest.

Extraction Prompt

The extraction model receives:

The generated photo
A JSON list of field names to look for

For each field, it returns:

value — what it read (or null)
confidence — 0.0 to 1.0
reason — null, "blurry", "cropped", "occluded", or "not_in_frame"

{
  "extractions": {
    "doc_number": {
      "value": "01182034",
      "confidence": 0.95,
      "reason": null
    },
    "emitter_name": {
      "value": null,
      "confidence": 0.0,
      "reason": "cropped"
    }
  }
}

Comparison Logic

Comparison is pure Python — no model involved:

def _normalize(val: str) -> str:
    s = str(val).strip().lower()
    s = s.replace("$", "").replace(",", "").replace(".", "").strip()
    return s

Condition	Status
`value is None` or `confidence == 0` + reason is `cropped`/`not_in_frame`	`not_visible`
`value is None` or `confidence == 0` + other reason	`illegible`
`confidence < 0.5`	`illegible`
`normalize(extracted) == normalize(expected)`	`match`
Otherwise	`mismatch`

Verification Result

{
  "fields": {
    "doc_number": {
      "source_value": "01182034",
      "extracted_value": "01182034",
      "status": "match",
      "confidence": 0.95,
      "extraction_reason": null
    }
  },
  "summary": {
    "total_fields": 25,
    "matched": 20,
    "mismatched": 0,
    "illegible": 3,
    "not_visible": 2
  }
}

Verified Generation

The generate_verified_photo() function combines generation + verification + retry:

Generate photo
Verify against schema
If mismatches exist and retries remain, regenerate with emphasis on wrong fields
Return result with verified: true/false, attempt count, verification details, and occlusion manifest

result = await generate_verified_photo(
    reference_image_path="output/doc.png",
    variation=PhotoVariation(name="full_picture"),
    output_path="output/photo_full.png",
    schema={"doc_number": "01182034", "emitter_name": "ACME FOODS S.A."},
    max_retries=3,
)
# result["verified"] == True
# result["attempts"] == 1
# result["occlusion_manifest"]["doc_number"] == "visible"

​The Verification Pipeline

​Extraction Prompt

​Comparison Logic

​Verification Result

​Verified Generation

The Verification Pipeline

Extraction Prompt

Comparison Logic

Verification Result

Verified Generation