Skip to main content

Overview

If you already have a printed document (PDF or image), penquify can detect its schema automatically and generate realistic photo variations with ground-truth verification.

The Upload Pipeline

1

Upload PDF or image

Provide a file path (CLI/Python) or upload via API.
2

Convert to image

If a PDF, penquify converts the first page to PNG using Playwright.
3

Detect schema

Gemini 2.5 Flash extracts every field: document type, header fields, line items, totals. Returns structured JSON with confidence score.
4

Flatten for verification

The detected schema is flattened to a field_name -> value dict that serves as ground truth.
5

Generate verified photos

Photo variations are generated and verified against the detected schema.

Usage

penquify upload --image /path/to/invoice.pdf --output output/ \
  --presets full_picture folded_skewed blurry

Detected Schema Format

The vision model returns a structured schema:
{
  "document_type": "dispatch_guide",
  "header": {
    "doc_number": "00098765",
    "date": "20/04/2026",
    "emitter_name": "ACME LOGISTICS S.A.",
    "receiver_name": "HOTEL PACIFIC S.A."
  },
  "items": [
    {
      "pos": 1,
      "code": "AL-3001",
      "description": "SALMON FRESCO FILETE",
      "qty": 25,
      "unit": "KG",
      "unit_price": 14500,
      "total": 362500
    }
  ],
  "totals": {
    "subtotal": 551500,
    "tax": 104785,
    "total": 656285
  },
  "confidence": 0.92
}

Output Files

output/uploaded/
  source.png                    # converted from PDF (if PDF input)
  detected_schema.json          # full detected schema
  ground_truth.json             # flattened field -> value dict
  photo_full_picture.png        # generated photos
  photo_full_picture_verification.json
  photo_full_picture_occlusion.json
  photo_folded_skewed.png
  ...
If no presets are specified, the upload pipeline defaults to full_picture, folded_skewed, and blurry.