What Is an Occlusion Manifest?
An occlusion manifest is a per-photo JSON that explains the visibility status of every field in the source document. For each field, it says whether it’svisible or provides the specific reason it failed extraction.
This is what makes penquify datasets useful for benchmarking: you know exactly which fields are extractable and which aren’t, and why.
Format
How It Works
Thebuild_occlusion_manifest() function cross-references the verification result with the PhotoVariation config:
- If a field has status
match->"visible" - If a field is
not_visibleorillegible, check which variation settings explain it - If a field is
mismatch, mark as image generation error
Occlusion Reasons
Crop / Framing
| Reason | Triggered When |
|---|---|
occluded_by_crop(header) | cropped_header=True |
occluded_by_crop(top 10-15%) | missing_area="top 10-15%" |
Stains / Contamination
| Reason | Triggered When |
|---|---|
obscured_by_coffee_stain(upper_right) | stain.type="coffee" + text_obstruction is partial or severe |
obscured_by_grease_stain(center) | Same, with grease stain |
obscured_by_water_stain(lower_left) | Same, with water stain |
Blur / Degradation
| Reason | Triggered When |
|---|---|
blurred_by_motion(horizontal) | motion_blur=True |
degraded_by_compression(heavy) | jpeg_compression is moderate or heavy |
washed_out_by_glare(general) | glare="strong" |
Angle / Distortion
| Reason | Triggered When |
|---|---|
distorted_by_extreme_angle | Angle contains "45" or skew="strong" |
warped_by_paper_curvature | curvature="strong" |
Multi-page / Hand
| Reason | Triggered When |
|---|---|
hidden_behind_stacked_page | stapled=True + stacked_sheets_behind > 0 |
possible_finger_occlusion | hand_visible=True (only for not_visible fields) |
Image Generation Error
| Reason | Triggered When |
|---|---|
hallucinated_or_garbled_by_image_gen | Status is mismatch |
Using Manifests for Benchmarking
The occlusion manifest enables precise OCR benchmarking:- Recall on visible fields: can your model extract what’s readable?
- Robustness on degraded fields: can your model handle partial visibility?
- False positive rate: does your model hallucinate values for hidden fields?