┌─────────────────────────────┐ │ S H E D P R O T O T Y P E │ │ ───────────────────────── │ │ 1. Upload a document left │ │ 2. Click ▶ EXTRACT │ │ 3. View results here │ └─────────────────────────────┘
Upload a delivery note PDF or image, click ▶ EXTRACT, then select the document to view structured results.
SHED is a local web tool for extracting structured data from construction documents (Delivery Notes, Invoices, Certificates, and more) using AI vision models. Upload a scanned PDF or image, run the extraction command, and get all fields populated as structured JSON — ready to review, edit, and compare side-by-side with the original document. Field schemas are configurable per doc type in the FIELDS tab; a 57-field Delivery Note schema is included as the default.
Each document passes through a preprocessing pipeline before any AI is involved, improving accuracy and minimising token cost.
uploads/ and appears in the document list immediately. PDFs are rasterised at 200 dpi before processing._part1.pdf, _part2.pdf, etc. Each part is tagged auto-split, gets meta.split_from set to the original stem, and a note recording its page range. Each part then continues through the pipeline independently. The original PDF is deleted. If detection fails for any reason, the file is treated as a single document (silent fallback).uploads/ — so they are invisible to the browser and batch runner. Each chunk is extracted independently, then the results are merged: top-level fields come from the highest-confidence chunk, materials are concatenated in page order, costs are summed. The merged document is tagged auto-chunked with a note listing the segment page ranges.
Gemini provider exception: When the active provider is Gemini, PDFs are passed natively as application/pdf bytes directly to the Gemini API — rasterisation, image enhancement, and OCR are all skipped. Gemini handles text reading internally with no page limit in this mode.
Skipped for Gemini when processing PDFs natively.
The document is deskewed (rotation corrected), converted to grayscale, then noise-checked: a median filter is applied only if the image is detected as noisy (RMS pixel deviation from local neighbourhood > 4.0) — this avoids blurring fine text on clean scans while still removing salt-and-pepper artefacts from poor-quality ones. Adaptive contrast is then applied (histogram stretch to full range), followed by sharpen ×2. These steps improve both OCR and AI vision accuracy across the full quality range.{"value": "...", "conf": 0.97} — confidence baked in, no separate block. Not-found fields are null. No explanation, no markdown — JSON only. Both the prescreen and extraction models are independently configurable in ⚙ Settings."40,000" → 40.0, "1.200" → 1200.0. Applied after extraction so it never depends on the model getting it right. The script also computes fill_rate (fraction of fields with a value) and avg_confidence (mean conf across all filled fields) and injects them into meta. Finally, a webhook notification is POSTed to the configured URL on extraction complete (configured in ⚙ Settings).The extraction model and prescreen model are independently configurable in ⚙ Settings. Each slot can use a different provider — useful for mixing a free-tier Gemini Flash prescreen with a Claude CLI extraction, for example.
claude -p <prompt> --allowedTools Read as a subprocess. Requires a Claude Code CLI login. No API key needed. Images passed via the Read tool.{"value": "...", "conf": 0.97}; not-found fields are null. meta includes fill_rate and avg_confidence. No markdown, no code fences. On failure, extract() raises ExtractionError or ClassificationError instead of calling sys.exit() — making it safely callable from Python code (API routes, tests, batch scripts).Image tokens dominate the cost per extraction. Four targeted optimisations bring the total down by roughly 70–75% compared to a naive implementation, with no loss in extraction quality.
_confidence schema block duplicated all 57 field names a second time. Instead, one instruction sentence tells the model to return {"value": "...", "conf": 0.0–1.0} per field. Same output, half the schema tokens.After extraction, every field can be reviewed and annotated directly in the browser without touching the JSON file. Documents move through workflow phases as they are reviewed and approved.
fill_rate is the fraction of all schema fields that have a non-null value. avg_confidence is the mean confidence across all filled fields. Both are computed in Python after extraction and stored in meta."error": "wrong_value" inside the field object. Flagged fields show a coloured badge in view mode. Material rows have a row-level flag.meta.status. Phase tabs in the left panel filter the document list. The workflow bar at the bottom of each FIELDS view advances or resets the phase with one click.The ANALYTICS tab gives a live overview of extraction quality across all loaded documents. All metrics are computed client-side from the JSON already in memory — no extra server calls. Four sub-tabs:
CSV exports (docs and materials) have moved to the TOOLS tab.
dan) and English (eng) language packs. Provides character-level hints to the AI model.