quack-check
Overview Link to heading
quack-check is a deterministic PDF transcript orchestrator built around Docling. It classifies PDF quality, chooses a policy, chunks large files safely, runs Docling (or native extraction), and merges a stable transcript with optional post-processing.
Ambition Link to heading
I’m building this to become a sharp, reusable tool that I can rely on in real workflows: fast, well-scoped, and easy to operate.
What’s novel Link to heading
- Opinionated defaults with room for power-user control.
- Tight scope + strong ergonomics (the “small tool, big leverage” approach).
Highlights Link to heading
- PDFs are not uniform. Some have clean embedded text, some have partial/broken layers, and some are image-only scans.
- Docling can do a lot, but deterministic orchestration, chunking, and policy decisions are on you.
- quack-check makes those decisions explicit and configurable.
- Preflight probe for text quality (chars/page, garbage ratio, whitespace ratio).
- Policy-driven extraction tiers: high-text, mixed-text, scan.
Stats Link to heading
- Project page: /projects/quack-check/
- Primary language: Rust
- Commits: 15
- Created: 2026-02-13T20:05:07Z
- Last updated: 2026-02-14T04:48:05Z
Links Link to heading
- Repo: https://github.com/sguzman/quack-check
- README: /projects/readme/quack-check/
- DeepWiki: https://deepwiki.com/sguzman/quack-check/