cfr-to-text
Overview Link to heading
Extract text from CFR XML files (Code of Federal Regulations) into plain text or JSONL.
Ambition Link to heading
A robust, industrial-grade extraction tool for the Code of Federal Regulations, stripping complex XML schemas into semantic text.
What’s novel Link to heading
- Sophisticated CLI with TOML configuration for fine-grained control over element exclusion and whitespace normalization.
- High-speed XML event processing capable of handling the entire US Federal database.
- Automated output splitting and file management for massive, multi-part datasets.
Highlights Link to heading
--config <FILE>: Config file path (defaultcfr-to-text.toml)--input-dir <DIR>/ positional inputs--recursive/--no-recursive--glob <GLOB>(repeatable)--output-dir <DIR>or--output <FILE>
Stats Link to heading
- Project page: /projects/cfr-to-text/
- Primary language: Rust
- Commits: 5
- Created: 2026-01-29T11:54:35Z
- Last updated: 2026-01-29T12:49:25Z
Links Link to heading
- Repo: https://github.com/sguzman/cfr-to-text
- README: /projects/readme/cfr-to-text/
- DeepWiki: https://deepwiki.com/sguzman/cfr-to-text/