← Back to Projects
Production · Internal
Document Intelligence Platform
Enterprise platform that classifies, extracts, and analyzes financial documents using LLM-powered workflows with async FastAPI and AWS Bedrock.
PythonFastAPIAWS BedrockClaudePostgreSQLS3TextractDocker
What it does
An AI-powered platform that automates the processing of complex financial documents — classification, structured data extraction, summarization, and conversational Q&A — replacing hours of manual review with seconds of intelligent processing.
Architecture highlights
- Fully async FastAPI backend with aioboto3 for non-blocking AWS calls
- AWS Bedrock Claude integration with prompt caching (80% cost reduction on repeated document patterns)
- Multi-category document classifiers with confidence scoring and regex-based validation
- Structured data extraction using Claude's tool-use API with JSON schema validation per document type
- Multi-format document processing: PDF (pymupdf), images (Textract OCR), Excel, Word
- Hybrid caching layer — PostgreSQL for metadata lookups, S3 for large extraction results (keyed by SHA256 file hash)
- Conversational Q&A with threaded chat history stored in PostgreSQL
- Token counting with tiktoken for cost tracking and cache threshold decisions
- Export pipeline generating styled PDF and Excel reports from extraction results
Backend patterns
- Service layer architecture: Claude, Textract, S3, and cache services decoupled from route handlers
- SQLAlchemy ORM with Alembic migrations
- JWT authentication with user context dependency injection
- Smart OCR fallback: if extracted text falls below quality thresholds, automatically routes through AWS Textract
- Model fallback: automatic switchover between Claude models on service unavailability
- Load testing with Locust using mocked AWS services