← Back to Projects
Production · Internal

Document Intelligence Platform

Enterprise platform that classifies, extracts, and analyzes financial documents using LLM-powered workflows with async FastAPI and AWS Bedrock.

PythonFastAPIAWS BedrockClaudePostgreSQLS3TextractDocker

What it does

An AI-powered platform that automates the processing of complex financial documents — classification, structured data extraction, summarization, and conversational Q&A — replacing hours of manual review with seconds of intelligent processing.

Architecture highlights

  • Fully async FastAPI backend with aioboto3 for non-blocking AWS calls
  • AWS Bedrock Claude integration with prompt caching (80% cost reduction on repeated document patterns)
  • Multi-category document classifiers with confidence scoring and regex-based validation
  • Structured data extraction using Claude's tool-use API with JSON schema validation per document type
  • Multi-format document processing: PDF (pymupdf), images (Textract OCR), Excel, Word
  • Hybrid caching layer — PostgreSQL for metadata lookups, S3 for large extraction results (keyed by SHA256 file hash)
  • Conversational Q&A with threaded chat history stored in PostgreSQL
  • Token counting with tiktoken for cost tracking and cache threshold decisions
  • Export pipeline generating styled PDF and Excel reports from extraction results

Backend patterns

  • Service layer architecture: Claude, Textract, S3, and cache services decoupled from route handlers
  • SQLAlchemy ORM with Alembic migrations
  • JWT authentication with user context dependency injection
  • Smart OCR fallback: if extracted text falls below quality thresholds, automatically routes through AWS Textract
  • Model fallback: automatic switchover between Claude models on service unavailability
  • Load testing with Locust using mocked AWS services