← Back to Projects
Production · Internal
E-Commerce Data Ingestion Pipeline
High-throughput data pipeline processing 15M+ product SKUs daily from hundreds of retailers via FTP, SFTP, and direct API feeds into a unified commerce data platform.
PythonGoFastAPIPostgreSQLData PipelinesETL
What it does
An end-to-end data ingestion system that pulls product data from hundreds of retailers through multiple protocols — FTP, SFTP, and direct APIs — normalizes it, and feeds it into a commerce content platform. Replaced unreliable crawl-based data collection with structured, real-time feeds.
Architecture highlights
- Multi-protocol ingestion layer handling FTP, SFTP, and REST API feeds with varied data formats (CSV, XML, JSON, proprietary)
- Processing throughput of 15M+ SKUs per day across hundreds of retailer feeds
- User-configurable pipeline: new data sources added via configuration, not code changes
- Data normalization layer handling unstructured and inconsistent data across retailers
- PostgreSQL as the core data store with optimized bulk upsert strategies
- Reduced commerce article creation time by 20% by providing reliable, up-to-date product data
Backend patterns
- Go and Python services working together — Go for high-throughput processing, Python for orchestration and API layers
- FastAPI serving layer for internal APIs and configuration management
- Idempotent processing: safe to re-run feeds without duplicating data
- Monitoring and alerting for feed health, data freshness, and processing lag
- Graceful handling of upstream failures — partial feed processing with retry logic