← Back to Projects
Production · Internal

E-Commerce Data Ingestion Pipeline

High-throughput data pipeline processing 15M+ product SKUs daily from hundreds of retailers via FTP, SFTP, and direct API feeds into a unified commerce data platform.

PythonGoFastAPIPostgreSQLData PipelinesETL

What it does

An end-to-end data ingestion system that pulls product data from hundreds of retailers through multiple protocols — FTP, SFTP, and direct APIs — normalizes it, and feeds it into a commerce content platform. Replaced unreliable crawl-based data collection with structured, real-time feeds.

Architecture highlights

  • Multi-protocol ingestion layer handling FTP, SFTP, and REST API feeds with varied data formats (CSV, XML, JSON, proprietary)
  • Processing throughput of 15M+ SKUs per day across hundreds of retailer feeds
  • User-configurable pipeline: new data sources added via configuration, not code changes
  • Data normalization layer handling unstructured and inconsistent data across retailers
  • PostgreSQL as the core data store with optimized bulk upsert strategies
  • Reduced commerce article creation time by 20% by providing reliable, up-to-date product data

Backend patterns

  • Go and Python services working together — Go for high-throughput processing, Python for orchestration and API layers
  • FastAPI serving layer for internal APIs and configuration management
  • Idempotent processing: safe to re-run feeds without duplicating data
  • Monitoring and alerting for feed health, data freshness, and processing lag
  • Graceful handling of upstream failures — partial feed processing with retry logic