Data Infrastructure & Backend Services for Large-Scale E-Commerce

High-performance Python services for crawling, search volume analysis, and automated indexing – transitioning from batch jobs to Kafka.

Data Infrastructure & Backend Services for Large-Scale E-Commerce
Client
Otto
Services
Web Development
Technologies
Python, FastAPI, Vue, AWS

About the project

As part of a long-term collaboration with Otto, one of Germany’s largest e-commerce platforms, I was responsible for building and scaling backend services that process vast amounts of product and search-related data. The goal was to support internal teams with actionable insights, automated classification, and keyword analytics for millions of items. I worked closely with other backend developers and data engineers to improve performance and modernize the underlying architecture.

What was the problem?

In the context of large-scale e-commerce, timely and accurate data processing is critical — from crawling third-party sources to analyzing trends and keeping listings up to date. The challenge was twofold: first, legacy batch-based processes could no longer meet performance demands; second, relevant data (like search volume, trends, and classifications) needed to be enriched and delivered across different internal services reliably and at scale.

Project execution

We developed a collection of high-performance backend services in Python, each focused on a specific task:

  • A crawler optimization layer to reduce runtime and resource usage during product data extraction
  • A search volume keyword service that retrieved, processed, and stored real-time demand signals across multiple categories
  • An automated indexing service to classify and tag new or updated products for the shop’s internal structure
  • Several microservices for enriching product data with internal and external information (e.g., supplier metadata, trend scores)

Initially designed as batch-processing jobs, all critical services were eventually refactored into a Kafka-based architecture, allowing for real-time data streaming, easier scaling, and better fault tolerance.

Technical details

All services were written in Python, with a strong focus on performance (e.g., asynchronous tasks, optimized data pipelines, multiprocessing where needed).
The original system relied on scheduled batch jobs using Celery and cron-like orchestration. As demand grew, we transitioned to a Kafka-based microservice architecture:

  • Apache Kafka handled real-time messaging between services
  • Services were containerized with Docker for consistent deployment
  • Internal APIs were developed with FastAPI for speed and maintainability
  • Monitoring and logging were done using tools like Prometheus and Grafana to identify bottlenecks and improve throughput

The result was a highly modular, scalable backend landscape that enabled Otto to make faster decisions and deliver richer product data across the platform.

Data Infrastructure & Backend Services for Large-Scale E-Commerce
Data Infrastructure & Backend Services for Large-Scale E-Commerce