Moving Data in Production: The Three Architectures Every ML Engineer Must Know

Backend Engineer & AI/ML Developer passionate about building scalable APIs, cloud systems, and LLM-powered applications. Sharing insights on Python, Django, FastAPI, LangChain, and deploying AI in production. I love writing about: Backend Engineering: Python, Django, FastAPI, REST APIs, Celery, PostgreSQL, AWS, Docker AI/ML Applications: LLMs, LangChain, Prompt Engineering, NLP, Vector Databases, MLOps Scaling Products: Payment integration, asynchronous systems, and performance optimization On this blog, I’ll share lessons learned, tutorials, and real-world case studies from my journey building production-ready backends and AI applications. My goal is to make complex concepts practical, actionable, and beginner-friendly — especially for engineers looking to move from theory to real-world deployment.
When you build a machine learning system, the model gets all the glory. But for that model to do its job, it needs data—and that data needs to move. It flows from a database to a feature engineering service, from a prediction service to a user interface, and from a sensor to a real-time analytics dashboard.
How that data moves is a critical design decision. In a production environment, different services don't share memory. Instead, they communicate using one of three primary data flow modes. Understanding these is key to building a system that’s not just smart, but also fast, reliable, and scalable.
This article will break down the three fundamental ways data moves in a distributed ML system: through databases, through services, and through real-time transports.
Data Flow Through a Database
This is the most straightforward way for two processes to communicate. If Process A needs to send data to Process B, it simply writes that data to a shared database. Process B then reads the data from that same database.
Think of it like two people communicating through a shared notebook. One person writes a message, and the other person reads it later. It's an easy and reliable way to get a message across.
When it works: This method is perfect for batch-oriented tasks where latency isn't a concern. For example, a nightly job that generates a new dataset and writes it to a data warehouse for another process to use the next day. It’s simple, reliable, and requires minimal engineering overhead.
The biggest problem is latency. Reading and writing from a conventional database is slow. This makes it a non-starter for almost all modern, consumer-facing applications that need predictions in milliseconds. It also requires both processes to have access to the same database, which isn’t always possible in a complex, multi-organization ecosystem.
2. The Microservices Approach: Data Flow Through Services
In a distributed system, services are designed to be independent. They talk to each other directly via a network, using APIs. This is a request-driven model: Process A sends a direct request to Process B, and Process B sends the requested data back.
This is the backbone of the microservices architecture, where an application is broken down into small, independent services. A ride-sharing app, for instance, might have separate services for managing drivers, handling rides, and optimizing prices.
Popular request styles:
REST: The dominant style for public APIs on the web. Online prediction, where a user's request for a recommendation is sent to a model and a response is returned immediately, often uses RESTful APIs.
RPC: This method makes a network request look like a local function call in your code. It's often used for efficient communication between internal services within the same data center.
While powerful, this approach may create a chain of dependencies. The requesting service must wait for the target service to respond. This can lead to a single service failure cascading and bringing down the entire system. In a system with hundreds of services, this complexity can become a major bottleneck.
3. Data Flow Through Real-Time Transports
In many modern ML applications, data isn’t requested; it streams in continuously from users, sensors, or other systems. This requires a completely different architecture.
This is where real-time transports like Apache Kafka and Amazon Kinesis come in. They act as a central broker or event bus. Services don't talk to each other directly; they simply publish events to a central broker, and any service that is interested in those events can subscribe to them.
This creates a powerful pub-sub (publish-subscribe) model that completely decouples data producers from data consumers. A service publishing data doesn't need to know who is consuming it, and a consumer doesn't need to know who is producing it.
Why it's a game-changer:
Asynchronous & Fast: These transports are designed for high-volume, low-latency data streams, making them perfect for real-time applications like fraud detection or live recommendations.
Scalability: By decoupling services, this model prevents a failure in one part of the system from bringing down the entire chain.
Enables Continual Learning: For models that need to be updated frequently, real-time transports allow engineers to pull fresh training data directly from the stream, enabling faster and more agile model updates.
This architecture is the backbone of modern, data-heavy systems and is indispensable for any organization that relies on up-to-the-minute data to drive its business.
The Right Tool for the Job
In the end, there is no single best way to move data. The most effective ML systems are those that use a combination of these approaches. A system might use:
Databases for historical data and periodic batch jobs.
Services for direct, low-volume communication between key components.
Real-time transports for high-volume, asynchronous data streams that power real-time decisions.
Understanding the strengths and weaknesses of each data flow mode is essential. It's the difference between building a system that's just a model and building a system that truly works at scale.



