In Part 1 of our series –https://www.vamsitalkstech.com/ai/engineering-the-ai-factory-blueprint-for-industrial-scale-ai-infrastructure/ , we explored the foundational concepts of AI factories and their critical infrastructure components. I intend to do a series of blogs on how webscale leaders leverage AI/GenAI in their business. So let’s dive deep into one of the most sophisticated implementations of an AI factory in production: Uber’s machine learning infrastructure.Â
In this blogpost, we’ll examine how Uber architected its AI factory to handle this massive scale, the technical challenges they overcame, and the lessons learned that any organization can apply. We’ll explore the architecture of Michelangelo, Uber’s core ML platform; how they handle real-time inference at global scale; their approach to GPU resource management; their implementation of ML observability; and their latest advances in generative AI integration. What makes Uber’s implementation particularly interesting is its evolution through three distinct phases: Initial ML Infrastructure (2016-2019), where they built the foundational Michelangelo platform; Deep Learning Transformation (2019-2023), when they scaled to handle complex neural networks; and Generative AI Integration (2023-present), incorporating large language models and new AI capabilities.
Uber’s journey from a ride-hailing startup to a global technology powerhouse is intrinsically linked to its mastery of artificial intelligence at scale. Processing over 10 million real-time predictions per second through its AI factory, Uber has created what might be considered the world’s largest real-time decision engine, powering everything from ride matching and pricing to fraud detection and food delivery optimization. The idea is to convey how theoretical AI factory concepts translate into real-world implementation at one of the world’s most advanced technology companies.
Uber, a company synonymous with connecting the physical and digital worlds, relies heavily on Artificial Intelligence (AI) and Machine Learning (ML) to orchestrate its complex global operations. From matching riders and drivers in real-time to predicting arrival times with remarkable accuracy and ensuring platform safety, AI is not just a feature—it’s the engine driving the core business. Powering these sophisticated AI capabilities requires an equally sophisticated, scalable, and constantly evolving infrastructure.

(ML Impact at Uber- Credit Uber AI Blog)
Let us now delve into the key components and strategies behind Uber’s AI infrastructure, drawing insights from their public engineering communications at the Uber AI blog.
The Evolutionary Journey: From Notebooks to a Global Platform
Uber’s AI journey began around 2015-2016, initially involving applied scientists developing models in notebooks and engineers crafting custom pipelines for deployment.1 Recognizing the need for standardization and scalability, Uber embarked on building a centralized platform. This evolution can be broadly categorized:
- Predictive ML Era (2016-2019): Focused on building foundational capabilities for predictive modeling using tabular data (e.g., XGBoost for ETAs, pricing, risk assessment) The cornerstone, Michelangelo, was launched during this period.
- Deep Learning Ascendance (2019-2023): Marked by a significant shift towards Deep Learning (DL) for enhanced model performance across critical applications. Over 60% of Uber’s top-tier models incorporated DL by the end of this phase. Infrastructure was enhanced to support DL training, deployment, and monitoring.1 Project Canvas was introduced, bringing software engineering rigor (like code reviews, testing, monorepo management) to ML development.
- Generative AI Integration (2023-Present): Characterized by active exploration and integration of Generative AI (GenAI) and Large Language Models (LLMs) to boost internal productivity and enhance user experiences. This phase necessitated significant infrastructure upgrades.
Michelangelo: The Central Nervous System

(Michelangelo Architecture – Credit Uber AI Blog)
At the heart of Uber’s AI infrastructure lies Michelangelo, their end-to-end ML platform developed in-house since 2015-2016. Designed to empower hundreds of internal projects and manage tens of thousands of monthly training jobs , Michelangelo covers the entire ML lifecycle:
- Data Management: Integrating with data sources and managing feature pipelines.4
- Model Training: Supporting various frameworks (like XGBoost, TensorFlow, PyTorch, Spark MLlib) and scaling techniques.
- Evaluation & Versioning: Tracking experiments and model versions.
- Deployment: Enabling models to be deployed offline (batch), online (real-time serving), or as libraries.
- Prediction & Monitoring: Serving predictions and monitoring model performance and data drift.
Michelangelo aims to abstract away infrastructure complexities, allowing product teams to focus on building and iterating on models. It includes components like a model registry (Gallery), debugging tools (Manifold), and frameworks for faster prototyping (PyML). Recent architectural improvements focus on a plug-and-play component model, allowing integration of both in-house and third-party/open-source tools.
Palette: The Feature Store

(Feature Store Architecture – Credit Uber AI Blog)
A critical component integrated with Michelangelo is Palette, Uber’s feature store. Launched to address challenges in managing and sharing feature pipelines, Palette serves as a central repository for curated, reusable features.
- Scale: It houses over 20,000 features used across various Uber teams.
- Computation: Supports both batch (using tools like Spark on HDFS/GCS) and near-real-time feature computation (using Kafka and Samza).
- Sharing & Discovery: Enables teams to discover and leverage existing features, reducing redundant work and improving data quality.
- Metadata Management: Recent re-architecture focused on improving metadata management for better scalability and maintainability, significantly reducing update times.
Gearing Up for Generative AI
The rise of GenAI demanded further evolution of Uber’s infrastructure :
- GenAI Gateway: Uber developed an internal gateway to provide a unified, secure, and managed interface for accessing various external and internal LLMs. It handles logging, cost management, safety policies, and PII redaction.
- LLMOps on Michelangelo: The platform was extended to support LLMOps workflows, including fine-tuning data preparation, prompt engineering, LLM deployment, serving, and monitoring. Techniques like memory offloading are used to improve LLM training efficiency.
- Hardware Upgrades: Significant investments were made in acquiring high-performance GPUs like Nvidia H100s, essential for meeting the low-latency requirements of many GenAI applications.
- Network Enhancements: Recognizing the bandwidth demands of large models, Uber focused on scaling network infrastructure. This includes upgrades to 100GB/s links, implementing full mesh NVlink connectivity between GPUs, improving congestion control, and establishing dedicated network topologies. Benchmarking showed that improved network bandwidth and congestion control nearly doubled LLM training speed.
- Memory Upgrades: To handle the larger memory footprint of newer AI workloads, server memory capacity (DIMMs) was doubled (e.g., 16GB to 32GB per channel).
Cloud and Compute Strategy
- Multi-Cloud Adoption: In February 2023, Uber began a strategic shift from primarily on-premise data centers to a multi-cloud environment. They leverage Google Cloud Platform (GCP) for batch data analytics and ML training, utilizing Google Cloud Storage (GCS) for their data lake and IaaS for migrating Hadoop/Spark/Presto workloads. Concurrently, a seven-year deal with Oracle Cloud Infrastructure (OCI) suggests OCI hosts other critical workloads, potentially chosen for price-performance advantages, especially for network-intensive tasks. This multi-cloud approach aims to optimize costs, access best-of-breed services, and mitigate vendor lock-in.Â
- Hardware Benchmarking: Uber conducts rigorous price-performance evaluations of various CPU and GPU models (including Nvidia A10, A100, H100) available both on-premise and from cloud providers. This benchmarking informs hardware selection for different workloads, from traditional tree-based models to DL and LLMs.
- Resource Optimization: A key goal is maximizing infrastructure utilization, particularly for expensive GPU resources. Techniques like reactive scaling (using idle serving capacity for training during off-peak hours) and measuring Model Flops Utilization (MFU) are employed.
MLOps and Data Management
- Frameworks: Uber utilizes a mix of industry-standard frameworks including TensorFlow, PyTorch, XGBoost, Apache Spark, and increasingly, Ray for distributed computing, especially on GPU clusters.
- Orchestration & Workflow: The Michelangelo Job Controller acts as a federation layer for scheduling batch workloads (Ray, Spark) across multiple Kubernetes clusters, abstracting infrastructure details. Project Canvas introduced software engineering practices (monorepo, CI/CD, code reviews, testing) into the ML development workflow.
- Data Pipelines & Governance: Data pipelines leverage tools like Kafka and Samza for streaming. On GCP, Uber uses GCS for storage and employs DataMesh, a service built internally, to manage and govern cloud resources (projects, buckets, IAM policies, monitoring) based on data ownership within the organization. The D3 system automatically monitors dataset columns for data drift to proactively catch quality issues.
The Impact of AI at UberÂ
Uber’s AI Factory stands as a prime example of a sophisticated real-time decision-making ecosystem, designed to handle the complexities of ride-hailing at scale. The core of this system lies in its ability to process an immense volume of data points concurrently, optimizing the pairing of riders and drivers through a mechanism known as “batched matching.”
This batched matching system doesn’t simply operate in a vacuum; it continuously learns and evolves. By tapping into historical trip data, analyzing traffic patterns, and factoring in real-time conditions, the AI Factory constantly refines its dispatch efficiency. It’s a dynamic system that adapts to the ebb and flow of rider demand and driver availability.
One of the most notable applications of machine learning within the AI Factory is Uber’s dynamic pricing model, often referred to as “surge pricing.” This model leverages sophisticated algorithms to forecast spikes in demand and modify prices accordingly. While sometimes controversial, surge pricing aims to incentivize drivers to get on the road during peak times, theoretically reducing wait times for riders.
The impact of Uber’s AI Factory on the overall ride-hailing experience has been significant. Reports suggest that the implementation of these advanced technologies has led to a notable decrease in wait times, potentially by as much as 30%. Furthermore, it appears that driver utilization has seen a boost, possibly around 20%. These figures highlight the potential for AI and machine learning to optimize complex logistical operations.
Conclusion
In conclusion, Uber’s AI infrastructure stands as a testament to the transformative power of AI when applied at scale. The company’s strategic investments in platforms, feature stores, and cutting-edge technologies have enabled it to navigate the complexities of real-world marketplaces and deliver AI-driven solutions that optimize its operations. However, the journey is not without its challenges. Balancing the efficiency of algorithmic systems with the human element remains an ongoing struggle, highlighting the need for companies to consider both technological advancements and the impact on individuals within their ecosystems. As Uber’s experience shows, AI infrastructure is not merely a supporting function but a strategic asset that can drive innovation and shape the future of businesses.