Nebius Group N.V.
NBIS · Netherlands
Rents NVIDIA GPU clusters interconnected via InfiniBand across jurisdictionally compliant data centers so AI developers can run distributed training workloads without assembling the infrastructure themselves.
Nebius rents fixed GPU clusters interconnected via InfiniBand because distributed AI training demands microsecond-level latency synchronization that commodity ethernet cannot provide, binding the entire compute topology to specific data center facilities with sufficient power and cooling. Those facilities must then fall within jurisdictionally compliant locations under European data sovereignty regulations, so the geographic footprint of eligible facilities becomes a hard boundary on which workloads the Kubernetes and Slurm orchestration layers can legally serve. NVIDIA's allocation decisions — made in favor of hyperscale cloud providers — set an external ceiling on total GPU count that capital cannot overcome, and facility construction timelines resist compression in parallel, meaning neither compute supply nor hosting capacity can be expanded to meet demand even when funding is available. Customers who have customized orchestration configurations, optimized InfiniBand topologies, and accumulated MLflow experiment records face weeks of migration testing to switch providers, creating lock-in that sustains utilization — yet if Avride's internal training cycles grow large enough, they consume the spare capacity on which external customers depend, and the same internal demand intensity that deepens the platform's differentiation becomes the mechanism by which it displaces the external customer base that differentiation is meant to attract.
How does this company make money?
Nebius charges hourly rates for GPU compute capacity allocation, monthly subscriptions for managed Kubernetes and Slurm orchestration services, and per-seat licensing for MLflow, PostgreSQL, and Apache Spark managed services accessed through the Nebius AI Cloud platform.
What makes this company hard to replace?
Kubernetes and Slurm orchestration configurations customized for specific AI model architectures require weeks of migration testing before a customer can move to a different provider. InfiniBand networking topology optimizations built for distributed training workloads cannot be easily replicated across different data center environments. MLflow experiment tracking and model versioning creates data gravity — accumulated records of experiments and model versions — that makes switching compute providers operationally costly.
What limits this company?
NVIDIA determines how many H100, H200, B200, and GB200 units Nebius receives through allocation decisions made in favor of hyperscale cloud providers, so the total GPU count in any cluster is a ceiling set externally and cannot be expanded through capital deployment alone. Facility construction timelines and chip allocation cycles compound independently, meaning neither compute supply nor hosting capacity can be accelerated to meet demand even when funding is available.
What does this company depend on?
The infrastructure depends on NVIDIA GPU chip allocations for the H100, H200, B200, and GB200 series; InfiniBand networking hardware for cluster interconnects; European and North American data center facilities with sufficient power and cooling capacity; Kubernetes and Slurm orchestration software licenses; and MLflow and Apache Spark managed service frameworks.
Who depends on this company?
AI model developers training large language models would lose access to the synchronized GPU clusters required for distributed training workloads. Healthcare and life sciences companies running AI inference would experience service disruptions in real-time diagnostic and drug discovery applications. Autonomous driving companies, including Avride, would lose the compute infrastructure needed for training perception and decision-making models.
How does this company scale?
GPU cluster utilization and managed service software can serve additional AI workloads with minimal marginal cost once infrastructure is deployed. Physical data center expansion and NVIDIA GPU procurement cannot be accelerated through capital alone, because chip allocation constraints and facility construction timelines resist compression regardless of available funding.
What external forces can significantly affect this company?
U.S.-China semiconductor export controls affect NVIDIA GPU availability across different regions. European data sovereignty regulations require AI model training to occur within specific jurisdictions. Climate regulations increase power costs and carbon reporting requirements for energy-intensive GPU compute operations.
Where is this company structurally vulnerable?
If Avride's training cycles grow large enough to crowd out external customer GPU allocations, the captive workload that produces the optimization depth also consumes the spare capacity on which external customers depend — the same internal demand intensity that differentiates the platform is the mechanism by which it cannibalizes the external customer base the differentiator is meant to attract.