Tooliax Logo
ExploreCompareCategoriesSubmit Tool
News
Tooliax Logo
ExploreCompareCategoriesSubmit Tool
News
Microsoft Launches Maia 200: Custom AI Chip Revolutionizes Azure Inference for LLMs
Back to News
Saturday, January 31, 20264 min read

Microsoft Launches Maia 200: Custom AI Chip Revolutionizes Azure Inference for LLMs

Microsoft has introduced its new, internally developed AI accelerator, the Maia 200, engineered specifically for inference tasks in Azure datacenters. This innovative chip is poised to optimize the critical aspect of token generation for large language models (LLMs) and various reasoning workloads, achieving its goals through a combination of narrow precision computing, a sophisticated on-chip memory architecture, and an advanced Ethernet-based scale-up interconnect.

The decision to create a dedicated inference chip stems from the distinct demands placed on hardware by AI training versus inference. While training necessitates extensive all-to-all communication and prolonged job durations, inference prioritizes metrics like tokens per second, minimal latency, and overall cost efficiency. Microsoft positions the Maia 200 as its most performant inference system, claiming approximately 30 percent better performance per dollar compared to the latest hardware currently deployed in its cloud infrastructure. The company also notes its FP4 performance is three times that of Amazon Trainium's third generation and boasts higher FP8 performance than Google's TPU v7 at the accelerator level.

The Maia 200 will integrate seamlessly into Azure's diverse hardware ecosystem. It is slated to support a wide array of models, including the upcoming GPT 5.2 models from OpenAI, and will power crucial workloads within Microsoft Foundry and Microsoft 365 Copilot. Furthermore, Microsoft’s internal Superintelligence team plans to leverage this chip for generating synthetic data and for reinforcement learning applications, aiming to refine in-house models.

At its core, the Maia 200 silicon is manufactured using TSMC’s leading 3-nanometer process technology, integrating an impressive count of over 140 billion transistors. Its compute pipeline features native support for FP8 and FP4 tensor operations. Each chip is capable of delivering more than 10 petaFLOPS in FP4 and exceeding 5 petaFLOPS in FP8, all while maintaining a 750W system-on-chip thermal design power envelope.

Memory resources are strategically divided between high-bandwidth memory (HBM) and on-die static random-access memory (SRAM). The Maia 200 incorporates 216 GB of HBM3e, providing approximately 7 terabytes per second of bandwidth, complemented by 272 MB of on-die SRAM. This SRAM is organized into distinct tile-level and cluster-level tiers, offering full software management. This allows compilers and runtimes to precisely allocate working sets, ensuring critical attention and General Matrix Multiply (GEMM) kernels remain close to the compute units.

The microarchitecture of the Maia 200 is characterized by its hierarchical, tile-based design. The fundamental computational and storage unit is the tile, each containing a Tile Tensor Unit for high-throughput matrix operations, a Tile Vector Processor for programmable SIMD tasks, and dedicated Tile SRAM. Data movement within the tile is managed by Tile DMA engines, preventing computational stalls. A Tile Control Processor orchestrates the sequence of operations. Multiple tiles form a cluster, sharing a larger, multi-banked Cluster SRAM. Cluster-level DMA engines facilitate data transfer between Cluster SRAM and the co-packaged HBM stacks. A cluster core manages multi-tile execution and employs redundancy schemes to enhance yield without altering the programming model.

Efficient data movement is paramount for inference performance. The Maia 200 employs a custom Network on Chip (NoC) alongside its hierarchy of DMA engines. This NoC interconnects tiles, clusters, memory controllers, and I/O units, featuring separate planes for large tensor traffic and smaller control messages to prevent bottlenecks. Beyond the chip, the Maia 200 integrates its own network interface controller (NIC) and an Ethernet-based scale-up network utilizing the AI Transport Layer protocol. This on-die NIC provides approximately 1.4 terabytes per second of bandwidth in each direction, capable of scaling up to 6,144 accelerators across a two-tier domain. Within each tray, four Maia accelerators form a Fully Connected Quad (FCQ), leveraging direct, non-switched links to optimize tensor parallel traffic and reduce reliance on external switches.

At the system level, the Maia 200 adheres to Azure's established rack, power, and mechanical standards. It supports both air-cooled and liquid-cooled configurations, featuring a second-generation closed-loop liquid cooling Heat Exchanger Unit for high-density rack deployments. This flexibility enables seamless integration and mixed deployments of GPUs and Maia accelerators within the same datacenter footprint. The accelerator is deeply integrated with the Azure control plane, leveraging existing workflows for firmware management, health monitoring, and telemetry, ensuring smooth fleet-wide rollouts and maintenance without disrupting ongoing AI workloads.

This article is a rewritten summary based on publicly available reporting. For the original story, visit the source.

Source: MarkTechPost
Share this article

Latest News

From Political Chaos to Policy Crossroads: Albanese Navigates Shifting Sands

From Political Chaos to Policy Crossroads: Albanese Navigates Shifting Sands

Feb 3

Historic Reimagining: Barnsley Crowned UK's First 'Tech Town' with Major Global Partnerships

Historic Reimagining: Barnsley Crowned UK's First 'Tech Town' with Major Global Partnerships

Feb 3

OpenClaw: Viral AI Assistant's Autonomy Ignites Debate Amidst Expert Warnings

OpenClaw: Viral AI Assistant's Autonomy Ignites Debate Amidst Expert Warnings

Feb 3

Adobe Sunsets Animate: A Generative AI Strategy Claims a Legacy Tool

Adobe Sunsets Animate: A Generative AI Strategy Claims a Legacy Tool

Feb 3

Palantir CEO Alex Karp: ICE Protesters Should Demand *More* AI Surveillance

Palantir CEO Alex Karp: ICE Protesters Should Demand *More* AI Surveillance

Feb 3

View All News

More News

Sharpening Your Skills: Navigating Decision Tree Challenges in Data Science Interviews

February 2, 2026

Sharpening Your Skills: Navigating Decision Tree Challenges in Data Science Interviews

India's Zero-Tax Gambit: A 23-Year Incentive to Lure Global AI Infrastructure

February 2, 2026

India's Zero-Tax Gambit: A 23-Year Incentive to Lure Global AI Infrastructure

Exposed: The 'AI-Washing' Phenomenon Masking Traditional Layoffs

February 2, 2026

Exposed: The 'AI-Washing' Phenomenon Masking Traditional Layoffs

Tooliax LogoTooliax

Your comprehensive directory for discovering, comparing, and exploring the best AI tools available.

Quick Links

  • Explore Tools
  • Compare
  • Submit Tool
  • About Us

Legal

  • Privacy Policy
  • Terms of Service
  • Cookie Policy
  • Contact

© 2026 Tooliax. All rights reserved.