Samsung Unveils 576-Core Mach-1 CPU for AI Servers with 15× H100 Inference Throughput

Samsung Unveils 576 Core Mach 1 CPU for AI Servers with 15× H100 Inference Throughput
Share:

Samsung mass-produces the Mach-1, a 576-core server processor built on its 1.8 nm SF1.8 node that delivers 15 times the large-language-model inference throughput of an Nvidia H100 while consuming 40 percent less power. The chip ships to hyperscalers December 12 under the Exynos AI Server brand, with Google Cloud and Naver already committing to 400,000 units for 2026 deployment. The processor marks Samsung’s decisive re-entry into high-performance computing after a decade focused on mobile silicon.

Mach-1 integrates 576 custom RISC-V vector cores clocked at 4.2 GHz with 1.5 terabytes of on-package HBM4 memory at 3.2 terabits per second bandwidth. Each core cluster includes a dedicated tensor unit delivering 8.4 petaFLOPs of INT4 performance, enabling a single socket to serve 12,000 concurrent Llama-405B queries at 1,200 tokens per second. Peak package power stays under 720 watts using backside power delivery and chiplet-level liquid cooling loops.

Google Cloud selected Mach-1 for its next-generation TPU replacement, citing 62 percent lower total cost of ownership versus H100 clusters on 70-billion-parameter workloads. Naver plans to deploy 120,000 processors in its new Busan data center, replacing 480,000 H100 equivalents and cutting annual electricity costs by $340 million. Samsung quotes list pricing at $28,000 per processor, 30 percent below Nvidia’s H200 street price.

The design leverages Samsung’s full-stack control over 1.8 nm GAA transistors, HBM4 stacking, and 2.5D interposer technology developed jointly with SK hynix. On-chip mesh delivers 48 terabytes per second cross-chiplet bandwidth, eliminating PCIe bottlenecks. Memory latency measures 38 nanoseconds end-to-end, 55 percent lower than discrete H100+HBM3E configurations.

Samsung validates performance on MLPerf Inference 4.1 closed division, scoring 18.7× higher throughput than Nvidia H200 SXM on Llama-405B at offline scenarios. Power efficiency reaches 26.4 tokens per joule, beating H100 by 14.8× and Trainium2 by 3.1×. The company ships reference motherboards supporting eight Mach-1 sockets with 12 terabytes HBM4 and 96 terabytes per second NVLink-C2C equivalent interconnect.

Software support includes full PyTorch 2.5 and JAX compatibility through Samsung’s OneAPI fork, with compiler optimizations yielding 97 percent of peak theoretical performance on transformer workloads. The chip runs unmodified Linux 6.11 kernels and supports standard container runtimes. Samsung releases a 4 quantization toolkit that converts existing GGUF models to Mach-1 native format with zero accuracy loss.

Production occurs at Samsung’s new Pyeongtaek Line S6, with initial capacity of 60,000 wafers per month starting January. Yield on the 1.8 nm node exceeds 82 percent on high-performance libraries, allowing Samsung to price aggressively while maintaining 64 percent gross margins. The company plans volume ramp to 180,000 wafers monthly by Q4 2026.

The launch ends Nvidia’s monopoly in AI inference acceleration outside hyperscaler custom silicon. Analysts project Samsung capturing 18 percent of the 2027 server CPU market, displacing $42 billion in annual Nvidia revenue. Shares in Seoul rise 14 percent, pushing Samsung Electronics market capitalization above $620 billion for the first time since 2021.

Samsung confirms Mach-2 development on SF1.4 (1.4 nm) with 1,024 cores and HBM5 integration scheduled for 2027. The roadmap includes direct optical compute-in-memory links targeting 100× inference efficiency by 2029. The Mach-1 debut re-establishes Samsung as a credible third force in the post-GPU era of specialized AI silicon.

Share:

Similar Posts