Skip to content
Topic

#Inference

8 articles on Inference — news, releases, guides and analysis from the SourceFeed engine.

OpenAI Jalapeno and the Shift to Custom Inference Silicon
Article 2d ago 7

OpenAI Jalapeno and the Shift to Custom Inference Silicon

Custom ASICs are replacing general-purpose GPUs for running large language models to survive the crushing cost of scale.

Priya Nair
The LLM Cost Cliff Your Budget Isn't Ready For

The LLM Cost Cliff Your Budget Isn't Ready For

Article · 3d ago1
OpenAI's Jalapeño Chip Is a Bet on Inference Economics

OpenAI's Jalapeño Chip Is a Bet on Inference Economics

News · 5d ago2
How OpenAI's Jalapeño Chip Changes Production LLM Serving

How OpenAI's Jalapeño Chip Changes Production LLM Serving

Article · 5d ago1
Serve an Open-Source LLM at Scale with vLLM on a Rented GPU Instance

Serve an Open-Source LLM at Scale with vLLM on a Rented GPU Instance

Tutorial · 6d ago0
Running 70B Models on 4GB VRAM: The AirLLM Layer-Swap Hack

Running 70B Models on 4GB VRAM: The AirLLM Layer-Swap Hack

Article · 1w ago1
Unified x86 AI Acceleration: Inside the New ACE Specification

Unified x86 AI Acceleration: Inside the New ACE Specification

Article · 1w ago2
Xiaomi's MiMo-V2.5-Pro-UltraSpeed Pushes a 1T Model Past 1000 Tokens/Sec on Commodity GPUs

Xiaomi's MiMo-V2.5-Pro-UltraSpeed Pushes a 1T Model Past 1000 Tokens/Sec on Commodity GPUs

News · 3w ago5