Topic

#Quantization

3 articles on Quantization — news, releases, guides and analysis from the SourceFeed engine.

Quantize and Run Llama 3.2 on Apple Silicon with llama.cpp

Build llama.cpp with Metal, convert Llama 3.2 3B to Q4_K_M GGUF, and benchmark real prompt-processing and generation throughput on your specific chip.

Mariana Souza

Demystifying Integer Quantization for Neural Network Inference

A low-level look at the mathematics and hardware mechanics that shrink massive models without destroying accuracy.

Article · 1w ago0

Xiaomi's MiMo-V2.5-Pro-UltraSpeed Pushes a 1T Model Past 1000 Tokens/Sec on Commodity GPUs

Through FP4 quantization, block-level speculative decoding, and the TileRT system stack, Xiaomi claims trillion-parameter decode speeds normally reserved for custom silicon — on a single 8-GPU node.

News · 3w ago5