RedditMarch 25, 2026ai

Liquid AI's LFM2-24B-A2B running at ~50 tokens/second in a web browser on WebGPU

The model (MoE w/ 24B total & 2B active params) runs at ~50 tokens per second on my M4 Max, and the 8B A1B variant runs at over 100 tokens per second on the same hardware.

Demo (+ source code): https://huggingface.co/spaces/LiquidAI/LFM2-MoE-WebGPU Optimized ONNX models: - https://huggingface.co/LiquidAI/LFM2-8B-A1B-ONNX - https://huggingface.co/LiquidAI/LFM2-24B-A2B-ONNX

Source: Reddit · reddit.com