A new frontier in artificial intelligence has emerged with Liquid AI's introduction of LFM2.5-1.2B-Thinking. This 1.2 billion-parameter reasoning model stands out for its capacity to run entirely on-device, fitting comfortably within approximately 900 MB on contemporary smartphones. This innovation marks a pivotal shift, as complex AI functionalities that previously demanded data center resources can now be executed offline on everyday consumer hardware, with a specific emphasis on structured reasoning, tool integration, and mathematical problem-solving, rather than general conversational abilities.
The LFM2.5 Family and Core Specifications
LFM2.5-1.2B-Thinking is an integral component of the LFM2.5 series of Liquid Foundation Models. This family builds upon the earlier LFM2 architecture, incorporating enhanced pre-training and a multi-stage reinforcement learning approach optimized for deployment at the edge.
The model operates exclusively with text and serves general-purpose applications, featuring the following configuration:
- A total of 1.17 billion parameters, positioning it within the 1.2B class.
- Sixteen layers, composed of ten double-gated LIV convolution blocks and six GQA blocks.
- A substantial training budget utilizing 28 trillion tokens.
- An extensive context length capable of processing 32,768 tokens.
- A vocabulary size encompassing 65,536 distinct entries.
- Support for eight languages: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
Enhanced Reasoning Through Thinking Traces
The 'Thinking' variant of the model has been specifically engineered for advanced reasoning processes. During inference, it generates internal 'thinking traces' – a sequence of intermediate steps – before producing its final output. These traces enable the model to formulate plans for tool invocations, validate partial outcomes, and meticulously execute multi-step instructions.
Liquid AI suggests leveraging this model for agentic applications, data extraction pipelines, and retrieval-augmented generation (RAG) workflows where transparent reasoning and verifiable intermediate actions are critical. Essentially, LFM2.5-1.2B-Thinking can function as the strategic core for agents and specialized tools, complementing other models for broader general knowledge or computationally intensive code tasks.
Benchmark Performance Against 1B-Class Models
Liquid AI's evaluations pit LFM2.5-1.2B-Thinking against other models in the 1 billion parameter category across a range of reasoning and instruction benchmarks. Significant improvements are observed when compared to LFM2.5-1.2B-Instruct:
- Math reasoning scores on MATH 500 increased from approximately 63 to 88.
- Instruction following capabilities on Multi IF rose from about 61 to 69.
- Tool use performance on BFCLv3 improved from around 49 to 57.
The model also demonstrates competitive performance against Qwen3-1.7B in its thinking mode on most reasoning benchmarks, while requiring roughly 40 percent fewer parameters and generating fewer output tokens on average. Furthermore, it surpasses other 1B-class baselines, including Granite-4.0-H-1B, Granite-4.0-1B, Gemma-3-1B-IT, and Llama-3.2-1B Instruct, across various tasks.
Mitigating 'Doom Looping' with Advanced Training
Reasoning models commonly encounter 'doom looping,' a phenomenon where the model repeatedly outputs fragments of its chain of thought instead of completing a response. LFM2.5-1.2B-Thinking employs a sophisticated multi-stage training pipeline to effectively mitigate this issue.
The process commences with mid-training that incorporates reasoning traces, teaching the model to adopt a 'reason first, then answer' methodology. This is followed by supervised fine-tuning using synthetic chains to refine chain-of-thought generation. Subsequently, preference alignment and Reinforcement Learning from Verbose Responses (RLVR) are applied. During preference alignment, the research team generates multiple candidate responses and uses an LLM judge to label preferred and rejected outputs, explicitly identifying looping instances. RLVR then introduces an n-gram repetition penalty early in training. This methodology dramatically reduces the doom loop rate from 15.74 percent during mid-training to a mere 0.36 percent after RLVR on a set of representative prompts.
The outcome is a compact reasoning model capable of generating coherent thinking traces without succumbing to lengthy, repetitive outputs, which is crucial for interactive agents and on-device user experiences.
Efficient Inference and Minimal Hardware Footprint
A primary design objective for LFM2.5-1.2B-Thinking was achieving rapid inference with a minimal memory footprint on both CPUs and NPUs. The model can decode approximately 239 tokens per second on an AMD CPU and about 82 tokens per second on a mobile NPU. Crucially, it operates using less than 1 GB of memory, offering immediate support for popular runtimes like llama.cpp, MLX, and vLLM.
These performance figures confirm the model's suitability for mobile and embedded devices, providing practical throughput even with extended contexts while maintaining a sub-1 GB memory profile.
Key Highlights and Deployment Options
LFM2.5-1.2B-Thinking offers a compelling blend of features for edge AI:
- A 1.17 billion parameter reasoning model with a 32,768 context length, operating under 1 GB on phones and laptops.
- Optimization for explicit thinking traces, agentic workflows, data extraction, and RAG.
- Achieves strong scores for a 1B-class model, including 87.96 on MATH 500 and 85.60 on GSM8K, demonstrating competitive performance with Qwen3 1.7B in thinking mode using fewer parameters.
- Its advanced training pipeline significantly reduces doom loops from 15.74 percent to 0.36 percent through mid-training, supervised fine-tuning, preference alignment, and RLVR with n-gram penalties.
- The model runs efficiently on AMD and Qualcomm NPUs and CPUs, is compatible with runtimes such as llama.cpp, FastFlowLM, and NexaML, and is available in GGUF, ONNX, and MLX formats for easy on-device deployment from Hugging Face.
For those seeking to access or host the model, various platforms and repositories are available:
Cloud & API Providers:
- OpenRouter
- Liquid AI Playground
- LEAP (Liquid's Edge AI Platform)
Model Repositories (Self-Hosting):
- Hugging Face hosts the model weights in multiple formats, facilitating local or private infrastructure deployment.
This article is a rewritten summary based on publicly available reporting. For the original story, visit the source.
Source: MarkTechPost