JetBrains has opened a model Mellum2, designed for use in AI tools for software development. The model is published under a license Apache 2.0The weights are available on Hugging Face. JetBrains emphasizes that Mellum2 was trained from scratch and is designed not for multimodal tasks, but for working with text and code: request routing, RAG pipelines, summarization, auxiliary agents, and private deployment in company infrastructure.
Mellum2 is built on the architecture Mixture-of-ExpertsWith a total size of 12 billion parameters Only about 1000 are activated per token 2.5 billion parameters, which should reduce computational costs and latency during inference. According to JetBrains, the model's benchmark performance is comparable to open-source models of similar size, but delivers more than double the inference speedup.
JetBrains describes Mellum2 as an evolution of the original Mellum model, which was initially created for code completion. The new version expands to a broader class of tasks that require working with both program code and natural language. The company positions Mellum2 as a "focused" model—not a replacement for large, general-purpose LLMs, but a fast, specialized component for frequent intermediate operations within complex AI systems.
Among the proposed use cases are are called Classification and routing of requests between models and tools, compression and processing of context in RAG systems, preparation of data for agents, scheduling, validation of intermediate results, and local execution in environments where it is not possible to send source code or internal data to external APIs.
On Hugging Face published collection Mellum 2, which includes several model variants: Thinking, Instruct, Thinking-SFT, Instruct-SFT, Base, and Base-Pretrain. The models are distributed in the Safetensors format under the Apache 2.0 license.
Examples of usage via Transformers, vLLM, SGLang, and Docker Model Runner are provided for launching.
What's more technically interesting isn't the emergence of yet another open source model, but the niche JetBrains has chosen. The company isn't focusing on competing with the largest general-purpose models, but on low-cost and fast components that can be integrated directly into IDEs, internal assistants, corporate RAG systems, and agent pipelines. For developers and companies, this means the ability to run some AI logic locally or on their own servers, while maintaining control over the code, data, and inference costs.
Source: linux.org.ru




