Mistral Introduces Pixtral 12B, a Game-Changer in Multimodal AI and Computer Vision
Mistral Releases Pixtral 12B
Mistral has officially released its first multimodal artificial intelligence (AI) model, dubbed Pixtral 12B. Known for its open-source large language models (LLMs), the company has made the latest AI model available on GitHub and Hugging Face for users. While the Pixtral model utilizes advanced computer vision technology, it is currently limited to processing images and answering related queries, but does not generate images like some Generative Adversarial Networks.
Exclusive Functionalities of Pixtral 12B
- Main Features: 12 billion parameters and capability to interpret images.
- Users can upload image files or URLs to query the model for identifying and counting objects.
- Support for traditional text-based tasks inherent to Nemo 12B AI model.
The release announcement was made via Mistral's account on X, where they shared a magnet link for downloads. The total file size is 24GB, requiring NPU or powerful GPU hardware.
A New Benchmark in Multimodal AI
Recent benchmarks show that Pixtral 12B outperforms both Claude-3 and Phi-3 in multimodal tasks. Additionally, it excels in Massive Multitask Language Understanding tests.
According to TechCrunch, the model can be fine-tuned under Apache 2.0 license, allowing unrestricted personal or commercial use. Mistral aims to make Pixtral available on more platforms shortly.
This article was prepared using information from open sources in accordance with the principles of Ethical Policy. The editorial team is not responsible for absolute accuracy, as it relies on data from the sources referenced.