NVIDIA Introduces Powerful 80-Billion-Parameter AI Model Optimized for RTX Workstations

August 23, 2024 – On August 21st, NVIDIA announced the release of their new Mistral-NeMo-Minitron 8B, a compact language AI model noted for its high precision and computational efficiency. This model, designed to run on GPU-accelerated datacenters, clouds, and workstations, joins the ranks of NVIDIA’s growing AI offerings.

Following their collaboration with Mistral AI last month to release the open-source Mistral NeMo 12B model, NVIDIA has now introduced its smaller counterpart, the Mistral-NeMo-Minitron 8B. This model, boasting 8 billion parameters, is optimized for workstations equipped with NVIDIA’s RTX graphics cards.

According to NVIDIA, the Mistral-NeMo-Minitron 8B was created through a process of width-pruning the larger Mistral NeMo 12B model, followed by knowledge distillation and a light retraining phase. The methods employed in creating this efficient model are detailed in their paper titled “Compact Language Models via Pruning and Knowledge Distillation.”

Pruning, as explained by NVIDIA, involves the removal of model weights that contribute minimally to accuracy, thereby reducing the neural network’s size. The “distillation” process involves retraining the pruned model on a small dataset to significantly enhance the accuracy that may have been reduced during pruning.

Remarkably, despite its size, the Mistral-NeMo-Minitron 8B has excelled in nine popular benchmark tests for language models. These benchmarks encompass a wide range of tasks, including language comprehension, commonsense reasoning, mathematical reasoning, summarization, coding, and the ability to generate realistic responses.

Leave a Reply