xAI Debuts Grok-1.5 with Major Upgrades in Long Context Understanding and Reasoning Abilities

March 29, 2024 – In a move that echoes its founder Elon Musk’s penchant for grand announcements with minimal fuss, the artificial intelligence startup xAI has unveiled its latest language model, Grok-1.5. The official announcement was characteristically succinct, merely providing a link to the new model without any accompanying text, reflecting a “less is more” approach.

The upgrades in Grok-1.5 are primarily centered around two key areas: long context understanding and enhanced reasoning abilities, particularly in programming and mathematics.

Firstly, in terms of long context understanding, Grok-1.5 has significantly expanded its context window to 128,000 tokens, a sixteen-fold increase from its predecessor’s 8,192 tokens. This brings it on par with GPT-4, another leading language model in the industry. This substantial increase means that Grok-1.5 can now process longer and more complex prompts while maintaining its ability to follow instructions accurately.

Moreover, in the Needle In A Haystack (NIAH) evaluation, Grok-1.5 demonstrated impressive retrieval capabilities. It could search for and locate embedded text within a context length of up to 128K tokens, achieving perfect retrieval results.

Secondly, Grok-1.5 has undergone substantial improvements in its reasoning abilities, particularly when it comes to handling programming and math-related tasks. It has surpassed its predecessors, Grok-1, Mistral Large, and Claude 2, in this regard.

Specifically, in math-related benchmarks, Grok-1.5 scored an impressive 50.6% on the MATH test, outperforming the “medium-sized” Claude 3 Sonnet. Additionally, it achieved a score of 90% on the GSM8K benchmark, which covers a wide range of math problems from elementary school to high school competitions.

In programming, Grok-1.5 achieved a score of 74.1% on the HumanEval benchmark, surpassing Claude 3 Sonnet, Gemini Pro1.5, and even GPT-4. It was only second to the “large-sized” Claude 3 Opus in this category.

Overall, the upgrades in Grok-1.5 represent a significant leap forward in language model capabilities, particularly in the areas of long context understanding and reasoning abilities in programming and mathematics. With these enhancements, xAI continues to push the boundaries of what is possible in the field of artificial intelligence.

