September 13, 2024

Understanding Grok: How it sets itself apart from ChatGPT

Grok #Grok

Elon Musk on Saturday announced the launch of a new large language generative AI model — Grok, which is said to be modelled after “Hitchhiker’s Guide to the Galaxy” and is intended to “answer almost anything and, far harder, even suggest what questions to ask!”

Grok will be incorporated within X, previously known as Twitter and has been designed to answer questions with a sense of humour. In fact, the company advocates not to use Grok if you hate humour.

With access to real-time knowledge of the world with the help of the X platform, Grok is even capable of answering “spicy questions” that most other AI models reject. This four-month-old generative AI model with 2 months of training is still in the beta phase and the company claims to improve it in the coming days.

Purpose of Grok AI

According to xAI, Grok has been created to assist humanity to understand and gain knowledge. It is powered by Grok-1 LLM, which has been developed over a period of four months. The prototype Grok-0 was trained with 33 billion parameters, which is said to be as capable as Meta’s LLaMA 2, which supports 70 billion parameters.

Grok capabilities

In terms of benchmarks, the Grok-1 achieves 63.2% on the HumanEval coding task and 73% on MMLU. While it is still not as capable of something like GPT-4, xAI claims that, within a limited time, the company has been able to improve the performance of Grok-1 when compared to Grok-0.

Festive offer

According to the benchmark numbers, on GSM8k (Cobbe et al. 2021), a benchmark designed around middle-class math word problems, Grok-1 achieved 62.9 per cent, which is higher than GPT-3.5 and LLaMa 2 but lower than Palm 2, Claude 2, and GPT-4.

The same goes for other benchmarks like MMLU, a benchmark based on multi-choice questions (Hendrycks et al. 2021), HumanEval (Chen et al. 2021), a Python code generation test, and MATH (Hendrycks et al. 2021) a middle school and high school mathematical tests written in LaTeX.

Benchmark Grok-0 (33B) LLaMa 2 70B Inflection-1 GPT-3.5 Grok-1 Palm 2 Claude 2 GPT-4 GSM8k 56.8%8-shot 56.8%8-shot 62.9%8-shot 57.1%8-shot 62.9%8-shot 80.7%8-shot 88.0%8-shot 92.0%8-shot MMLU 65.7%5-shot 68.9%5-shot 72.7%5-shot 70.0%5-shot 73.0%5-shot 78.0%5-shot 75.0%5-shot + CoT 86.4%5-shot HumanEval 39.7%0-shot 29.9%0-shot 35.4%0-shot 48.1%0-shot 63.2%0-shot – 70%0-shot 67%0-shot MATH 15.7%4-shot 13.5%4-shot 16.0%4-shot 23.5%4-shot 23.9%4-shot 34.6%4-shot – 42.5%4-shot

Similarly, xAI has also hand-graded Grok-1, where, it cleared the 2023 Hungarian national high school finals in mathematics with a C grade (59 per cent), surpassing the performance of Claude 2 (55 per cent), while the GPT-4 scored a B grade with 68 per cent.

These numbers clearly indicate that Grok-1 is already more capable than OpenAI’s GPT-3.5, but not as capable as the latest model GPT-4. The company also claims that Grok-1 despite being trained on less amount of data can surpass models that have been trained on large amounts of data and also require higher computing capabilities.

Grok-1 has been trained using a custom training and inference stack based on Kubernetes, Rust, and JAX. As Grok has access to the internet with real-time access to the latest information, the company claims that it can “generate false or contradictory information.”

To mitigate these issues in future models, xAI is looking for human feedback, contextual understanding, multimodal capabilities, and Adversarial robustness.

The beta version of Grok is currently available to a limited number of users in the US. In the coming days, the same will be made available for X Premium+ subscribers, which costs Rs 1,300 per month, when subscribed from a desktop.

Leave a Reply