Apple’s next update could include Anthropic

Meta has taken a new step into the AI arena with the launch of its largest AI model yet, the Llama 3.1 405B. With 405 billion parameters, this model promises to revolutionize the field of open AI, positioning itself as one of the most advanced available today.

Flame 3.1 405B: Innovation and Capability

The parameters in an AI model are crucial, as they determine its ability to solve complex problems. With 405 billion parameters, Llama 3.1 405B ranks among the most powerful open models of recent years. Trained with 16,000 Nvidia H100 GPUs, it uses advanced development and training techniques, which, according to Meta, make it competitive with proprietary models such as OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet.

Meta has made Llama 3.1 405B available for both download and use on cloud platforms such as AWS, Azure and Google Cloud. In addition, this model is already used by WhatsApp and Meta.ai, providing chatbot experiences for users in the United States.

Capabilities and applications

Llama 3.1 405B is not only large in size, but also in functionality. This generative AI model can perform a variety of tasks, from coding and mathematical problem solving to document summarization in eight languages, including English, Spanish, German and French. Although it is text-only and cannot analyze images, its versatility in handling text makes it suitable for a wide range of applications, such as the analysis of PDF files and spreadsheets.

Meta is exploring multimodality, i.e. the ability to recognize and generate content in different formats, such as images and videos. Although these models are not yet ready for launch, the company continues to move in this direction.

Data and training

To train Llama 3.1 405B, Meta used a dataset consisting of 15 trillion tokens, equivalent to 750 billion words. Although this is not a new dataset, Meta has refined its curation and filtering processes to improve the quality of the data used. Synthetic data generated by other AI models has also been used to tune and refine Llama 3.1 405B, a common practice among leading AI vendors.

Meta has assured that it has carefully balanced the training data, although it has not disclosed specific details about its origin. Transparency in training data is a sensitive issue, as it may involve intellectual property issues and potential lawsuits.

Context window and tools

A notable feature of Llama 3.1 405B is its large context window of 128,000 tokens, allowing the model to handle long and complex text more efficiently. This is especially useful for tasks such as summarizing long texts and improving coherence in chatbot dialogues.

Meta has also released updated versions of its smaller models, Llama 3.1 8B and Llama 3.1 70B, which share this large context window. These models are designed for more general applications, such as code generation and interaction with chatbots.

In addition, all Llama 3.1 models can use third-party tools and applications to perform specific tasks. By default, they are trained to use Brave Search for recent queries, the Wolfram Alpha API for math and science questions, and a Python interpreter to validate code.

Ecosystem and licenses

Meta is encouraging the use of synthetic data by updating the Llama license, allowing developers to use the outputs of Llama 3.1 models to develop their own generative AI models. However, there are restrictions for developers of applications with more than 700 million monthly users, who must apply for a special license.

To support the Llama ecosystem, Meta has launched new security tools and a “reference system” to facilitate its integration into applications. The company is also working on Llama Stack, an API for tools to tune models, generate synthetic data and build “agent” applications.

Vision for the future

Mark Zuckerberg, CEO of Meta, has expressed his vision of democratizing access to AI tools and models, ensuring that more developers around the world can benefit from these technologies. This strategy includes offering tools for free to foster an ecosystem, gradually adding additional products and services.

Meta is investing large sums in these models, allowing it to undercut its competitors’ prices and expand its version of AI. According to Meta, Llama models have been downloaded more than 300 million times, and more than 20,000 derivative models have been created.

Challenges and sustainability

Training such large models as Llama 3.1 405B involves significant challenges, especially in terms of power consumption and grid stability. Meta is working to address these issues as it scales the training of even larger models in the future.

In summary, the launch of Llama 3.1 405B marks an important milestone for Meta in its goal to become a leader in the field of generative AI. With its focus on openness and collaboration, the company is well positioned to influence the future of this technology.

Date
April 2, 2024

You may also be interested in