The deployment of generative AI models can now be accelerated with the release of the Ray 2.4 upgrade, which is open-source.
The latest version 2.4 of Ray, an open-source machine learning (ML) technology for scaling and deploying AI workloads, is focused on speeding up generative AI tasks. With support from lead commercial vendor Anyscale and a broad community of open-source contributors, Ray has become one of the most popular technologies in the ML field. It is also utilized by OpenAI, the company behind GPT-4 and ChatGPT, to scale up its ML training workloads and technology, and for AI inference purposes as well.
Ray 2.x was initially introduced in August 2022 and has since undergone continuous enhancements, including the observability-focused Ray 2.2 release.
The recent Ray 2.4 upgrade centers around generative AI workloads, offering users new features that enable quicker and easier model building and deployment. Additionally, the update includes integration with Hugging Face models, such as GPT-J for text and Stable Diffusion for image generation.
According to Robert Nishihara, Anyscale's CEO and co-founder, Ray is an open-source infrastructure that manages the life cycle of large language model (LLM) and generative AI, from training to deployment and productization. The aim is to reduce the level of expertise required to build this infrastructure, thereby lowering the barrier to entry for integrating AI into products across all industries.
Release of Ray 2.4 is introducing new workflows for generative AI by providing faster capabilities for building and deploying models and integrating with popular models from Hugging Face, such as GPT-J and Stable Diffusion, which can help reduce the expertise required to build and manage the infrastructure for AI workloads.
Ray 2.4 offers prebuilt scripts for easy generative AI deployment.
Ray 2.4 is simplifying the process of building and deploying generative AI with prebuilt scripts and configurations that eliminate the need for manual configuration and scripting. According to Robert Nishihara, the CEO of Anyscale, Ray 2.4 is providing a starting point that is already delivering good performance, allowing users to modify it and bring their own data. The goal is not just to set up the cluster but to provide runnable code for training and deploying an LLM. Ray 2.4 includes a set of Python scripts that would have otherwise been written by the user. The focus of Ray 2.4 is on a particular set of generative AI integrations utilizing open-source models available on Hugging Face. The release provides integration with LangChain and a series of new integrated trainers for ML-training frameworks, including ones for Hugging Face Accelerate, DeepSpeed, and PyTorch Lightning.
Ray 2.4 speeds up training and inference through performance optimizations.
Ray 2.4 introduces a new approach to handling data for AI training and inference, which optimizes the use of compute and GPU resources. Traditionally, data is processed in multiple stages before operations like training or inference are executed. However, this pipeline approach can introduce latency and underutilization of resources. Ray 2.4 addresses this challenge by streaming and pipelining data so that it all fits into memory at once, and by preloading some data onto GPUs to keep utilization high. The result is a more efficient use of both CPU and GPU resources, which is uniquely enabled by Ray's technology.