OctoML Introduces OctoAI, a Self-Optimizing Computational Service Designed for AI Application.
OctoML has launched OctoAI, an innovative self-optimizing computing service tailored for AI applications. Initially established in 2019 with a primary emphasis on enhancing machine learning (ML) models, OctoML has evolved significantly since its inception. The company's journey has encompassed not only model optimization but also streamlined ML model deployment, backed by an impressive $132 million in funding. In its latest stride, OctoML is unveiling the next phase of its service—a progression that doesn't entirely deviate but rather refines its approach. This evolution shifts OctoML's core focus from mere model optimization to empowering enterprises to leverage existing open-source models. The company enables businesses to fine-tune these models using their proprietary data or to seamlessly host their customized models using the service .
Branded as OctoAI, the novel platform exemplifies a self-optimizing computing service dedicated to AI, with a specific emphasis on generative AI. OctoAI effectively aids businesses in crafting ML-based applications and seamlessly deploying them into operational environments, alleviating concerns over intricate underlying infrastructures.
Luis Ceze, Co-founder, and CEO of OctoML, expound on the transition, noting that the previous platform catered to ML engineers, focusing on optimizing and packaging models into deployable containers compatible with varied hardware configurations. The insights garnered from this phase have driven the evolution towards a fully managed computing service, abstracting the complexities of ML infrastructure.
With OctoAI, users are tasked with defining their priorities—such as latency versus cost—and OctoAI autonomously determines the optimal hardware for execution. The service also inherently fine-tunes these models, translating into cost efficiencies and enhanced performance. The determination of whether to leverage Nvidia GPUs or AWS's Inferentia machines is also automated. This sophisticated framework simplifies the intricate process of model deployment, a common bottleneck in numerous ML projects. While users inclined towards comprehensive control can configure their model's behavior and hardware specifications, Ceze anticipates that many users will embrace OctoAI's automated management.
OctoML's proposition is further enriched by its offering of accelerated versions of prominent foundational models, such as Dolly 2, Whisper, FILM, FLAN-UL2, and Stable Diffusion, readily available. The company's innovation has led to a remarkable enhancement in Stable Diffusion's performance—achieving three-fold faster execution and a five-fold cost reduction when compared to the conventional model runtime.
Notably, OctoML's commitment to existing customers seeking optimization services remains steadfast. Nevertheless, the company's strategic focus will be directed towards this novel computing platform, shaping its trajectory for the future.