Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More
Seattle based startup OctoML today released its new OctoAI self optimizing infrastructure service to help organizations build and deploy generative AI applications.
OctoML got its start in 2019 as a spinout from the University of Washington with the foundation of the company’s technology stack relying on the open source Apache TVM machine learning (ML) compiler framework. Its original focus was to help organizations optimize ML models for deployment, an effort that helped the company raise a total of $131.9 million to date, including an $85 million Series C round in 2021. In June 2022, OctoML added technology to help transform ML models into software functions. Now, the company is going a step further with its OctoAI service, which is all about optimizing the deployment of ML on infrastructure to help improve performance and manage costs.
“The demand for compute is just absurd,” Luis Ceze, Octo ML CEO, told VentureBeat. “Because generative AI models use a lot of compute, making compute efficient for AI is at the very core of the value proposition for OctoML.”
Solving the last mile problem with AI
With its new platform, OctoML is helping to solve the last mile problem with AI: Getting models deployed so users can benefit from the power of generative AI.
Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.
Ceze, still a Professor at the University of Washington, said that when OctoML was founded, the focus was on data scientists building ML systems. From there, the company evolved into a platform with a model optimization service that takes model inputs, then optimizes the packages into containers.
With model optimization, Ceze said organizations still had to take the container and find the right hosting configuration infrastructure for deployment. The new OctoAI platform addresses that challenge with a fully managed compute service.
“We can abstract away all the complexities of optimizing the model, packaging and deploying with a fully managed infrastructure,” Ceze said.
Part of the new service is a library of popular open source large language models (LLMs) that developers can use to build and extend. At launch, supported models include Stable Diffusion 2.1, Dolly v2, LLaMA 65B, Whisper, FlanUL and Vicuna.
How the OctoAI service works
OctoML is not the only vendor looking to help developers deploy common open-source LLMs.
Among the vendors that have recently offered similar types of services is Anyscale, the lead commercial sponsor behind the open source Ray ML framework for workload scaling. At the end of May, Anyscale launched its Aviary open source project as a technology to help developers deploy and scale open-source LLMs.
Ceze explained that the OctoAI service is not using Ray for scaling workloads; it has developed its own proprietary approach. The Apache TVM project continues to play a foundational, helping turn a model into code that will run efficiently on GPU infrastructure.
“We basically built an engine that for any given model, we deeply optimize the model for the hardware target and produce a deployable artifact,” Ceze said.
The service also abstracts the physical cloud infrastructure on which the models run. At launch, the OctoAI service runs on Amazon Web Services (AWS), with plans to expand to other cloud providers. Ceze said he doesn’t want users to have to deal with the underlying complexity of choosing a specific type of processor or cloud instance to run a workload.
“We want to make sure that users tell us the expected performance, then we’re going to go and choose the right hardware that works for them and has the right cost structure,” Ceze said.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.