• Wed. May 29th, 2024

To combat GPU shortage for generative AI, startup works to optimize hardware


Jun 8, 2023
To combat GPU shortage for generative AI, startup works to optimize hardware


Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

AI startup CentML, which optimizes machine learning models to work faster and lower compute costs, emerged from stealth today. The Toronto-based company aims to help address the worldwide shortage of GPUs for training and inference of generative AI models.

According to the company, access to compute one of the biggest obstacles to AI development, and the scarcity is only going to increase as inference workloads accelerate. By extending the yield out of the current AI chip supply and legacy inventory without affecting accuracy, CentML says it can increase access to compute in what it calls a “broken” marketplace for GPUs.

Hard for smaller companies to access GPUs

CentML raised a $3.5 million seed round in 2022 led AI-focused Radical Ventures. Co-founder and CEO Gennady Pekhimenko, a leading systems architect, told VentureBeat in an interview that when he saw the trajectory of the size of large language models, it was clear that whoever owned the hardware and the software stack on top of them would have a dominant position.

“It was very transparent what was happening,” he said, adding with a laugh that even he put his money into Nvidia, which controls about 80% of the GPU market. But Nvidia, he explained, always wants to sell its most expensive chips, like the latest A100 and H100 GPUs, but that has made it hard for smaller companies to get access. Yet Nvidia has other, less expensive chips that are poorly utilized: “We build software that optimizes those models efficiently on all the GPUs available, not just on the most expensive available in the cloud,” he said. “We’re essentially serving a larger part of the market.”


Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.


Register Now

As the cost of inference grows “exponentially” (models like ChatGPT cost millions of dollars to run), CentML uses a powerful open-source compiler to automatically tune optimizations to work best for a company’s specific inference pipeline and hardware.

A competitor like OctoML, Pekhimenko said, is also built on compiler technology to automatically maximize model performance, but an older technology. “Their solution is not competitive in the cloud. We knew what the deficiencies were and built a new technology that doesn’t have those deficiencies,” he said. “So we have the benefit of coming second.”

Race to access AI chips has become like “Game of Thrones”

David Katz, partner at Radical Ventures, says the battle to get access to AI chips has become like “Game of Thrones” — but less gory. “There’s this insatiable appetite for compute that’s required in order to run these models and large models,” he told VentureBeat, adding that Radical invested in CentML last year.

CentML’s offering, he said, creates “a little bit more efficiency” in the market. In addition, it demonstrates that complex, billion-plus parameter models can also run on legacy hardware.

“So you don’t need the same volume of GPUs or you don’t need the A100s necessarily,” he said. “From that perspective, it is essentially increasing the capacity or the supply of chips in the market.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Leave a Reply

Your email address will not be published. Required fields are marked *