• Thu. Jun 20th, 2024

Skyflow launches ‘privacy vault’ for building LLMs


May 18, 2023
How machine learning can help crack the IT security problem


Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

Palo Alto, California-based Skyflow, a company that makes it easier for developers to embed data privacy into their applications, today announced the launch of a “privacy vault” for large language models.

The solution, as the name suggests, provides enterprises with a layer of data privacy and security throughout the entire lifecycle of their LLMs, beginning with data collection and continuing through model training and deployment.

It comes as enterprises across sectors continue to race to embed LLMs, like the GPT series of models, into their workflows to simplify processes and boost productivity. 

Why a privacy vault for GPT models?

LLMs are all the rage today, helping with things like text generation, image generation and summarization. However, most of the models that are out there have been trained on publicly available data. This makes them suitable for broader public use, but not so much for the enterprise side of things.


Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.


Register Now

To make LLMs work in specific enterprise settings, companies need to train them on their internal knowledge. A few have already done it or are in the process of doing it, but the task is not easy, as you have to ensure that the internal, business-critical data used for training the model is protected at all stages of the process.

This is exactly where Skyflow’s GPT privacy vault comes in. 

Delivered via API, the solution establishes a secure environment, allowing users to define their sensitive data dictionary and have that information protected at all stages of the model lifecycle: data collection, preparation, model training, interaction and deployment. Once fully integrated, the vault uses the dictionary and automatically redacts or tokenizes the chosen information as it flows through GPT — without lessening the value of the output in any way.

“Skyflow’s proprietary polymorphic encryption technique enables the model to seamlessly handle protected data as if it were plaintext,” Anshu Sharma, Skyflow cofounder and CEO, told VentureBeat. “It will protect all sensitive data flowing into GPT models and only reveal sensitive information to authorized parties once it has been processed by the model and returned.”

For example, Sharma explained, plaintext sensitive data elements like email addresses and social security numbers are swapped with Skyflow-managed tokens before inputs are provided to GPTs. This information is protected by multiple layers of encryption and fine-grained access control throughout model training, and ultimately de-tokenized after the GPT model returns its output. As a result, authorized end users get a seamless output experience, with plaintext-sensitive data bypassing the GPT model.

“This works because GPT LLMs already break down inputs to analyze patterns and relationships between them and then make predictions about what comes next in the sequence. So, tokenizing or redacting sensitive data with Skyflow before inputs are provided to the LLM doesn’t impact the quality of GPT LLM output — the patterns and relationships remain the same as before plaintext sensitive data is tokenized by Skyflow,” Sharma added.

Skyflow GPT privacy vault for LLMs

The offering can be integrated into an enterprise’s existing data infrastructure. It also supports multi-party training, where two or more entities could share anonymized datasets and train models to unlock insights.

Multiple use cases

While the Skyflow CEO didn’t share how many companies are using the GPT privacy vault, he did note that the offering, which is an extension of the company’s existing privacy-focused solutions, is helping protect sensitive clinical trial data in the drug development cycle as well as customer data used by travel platforms for improving customer experiences.

IBM too is a customer of Skyflow and has been using the company’s products to de-identify sensitive information in large datasets before analyzing it via AI/ML.

Notably, there are also alternative approaches to the problem of privacy, such as creating a private cloud environment for running individual models or a private instance of ChatGPT. But those could prove to be far more expensive than Skyflow’s solution.

Currently, in the data privacy and encryption space, the company competes with players like Immuta, Securiti, Vaultree, Privitar and Basis Theory. 

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Leave a Reply

Your email address will not be published. Required fields are marked *