Generative AI chatbots like ChatGPT, Microsoft’s Bing/CoPilot, and Google’s Gemini are the vanguard of a significant advance in computing. Among much else, they can be compelling tools for finding just the right word, drafting simple legal documents, starting awkward emails, and coding in unfamiliar languages. Much has been written about how AI chatbots “hallucinate,” making up plausible details that are completely wrong. That’s a real concern, but worries about privacy and confidentiality have gotten less attention.
To be sure, many conversations aren’t sensitive, such as asking for a recommendation of bands similar to The Guess Who or help writing an AppleScript. But increasingly, we’re hearing about people who’ve asked an AI chatbot to analyze or summarize some information and then pasted in the contents of an entire file. Plus, services like ChatPDF and features in Adobe Acrobat let you ask questions about a PDF you provide—it can be a good way to extract content from a lengthy document.
While potentially useful from a productivity standpoint, such situations provide a troubling opportunity to reveal personally sensitive data or confidential corporate information. We’re not talking hypothetically here: Samsung engineers inadvertently leaked confidential information while using ChatGPT to fix errors in their code. What might go wrong?
The most significant concern is that sensitive personal and business information might be used to train future versions of the large language models used by the chatbots. That information could then be regurgitated to other users in unpredictable contexts. People worry about this partly because early large language models were trained on text that was publicly accessible online but without the knowledge or permission of the authors of that text. As we all know, lots of stuff can unintentionally end up on the Internet.
Although the privacy policies for the best-known AI chatbots say the right things about how uploaded data won’t be used to train future versions, there’s no guarantee that companies will adhere to those policies. Even if they intend to, there’s room for error—conversation history could accidentally be added to a training model. Worse, because chatbot prompts aren’t simple database queries, there’s no easy way to determine if confidential information has made its way into a large language model.
More down to earth, because chatbots store conversation history (some let you turn off that feature), anything added to a conversation is in an uncontrolled environment where at least employees of the chatbot service could see it, and it could be shared with other partners. Such information could also be vulnerable should attackers compromise the service and steal data. These privacy considerations are the main reason to avoid sharing sensitive information with chatbots.
Adding emphasis to that recommendation is the fact that many companies operate under master services agreements that specify how client data must be handled. For instance, a marketing agency tasked with generating an ad campaign for a manufacturer’s new product should avoid using any details about the product in AI-based brainstorming or content generation. If those details were revealed in any way, the agency could be in violation of its contract with the manufacturer and be subject to significant legal and financial penalties.
In the end, although it may feel like you’re having a private conversation with an AI chatbot, don’t share anything you wouldn’t tell a stranger. As Samsung’s engineers discovered, loose lips sink chips.
(Featured image by iStock.com/Ilya Lukichev)