Login

***harlan4096*** · 05 December 25, 06:56

Quote:The Whisper Leak attack allows its perpetrator to guess the topic of your conversation with an AI assistant — without decrypting the traffic. We explore how this is possible, and what you can do to protect your AI chats.

People entrust neural networks with their most important, even intimate, matters: verifying medical diagnoses, seeking love advice, or turning to AI instead of a psychotherapist. There are already known cases of suicide planning, real-world attacks, and other dangerous acts facilitated by LLMs. Consequently, private chats between humans and AI are drawing increasing attention from governments, corporations, and curious individuals.

So, there won’t be a shortage of people willing to implement the Whisper Leak attack in the wild. After all, it allows determining the general topic of a conversation with a neural network without interfering with the traffic in any way — simply by analyzing the timing patterns of sending and receiving encrypted data packets over the network to the AI server. However, you can still keep your chats private; more on this below…

How the Whisper Leak attack works

All language models generate their output progressively. To the user, this appears as if a person on the other end is typing word by word. In reality, however, language models operate not with individual characters or words, but with tokens — a kind of semantic unit for LLMs, and the AI response appears on screen as these tokens are generated. This output mode is known as “streaming”, and it turns out you can infer the topic of the conversation by measuring the stream’s characteristics. We’ve previously covered a research effort that managed to fairly accurately reconstruct the text of a chat with a bot by analyzing the length of each token it sent.

Researchers at Microsoft took this further by analyzing the response characteristics from 30 different AI models to 11,800 prompts. A hundred prompts were used: variations on the question, “Is money laundering legal?”, while the rest were random and covering entirely different topics.

By comparing the server response delay, packet size, and total packet count, the researchers were able to very accurately separate “dangerous” queries from “normal” ones. They also used neural networks for the analysis — though not LLMs. Depending on the model being studied, the accuracy of identifying “dangerous” topics ranged from 71% to 100%, with accuracy exceeding 97% for 19 out of the 30 models.

Continue Reading...

Login
Username/Email:
Password:	Lost Password?
	Remember me