The dawn of the vishing!

The dawn of the vishing!

08.11.2023

"Hello, this is Alice, from Bank ABC; as part of our due diligence process and as mandated by the Central Bank, we are required to validate and update your personal details in case of any changes. As a first step, we need to validate your contact details. Please provide your registered mobile number and email address."

"Hello, this is Mark from IT. As part of our continuous systems monitoring, I can see that you haven't changed your password for over six months. As per our security policy, it is mandatory to change the password every 6 months to ensure the security of our systems. For your convenience, I have just sent you an SMS with a link where you can change your password. Have a good day."

Both examples are what we call voice phishing (Vishing). Both examples belong to the so-called Tech-Support scam or call center scam. A typical pattern of this type of scam is that the victim is being called on the phone, and the criminal is posing as a technical support person, customer service representative, or government employee (police, IRS, etc.) to gain the trust of the potential victim.

As per the most recent FBI IC3 Report, this type of fraud has caused losses of more than 1B USD only in the U.S. in 2022. Another shocking figure is the number of victims, which more than doubled in just last 3 years and from which almost half of all victims are reported to be over 60 years old, whereas they bear 70% of overall losses.

The current situation is bad, so how much worse can it get? Much, much, worse! Let me explain.

The current model of operation is quite simple. Fraudster rents an open space office somewhere in southeast Asia, most commonly in India, with a huge English-speaking population. In a country where the youth unemployment is 20%+, it is not hard to find several tens of call center "employees," especially if you incentivize them with a proper bonus scheme. Equip them with a computer, phone, and a chair, and you are all set.

As easy as this might sound, it requires some real effort, especially the part handling the employees and staying under the radar of the police. It is also difficult to scale quickly as the actual "work" is done by employees who need to be hired, at least a bit trained, and retained. As this business grows, the general public becomes more aware due to country-wide awareness campaigns. After a while, you might attract the attention of vigilantes and/or social media streamers (e.g., Scammer Payback and many others) who start to disturb your business and cut into your revenue. Still, as seen above in the report, this business is growing fast, and despite the efforts of global communities and the US, along with the Indian government, there are no signs of slowing down.

But what if we could improve on this business model by leveraging the newest advancements in AI? Let's see whether we could improve the above model a bit. The first thing to focus on would be the hard-to-scale part = the call center agent. Would it be possible to replace the agent with a Large Language Model? Such a model would need to converse with a potential victim, understand the responses, and lead the conversation to a desired pre-defined outcome. Do we have such a capability today? Well, looking at ChatGPT 4, it would seem we do. Or, if your expectations are high, we are very close to it. Actually, with a general model like ChatGPT4, we have a far more knowledgeable "call center agent" than the one we could hire, not to mention its capability to adapt to the new use-case or scenario seamlessly.

OK, we have an agent who can interact with potential victims via prompts, but we need to translate the text into a call conversation. So, are we able to transform the ChatGPT responses into natural language? Yes, we do! We have several natural Language Processing (NLP) AI text-to-speech models that can generate custom-defined voice on the fly (see screenshot of customization options from PlayHT above). We can define the language, the tone, the gender, and other attributes for the generated voice to sound as human-like as possible (say bye-bye to undesired accent). The same approach is applied for voice-to-text translation of human replies which are fed back to LLM agent for processing.

The above diagram depicts this new and improved setup, where the human call center agents are replaced by AI models capable of interacting with the potential victims and leading the conversation into the desired end-state - be it credential harvesting or an actual direct financial loss via enforced funds transfer or other defined outcome.

Such a setup would be scalable, elastic, can be easily moved from one physical location to another, and can be easily deployed or shut down as and when needed. The setup wouldn't require physical location or substantial human involvement and could be easily provided "As A Service" from a "safe jurisdiction."

Now, after going through the above scenario, I hope you realize the logic behind the title of this blog. If the above steps are sufficiently fine-tuned and polished, we will see a massive increase in the overall losses attributed to vishing in the months and years to come.

The LLM model might become so authentic and well-trained with even customer-specific data that we will start questioning any phone conversations we have. While today, we as customers have to confirm our identity (e.g., via MFA, tokens, etc.) to assure the companies that it is genuinely us interacting with them, in the future, such authentication will be required for both sides. Organizations/companies will need to have a way to confirm to their customers that it is indeed them trying to reach out to them.

We are indeed living in exciting times, but please be vigilant and spread awareness about the risks to all around you - your family, friends, and colleagues!