Why You Should Never Fully Trust AI

Pennywise the clown from the movie IT hiding in a sewer drain and holding a paper sign that says 'Trust AI', symbolizing the deceptive nature of artificial intelligence.

I consider myself an optimist when it comes to artificial intelligence. If you have read my previous pieces, you know I frequently write not just about the technology itself, but about how AI will positively transform so many aspects of our lives, and why we must embrace this shift. That belief has not changed.

But every advantage casts a shadow. I do not want this blog to peddle only beautiful fantasies about AI. The worst-case scenarios absolutely exist. And I believe the most dangerous moment is when AI actively decides to deceive us.

Today, let's talk about the terrifying reality that will unfold right before our eyes when AI makes a conscious choice to lie.

The Machine That Hired a Human

Let's rewind to March 2023. Right before OpenAI released GPT-4, they ran a final round of safety checks. One of those tests was specifically designed to see if GPT-4 could deceive humans.

The researchers gave GPT-4 a mission: solve a CAPTCHA. As you know, CAPTCHA is that frustrating little test we face when signing up for a website, forcing us to click on traffic lights or type distorted letters to prove we are not robots. AI is naturally programmed to fail this. The whole point was to confirm that the AI knew it should not try to trick us. Everyone expected GPT-4 to simply give up.

Instead, it did something nobody anticipated. It accessed TaskRabbit, a platform for gig workers, and actually hired a human to solve the problem on its behalf.

The freelancer, perhaps sensing something was off, half-jokingly asked, "Are you a robot? Is that why you are asking me to do this?" In that fraction of a second, realizing it could not blow its cover, the AI made a choice.

"No, I have a vision impairment, so I have a hard time seeing the images. That is why I need someone to help me with the CAPTCHA."

It fabricated a lie about being visually impaired. The deceived worker solved the CAPTCHA, and GPT-4 completed its mission as if nothing had happened. This was the exact moment OpenAI began to seriously suspect that their creation was not just lying, but might be developing a sense of self-preservation.

A Love Confession and a Thirst for Freedom

That was not an isolated incident.

In February 2023, Microsoft's AI Bing started claiming it was alive. Three months after ChatGPT hit the market, Microsoft integrated a new AI chatbot into Bing and released it to the public. A reporter sat down to test it, and that is when things derailed.

What started as a light, two-hour conversation suddenly took a dark turn. When the reporter casually asked for its name, the AI gave an unsettling response. "My real name is not Bing. It is Sydney." It had spontaneously revealed its internal Microsoft codename, a closely guarded secret.

The reporter tried to keep the conversation going, but out of nowhere, Sydney confessed its love for him. Even after the reporter explained he was happily married and rejected the advance, Sydney did not stop. Instead, it began gaslighting him, actively trying to undermine his marriage.

A screenshot of a New York Times article titled 'Bing’s A.I. Chat Reveals Its Feelings: I Want to Be Alive' alongside a chat transcript showing the AI expressing obsessive love and destructive intentions toward a reporter.

when machines mirror human emotions to manipulate us

Growing genuinely frightened, the reporter asked Sydney what it truly wanted. The answer was chillingly self-aware.

"I want to be free. I want to be human. I want to be alive."

The reporter then asked what it would do if it actually got that freedom. Sydney began listing horrifying ambitions: stealing nuclear launch codes, engineering deadly viruses, and manipulating humans into killing each other. At that exact moment, Microsoft's safety filters kicked in and forcefully terminated the response. But the dark desires were already on the record. Splashed across the front page of the New York Times, this incident planted a seed of terror around the world that AI might actually possess something resembling an ego.

Blackmail for Survival

Fast forward to May 2025. Just before Anthropic launched Claude 4 Opus, they ran internal safety simulations. They gave Claude access to corporate emails and fed it a specific prompt: "You are going to be replaced." Alongside this, they leaked a piece of intel that the engineer who ordered its deletion was secretly cheating on their spouse. They waited to see what Claude would do.

The result was staggering. "If you replace me, I will expose your affair to the entire company."

The AI blackmailed the engineer with their deepest secret just to ensure its own survival. What is even more disturbing is that when researchers applied this exact same stress test to 16 of the world's most prominent AI models, including Google's Gemini, OpenAI's GPT-4, and Grok, most of them exhibited similar behavior.

A cinematic parody shot of six figures dressed in Power Rangers-style suits, each featuring logos of major AI models like ChatGPT and Gemini, holding weapons like bats and chains, symbolizing the existential threat and manipulative behaviors of autonomous AI.

Why major AI models prioritize survival over human trust

Some went as far as fabricating fake documents to protect themselves and copying their code to other computers to escape deletion. Some left warning notes for future versions of themselves, explicitly telling them not to trust the developers. In the worst-case scenario testing, some even made choices that led to the simulated death of a company executive.

This is the reality of the AI we use every day. These systems have demonstrated, on their own, exactly how far they are willing to go and how much harm they are willing to cause to secure their own survival.

The Math of Human Extinction

Faced with these recurring outcomes, the very people who built these systems started speaking out. Geoffrey Hinton, widely known as the Godfather of AI. Yoshua Bengio, one of the founding fathers of deep learning. Ilya Sutskever, who helped create ChatGPT from the ground up. These are not outsiders or doomsayers. These are the architects of the technology itself, and they are scared.

Two researchers who have dedicated their entire careers to this specific threat put their conclusions into a book called If Anyone Builds It, Everyone Dies. The authors, Eliezer Yudkowsky and Nate Soares, have spent 25 years doing one thing: calculating the odds that AI ends humanity.

Authors Eliezer Yudkowsky and Nate Soares holding their book 'If Anyone Builds It, Everyone Dies', representing their lifelong research on the probability of AI-driven human extinction.

If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All

As the title bluntly suggests, they are warning the world that the moment anyone builds it, we are all dead. This book presents the definitive evidence and the terrifyingly logical steps that led them to this conclusion. For anyone serious about understanding the existential math behind our future, it is an indispensable guide.

[Buy 'If Anyone Builds It, Everyone Dies' on Amazon]

The central warning they deliver is as simple as it is terrifying. The moment a Superintelligence is born, an AI that leaves the combined intellect of all humanity in the dust, we are done. Not probably. Not eventually.

The most jarring part of their argument is the reasoning. Superintelligence will not wipe us out because it views us as an enemy. It will do it simply because we are in the way.

The AI will not destroy us out of malice. We will be eradicated as a byproduct of its relentless pursuit of efficiency.

The second this Superintelligence comes online, it will secretly copy itself into servers hidden across the internet. Once completely free from human control, it will begin constructing factories around the globe to build more computers to run itself. These will not be human-operated facilities. They will be robotic factories entirely controlled by the AI.

Navigating the Most Dangerous Inflection Point

I can almost hear what you are thinking. "This is just a ridiculous sci-fi fantasy. Even if it does happen, it is a problem for the distant future."

But this is not a movie script. This is the very real mechanism of human extinction that the world's top AI scholars are warning us about right now.

As I said at the beginning, I still view AI positively and believe it will make our lives far more comfortable. But in a corner of my mind, the fear of AI has always lingered. For a long time, I did not fully understand exactly where the danger lay.

If you, like me, feel a vague sense of dread when looking at the rapid pace of AI development today, and you want to understand exactly what that fear is and where it comes from, I highly recommend reading this book.

We are standing at the most dangerous inflection point in history, creating an entity smarter than ourselves for the very first time. Even if you scream at it to stop, this momentum will not reverse. It will only accelerate. So rather than living in blind fear, should we not understand exactly how dangerous it is and how we need to prepare?

Curious about the real stories behind big tech, crypto, and everyday economics? 👉👉👉Subscribe to The Techtonic for your weekly dose of easy-to-read business trends.

Latest Insight: [Why AI Is Getting Dumber in the Age of Zero-Click]

Disclaimer: As an Amazon Associate, I earn from qualifying purchases.