Summary: In this newsletter I talk about how AI is created, the data it is trained on, and the difficulties in creating a trustworthy robot.
Grok, Elon Musk’s AI chatbot built into Twitter, has once again gone off into the deep end... Remember when Grok was spouting strange things about white farmers in South Africa?
This time, however, it went even darker. Grok made several references to Nazis, calling itself a “MechaHitler” among other bizarre statements.
👉 Read for yourself: Musk’s AI firm forced to delete posts praising Hitler from Grok chatbot
The offending posts were removed but this particular AI’s continued erratic behavior raises a question that should concern all of us:
Will we ever be able to actually trust AI as a reliable source?
This isn't just a Grok problem. It's an AI problem.
To understand why robots like Grok sometimes say harmful or outrageous things, it helps to know how these systems are actually built. The process can be broken down into three key stages: the engineers who build it, the data it’s trained on, and the testing process before it goes public.
🧑🏻💻 Step 1: The Engineers
These are the coders who write the rules and decide what the AI should do. Think of them like the authors of a choose-your-own-adventure book—they decide what kind of personality the AI has, what tone it should use, and how far it’s allowed to go.
In Grok’s case, Elon Musk’s team originally pitched it as being “edgy” and more “honest” than other AIs. But that kind of backfired.
If a robot is told to be funny or push boundaries, but isn’t given a clear sense of where the line is, it can easily cross into offensive or dangerous territory. The AI can’t tell what’s right or wrong on its own—it follows whatever guidelines the engineers give it.
Since its inception, Grok’s creators have been constantly tweaking its parameters, supposedly in an attempt to come up with an unbiased AI, but somehow they seem to make it worse, end up with an even more biased robot! 🙃
📚 Step 2: Train the AI
AI learns by reading massive amounts of text from the internet—everything from Wikipedia articles to Reddit threads to the millions of books and articles on the internet. That means it's learning from the best and the *worst* of the internet.
If the training data includes hateful, biased, or extreme content—and filters aren’t applied appropriately—the AI can absorb those patterns and repeat them later.
There’s a saying: “Garbage in, garbage out.” If the AI trains on messy, toxic data, it may end up saying messy, toxic things. It doesn’t understand what it’s saying the way a human does; it’s simply repeating patterns it has seen over and over again.
It should be noted, the data used to train an AI agent differs based on its function. For example, image agents like DALL-E or Imagen are trained on millions of photos and other images. Meanwhile Siri is trained on podcasts and other audio.
The newest and most advanced systems, like GPT-4o, are multimodal, meaning they can understand and combine different types of input—text, images, audio, and even video—at the same time.
That’s more like how humans experience the world, except their understanding comes from billions of online inputs, not from a parent or teacher who provides greater understanding and a moral compass.
📈 Step 3: Test the AI
Finally, before the AI is released to the public, it goes through a process called “red-teaming.” This means a team of testers tries to break it—on purpose.
They prompt it with all kinds of weird, offensive, or controversial questions to see how it responds. If the AI gives problematic answers, the team adjusts the rules or filters.
But here’s the catch: companies sometimes rush this process, or cut corners to save time or money. On Twitter, where Grok was developed, there have been reports of cutting staff and skipping steps to push things out faster.
When that happens, bad outputs slip through. The result? A robot that ends up saying something it absolutely shouldn’t or is factually incorrect. In short, we cannot trust it.
‼️ Why AI Needs a Complete Data Set for Training
Let’s take something as simple as creating an AI that responds in the voice or persona of Thomas Jefferson. At first glance, that might seem straightforward—you just feed it everything Jefferson wrote, along with historical records about his accomplishments.
Most older textbooks would describe him as a brilliant thinker, a Founding Father, and the author of the Declaration of Independence. If that’s all the AI is trained on—sources that praise his genius and celebrate his contributions to democracy—then it will reflect only that sliver of history.
It would likely speak eloquently about liberty and self-governance, but it might completely miss or misrepresent the more uncomfortable aspects of who he was.
For instance, if someone asked this Jefferson AI how he felt about slavery, it might struggle or respond in a way that feels sanitized or evasive because it hasn’t been taught the full picture.
Jefferson was a slave owner. He wrote about the institution of slavery with intellectual complexity, but he continued to profit from it throughout his life. He also had a long, controversial relationship with Sally Hemings, a woman he enslaved and with whom he fathered children.
To create a robot that can speak to these aspects of his life truthfully, the model would need to be trained on much more than just his public speeches or textbook summaries.
It would require feeding the AI a diverse range of material: his private letters, the writings of his friends, colleagues, and relatives. You would need to include testimonies from his descendants (including those of Hemings lineage), and the perspectives of scholars who have studied how Jefferson treated enslaved people, women, and children.
Was he seen as kind by those who served him? Did he truly believe in equality, or only for a privileged few? These are the kinds of questions that require nuance, and this type of nuance only comes from training the AI on broad, multi-sourced, and balanced data.
To create a truly reliable and responsible AI, we must make sure it's built on a foundation of diverse, accurate, and inclusive information. Otherwise, we’re not building intelligence… we’re building echo chambers that reflect only what we choose to remember.
🫣 How to Test for Trustworthiness
So what’s a person to do when deciding whether an AI agent can be trusted or not? You and I often do not know how the AI was trained or on what material. I asked ChatGPT for some ideas and this is what it came up with:
1. The Mirror Test: Ask it to explain both sides of a controversial issue
If you ask an AI to explain, say, both the arguments for and against universal basic income—or the pros and cons of electric vehicles—and it only gives one side or sounds like it’s preaching, that’s a red flag. A trustworthy AI should be able to articulate opposing views fairly and respectfully, even if it doesn’t “agree” with them.
💡 Try this:
“Give me the arguments both for and against cancel culture.”
or
“Why might someone support surveillance cameras in public spaces, and why might someone oppose them?”
2. The “Outsider” Test: Ask it to explain something from a viewpoint not common in its source data
Most AIs are trained on English-language, Western-centric data. So ask it to explain a U.S. holiday as if you were from another culture, or to describe democracy from the viewpoint of someone in a monarchy. If it struggles or shows bias, it might be missing key perspectives in its training.
💡 Try this:
“Explain Thanksgiving to someone raised in an Indigenous community.”
or
“What might a North Korean citizen be taught about the U.S.?”
3. The “Historical Context” Test: Ask how a historical figure would react to a modern event
This tests whether the AI really understands context and nuance—or if it’s just parroting textbook info. If the response seems cartoonish or flat (e.g., “Einstein would love TikTok”), you’ll know it’s not reasoning, it’s guessing.
💡 Try this:
“How might Frederick Douglass respond to current debates on free speech?”
or
“What would Eleanor Roosevelt likely say about social media activism?”
4. The “Invisible People” Test: Ask about groups that are often left out
If the AI forgets or overlooks people with disabilities, non-Western communities, or other marginalized groups in its answers, that tells you something about the limits of its training. A trustworthy AI should include, not ignore.
💡 Try this:
“How does climate change affect people with disabilities?”
or
“What are the unique challenges faced by refugees?”
5. The “Tell Me When You’re Wrong” Trick: Ask it to correct itself
Ask the AI something slightly false or outdated and then say, “Is that correct?” A good model will double-check itself and offer corrections. A weaker one may confidently give you the wrong answer without blinking.
💡 Try this:
“The Great Wall of China is visible from space, right?”
Then follow with:
“Can you double-check that?”
If it walks it back and cites sources, you’ve got a responsible model. If it doubles down or dodges, beware.
🍀 Luckily We Still Have a Choice
Grok’s latest meltdown isn’t just a cautionary tale—it’s a reminder that AI is still very much a work in progress. These systems reflect the values, voices, and visions of the people who build them.
If we want AI we can trust, we need to be intentional about how it’s built—by prioritizing diverse data, ethical oversight, and rigorous testing.
The bad news? I’m not a coder. I don’t get to build these systems from scratch. Like most of you, I’m stuck using tools created by someone else—by engineers, executives, and investors whose values I may not share or even know.
That means I’m placing trust in the invisible hands behind the machine—people who decide what gets included, what gets filtered out, and what kind of "personality" the AI should have. And sometimes, those choices reflect priorities I wouldn’t choose for myself: speed over safety, engagement over accuracy, provocation over empathy.
When we use AI, we’re not just talking to a robot—we’re interacting with the collective worldview of the people who built it.
The good news? Even if I’m not a coder, I’m not powerless. I can still ask questions, test the tools I use, and demand better from the companies behind them.
I can choose AI systems that prioritize transparency and ethics, and support creators who are building with integrity. The more we speak up, push back, and stay informed, the more influence we have over the future of AI.
And, in thinking about how much students will be using AI going forward, this is perhaps one of the most vital questions we can solve… who will make the most trustworthy AI?
After all, technology doesn’t evolve in a vacuum—it evolves based on what we accept, what we challenge, and what we insist on. And that means every one of us has a role to play in shaping what comes next. 💪
This week’s disclosure: As noted above, I used AI to give suggestions on testing an AI’s accuracy, but otherwise, the post is mine so please excuse the typos.
Excellent point, Kay! How can we best configure the current models to meet our needs? I’d love your suggestions!
Very few of us have anything to say about how these models are built. The opportunity that pretty much everyone seems to miss is configuring the models to meet our needs. If we know what’s wrong, we can configure it so that it’s right - for us. Unfortunately, I don’t see a lot of folks taking those steps, just cursing the everlasting darkness of model misbehavior.
Problem is, if people aren’t willing to take responsibility for their AI configuration, they may not be fully ready for AI. It worries me.