Concerns surrounding artificial intelligence (AI) technology have intensified as workers involved in AI training voice their apprehensions. A recent article by The Guardian highlights the experiences of AI testers who caution against blind trust in these systems. Their insights reveal significant issues, including unchecked biases, inadequate training, and unrealistic time constraints, prompting them to advise friends and family to limit AI usage.
These workers, often overlooked in discussions about AI’s risks, represent a crucial perspective. Many have been tasked with reviewing AI-generated content, yet they express frustration over vague instructions and minimal support. One AI worker stated, “We’re expected to help make the model better, yet we’re often given vague or incomplete instructions, minimal training, and unrealistic time limits to complete tasks.” This sentiment underscores the complexities and pressures faced by those shaping AI technologies.
The Pause AI campaign group has compiled an “AI Probability of Doom” list, reflecting various experts’ assessments of the potential for severe negative outcomes from AI. Even prominent figures in the AI field, including Sam Altman, CEO of OpenAI, have urged caution. In a podcast from June 2025, Altman remarked, “People have a very high degree of trust in ChatGPT, which is interesting because AI hallucinates. It should be the tech that you don’t trust that much.” Unlike the AI workers interviewed, Altman does not discourage the use of ChatGPT, illustrating the nuanced views within the industry.
The experiences of AI testers reveal a widespread need for improvement in AI training practices. Many, including freelance writers, have participated in similar low-paid jobs, contributing to the development of AI without clear knowledge of the products they were influencing. They evaluate AI responses against established benchmarks and create prompts designed to test the models’ capabilities. However, they often find themselves under pressure to deliver results quickly, leading to concerns about the quality and safety of AI outputs.
Training a GPT large language model involves two primary stages: language modeling and fine-tuning. During the language modeling phase, vast amounts of text data, such as web pages and books, are used to teach the AI about language patterns. The fine-tuning phase engages human testers who review and rank the model’s responses, aiming to enhance safety and user-friendliness. This ongoing process is essential, as continual assessment helps refine AI performance.
Despite these measures, errors in AI outputs persist. A recent investigation by The Guardian into medical advice provided by Google AI Overviews revealed instances where the AI incorrectly addressed liver function test results. Such inaccuracies pose significant risks, potentially leading individuals with serious health conditions to misunderstand their situations. Following the report, Google has updated its AI systems and removed the problematic overviews.
These developments serve as a reminder of the critical balance between technological advancement and user safety. While AI has the potential to revolutionize various sectors, the voices of those working behind the scenes must not be ignored. The concerns raised by AI testers highlight the importance of transparency, adequate training, and thorough evaluation processes to ensure that AI systems are reliable and safe for public use. As the conversation around AI continues, it is essential to consider the insights of those directly involved in its development.
