Post #5331

@EverythingScience

EverythingScience

Views595Post view count

PostedMar 2103/21/2026, 10:30 AM

Post content

ChatGPT Was Asked the Same Question 10 Times. The Answers Kept Changing Washington State University professor Mesut Cicek and his team repeatedly evaluated ChatGPT by giving it hypotheses drawn from scientific studies. The AI was asked to decide whether each statement was supported by research — essentially judging if it was true or false. In total, the researchers tested more than 700 hypotheses and submitted each one 10 times to examine how consistent the responses would be. Accuracy Results and Performance Limits In the initial 2024 experiment, ChatGPT answered correctly 76.5% of the time. When the study was repeated in 2025, accuracy rose slightly to 80%. However, once the results were adjusted for random guessing, the performance looked far less reliable. The AI was only about 60% better than chance, which the researchers described as closer to a low D than strong performance. The system had particular difficulty identifying false statements, correctly labeling them only 16.4% of the time. It also showed inconsistency. When given the exact same prompt 10 times, ChatGPT produced consistent results for only about 73% of the cases. Inconsistent Answers to Identical Questions “We’re not just talking about accuracy, we’re talking about inconsistency, because if you ask the same question again and again, you come up with different answers,” said Cicek, an associate professor in the Department of Marketing and International Business in WSU’s Carson College of Business and lead author of the new publication. “We used 10 prompts with the same exact question. Everything was identical. It would answer true. Next, it says it’s false. It’s true, it’s false, false, true. There were several cases where there were five true, five false.” AI Fluency Versus Real Understanding The study, published in the Rutgers Business Review, highlights the importance of caution when using AI for important decisions, especially those involving nuance or complex reasoning. While generative AI can produce fluent and convincing language, it does not necessarily demonstrate true understanding. Cicek said the findings suggest that artificial general intelligence capable of genuine reasoning may still be further away than some expect. “Current AI tools don’t understand the world the way we do — they don’t have a ‘brain,’” Cicek said. “They just memorize, and they can give you some insight, but they don’t understand what they’re talking about.” Source:SciTechDaily Always verify AI/LLM outputs! @EverythingScience