人間には簡単でも生成AIには難しい問題

8/1/2025

人間には簡単でも生成AIには難しい問題として、有名なのが「strawberry問題」。
「strawberry」という単語の中に「r」が何個あるかを数えられない問題のことです。人間なら一目で「3個だよ」って分かりますが、ChatGPTやClaudeといった生成AIに聞いてみると、「2個」って答えちゃう。テキストを「トークン」という単位に分解して処理するために起こってしまう問題で、克服するのが結構むずかしいようです。良く知られている問題なのに、いまだ解決できていないのが、GPT-4o、Gemini 2.0 Flash、Claude 3.5 Sonnet、Claude 3 Opus。正解は、OpenAI o1 preview、OpenAI o1、Gemini 1.5 Proでした。
「15.19と15.5はどちらが大きい？」問題、「図書室で男の子が26人、絵本を読んでいます。男の子が16人、女の子が13人、漫画の本を読んでいます。あとから女の子が15人きて絵本を読み始めました。絵本を読んでいる人は何人ですか」問題は、GPT-4o、Gemini 2.0 Flash、Claude 3.5 Sonnet、Claude 3 Opus、OpenAI o1 preview、OpenAI o1、Gemini 1.5 Pro すべて正解でした。
「オリバーは金曜日に44個のキウイを摘みました。土曜日には58個のキウイを摘みました。日曜日は、金曜日の2倍のキウイを摘みましたが、そのうちの5個は平均よりも小さなサイズでした。オリバーはいくつのキウイを摘みましたか」問題は、Gemini 1.5 Proだけ不正解。その他は正解でした。
『「すもももももももものうち」に「も」は何個ありますか？』については、OpenAI o1とClaude 3 Opusが正解、GPT-4o、Gemini 2.0 Flash、Claude 3.5 Sonnet、OpenAI o1 preview、Gemini 1.5 Proは不正解でした。
生成AIには、こうした多くの弱点があることを知ったうえで、使いこなすことが大切でしょう。
高性能競争も重要ですが、人間には簡単でも生成AIには難しい問題の解決も重要。

2024.11.06
生成AIの「strawberry問題」、日本語では？
https://www.cybersoken.com/topics/991/

2024年10月19日
GPT-4やClaudeなどの大規模言語モデルが抱える「ストロベリー問題」とは？
https://gigazine.net/news/20241019-strawberrry-problem/

Problems Easy for Humans but Difficult for Generative AI

A well-known example of a problem that is easy for humans but difficult for generative AI is the "strawberry problem." This refers to the challenge where AI cannot accurately count the number of "r"s in the word "strawberry." For humans, it's immediately obvious that the answer is "3," but if you ask generative AI models like ChatGPT or Claude, they often answer "2." This issue arises because such models process text in "tokens," which complicates accurate counting. Overcoming this problem has proven to be quite challenging. Despite being a well-known issue, it remains unresolved in models like GPT-4o, Gemini 2.0 Flash, Claude 3.5 Sonnet, and Claude 3 Opus. The correct answers, however, have been provided by OpenAI o1 preview, OpenAI o1, and Gemini 1.5 Pro.
Other problems include questions like, "Which is larger: 15.19 or 15.5?" or, "In the library, 26 boys are reading picture books, and 16 boys and 13 girls are reading comic books. Later, 15 girls join and start reading picture books. How many people are reading picture books?" All models—GPT-4o, Gemini 2.0 Flash, Claude 3.5 Sonnet, Claude 3 Opus, OpenAI o1 preview, OpenAI o1, and Gemini 1.5 Pro—answered these correctly.
However, when it comes to a problem like, "Oliver picked 44 kiwis on Friday and 58 on Saturday. On Sunday, he picked twice as many as he did on Friday, but 5 of them were smaller than average. How many kiwis did Oliver pick?" only Gemini 1.5 Pro failed to answer correctly, while all the other models succeeded.
As for the question, "In the phrase 'すもももももももものうち' (Sumomo mo momo mo momo no uchi), how many times does the character 'も' appear?", OpenAI o1 and Claude 3 Opus answered correctly, whereas GPT-4o, Gemini 2.0 Flash, Claude 3.5 Sonnet, OpenAI o1 preview, and Gemini 1.5 Pro gave incorrect answers.
It is important to be aware of the many weaknesses of generative AI before using it.
While the race to develop high-performance AI is important, addressing problems that are simple for humans but difficult for generative AI remains a crucial challenge.

0 Comments

よろず知財コンサルティングのブログ