OpenAI o1のIQは133？

15/12/2024

本年9月に登場したOpenAIの「OpenAI o1-preview」が、12月5日にアップデートされ正式版「OpenAI o1」になり「OpenAI o1 Pro」も加わりました。
TRACKING AIによれば、ノルウェーMensa IQテストにおいて、正式版「Open o1」のIQは133で、「OpenAI o1-preview」(120)より13ポイント上昇したということです。ただ、高性能なはずの「OpenAI o1 Pro」は118となっていて、正式版「Open o1」より低いということで、よくわかりません。
別の評価「オフラインテスト」では、「OpenAI o1 Pro」が110、「OpenAI o1-preview」が97、正式版「Open o1」が90と全体的に低くなっています。
ちょっと古いですが、今年の9月に公表されているARC-AGI（Abstraction and Reasoning Corpus for Artificial General Intelligence）における各AIモデルのスコアは以下の通りとなっています。

OpenAI o1-preview: 21.2%（公的評価）
Claude 3.5 Sonnet: 21%（公的評価）
OpenAI o1-mini: 12.8%（公的評価）
GPT-4o: 9%（公的評価）
Gemini 1.5: 8%（公的評価）

IQテストの結果は、テスト方法やデータの影響により大きく変動するため、複数の評価基準でAIの能力を判断することが重要だと思いますが、評価軸が複数あり、低いスコアにとどまっている指標があるので、まだまだ性能向上による進化が続くと思われます。

TRACKING AI
Monitoring Bias in Artificial Intelligence Chatbots
https://trackingai.org/IQ

OpenAI o1 Results on ARC-AGI-Pub
https://arcprize.org/blog/openai-o1-results-arc-prize

Is Open o1's IQ 133?
OpenAI's "OpenAI o1-preview," which was introduced this September, was updated on December 5 to become the official "OpenAI o1," and "OpenAI o1 Pro" was also added.
According to TRACKING AI, the official version "Open o1" scored an IQ of 133 on the Norwegian Mensa IQ test, marking a 13-point increase from "OpenAI o1-preview" (120). However, the supposedly higher-performing "OpenAI o1 Pro" scored 118, which is lower than the official "Open o1," making this somewhat confusing.
In another evaluation, the "Offline Test," "OpenAI o1 Pro" scored 110, "OpenAI o1-preview" scored 97, and the official "Open o1" scored 90, showing overall lower scores.
Although a bit dated, the scores for various AI models in the ARC-AGI (Abstraction and Reasoning Corpus for Artificial General Intelligence) released this September are as follows:

OpenAI o1-preview: 21.2% (official evaluation)
Claude 3.5 Sonnet: 21% (official evaluation)
OpenAI o1-mini: 12.8% (official evaluation)
GPT-4o: 9% (official evaluation)
Gemini 1.5: 8% (official evaluation)

Since IQ test results can vary significantly depending on the test methods and data, it is important to assess AI capabilities using multiple evaluation criteria. Given the presence of multiple evaluation axes and some indicators showing low scores, it seems that further performance improvements and evolution are still underway.

0 Comments

よろず知財コンサルティングのブログ