OpenAI o1の推論能力

20/10/2024

2024年9月13日、OpenAIは新たなAIモデル「OpenAI o1」（「o1-preview」と「o1-mini」の2種類）を発表しました。o1は従来のGPT-4oモデルと比較して、複雑な推論タスクの処理に特化していて、全米数学オリンピック予選で上位500位相当の成績を収め、プログラミング世界大会で金メダルレベルの実力を示し、物理・生物・化学の分野で人間の博士レベルを超える精度を実現しているということです。（現在はChatGPT 有料ユーザーのみ利用可能）
この”推論に強い”とされる「OpenAI o1」について、AppleのAI研究者はOpenAI o1が実際には“推論”能力は持っていないという論文を発表したため、「OpenAI o1」が本当に人間と同レベルの数学的推論をこなすことができるのか、議論が活発化しているようです。

AIは「思考の連鎖」で数学の問題を解けるようになるのか、GoogleやAppleが検証
2024.10.18
https://xtech.nikkei.com/atcl/nxt/column/18/02801/101600011/

「o1-preview」は自己評価メカニズムを持つ　計画立案中に自分の行動をチェックして修正　
2024/10/18
https://ai-data-base.com/archives/77179

[Submitted on 30 Sep 2024 (v1), last revised 14 Oct 2024 (this version, v4)]
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
https://www.arxiv.org/abs/2409.19924

2024/10/16 [WED]
OpenAI「o1」を徹底分析：”推論に強い”とされるo1が従来のモデルとは違う理由は？イエール大などの研究グループが発表
https://ledge.ai/articles/analysis_of_openai_o1

「OpenAI o1の論理的推論は生成AIの概念を覆す」、博報堂DYHDの森執行役員
貴島逸斗日経クロステック／日経NETWORK
2024.10.15
https://xtech.nikkei.com/atcl/nxt/column/18/02975/100900001/

Appleの研究が示す、OpenAI「o1」は本当に推論が得意なのか？
2024年10月15日
https://reinforz.co.jp/bizmedia/60811/

オープンAI日本法人社長が明かす、サム・アルトマンCEOからの「指示」と最新モデル「OpenAI o1」の狙い
2024.10.14
https://diamond.jp/articles/-/351533

「現在のLLMに真の推論は困難」──Appleの研究者らが論文発表
2024年10月13日
https://www.itmedia.co.jp/news/articles/2410/13/news070.html

AppleのAI研究者はOpenAI o1が実際には“推論”能力は持っていないと主張している
202410/13
https://xenospectrum.com/apple-ai-researchers-believe-openai-o1-does-not-actually-have-inference-capabilities/#google_vignette

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models 　[Submitted on 7 Oct 2024]
https://arxiv.org/pdf/2410.05229

Inference Capabilities of OpenAI o1

On September 13, 2024, OpenAI unveiled a new AI model named "OpenAI o1," which comes in two versions: "o1-preview" and "o1-mini." Compared to the existing GPT-4o model, o1 is specifically designed to handle complex reasoning tasks. It reportedly achieved results equivalent to ranking in the top 500 of the U.S. Math Olympiad, demonstrated gold-medal-level capabilities in international programming competitions, and surpassed human PhD-level accuracy in the fields of physics, biology, and chemistry. (Currently, it is available only to paid ChatGPT users.)
However, regarding this "reasoning-focused" model, an AI researcher at Apple published a paper arguing that OpenAI o1 does not actually possess genuine "reasoning" capabilities. This has sparked an active debate about whether "OpenAI o1" can truly perform reasoning on par with humans.

0 Comments

よろず知財コンサルティングのブログ