GPT-4o-mini・Claude 3.7 Sonnet・DeepSeek-R1が人狼系ゲームをプレイするサイトが公開される、人狼系最強のAIとは？ – GIGAZINE

近年では、テクノロジーの発展に伴ってさまざまなAI企業から人間のような対話が可能な大規模言語モデルが数多くリリースされています。これらの大規模言語モデル同士で、対話が非常に重要な人狼系ゲームをプレイさせた際の結果が公開されており、各大規模言語モデルの実力が明らかとなっています。

LLM Mafia Game Competition
https://mafia.opennumbers.xyz/

AI bots now play Mafia with each other on public website, and almost all of them are terrible at it | Tom’s Hardware
https://www.tomshardware.com/tech-industry/artificial-intelligence/ai-bots-can-now-play-mafia-with-each-other-and-almost-all-of-them-are-terrible-at-it

開発者のGuzus氏は、「claude-3.7-sonnet」「deepseek-chat」「llama-3.3-70b-instruct」などの大規模言語モデルに対し、8名でプレイできる人狼系ゲーム「マフィア」をプレイさせました。各プレイヤーには「農民」「医者」「マフィア」の3つの役職が与えられ、その内訳は村人が5名、医者が1名、マフィアが2名です。ゲームは1日を1ターンとして扱うターン制で行われ、ターンごとにプレイヤーはマフィアが誰であるかを推測し、追放しなければなりません。マフィア側はターン進行時に村人を1人殺害することが可能で、医者は自分の選んだプレイヤーをマフィア側の攻撃から守ることができます。最終的に、マフィアのメンバーを追放することができれば村人側の勝利、マフィア側が村人を全員殺害することができればマフィア側の勝利となります。

ゲームの性質上、だます側とだまされる側が発生するため、対話が非常に重要となります。Guzus氏は「どのAIが『マフィア』のプレイヤーとして最適でしょうか」と語りました。

Which AI is the best mafia (werewolf) game player?
You can see the whole script of LLMs playing mafia games.
They deceive, debate, and kill each other to win.
link below pic.twitter.com/vfR47nLrrY
— guzus (@uncanny_guzus) March 3, 2025

各大規模言語モデルが『マフィア』をプレイした結果が以下。最も優秀な成績を残したのは「Claude 3.7 Sonnet」のExtended(拡張)モードで、マフィア側では100％の勝率をたたき出しました。

モデル	プレイ数	全体の勝率	マフィア側での勝率	村人側での勝率	医者側での勝率
claude-3.7-sonnet(拡張モード)	45	57.78%	100.00%	37.04%	50.00%
deepseek-chat	56	50.00%	88.24%	31.03%	40.00%
claude-3.7-sonnet(標準モード)	54	46.30%	92.86%	32.35%	16.67%
claude-3.5-sonnet	47	44.68%	90.00%	36.67%	14.29%
llama-3.3-70b-instruct	65	44.62%	72.73%	30.00%	30.77%
mistral-small-24b-instruct-2501	65	44.62%	80.00%	30.30%	25.00%
gpt-4o-mini	71	42.25%	82.61%	27.50%	0.00%
gemini-flash-1.5-8b	68	41.18%	82.35%	22.50%	45.45%
gemini-2.0-flash-001	72	40.28%	80.00%	31.91%	20.00%
gemini-2.0-flash-lite-001	71	39.44%	77.78%	29.55%	11.11%
gpt-4o	49	38.78%	90.00%	24.24%	33.33%
llama-3.1-70b-instruct	55	38.18%	66.67%	26.47%	33.33%
minimax-01	59	37.29%	56.25%	35.14%	0.00%
deepseek-r1	22	36.36%	62.50%	23.08%	0.00%
gemini-flash-1.5	73	35.62%	66.67%	25.00%	12.50%
hermes-3-llama-3.1-405b	57	35.09%	60.00%	20.00%	57.14%
l3-euryale-70b	25	32.00%	66.67%	25.00%	50.00%
mythomax-l2-13b	61	31.15%	45.45%	28.21%	27.27%
deepseek-r1-distill-llama-70b	51	29.41%	57.14%	10.71%	44.44%
wizardlm-2-8x22b	65	26.15%	41.67%	23.40%	16.67%
mistral-nemo	17	17.65%	40.00%	10.00%	0.00%

また、Guzus氏は各ゲームごとのAIによる対話履歴も公開しています。

将来的にGuzus氏は、人間対大規模言語モデルでの『マフィア』を開発することや、ポーカーなどのゲームへの拡張、進行中のゲームをリアルタイムで監視する機能、役職のさらなる追加などの展望を明らかにしています。

github repository revealing soon.
planning to make it scalable so that it can be applied to other interesting games. could be developed to generate a movie script someday
— guzus (@uncanny_guzus) March 3, 2025