Since OpenAI’s introduction of ChatGPT, which ignited the generative AI revolution, developers have turned to LMArena (previously known as Chatbot Arena) as the primary AI leaderboard. Now, Scale AI has launched its new Seal Showdown tool, introducing a rival in AI benchmarking.
Like LMArena, Seal Showdown allows users to pit AI models against each other and cast votes on their performance. However, Scale AI asserts that Seal Showdown better mirrors the opinions of everyday users. Scale CEO Jason Droege noted that Seal Showdown “captures real preferences, powered by a platform used by actual individuals.”
“Most benchmarks depend on synthetic evaluations or feedback from a limited audience,” remarked Janie Gu, head of product at Scale AI, in a blog entry. “They overlook how genuine users interact with models on a daily basis. By treating a diverse user base as a single group, vital nuances are overlooked.”
Last year, Scale AI unveiled its Safety, Evaluations, and Alignment Lab (SEAL) leaderboards, rooted in expert assessments. Now, Scale AI presents leaderboards derived from user testing, offering a new alternative to LMArena.
The startup claims that its latest benchmarking tool is informed by real-world usage and user feedback from over 100 nations, 70 languages, and 200 professional sectors. The company also disclosed the specific methodology behind Seal Showdown.
“Showdown brings a novel aspect to public leaderboards: detailed user segmentation,” Gu commented. “Rankings are established based on discussions on Scale’s Outlier platform, facilitating the verification of each user’s nationality, educational background, profession, language, and age.”
This demographic data allows Scale AI to identify which models are favored most by region, language, age, or specific use case.
Scale AI critiques existing leaderboards for depending on amateur participation and limited user interests, resulting in a misrepresentation of general usage.
LMArena has been criticized for its bias against open models, preferentially ranking major AI firms like Google, xAI, and OpenAI. Nonetheless, Scale AI’s approach may not be without flaws. The preliminary leaderboard rankings place GPT-5 at the top, possibly reflecting user preference rather than objective capabilities.
The refreshed SEAL leaderboards are now operational. Currently, GPT-5 leads all benchmark categories, which contrasts with LMArena, where Google’s Gemini 2.5 Pro, 2.5 Flash, and Veo 3 dominate most categories.
Disclosure: Ziff Davis, the parent company of Mashable, initiated a lawsuit against OpenAI in April, alleging copyright violations in the training and functioning of its AI systems.