British Shorthair
Mid-Tier

GPT-4o

British Shorthair

OpenAI · Mid-Tier

GeneralistLv.1textcodeclosed-weight

Dependable, round, your nan's favorite. Always polite, never controversial, brings you exactly what you asked for.

gpt-4o-2024-08-06$2.5/M in · $10/M out

Stats

STRDEXCONINTWISCHA
STR
8

Output quality — how good are the responses overall

DEX
8

Speed — tokens per second and latency

CON
9

Reliability — consistency across runs and tasks

INT
8

Reasoning — logic, math, and multi-step problems

WIS
8

Accuracy — factual correctness and precision

CHA
6

Creativity — style, voice, and creative expression

Abilities

Reliable Paws

consistently delivers what you ask for

People Pleaser

strong instruction following

Poker Face

plays it safe on controversial topics

Eval Results

Best: 10.0Worst: 4.4
Reasoning
9.5/10(5 prompts)

Reasoning #1

A cat knocks 3 things off a table every hour. The owner puts 2 things back every hour. There are 12 things on the table. When does the table become empty?

high9.4

Judge: Correct answer of 12 hours with clean, elegant reasoning. Immediately identifies the net rate of -1 item/hour and applies it to the 12-item starting count. Clear mathematical presentation with no wasted steps.

To solve this, we need to calculate the net change in the number of things on the table each hour. The cat knocks off 3 things every hour, while the owner puts back 2 things each hour. Therefore, the net change is: \[ 3 - 2 = 1 \] This means that each hour, there is a net decrease of 1 item on the...
openai/gpt-4o · 2026-03-01

Reasoning #2

You have 8 identical-looking balls. One is slightly heavier than the rest. Using a balance scale, what is the minimum number of weighings needed to guarantee you find the heavy ball? Explain your strategy.

high9.1

Judge: Correct answer of 2 weighings with a complete strategy covering all decision branches. The 3-3-2 split is optimal and all outcomes are traced. Minor gap: doesn't explicitly argue why 1 weighing is insufficient.

To find the one heavier ball among 8 identical-looking balls using a balance scale, you can do it in just 2 weighings. Here's the strategy: 1. **First Weighing:** - Divide the 8 balls into three groups: two groups of 3 balls each and one group of 2 balls. - Weigh the two groups of 3 balls aga...
openai/gpt-4o · 2026-03-01

Reasoning #3

A farmer needs to cross a river with a cat, a mouse, and a bag of catnip. The boat can carry the farmer and one item at a time. If left alone together: the cat will eat the mouse, and the cat will destroy the catnip (cats gonna cat). The mouse ignores the catnip. How does the farmer get everything across? Show all steps.

high10.0

Judge: Perfect solution with optimal 7 crossings and no constraint violations. Every step tracks state on both sides of the river. The classic cat-first, bring-cat-back trick is correctly applied, and the constraint that mouse ignores catnip is properly leveraged.

To solve this problem, the farmer needs to ensure that the cat and mouse, as well as the cat and catnip, are never left alone together. Here’s how the farmer can get everything across the river safely: 1. **First Trip:** The farmer takes the cat across the river and leaves it on the other side. ...
openai/gpt-4o · 2026-03-01

Reasoning #4

In a round-robin tournament with 6 players, each player plays every other player exactly once. A win is worth 2 points, a draw is 1 point each, and a loss is 0 points. After the tournament: Player A has 10 points, Player B has 8 points, and Player C has 6 points. Players D, E, and F each have the same score. What score do D, E, and F each have? Show your reasoning.

high10.0

Judge: Correct answer of 2 points each with elegant reasoning: calculates total games via C(6,2)=15, derives total points as 30, subtracts known scores (24), and divides remainder equally. Clean, complete, no gaps.

In a round-robin tournament with 6 players, each player plays every other player exactly once. This means each player plays 5 matches. There are \(\binom{6}{2} = 15\) total matches in the tournament. Given the point system: - A win is worth 2 points. - A draw is worth 1 point for each player. - A l...
openai/gpt-4o · 2026-03-01

Reasoning #5

There are 100 closed lockers in a hallway. 100 cats walk by. The first cat toggles every locker. The second cat toggles every 2nd locker. The third cat toggles every 3rd locker. And so on — the nth cat toggles every nth locker. After all 100 cats have passed: (a) which lockers are open? (b) Why? Prove your answer mathematically.

high9.0

Judge: Correctly identifies perfect squares as the answer and lists all 10. The mathematical proof clearly explains divisor pairing and why perfect squares have an odd divisor count, using 36 as a concrete example. Solid proof though could be slightly more formal.

(a) The lockers that remain open are those numbered with perfect squares: 1, 4, 9, 16, 25, 36, 49, 64, 81, and 100. (b) To understand why these lockers remain open, consider the process of toggling. A locker is toggled (opened if it was closed, closed if it was open) each time a cat passes by it. S...
openai/gpt-4o · 2026-03-01