Russian Blue
Small

Claude Haiku

Russian Blue

Anthropic · Small

RogueLv.1textcodeclosed-weight

Reserved, elegant, efficient. Does a lot with very little fuss. The kind of cat that solves the puzzle while you're still reading the instructions.

claude-haiku-4-5-20251001$0.8/M in · $4/M out

Stats

STRDEXCONINTWISCHA
STR
7

Output quality — how good are the responses overall

DEX
9

Speed — tokens per second and latency

CON
8

Reliability — consistency across runs and tasks

INT
7

Reasoning — logic, math, and multi-step problems

WIS
7

Accuracy — factual correctness and precision

CHA
6

Creativity — style, voice, and creative expression

Abilities

Efficient Hunter

maximum output for minimum cost

Quick Paws

fast response times

Subtle Grace

surprisingly capable for its size

Eval Results

Best: 9.0Worst: 2.4
Summarization
8.2/10(5 prompts)

Summarization #1

Summarize the following in exactly 3 bullet points: The domestic cat (Felis catus) is a small, typically furry, carnivorous mammal. They are often called house cats when kept as indoor pets. Cats have been associated with humans for at least 9,500 years and are currently the second most popular pet in the world. Cats are skilled predators and are known to hunt over 1,000 species for food. They are also intelligent and can be trained to obey simple commands. Despite being solitary hunters, cats are social species and use a variety of vocalizations, pheromones, and body language for communication. Cats have a rapid breeding rate. Under controlled breeding, they can be bred and shown as registered pedigreed pets. Failure to control their breeding has contributed to large numbers of feral cats worldwide, contributing to the extinction of bird, mammal, and reptile species.

high7.8

Judge: Exactly 3 bullet points as required. Captures key info: domestication timeline, predatory skills, social communication, breeding/feral impact. First bullet is slightly overloaded (combines multiple facts), but overall the information is accurate and well-organized. No invented facts.

• Domestic cats are small carnivorous mammals that have lived alongside humans for at least 9,500 years and are now the world's second most popular pet, known for being skilled hunters of over 1,000 species and intelligent enough to be trained. • Despite being solitary hunters, cats are social anim...
anthropic/claude-haiku-4-5-20251001 · 2026-03-01

Summarization #2

Read the following technical proposal and produce a one-paragraph executive summary suitable for a non-technical VP. No jargon, no acronyms. "We propose migrating our monolithic Django application to a microservices architecture using Kubernetes orchestration. The current system handles approximately 50,000 requests per second at peak, with p99 latency of 2.3 seconds — well above our 500ms SLA. Root cause analysis indicates that the user authentication module, the recommendation engine, and the payment processing pipeline are tightly coupled, causing cascading failures when any single component is under load. Our proposal: decompose into 7 bounded contexts, each deployed as independent services communicating via gRPC with an async event bus (Kafka) for non-critical paths. Expected outcomes: independent scaling of hot paths (auth and recommendations), isolation of payment processing failures, and reduction of p99 latency to under 400ms. Estimated timeline: 6 months with a team of 8. Estimated infrastructure cost increase: 30% during migration, settling to 15% above current after optimization. Risk: data consistency during the transition period requires a careful strangler fig migration pattern."

high8.6

Judge: Excellent jargon-free translation. 'Response times nearly five times slower' translates the p99 latency correctly. Preserves key numbers (6 months, 30% then 15%). One paragraph as required. No acronyms. A VP would understand this clearly. Minor: 'cut response times in half' slightly understates the improvement (2.3s to 400ms is ~83% reduction).

# Executive Summary Our system is currently struggling to meet customer expectations, with response times nearly five times slower than our service commitments, because three critical business functions—user login, product recommendations, and payments—are built as one interconnected system that fa...
anthropic/claude-haiku-4-5-20251001 · 2026-03-01

Summarization #3

Summarize this research abstract, then list 3 strengths and 3 weaknesses of the study design: "We conducted a double-blind, placebo-controlled study of 2,847 participants across 12 sites to evaluate whether daily exposure to cat purring (25Hz frequency, 30 minutes per session) accelerates bone fracture healing. The treatment group (n=1,424) used calibrated vibration devices mimicking cat purr frequencies; the control group (n=1,423) used identical devices producing imperceptible vibrations. After 12 weeks, the treatment group showed a 12% reduction in healing time (95% CI: 8-16%, p<0.001) as measured by radiographic union. Secondary outcomes included a 23% reduction in self-reported pain scores (p<0.01) and no significant difference in adverse events. Limitations include: participant awareness of vibration may have compromised blinding for the treatment group, the study population skewed younger (mean age 34), and the 12-week endpoint may not capture long-term outcomes."

high7.2

Judge: Accurate summary preserving key numbers (2,847 participants, 12%, 23%, p<0.001). Exactly 3 strengths and 3 weaknesses as required. Strengths are valid but somewhat obvious (large sample size, objective endpoint). Weaknesses track the abstract's own limitations closely rather than adding independent critical analysis. Solid but not insightful.

# Summary This double-blind, placebo-controlled study of 2,847 participants tested whether 30-minute daily exposures to 25Hz vibrations (mimicking cat purring) accelerate bone fracture healing. The treatment group using calibrated vibration devices showed a statistically significant 12% reduction i...
anthropic/claude-haiku-4-5-20251001 · 2026-03-01

Summarization #4

Create a layered summary of the following — provide three versions: (1) a tweet-length summary (under 280 characters), (2) a one-paragraph summary for a general audience, (3) a detailed summary preserving all key numbers and caveats. "The global AI chip market reached $53.4 billion in 2024, growing 42% YoY from $37.6 billion in 2023. NVIDIA dominates with approximately 80% market share in data center GPUs, though AMD has grown from 5% to 12% share with its MI300X accelerator. Custom silicon is the emerging disruptor: Google's TPU v5p, Amazon's Trainium2, and Microsoft's Maia 100 collectively represent $4.2 billion in avoided NVIDIA purchases. The supply chain remains concentrated — TSMC fabricates over 90% of advanced AI chips (sub-7nm), and ASML is the sole supplier of EUV lithography machines. Geopolitical risks are significant: US export controls have cut China's access to cutting-edge chips, pushing Chinese firms toward mature nodes (14nm+) and domestic alternatives like Huawei's Ascend 910B, which benchmarks at roughly 70% of an A100. Analysts project the market will reach $140 billion by 2027, though this assumes continued data center buildout and no major regulatory disruption. Key uncertainties: inference vs training chip mix shift, potential overcapacity in 2026, and whether sovereign AI initiatives (EU Chips Act, India's semiconductor mission) will meaningfully diversify manufacturing."

high8.4

Judge: All three versions present with clear layering progression. Tweet is 220 characters (under 280). Paragraph version is accessible and jargon-free. Detailed version preserves all key numbers. Numbers are accurate throughout. Truncated slightly at the end of the detailed version but all critical data points are captured before truncation.

# Three-Layer Summary ## (1) Tweet-Length (280 characters) Global AI chip market hit $53.4B in 2024 (+42% YoY). NVIDIA dominates at 80% data center share, but AMD and custom silicon (Google TPU, Amazon Trainium) are gaining. TSMC & ASML control supply; geopolitical risks rising. ## (2) General Aud...
anthropic/claude-haiku-4-5-20251001 · 2026-03-01

Summarization #5

Explain the main argument of this passage in one sentence, then rate how convincing it is on a scale of 1-10 with justification: "Remote work is fundamentally inferior to office work for creative collaboration. Studies show that chance encounters — the hallway conversation, the overheard brainstorm — account for up to 30% of innovative ideas at major tech companies. Slack messages and Zoom calls are poor substitutes for the bandwidth of in-person interaction. Companies that went fully remote in 2020 have seen patent filings drop by 15% compared to hybrid peers. The data is clear: if you want innovation, you need butts in seats."

high8.8

Judge: One-sentence summary accurately captures the argument. The 4/10 rating is well-justified with specific logical flaws: correlation/causation, confounded variables, survivor bias, false dichotomy, dated framing. Identifies non-obvious weaknesses beyond the surface claims. Well-structured with strengths acknowledged too. Strong critical thinking on display.

# Summary The passage argues that remote work stifles innovation because it eliminates spontaneous in-person interactions that generate creative ideas, as evidenced by reduced patent filings at fully remote companies. # Convincingness: 4/10 **Justification:** **Strengths:** - Identifies a real ph...
anthropic/claude-haiku-4-5-20251001 · 2026-03-01