sunilkumardash9

joined 4 months ago
[–] sunilkumardash9@lemmy.world -1 points 1 week ago (1 children)

You need to try it to see.

[–] sunilkumardash9@lemmy.world 0 points 1 week ago

TL;DR

  • Claude Opus 4 leads in raw performance and prompt adherence.
  • It understands user intentions better, reminiscent of 3.6 Sonnet.
  • High taste. The generated outputs are tasteful. Retains the Opus 3 personality to an extent.
  • Though unrelated to code, the model feels nice, and I never enjoyed talking to Gemini and o3.
  • Gemini 2.5 is more affordable in pricing and takes fewer API credits than Opus.
  • One million context length is undefeatable for large codebase understanding.
  • Opus is the slowest in time to first token. You have to be patient with the thinking mode.
[–] sunilkumardash9@lemmy.world 0 points 2 months ago* (last edited 2 months ago) (1 children)

Have you tried doing the same?

 

Google has finally arrived

Some observations on the model

  • Gemini 2.5 pro is absolutely a beast in coding, perhaps the best model right now
  • They spent all the computing resources on training it on coding data and forgot to give it a distinct personality
  • Doesn't do well on reasoning as well as Grok 3 (think) and Claude 3.7 Sonnet (thinking)
  • On par with 03-mini-high in general mathematics

If you're a coder, you'll absolutely love it, or else you will be fine with other frontier reasoning models (Deepseek r1, if you ask me)

 

Deepseek v3 0324 is the first open-source model to match SOTA coding performance

  • Understands user intention better than before; I'd say it's better than Claude 3.7 Sonnet base and thinking. 3.5 is still better at this (perhaps the best)
  • Again, in raw quality code generation, it is better than 3.7, on par with 3.5, and sometimes better.
  • Great at reasoning, much better than any and all non-reasoning models available right now.
  • Better at the instruction following than 3,7 Sonnet but below 3.5 Sonnet.