Which AI Studio model should you pick? We tested all 6 so you don't have to

Hi all,

I’m Filippo, an Asana Solutions Partner at i.DO, and I work with AI Studio daily to build automated workflows for teams of all sizes.

One of the most common questions I hear from clients is: “Which model should I pick in AI Studio?” The dropdown gives you six options, but the descriptions are too vague to make an informed decision, especially when credits are on the line.

So I ran a structured test: same workflows, same tasks, all 6 models, side by side. Here’s what I found.


The problem

When you create an AI Studio rule, you need to choose a model:

The built-in descriptions don’t tell you much about real-world differences in quality, style, or cost. And since each model consumes credits at a different rate, picking the wrong one means either overpaying or getting unsatisfactory results.


How we tested

Setup

We created 6 identical Asana projects (one per model), each with the same AI Studio rules but configured with a different model. The projects were isolated in a sandbox team.

Models tested: GPT 5 mini | GPT 5 | GPT 5.2 | Claude Sonnet 4.5 | Claude Opus 4.6 | Claude Haiku 4.5

Test tasks

We created 4 tasks independently (not multi-homed) across all 6 projects, each with the same title and description. The tasks were designed to cover a range of complexity and urgency:


Two workflows tested

  1. Test 1 - Classification: When a task is added, the AI reads the title/description and sets a custom field “AI Priority” to Low, Medium, High, or Urgent.

  2. Test 2 - Summary generation: When a task is moved to “Done”, the AI generates a 2-3 sentence summary comment.


Test 1: Classification results

Key finding: Results are almost entirely consistent. The only difference is on the deliberately borderline task, where Claude Sonnet 4.5 classified it as Low while all 5 others chose Medium. For clear-cut tasks, all models agree perfectly.

Takeaway: For simple classification, all models perform equivalently. Model choice should be driven by cost. GPT 5 mini is the cheapest at 218 credits per run: roughly half the cost of Claude Opus 4.6 (452).


Test 2: Summary generation results

What we noticed

  1. GPT models use a structured format with labels (e.g., “Key deliverable: / Status:”), while Claude models write in narrative prose.

  2. GPT 5 was the most factually accurate; it flagged that a task in the “Done” section was still technically marked incomplete.

  3. Claude Sonnet 4.5 produced the smoothest prose but made one factual error (stated a task was “completed” when it wasn’t).

  4. Claude Haiku 4.5 consistently generated the longest responses, which drives up output credit costs.

  5. GPT 5 mini surprised us by adding useful calls-to-action on urgent tasks (e.g., “please confirm the fix has been deployed”), showing unexpected sophistication for the cheapest model.

Examples:

GPT 5

Claude Haiku 4.5


Credit consumption: the full picture

GPT 5 mini uses roughly half the credits of Opus 4.6, and about a third less than the next cheapest option (Haiku 4.5). The gap widens on summary tasks where output tokens matter more.


Our recommendation

Start with GPT 5 mini. It delivered accurate classifications and surprisingly good summaries at roughly half the cost of the most expensive model. Only upgrade if the output quality is demonstrably insufficient for your specific use case.

Here’s a simple decision framework:


Bonus: Automatic Model Selection

Asana recently introduced a “Recommended” option in the model picker. When selected, Asana analyzes your prompt and automatically selects the best model for your use case, balancing performance and cost.

This is a great option if you don’t want to think about models at all, and it aligns with our finding that for most standard workflows, model choice matters far less than you’d expect.


Limitations of this test

  1. We tested only 2 workflow types (classification and summarization). More complex use cases (e.g., multi-step analysis, writing long text, interacting with external data) may show bigger differences.

  2. Credit consumption can vary based on prompt length, task complexity, and output verbosity.

  3. Asana may update model versions or pricing over time.


Filippo | Asana Solutions Partner at i.DO

We help teams get more from Asana + AI - learn more

10 Likes

Thanks @Filippo_Baj_iDO , great article :wink:

If I have to remember only 1 thing, that would be this:

3 Likes