PromptingBasics

Data Analysis

By
Dan Lee
Dan Lee
Dec 20, 2025

Data Analysis with AI (That You Can Actually Trust)

AI is great at explaining charts, generating SQL, and summarizing findings—but it’s also great at sounding confident while being subtly wrong. Data analysis lives in the land of details, so your prompts need guardrails.

This guide shows how to prompt for analysis in a way that works for AI engineers and data folks, and also for non-technical teams (sales, ops, finance, marketing, execs) who want clear insights without getting buried in jargon.

The Big Idea

Don’t ask for “insights.” Ask for a method, an output format, and evidence tied to the input data.

Ask Better Questions

A strong analysis prompt states:

  • Goal: what decision are you trying to make?
  • Metric: what matters (conversion, churn, revenue, latency)?
  • Slice: which segment/time window?
  • Output shape: table, bullets, dashboard notes, SQL, etc.
  • Confidence: what’s certain vs assumed?

That last one—confidence—is what separates analysis from storytelling.

Two Habits That Prevent Bad Takes

  1. Require evidence Ask for “Evidence” columns or quotes from the data.

  2. Force unknowns “If the data can’t support a claim, say ‘Unknown’ and list what’s missing.”

Example 1: Technical (SQL + Findings + Checks)

You want month-over-month retention analysis, and you want it reproducible.

Text
Context: You are a senior data scientist. We need a retention readout for a product review.
Instruction: Write BigQuery SQL to compute 30-day retention by signup cohort. Then summarize the results and flag data quality concerns.
Input Data:
Tables:
- users(user_id STRING, signup_ts TIMESTAMP, acquisition_channel STRING)
- events(user_id STRING, event_ts TIMESTAMP, event_name STRING)
Definitions:
- Active in day 30 window = user has >=1 event between day 30 and day 37 after signup.
Output Indicator:
1) FINAL_SQL (single query)
2) RESULTS_SUMMARY (bullets: trend, biggest cohort drop, channel differences)
3) VALIDATION_CHECKS (5 bullets: pitfalls, missing data risks, join/dup risks)
Constraints: Use only the columns provided. If a definition is ambiguous, list assumptions explicitly.

This is the analysis trifecta: compute → interpret → validate. It also prevents “phantom columns” and sloppy joins.

Example 2: Non-Technical (Sales Funnel Diagnosis)

A sales leader wants to know why conversion is down. Great—make the AI behave like an analyst.

Text
Context: I’m a sales operations manager. We’re trying to diagnose a conversion dip without blaming individuals.
Instruction: Analyze the funnel table and identify the most likely drivers. Provide 3 hypotheses and what data would confirm each.
Input Data:
Funnel (last 2 months):
- Visitors: 120,000 -> 118,000
- Trial signups: 6,000 -> 4,500
- Activated (did key action): 3,300 -> 2,100
- Paid conversions: 660 -> 630
Notes: Website redesign launched mid-month 2. Trial onboarding email sequence was updated week 3 of month 2.
Output Indicator:
- Provide a table: Stage | Change | Likely Cause | Evidence | Next Test
- Then provide 3 experiments (max 1 sentence each) to validate top hypotheses
Constraints: Avoid jargon. Do not invent numbers beyond what’s provided.

This keeps the model grounded: it can hypothesize, but it must propose tests instead of “declaring truth.”

Make it Audit-Friendly

Add “Evidence” and “Next Test” to your output. If the model can’t point to data and propose a check, it’s probably guessing.

Takeaway

AI can accelerate data analysis—if you treat it like an analyst-in-training, not an oracle. Give it clear goals, explicit definitions, and structured outputs. Require evidence, force unknowns, and add a validation step. Do that, and the model won’t just generate insights—it’ll produce analysis you can defend in a meeting.

Dan Lee

Dan Lee

DataInterview Founder (Ex-Google)

Dan Lee is an AI tech lead with 10+ years of industry experience across data engineering, machine learning, and applied AI. He founded DataInterview and previously worked as an engineer at Google.