Data Modeling — User Guide

Audience: Data engineers, analysts, and admins who design schemas, explore existing databases, and generate test data with AI assistance.

Overview

The Data Model experience (/data-model) gives you a structured view of your connected databases: databases, schemas, tables, columns, keys, and relationships. Beyond browsing, you can:

Generate or evolve models from natural language (star schemas, CRM shapes, e-commerce, analytics marts)
Run data quality ideas: validation, profiling, anomaly thinking
Produce synthetic test data that respects constraints and foreign keys
Tie models to Data Ingestion targets, Building Pipelines (dbt, etc.), and Data Insights

Start by selecting a database from your database connections. If tables are missing, run Extract Metadata on the connection first.

Key capabilities

1. Schema visualization

Tree navigation — databases → schemas → tables (and views where applicable).
Column detail — data types, nullability, defaults, comments where exposed.
Keys — primary and foreign keys surfaced so you understand join paths.
Search and filter — find tables by name; filter by schema or type; sort by name, size, or recency (exact controls depend on UI version).

Use cases: onboarding onto an unfamiliar warehouse; documenting what already exists before refactoring; choosing fact and dimension candidates for a star schema.

2. AI-powered model generation

The AI can analyze requirements, propose entities and relationships, emit DDL, suggest indexes and constraints, and iterate on normalization vs denormalization tradeoffs.

Typical flow

Analysis — interprets your goals (OLTP vs analytics, volume, access patterns).
Generation — proposes tables, keys, and constraints.
Optimization — suggests indexes, partitioning ideas, and naming cleanups.

Example prompts (natural language)

Create a customer relationship management schema with customers, orders, and products tables

Design a normalized e-commerce database with proper indexing for high-volume transactions

Analyze my requirements and create an optimized table structure for a subscription billing system

Design a star schema for sales analytics with a fact table and dimensions for time, product, customer, and location

Create an e-commerce schema with user management, product catalog, order processing, and payments — include indexing and constraints

3. Test data generation

Generate realistic rows for development, QA, and demos—respecting foreign keys, uniqueness, and check constraints where the platform enforces them.

Basic

Generate 100 test users with realistic names, emails, and US phone numbers

At scale with relationships

Generate test data for the entire schema: 1000 customers, 5000 products, 10000 orders with line items — maintain referential integrity

Distributions and patterns

Generate customers where 70% are from USA, 20% are premium members, age follows a normal curve, and registration dates span the last 2 years

Workflow

Select the target table (or ask the agent to target multiple related tables).
Describe volume, locale, and patterns.
Use advanced options where available: distributions, custom formats, FK-aware generation.

Schema design best practices (reference)

Area	Guidance
Normalization	Prefer 3NF for transactional systems; consider denormalization for heavy read/analytics workloads.
Naming	Consistent conventions; avoid reserved words; prefer clarity over abbreviations.
Data types	Match real cardinality and range; plan for growth (e.g. BIGINT vs INT).
Indexing	Index foreign keys and high-selectivity filter columns; composite indexes for common multi-column predicates.
Constraints	Primary keys, foreign keys, and CHECK constraints document intent and catch bad data early.

Data quality angles you can explore with AI

Technique	What it helps with
Validation rules	Constraint violations, orphaned FKs, duplicates, invalid patterns.
Profiling	Column stats, null rates, cardinality, value distributions.
Anomaly framing	Outliers, suspicious spikes, inconsistent categories—often a starting point for cleansing pipelines.

Use these together with the Data Cleansing specialist (via AI Chat) and with Building Pipelines tests.

Integration with other features

Feature	Integration
Data Ingestion	Target tables and validation during load; map sources to modeled schemas.
Pipelines (dbt, etc.)	Reference models in transformations; generate or refine dbt models from intent.
Data Insights	Explore relationships; optional ER-style thinking for metrics and dimensions.
Database Explorer	Run ad hoc SQL against the same connections you model.

Troubleshooting

Symptom	Likely cause	What to try
Cannot see tables	Permissions or stale metadata	Confirm DB grants; Extract Metadata on the connection
Model generation fails	Naming conflict, unsupported type, or policy	Read the error; simplify the ask; fix conflicts in existing schema
Test data fails	FK or unique constraint	Reduce volume; generate parent rows first; relax patterns
Slow tree on huge catalogs	Wide schemas	Use search/filter; work schema-by-schema

Security notes

Schema visibility follows workspace RBAC and connection permissions.
Treat test data as non-production; avoid real PII unless policy allows.
Sensitive columns may be masked or restricted by policy in some deployments.