Data Modeling — User Guide

Audience: Data engineers, analysts, and admins who design schemas, explore existing databases, and generate test data with AI assistance.


Overview

The Data Model experience (/data-model) gives you a structured view of your connected databases: databases, schemas, tables, columns, keys, and relationships. Beyond browsing, you can:

  • Generate or evolve models from natural language (star schemas, CRM shapes, e-commerce, analytics marts)
  • Run data quality ideas: validation, profiling, anomaly thinking
  • Produce synthetic test data that respects constraints and foreign keys
  • Tie models to Data Ingestion targets, Building Pipelines (dbt, etc.), and Data Insights

Start by selecting a database from your database connections. If tables are missing, run Extract Metadata on the connection first.


Key capabilities

1. Schema visualization

  • Tree navigation — databases → schemas → tables (and views where applicable).
  • Column detail — data types, nullability, defaults, comments where exposed.
  • Keys — primary and foreign keys surfaced so you understand join paths.
  • Search and filter — find tables by name; filter by schema or type; sort by name, size, or recency (exact controls depend on UI version).

Use cases: onboarding onto an unfamiliar warehouse; documenting what already exists before refactoring; choosing fact and dimension candidates for a star schema.

2. AI-powered model generation

The AI can analyze requirements, propose entities and relationships, emit DDL, suggest indexes and constraints, and iterate on normalization vs denormalization tradeoffs.

Typical flow

  1. Analysis — interprets your goals (OLTP vs analytics, volume, access patterns).
  2. Generation — proposes tables, keys, and constraints.
  3. Optimization — suggests indexes, partitioning ideas, and naming cleanups.

Example prompts (natural language)

Create a customer relationship management schema with customers, orders, and products tables
Design a normalized e-commerce database with proper indexing for high-volume transactions
Analyze my requirements and create an optimized table structure for a subscription billing system
Design a star schema for sales analytics with a fact table and dimensions for time, product, customer, and location
Create an e-commerce schema with user management, product catalog, order processing, and payments — include indexing and constraints

3. Test data generation

Generate realistic rows for development, QA, and demos—respecting foreign keys, uniqueness, and check constraints where the platform enforces them.

Basic

Generate 100 test users with realistic names, emails, and US phone numbers

At scale with relationships

Generate test data for the entire schema: 1000 customers, 5000 products, 10000 orders with line items — maintain referential integrity

Distributions and patterns

Generate customers where 70% are from USA, 20% are premium members, age follows a normal curve, and registration dates span the last 2 years

Workflow

  1. Select the target table (or ask the agent to target multiple related tables).
  2. Describe volume, locale, and patterns.
  3. Use advanced options where available: distributions, custom formats, FK-aware generation.

Schema design best practices (reference)

Area Guidance
Normalization Prefer 3NF for transactional systems; consider denormalization for heavy read/analytics workloads.
Naming Consistent conventions; avoid reserved words; prefer clarity over abbreviations.
Data types Match real cardinality and range; plan for growth (e.g. BIGINT vs INT).
Indexing Index foreign keys and high-selectivity filter columns; composite indexes for common multi-column predicates.
Constraints Primary keys, foreign keys, and CHECK constraints document intent and catch bad data early.

Data quality angles you can explore with AI

Technique What it helps with
Validation rules Constraint violations, orphaned FKs, duplicates, invalid patterns.
Profiling Column stats, null rates, cardinality, value distributions.
Anomaly framing Outliers, suspicious spikes, inconsistent categories—often a starting point for cleansing pipelines.

Use these together with the Data Cleansing specialist (via AI Chat) and with Building Pipelines tests.


Integration with other features

Feature Integration
Data Ingestion Target tables and validation during load; map sources to modeled schemas.
Pipelines (dbt, etc.) Reference models in transformations; generate or refine dbt models from intent.
Data Insights Explore relationships; optional ER-style thinking for metrics and dimensions.
Database Explorer Run ad hoc SQL against the same connections you model.

Troubleshooting

Symptom Likely cause What to try
Cannot see tables Permissions or stale metadata Confirm DB grants; Extract Metadata on the connection
Model generation fails Naming conflict, unsupported type, or policy Read the error; simplify the ask; fix conflicts in existing schema
Test data fails FK or unique constraint Reduce volume; generate parent rows first; relax patterns
Slow tree on huge catalogs Wide schemas Use search/filter; work schema-by-schema

Security notes

  • Schema visibility follows workspace RBAC and connection permissions.
  • Treat test data as non-production; avoid real PII unless policy allows.
  • Sensitive columns may be masked or restricted by policy in some deployments.

Related