// documentation

Get up and running
in minutes

Everything you need to deploy Dagen, connect your stack, and run your first autonomous pipeline.

Welcome to Dagen

Dagen introduces the agentic pipeline: data infrastructure where pipelines are intent-driven — they learn, adapt, and improve. Specialist AI agents design, build, monitor, and refine the stack from ingestion to business-ready KPIs, alongside your existing SQL, Spark, and warehouse investments.

Key Capabilities

  • Agent Intelligence — Workspace-wide instructions, skills (loaded on demand via read_skill), rules, and lessons so agents behave like your data team.
  • Skills — Packaged expertise with a trigger + body — saves context vs pasting large prompts every time.
  • Git Reviews — AI review on GitHub pull requests (SQL, dbt, PySpark, YAML) with webhooks and optional auto-post.
  • Self-healing Pipelines — Schema drift awareness, quality signals, freshness expectations, and autonomous remediation.
  • Tri-layer Memory — Working memory, episodic memory, and institutional knowledge that compounds over time.

Core Design Principle: Intent-Awareness

Every pipeline should understand why it exists: the business outcome, who consumes the data, which decisions depend on it, and what quality means for that use case — not only which tasks it runs.

Getting Started

  • Authentication — Sign in with email, Google, or GitHub
  • Connect your ecosystem — Database connections and source repositories
  • Declare intent — Describe the business purpose of your pipeline
  • Let agents build — Specialists design, build, and test autonomously
  • Choose autonomy — Guided, Semi-Autonomous, or Autonomous in AI Chat

Authentication

Sign up at app.dagen.ai. Dagen supports email/password, Google SSO, and GitHub sign-in. Your free account includes $25 in agent compute credits.

For enterprise deployments, SSO via SAML 2.0 and SCIM provisioning are available. Contact your account team to configure identity provider integration.

Database Connections

From the dashboard, navigate to Connections and click New Connection. Dagen uses Airbyte under the hood — over 500 connectors are available. Common starting points:

  • Databases: Postgres, MySQL, Microsoft SQL Server, Oracle, MongoDB
  • SaaS: Salesforce, HubSpot, Stripe, Google Sheets
  • Warehouses: Snowflake, BigQuery, Redshift, Azure Synapse, Databricks
  • Files & Object Storage: S3, GCS, Azure Blob, SFTP, Apache Iceberg
  • Streaming: Kafka, Kinesis

Credentials are encrypted at rest and never exposed in agent outputs or logs.

Source Repositories

Connect your GitHub repositories to enable AI-powered pull request review and pipeline code generation. Navigate to Settings → Source Repositories and authenticate with GitHub. Dagen will request read access to the repositories you select.

Once connected, the Git Reviews feature will automatically review PRs containing SQL, dbt models, PySpark scripts, and YAML pipeline definitions.

Agent Intelligence & Skills

Configure workspace-wide agent behaviour at /agent-intelligence. This is where you define instructions, skills, rules, and lessons that shape how agents operate across all pipelines.

  • Instructions — Persistent context loaded into every agent session
  • Skills — On-demand playbooks triggered by keyword; loaded via read_skill
  • Rules — Guardrails that prevent repeated failure modes automatically
  • Lessons — Structured outcomes from past runs that inform future decisions

Skills save context compared to pasting large prompts each time. Use them to encode team standards, architecture preferences, and data quality thresholds.

Data Ingestion

The Data Ingestion Agent manages the full connector lifecycle. It handles rate limits, retries, schema evolution, and CDC (change data capture) patterns automatically. Ingestion jobs are observable from the Platform Dashboard.

Declare your ingestion intent in AI Chat and the agent will configure the appropriate Airbyte connector, define landing tables, and set up incremental sync logic.

Data Modeling

The Data Model Generation Agent supports medallion architecture (Bronze → Silver → Gold), star schema, Data Vault 2.0, and AI/RAG-ready semantic outputs. Describe your business KPIs and the agent will generate the appropriate dimensional models and dbt transformations.

  • Medallion / layered warehouse patterns
  • Star schema with facts and dimensions
  • Data Vault 2.0
  • Semantically rich outputs for AI and RAG applications

Platform Dashboard

The Platform Dashboard provides an overview of all active pipelines, recent agent runs, usage metrics, and system health. Use it to monitor pipeline status, review agent activity, and manage your workspace from a single pane of glass.

AI Chat

AI Chat is the primary interface for interacting with Dagen's agent system. Use it to build pipelines, diagnose failures, explore your data, and issue natural language commands to the specialist agent hierarchy.

Autonomy Levels

  • Guided — Requires human approval at each step
  • Semi-Autonomous — Handles routine decisions automatically, escalates ambiguous ones
  • Autonomous — Runs end-to-end within defined policy guardrails

Autonomy levels are configurable per pipeline and per environment.

Database Explorer

The Database Explorer provides a SQL console and schema browser for all connected databases and warehouses. Use it to inspect tables, run ad hoc queries, and explore data lineage without leaving the Dagen interface.

Building Pipelines

Pipelines in Dagen are built by declaring intent in AI Chat. The super-agent coordinates specialist agents to produce ingestion configs, dbt models, tests, orchestration DAGs, and monitoring rules. Supports dbt, Spark, Dataform, and custom workflow patterns.

Example intent declaration

// Intent: Salesforce → Snowflake revenue pipeline "Sync Salesforce Opportunity records daily. Filter to Closed Won. Join with Account for ARR segment enrichment. Load to ANALYTICS.REVENUE_FACT in Snowflake. Support daily refresh for the CFO revenue dashboard."

Git Reviews (AI on PRs)

Git Reviews automatically reviews GitHub pull requests containing SQL, dbt models, PySpark scripts, and YAML pipeline definitions. Configure webhooks in Settings → Source Repositories and optionally enable auto-post to have review comments posted directly to PRs.

Use Git Reviews to shift-left on data quality issues — catching bad SQL, schema mismatches, and logic errors before they merge to production.

Data Insights

Data Insights enables conversational KPI exploration and chart generation directly from AI Chat. Ask business questions in natural language and get charts, summaries, and SQL-backed answers against your connected warehouses.

Administration

Manage your workspace from the Administration panel: team members, roles and permissions (RBAC), usage and billing, runtime configuration, and audit logs. Enterprise deployments support SSO, SCIM, and custom data residency settings.

Model Settings

Configure which LLM each agent uses from the Model Settings panel. Dagen supports Anthropic (Claude), OpenAI (GPT), Google (Gemini), and open-source models. Enterprise deployments can bring their own model endpoints for data sovereignty requirements.

Supported Data Sources

Dagen provides connections to PostgreSQL, MySQL, Oracle, Snowflake, BigQuery, Redshift, Databricks, Kafka, Salesforce, Azure Blob Storage, Amazon S3, Amazon Ozone, Apache Iceberg, Teradata, and Hive. You also have access to over 500 additional sources and APIs through the full Airbyte connector catalog.

Full connector catalog is available inside the Dagen app. Open the app →

API Keys

Generate API keys from Settings → API Keys. Keys are scoped to your workspace and can be restricted to specific operations. Rotate or revoke keys at any time from the same panel.

# All API requests require Bearer token authentication Authorization: Bearer YOUR_API_KEY Content-Type: application/json

External API

The Dagen REST API allows you to trigger pipeline runs, query pipeline status, read lineage, and manage connections programmatically.

Trigger a pipeline run

POST /v1/pipelines/{pipeline_id}/runs { "autonomy_level": "semi-autonomous", "dry_run": false } → Returns run_id, status, agent_dispatch_plan

Get pipeline status

GET /v1/pipelines/{pipeline_id}/runs/{run_id} → Returns status, steps_completed, active_agents, schema_drift_events, remediation_log

The complete API reference is available inside the Dagen app. Open the app →

Magical Guide

The Magical Guide is an in-app interactive assistant that provides context-aware help as you work. Access it from any screen in the Dagen app for step-by-step guidance, feature explanations, and troubleshooting support.

Slack Integration

Connect Dagen to your Slack workspace to receive pipeline alerts, agent run summaries, and self-healing notifications directly in your channels. Configure the integration from Settings → Integrations → Slack.

Self-Hosted Deployment

Dagen is available as a self-hosted deployment for organisations with strict data residency or compliance requirements. The self-hosted edition runs entirely within your infrastructure — no data leaves your environment.

  • Available as AMI (AWS Marketplace) and container-based deployment
  • Supports GDPR, NIS2, and EU AI Act compliance postures
  • Data residency tracking per pipeline
  • Encrypted credentials with full audit logging
  • SOC 2-ready configuration

Contact contact@dagen.ai to discuss self-hosted deployment options.

Need help with something specific?

Use the Magical Guide in the application for interactive, context-aware assistance. For technical questions or enterprise deployments, our team replies within one business day.