Frequently Asked Questions

Frequently Asked Questions

Answers to the most common questions from data engineers, architects, and enterprise teams evaluating Dagen.

General / Getting Started

What is Dagen?

+
Dagen is an AI workspace built for data engineers. It uses intelligent agents to automate the entire data pipeline lifecycle — from ingestion and transformation to deployment, monitoring, and insights — so engineers can go from intent to production in minutes rather than weeks. Dagen v2.0 is built on a multi-agent architecture where five specialized agents work together, each handling a distinct stage of the data engineering workflow.

Who is Dagen built for?

+
Dagen is designed for everyone on a data team: data engineers who build and maintain pipelines, data scientists who need clean and reliable data, data analysts who want to explore and query data without filing tickets, and database administrators managing schema health and performance. Engineering managers and heads of data also use Dagen to enforce standards, track team productivity, and preserve institutional knowledge.

How is Dagen different from traditional data engineering tools?

+
Traditional tools require engineers to write boilerplate code, configure connectors manually, and stitch together separate platforms for ingestion, transformation, orchestration, and monitoring. Dagen replaces that friction with AI agents that understand your stack and execute those steps autonomously — while still producing transparent, versioned, production-grade code you fully own.

What are agentic data pipelines?

+
Agentic data pipelines are AI-driven workflows in which autonomous agents — rather than static, hand-coded orchestration scripts — plan, execute, monitor, and adapt each stage of the data engineering lifecycle. Where a traditional pipeline follows a fixed DAG of tasks defined in advance, an agentic pipeline reasons about the goal, selects the appropriate tools, writes the transformation code, handles errors, and iterates on quality — all without requiring an engineer to author every step.

Is Dagen a replacement for dbt, Airflow, or Airbyte?

+
No — Dagen is an intelligent automation layer that sits on top of the tools you already use. It integrates natively with dbt, dbt Cloud, Airflow, and Airbyte, and adds AI-powered automation to generate, test, and deploy within those frameworks. You keep the code, the version control, and the toolchain you trust.

How do I get started with Dagen?

+
Sign up at dagen.ai — no credit card required on the Launch plan. After signing in with email, GitHub SSO, or Google SSO, connect your first data source, optionally link a GitHub repository containing your dbt or Dataform project, then start building through the AI Chat interface or the dedicated feature panels. Most teams have their first pipeline running within minutes of signing up.

What results are teams seeing with Dagen?

+
More than 500 data engineers are using Dagen. Teams report an average pipeline setup time of 3 minutes, 99.9% pipeline reliability, 40% faster development cycles from prototype to production, and up to 5× improvement in throughput on common data engineering tasks.

Product & Features

What can Dagen's AI agents do?

+
Dagen includes several specialized agents that work together: the Conversation Agent interprets natural language requests, supports multi-turn conversations with per-session context, and coordinates the team; the Ingestion Agent connects sources, discovers schemas, and configures incremental loads; the dbt Agent writes production-grade transformation models with tests and documentation; the Orchestration Agent deploys and schedules pipelines via Airflow or Git-based CI/CD; and the Data Insights Agent queries your data, generates interactive visualizations, and exposes the underlying Python/Plotly code for full transparency.

What connections, data sources, databases and data warehouses does Dagen support?

+
Dagen supports a broad range of connections across three categories. Native source databases: PostgreSQL, MySQL, Oracle, Snowflake, BigQuery, Redshift, Databricks, Teradata, and Hive. SaaS and application sources: via the full Airbyte connector catalog, Dagen connects to hundreds of third-party sources including Salesforce, and other SaaS APIs and platforms. Destinations: BigQuery, Snowflake, Redshift, and S3, plus all Airbyte destinations. HashiCorp Vault is supported for secure credential management across all connection types.

Can Dagen write and deploy dbt models automatically?

+
Yes. Describe your transformation logic in plain English and Dagen's dbt Agent generates production-ready staging, intermediate, and mart models; adds data tests (not-null, unique, referential integrity); writes column-level documentation; and deploys them to your warehouse — optimized for Snowflake, BigQuery, or whichever target you use. All generated code is committed to your connected GitHub repository. A typical set of models is generated and deployed in under two minutes.

How does Dagen handle data ingestion?

+
Dagen wraps Airbyte's connector ecosystem with an AI-powered setup and management layer. You select a source, choose a destination, select the tables and schemas to sync, and configure the schedule. Dagen monitors every run in real time, showing record counts, table-level progress, and error details. Pipeline configurations can be exported and imported as JSON, and you can choose which Kubernetes cluster runs each ingestion job.

Does Dagen support real-time or streaming data?

+
Yes. Dagen supports Change Data Capture (CDC) streaming via its ingestion pipelines. Pipeline cards show live CDC stream status, and if issues arise the platform surfaces an AI-assisted CDC configuration panel to help troubleshoot and resolve them.

What is the Workflow Orchestrator?

+
The Workflow Orchestrator is a visual, drag-and-drop designer for building multi-step automated workflows. You can connect nodes, schedule runs, configure notifications (including Slack alerts), and monitor execution history — all from within Dagen. You can also describe a workflow to the AI assistant and have it built automatically. Workflows integrate natively with Airflow, dbt Cloud, and Git-based CI/CD, and maintain a complete run history and audit trail.

Can I build custom AI agents in Dagen?

+
Yes. The Agent Builder lets you create custom agents using plain English descriptions, clone from pre-built templates, or write Python tools directly. You can configure each agent's instructions, assign it custom tools, upload knowledge documents for it to reference, and fine-tune its behavior with skills, rules, and lessons learned — then test it in a live chat dialog before deploying. Teams have used this to build schema review agents, cost optimization agents, and data contract validators, all without writing agent framework code.

What is the Knowledge Base feature?

+
The Knowledge Base lets you upload documents — runbooks, data dictionaries, architecture decision records, onboarding guides, and business definitions — that your AI agents can reference during conversations. It also generates a visual knowledge graph of entities and relationships extracted from your uploaded files, giving agents richer business context for every query.

Does Dagen support Apache Spark?

+
Yes. Dagen has dedicated Spark Pipeline support for PySpark, Scala Spark, SQL, and Python workloads. You can run Spark jobs on Databricks (including serverless), Google Cloud Dataproc, and Kubernetes — selecting the main file, cluster, and resource configuration from within the platform.

Can I use Dagen to explore and query databases?

+
Yes. The Database Explorer provides a visual tree of all your connected databases, schemas, tables, and views. You can write and run SQL queries directly in the console, browse paginated table data, inspect column definitions and constraints, and export results — all without leaving Dagen.

What data visualization and analytics capabilities does Dagen offer?

+
The Data Insights feature lets you ask questions about your data in plain language. The Data Insights Agent queries your databases, then returns KPI cards, bar/line/pie charts, and data tables with AI-generated narrative summaries. It also performs anomaly detection and surfaces business intelligence recommendations. You can view the underlying Python/Plotly code for any chart and export results for external use.

What is Agent Intelligence?

+
Agent Intelligence is Dagen's layer for governing and continuously improving agent behavior. It has five components: Instructions (how agents should behave, prioritized 0–100 and optionally scoped to specific agents), Skills (expert knowledge loaded on-demand when triggered by a phrase), Rules (hard constraints applied automatically to every agent action), Lessons Learned (patterns to avoid with correct alternatives — agents apply these automatically and track how many times each has been applied), and Templates (pre-built configurations you can apply with one click). All configurations can be exported and imported as JSON.

What is the Magical Guide?

+
The Magical Guide is Dagen's context-aware in-app navigation assistant, accessible via a floating button anywhere in the platform. It can navigate to any page, run guided feature walkthroughs, and help troubleshoot errors in real time — making it significantly faster for new team members to find and learn specific capabilities without leaving the platform.

Integrations

What tools and platforms does Dagen integrate with?

+
Dagen integrates across the full data stack. Transformation: dbt and Dataform. Orchestration: Airflow and dbt Cloud. Ingestion: Airbyte's full connector catalog. Version control: GitHub (OAuth and PAT). IDE: VS Code (local and browser-based). Messaging: Slack (bidirectional). Cloud compute: AWS (EKS, Lambda, Fargate), Azure (AKS, Synapse), GCP (GKE, Dataproc, Vertex AI, Cloud Run). Processing: Databricks, Google Dataproc, EMR, Kubernetes, Docker. Observability: Datadog, Splunk, Prometheus. Secret management: HashiCorp Vault. AI models: OpenAI, Anthropic, Google, DeepSeek, Snowflake Cortex, Ollama, vLLM.

What connections, data sources, databases and data warehouses does Dagen support?

+
Dagen supports a broad range of connections across three categories. Native source databases: PostgreSQL, MySQL, Oracle, Snowflake, BigQuery, Redshift, Databricks, Teradata, and Hive. SaaS and application sources: via the full Airbyte connector catalog, Dagen connects to hundreds of third-party sources including Salesforce, and other SaaS APIs and platforms. Destinations: BigQuery, Snowflake, Redshift, and S3, plus all Airbyte destinations. HashiCorp Vault is supported for secure credential management across all connection types.

Does Dagen integrate with GitHub?

+
Yes. Connect your GitHub repositories via OAuth or a Personal Access Token. Dagen automatically detects existing dbt or Dataform projects in connected repos and creates corresponding pipelines. Full Git operations — branching, committing, pushing, pull requests, and diffing — are available directly within the platform, so engineers never need to leave Dagen to manage version control.

Does Dagen have a Slack integration?

+
Yes — and it's bidirectional. Outbound: Dagen sends workflow completion and failure notifications to any Slack channel via webhook. Inbound: your team can @mention the Dagen bot in any channel to interact with the Super Agent, which maintains per-user conversation context across messages. The connection is secured via HMAC-SHA256 signing; requests older than five minutes are automatically rejected.

What commands can I use with the Dagen Slack bot?

+
@dagen <question> asks the Super Agent anything. @dagen run <workflow> triggers a named workflow. @dagen list workflows shows all active workflows. @dagen status displays recent run results. @dagen agent <name> <question> routes a query directly to a specific sub-agent. @dagen help lists all available commands. Results are posted in-thread, and the bot reacts with 👀 immediately to confirm receipt.

Can I bring my own AI models (BYO LLM) to Dagen?

+
Yes. Dagen supports OpenAI (GPT), Anthropic (Claude), Google (Gemini via API or Vertex AI), DeepSeek, Snowflake Cortex, and open-source models via Ollama or vLLM. Workspace admins configure available models and assign cost-optimized models to routine tasks while reserving higher-capability models for complex workflows. Enterprise customers can host proprietary or fine-tuned models (BYOM) within their own infrastructure — model weights never leave the customer's environment.

Does Dagen integrate with VS Code?

+
Yes. Dagen provides an integration token that lets you edit pipeline files locally in VS Code while staying synced with the platform in real time. You can also open pipelines in the browser-based VS Code editor built into Dagen, clone repositories to your local machine via the Git URL, or check out branches for local development — all without losing the connection to the platform's agent and monitoring capabilities.

Does Dagen integrate with Apache Airflow?

+
Yes. Dagen's Orchestration Agent integrates natively with Apache Airflow. Workflows built in Dagen can trigger Airflow DAGs, and Airflow-managed pipelines can be monitored from within the Dagen platform. Teams with existing Airflow infrastructure can adopt Dagen's AI agents without replacing or reconfiguring their current orchestration setup.

Does Dagen support Dataform?

+
Yes. Alongside dbt, Dagen supports Dataform pipelines with a full DAG view, file editor, and multiple run modes (full refresh, incremental, test-only, and custom). GitHub repositories containing Dataform projects are automatically detected when connected, and Dataform pipelines receive the same Git operations, monitoring, and AI-assisted troubleshooting as dbt pipelines.

Which cloud platforms and runtime environments does Dagen support?

+
Data ingestion jobs run on GCP (GKE), AWS (EKS), Azure (AKS), or on-premises Kubernetes. Spark and processing workloads run on Databricks, Google Cloud Dataproc, Amazon EMR, Azure Synapse, Snowflake, or Vertex AI. Python execution environments include Docker, Kubernetes, Cloud Run, Lambda, and Fargate. For secret management, Dagen integrates with HashiCorp Vault. Runtime selection is configurable per pipeline from within the platform.

Can Dagen export metrics to monitoring and observability tools?

+
Yes. Dagen supports one-click metric streaming to Datadog, Splunk, and Prometheus for enterprise deployments. Pipeline health, latency, data quality signals, and AI model usage data can flow directly into your existing observability stack alongside the rest of your infrastructure metrics — giving platform teams a unified view without custom exporters.

Pricing & Plans

Is Dagen free to use?

+
Yes. The Launch plan is free and includes up to 3 connected data sources, 5 automated pipeline deployments per month, agent chat with intent history, $25 in agent compute credits, and access to a shared dbt modeling workspace — no credit card required to get started.

What is included in the SaaS plan?

+
The SaaS plan is Dagen's recommended tier for growing teams. It includes unlimited data sources and destinations, continuous pipeline observability with alerts, $50 in agent compute credits, role-based access control (RBAC), SSO, and custom agent toolchains. Pricing starts at $299/month — see the Pricing page for full details.

What does the Enterprise plan include?

+
The Enterprise plan is designed for regulated industries and large data platforms that require full control. It includes dedicated VPC or on-premises deployment, private agent model hosting and fine-tuning, enterprise SLAs, white-glove onboarding, and advanced governance, audit, and compliance controls.

What counts as a "pipeline deployment"?

+
A deployment is a production pipeline run triggered by an agent. Ad-hoc development queries and dry runs do not count toward your monthly quota — only production runs do.

Can I switch plans at any time?

+
Yes. You can upgrade or downgrade your plan at any time. Upgrades take effect immediately. Downgrades take effect at the start of your next billing cycle.

Does Dagen offer usage-based pricing?

+
The SaaS plan includes generous agent execution and pipeline deployment quotas. If you consistently exceed them, Dagen will reach out with metered options tailored to your workload rather than hard-blocking your workflows.

Security & Enterprise

Can Dagen be deployed on-premises or in a private cloud?

+
Yes. Enterprise customers can deploy Dagen entirely within their own infrastructure — either inside a private VPC on AWS, Azure, or GCP using Terraform and Helm, or in a fully air-gapped on-premises environment. Outbound-only networking, customer-managed secrets, and HSM integration are supported for regulated environments.

What security and compliance certifications does Dagen hold?

+
Dagen is SOC 2 Type II certified and works with enterprise customers on HIPAA, GDPR, and industry-specific compliance assessments. Data Processing Agreements (DPAs) are available. The Enterprise plan includes comprehensive audit trails across every agent action, granular RBAC with workspace isolation by business unit, and compliance export tooling for regulated reporting.

Does Dagen support Single Sign-On (SSO)?

+
Yes. Dagen supports SAML 2.0 and OIDC for SSO — compatible with Okta, Azure Active Directory, and other major identity providers. SCIM provisioning is available for automated user lifecycle management, enabling automatic account creation, updates, and deprovisioning directly from your identity provider.

How does Dagen handle role-based access control (RBAC)?

+
RBAC is enforced from the first login. Admins manage team members, roles, and permissions through the Team Settings panel. Sensitive environments — production credentials, PII datasets, regulated schemas — are only accessible to engineers with the appropriate clearance. On the Enterprise plan, workspace isolation by business unit is available, and all access events are captured in the Job History audit trail.

How does Dagen protect database credentials and secrets?

+
Credentials are never stored in plain text. Dagen integrates with external secret managers — such as HashiCorp Vault — so engineers reference secret paths rather than entering credentials directly. For enterprise deployments, customer-managed encryption keys, private network peering, and full event streaming into your SIEM are available from day one.

How long does an enterprise deployment take?

+
Cloud VPC deployments typically go live within two weeks. On-premises or hybrid rollouts generally take four to six weeks depending on the security review process. Dagen provides dedicated solutions architects, Terraform/Helm deployment scripts, and migration playbooks to keep timelines on track.

What ongoing support does Dagen provide to enterprise customers?

+
Enterprise customers receive a named success manager, quarterly architecture reviews, and direct Slack and ticket access to a senior engineering pod. Dagen also conducts architecture design sprints, threat modeling sessions, joint KPI reviews, and hands-on migration assistance from legacy ETL and orchestration tooling.

Does Dagen ensure zero data egress?

+
For enterprise self-hosted deployments, all data, telemetry, and model weights remain within your own infrastructure. No data leaves your environment — Dagen operates entirely as an on-premises control and compute layer. This is essential for organizations with data residency requirements or stringent regulatory constraints.

Technical

How does Dagen handle schema discovery?

+
When you connect a database, Dagen's Ingestion Agent automatically maps schemas, infers column relationships, identifies data types, and builds a contextual model of your data landscape. You can also trigger metadata extraction at any time from the Database Connections panel to refresh the discovered schema — useful when upstream systems evolve between pipeline runs.

Does Dagen support Change Data Capture (CDC)?

+
Yes. Dagen's ingestion pipelines support CDC streaming for continuous data synchronization. The platform monitors stream health in real time and surfaces an AI-assisted CDC configuration panel when issues are detected, helping teams troubleshoot and reconfigure pipelines faster without manual log inspection.

How does Dagen manage version control for pipelines?

+
All pipeline code is stored in connected GitHub repositories. Dagen provides built-in Git operations — create branches, view diffs, commit changes, push to remote, open pull requests, and pull updates — directly from the platform UI. Every agent-generated model, transformation, and pipeline fix is committed to your repository, keeping the full history versioned, reviewable, and attributable.

How does Dagen handle failed pipeline runs?

+
When a pipeline fails, Dagen's monitoring dashboard surfaces the error immediately with probable root cause analysis. Clicking 'Fix with Agent' on the failed pipeline card opens an AI-assisted troubleshooting session: the agent reads the error logs, diagnoses whether the issue is a CDC configuration problem, credential expiry, warehouse permission gap, or upstream schema change, and walks through the resolution. All generated fixes are committed to your repository.

Can I monitor pipeline costs and AI model usage?

+
Yes. The Usage Analytics dashboard tracks AI model usage, token consumption (input and output), cost per model, average response times, and success rates across your workspace. You can filter by agent, time period, or workspace, set budget alert thresholds, and export usage data for internal chargeback or FinOps reporting.

How does Dagen handle import and export of agent configurations?

+
All agent and workflow configurations — instructions, skills, rules, lessons learned, and templates — can be exported as JSON from the Agent Intelligence panel and imported into any other workspace. This makes it straightforward to propagate best practices across teams, onboard new workspaces with approved instruction sets, or migrate configurations during reorganizations without losing accumulated institutional knowledge.

Can I run Dagen alongside my existing data stack without replacing it?

+
Yes. Dagen is an intelligent augmentation layer, not a replacement for your existing stack. It sits on top of dbt, Airflow, Airbyte, and your existing repositories — automating the repetitive parts while keeping your code, pipelines, and infrastructure exactly where they are. Teams typically see productivity improvements within their first pipeline run.

Still have questions?

Our team is happy to walk you through anything in more detail.