All posts

Notion Connector: load pages, databases, and tasks into your warehouse without maintaining pipelines

Managed connector to sync Notion users, pages, blocks, and comments to BigQuery, Redshift, PostgreSQL, Databricks, Supabase, Azure SQL Database, and Amazon S3. No pipeline maintenance required.

Jun 12, 2026

Managed connector to sync Notion users, pages, blocks, and comments to BigQuery, Redshift, PostgreSQL, Databricks, Supabase, Azure SQL Server, and Amazon S3. Create your account and test it now.

Every data team working in a company that uses Notion as an operating system eventually reaches the same point: there is living knowledge in that tool, such as roadmaps, OKRs, backlogs, wikis, project databases, and no practical way to join this data with the rest of the analytical stack. What starts as a manual CSV export every Monday eventually turns into a poorly documented Python script that no one really knows what it extracts, from where, and with what frequency.

Today Erathos launches the managed connector for Notion. Four endpoints available, seven supported destinations, zero pipeline code to write or maintain.

The problem with custom-built Notion pipelines

The Notion API is authenticated via an Internal Integration Secret and is well documented. Any engineer can pull a list of pages or records from a database in less than an hour. The pattern is simple enough to convince any engineer to build a custom integration in a sprint.

The real cost appears later, and it is predictable.

Cursor pagination with non-trivial behavior. The Notion API uses cursor-based pagination (start_cursor / has_more). The volume of objects per page is limited to 100 records. For full extractions of large workspaces with hundreds of databases and thousands of pages, the pagination logic must be correctly implemented for each endpoint. A naive implementation extracts only the first page and closes the loop as if it finished, without throwing any error.

Recursive block traversal. Pages in Notion are composed of blocks. Blocks can contain other blocks. A page with rich content, such as tables, toggles, and nested lists, can require multiple recursive API calls just to materialize the content of a single record. This logic don't show up in the initial script. It appears when data starts arriving incomplete in the warehouse and someone has to debug.

Rate limit with distinct behavior by volume. The Notion API applies rate limiting per integration token. For large workspaces with high volumes of pages and databases, a historical backfill can quickly pile up 429s if retry with exponential backoff is not implemented. The pipeline that works in the development workspace breaks when migrated to the company's production workspace.

Implicit schema evolution in database properties. Databases in Notion have user-defined schemas. Someone can add a new property (a Select field, a Relation, a Rollup) at any time, without warning. Your dbt model that consumed the database yesterday starts to diverge today. A pipeline that doesn't handle schema evolution leaves the new column out of the extraction, and it simply doesn't exist in the warehouse until someone notices.

Non-existent observability. A 200 OK on the API call doesn't mean that all records were extracted. Without record count by endpoint per run, without comparison to the previous window, without volume drop alerts, you are flying blind. The pipeline that ran successfully may have pulled zero new pages because no integration was connected to the correct database in Notion.

The worst-case scenario isn't the pipeline that breaks and sends an alert. It's the pipeline that runs successfully and delivers incomplete data, while the OKRs model feeding the leadership review is already computed over bad data.

What you can do when Notion data reaches the warehouse

Before detailing the connector, it is worth documenting what is actually gained when this data leaves the SaaS and enters modeled next to the rest of your stack.

Track progress of OKRs and projects with raw data

Companies using Notion as their OKR system typically have an objectives database and another for key results, with properties for progress, owner, and timeframe. With these databases in the warehouse via the pages endpoint, you can calculate the evolution of each KR over time, something impossible to do in Notion's native dashboards, which don't offer history of property changes with record-level granularity.

Join product backlog with usage and revenue data

With task and epic records exported via pages, you can join the product backlog with usage tables (product events coming from Mixpanel, Amplitude, or the warehouse itself) and revenue data from the CRM. How much time is engineering allocating to features used by enterprise versus starter customers? What is the average lead time for a task marked as a "critical bug" from creation to closure? These questions only have answers when Notion data lives alongside your other models.

Audit changes in internal documentation and wikis

The blocks endpoint brings the structured content of each page, such as block types, textual content, and hierarchy. Combined with the pages endpoint (which loads creation and last-edited metadata), you can build an audit model: which compliance wiki pages were edited in the last 30 days, by whom, and with what frequency. Useful for teams that need to evidence document governance controls for SOC 2 or ISO 27001 audits.

Analyze collaboration and engagement via comments and users

The comments endpoint brings annotations left on pages and blocks, with author, timestamp, and thread context. The users endpoint delivers the workspace member directory. Crossing the two, you can identify which pages concentrate the most discussion, which members are the key contributors to technical documentation, and how the Notion usage pattern evolved after a broader adoption in the company.

Combine operational data from Notion with the rest of the stack

The real gain appears when Notion data is next to Linear or Jira (to join backlog with issues), HubSpot or Salesforce (to join customer success projects with account data), and GitHub (to close the loop between planning and code delivery). None of these joins are possible when the data lives only in the SaaS.

What is available in the connector

The Notion connector delivers four endpoints ready to be materialized in your destination warehouse:

Endpoint	What it contains
`users`	Workspace members with ID, name, email, and account type
`pages`	Pages and database records with properties and metadata
`blocks`	Structured content per block (type, text, hierarchy)
`comments`	Comments with author, timestamp, and parent object ID

The supported destinations are BigQuery, Redshift, PostgreSQL, Databricks, Supabase, Azure SQL Server, and Amazon S3.

How to authenticate

Connector authentication requires a single field:

Token: the Internal Integration Secret of the Notion integration

To create the token, go to notion.so/profile/integrations, click on New integration, give it a name, select the associated workspace, and save. The Internal Integration Secret becomes visible on the configuration page of the created integration.

An important detail: for each page or database in Notion that you want to sync, you must connect the integration explicitly. Open the page in Notion, click on Connect to and select the integration. Pages and databases not connected to the integration are not accessible via API and do not appear in the extraction. No error, no warning.

The complete process is documented at docs.erathos.com/connectors/apis/notion.

Why outsource ingestion to Erathos

The premise of the connector is straightforward: the engineering of maintaining the ingestion shouldn't be the responsibility of your data team. Cursor pagination, recursive block traversal, rate limiting, retry with backoff, database schema evolution, failure alerts, volume drop alerts, backfill. All of this is the responsibility of whoever operates the ingestion platform.

With the connector configured, the platform delivers:

End-to-end visibility of each run. Extraction time per endpoint, record count per window, which windows were processed, where retries occurred. When an OKR progress metric changes on the dashboard and leadership asks why, you have the complete trail to find the root cause.

Out-of-the-box alerts. Run failures, volume drops per endpoint, and window delays are detected and routed through the alerting integrations the team already uses (email, Slack, or Discord). You don't write this code.

Reprocessing as a supported operation. When you need to reprocess a window, either because the dbt model logic changed or because you received a correction from the source, this is a platform operation, not an improvised DELETE + INSERT sequence in the warehouse.

Correct pagination, rate limit management, schema evolution, and backfill are the platform's responsibility. The data team focuses on the model, not the plumbing.

Available Pipelines

The Notion connector is available with the following destinations:

Get Started Now

Create your Erathos account and connect Notion to your warehouse in minutes. With the Internal Integration Secret and pages connected to the integration, the first data arrives at the destination with zero pipeline code to write, maintain, or monitor.

Operational data generated every day in Notion shouldn't remain disconnected from the rest of your analytical model. Or worse: in a manual export or homemade pipeline that will cost the team's attention every month forever.

See the full connector documentation at docs.erathos.com/connectors/apis/notion.