Deepnote research: our notes on building agents

The notebook

manifesto

AI has created a new category of computing: autonomous workflows that reason, act, call tools, and iterate on behalf of their users. These systems need a runtime that is safe to explore, auditable at scale, and rich enough to capture full interaction traces. That runtime is the notebook - the unit of work and the unit of record.

A bank wants an agent that can evaluate loan applications end-to-end. A recruiting team wants one that can screen candidates across resumes, research, policy, and compliance. A data team wants to ask a question in natural language and get back not a chatbot response, but a transparent, reproducible artifact. These are real use cases, and every major platform is racing to serve them. But the hard problems remain unsolved: observability, guardrails, composability, identity, and a runtime that can support both agentic applications and the teams building them.

The notebook has the right properties: transparency, reproducibility, composability, and structured execution traces. Deepnote notebooks are purpose-built for data work, making them the right foundation for agentic computing.

We are building the platform organizations use to define, run, observe, and govern agentic workflows - and the platform teams use to build the models, products, and processes behind them. Together.

Below are the principles that define what the notebook must become.

Principle 01

The universal computational medium for the agentic era

Notebooks must retain their universal nature: analysis, machine learning, autonomous agent workflows, and eval authoring. The interface should be built from open building blocks, with a block for everything the work requires: an agentic block for thinking, a block for text, a block for code, a block for SQL, a block for charts, a block for inputs, a block for files, a block for agents, and new blocks for capabilities we have not imagined yet. That is what makes the notebook extensible. It can grow with the work, rather than forcing every workflow into the same narrow interface.

Organizations do not want their agentic infrastructure to be controlled by a single model provider. They want to build on their own data, with their own guardrails, serving their own users. This is why the notebook format, the block-type system, the conversion tooling, and the core runtime are open source - so that the foundation is owned by everyone who builds on it, not held hostage by any single vendor. The teams building the next generation of models need more than a sandbox - they need a system of record for every trajectory, every rollout, and every evaluation run. The teams building agentic products and internal workflows need the same foundation: a runtime they can trust, govern, and extend. The notebook is the foundation both groups build on, and an open foundation is the only kind worth betting on.

Principle 02

Built for exploration and autonomous reasoning

Notebooks exist to support decision-making with clear links between data and conclusions. Unlike engineering workflows optimized for time-to-production, notebooks are optimized for time-to-insight: free exploration, scenario analysis, frequent branching, and fast iteration. Agents perform the same kind of exploratory reasoning. An agent probing a dataset, testing hypotheses, calling APIs, and iterating toward an answer is doing what notebooks were designed to support - open-ended, multi-step reasoning rather than linear execution.

The notebook is the natural unit of autonomous reasoning - whether an agent is exploring, researching in the background, or executing a longer-running workflow. The agent acts, the environment responds, the trajectory is captured, and the whole sequence is inspectable and replayable.

Principle 03

The context layer for AI

Every organization deploying data agents learns the same lesson: the models are not the bottleneck, the context is. An agent asked, “What was revenue growth last quarter?” The agent cannot answer without knowing how revenue is defined, which tables are canonical, what a fiscal quarter means for this company, and which data sources have been deprecated since the last team member left. Semantic layers were supposed to solve this, but they remained narrow - hand-authored YAML tied to specific BI tools, rarely maintained, never complete.

The notebook is the right medium for the context layer because it can hold all three layers of context an agent needs, and keep them executable rather than declarative.

Semantics. Ontologies, metric definitions, business rules, and governance policies - written in plain language inside a notebook, alongside the code that computes them. You can verify that a revenue definition still produces the expected output against live data, rather than just trusting that the YAML is correct. When a definition is trusted, it becomes a module - a notebook packaged as a reusable, parameterized component that any other notebook (or agent) can import. Build the churn rate calculation once; every downstream analysis inherits it. Ownership and access policies are inherited from notebook permissions, so the same controls that govern who can see the data govern who (and what) can use the context.

Operational state.Entities live as notebooks - a customer, a pipeline, a model, a process. Activities are captured through version history and audit logs. Environment conditions - machine types, connector configurations, dependency versions - are bound to the notebook, not scattered across infrastructure config files. The context is not abstract metadata; it is the system's working state.

Traceability. Every execution produces a runnable snapshot - outputs, logs, metrics - that anyone can pick up, rerun with different parameters, and build on without starting from scratch. Agent traces surface where context is missing, where definitions are ambiguous, and where agents consistently fail. The corrections flow back in. The result is not a static knowledge base but a living, self-improving system: maintained by humans, consumed by agents, refined by both.

Principle 04

Built for collaboration between humans and agents

Working with data requires input from multiple stakeholders and domain experts. The set of collaborators now includes AI agents - reading context, executing code, calling tools, accessing data sources, and producing outputs that others review, extend, or challenge.

To be truly useful, these agents must run on the best model for the job, connect to the systems where the work actually lives, and operate continuously when needed - not just as one-off assistants, but as always-on collaborators.

Collaboration must happen inside the medium: commenting, discussing, reviewing, and merging changes from any participant, human or machine, with native versioning rather than external mechanisms like git.

This extends beyond the data team to include non-technical users and autonomous agents operating on their behalf.

Principle 05

Low barrier to entry, high ceiling

Notebooks should be progressive: easy to start with, with advanced capabilities discoverable on demand. Code should be optional. A business analyst describes a question in natural language and an agent produces a rigorous, code-backed answer. A developer defines multi-agent systems with custom tool integrations. An ML engineer wires up an eval harness that treats each notebook execution as a test case. The same platform serves all three.

Low barrier democratizes access to data and computation - users self-serve by delegating to agents within a governed environment. No artificial ceiling - where spreadsheets break, notebooks scale. As powerful as any IDE, as capable as any agent framework, as rigorous as any evaluation harness.

Principle 06

Production and consumption in one place

Separate interfaces for insight producers and consumers create friction and information loss. Like spreadsheets, where the environment used to build a model is the same one used to explore it, notebooks unify production and consumption. The producers are no longer only human - agents generate analyses, surface anomalies, and assemble reports. They do so inside the same notebook where a business user reviews the output, adjusts parameters, and shares the result. The artifact a developer uses to design an agent workflow is the artifact a stakeholder sees when evaluating its conclusions. Notebooks can be used in production as standalone applications, invoked programmatically via APIs, or composed into multi-agent systems in which notebooks call one another.

Principle 07

Shareable, discoverable, and composable

Notebooks must be easy to search, organize, and discover, preventing double work and surfacing insights already produced by others.

Composability is the structural requirement that makes agentic systems governable. An agent scoring customer health calls another agent who pulls data from Salesforce, who then calls another agent who checks strategic importance to marketing. Each agent is a notebook. Notebooks calling notebooks is the natural architecture - composable, auditable, governed by the same permission model. For this to work, the building blocks must be open: open block-type definitions, open format specifications, and open conversion tools so that notebooks move freely between environments and can be extended by anyone.

This extends to agent swarms: collaborative workflows with different roles, tools, asynchronous handoffs, and partial visibility. Notebooks can represent multi-agent episodes in two patterns: one notebook per agent per episode (linked via shared state and cross-references), or one notebook per episode with multiple agent transcripts on a shared timeline. The narrative layer is the audit surface; the orchestration layer handles scheduling and wiring.

Principle 08

Built for scale

The platform must treat notebook execution as a first-class primitive: parameterized, headless, horizontally scalable, and elastic. Some agentic runs finish in seconds. Others go deep - branching, calling many tools, and consuming far more compute. The runtime should scale up when the work expands and scale down when it does not, without losing reproducibility, observability, or control. It should support scheduled and recurring runs, API- and policy-triggered execution, and large-scale orchestration for workflows, sweeps, and background automations. For enterprises, that means reliable production execution. For LLM training teams, it means millions of short-lived runs, each producing both an executed artifact and a structured trace.

The data plane should provide stable data snapshots, bounded and logged connectors, and, where possible, deterministic replay. Each run should carry its full provenance: data version, parameters, code, model version, connector policy, compute profile, and trigger context. At scale, evaluation becomes longitudinal: diffable notebooks and structured traces show exactly why outcomes changed across models, environments, policies, or infrastructure.

Principle 09

Connected to everything

A notebook that cannot reach the data is useless.

On the data side, the platform must connect natively to the warehouses and databases where organizations already live - Snowflake, BigQuery, Redshift, Databricks, PostgreSQL, and the long tail of internal systems behind REST APIs and JDBC connectors. Connections must be managed centrally, with credentials scoped and auditable, not scattered across individual notebooks.

On the model side, the platform must be model-agnostic. Organizations want to choose the best model for the task - or swap models as capabilities shift - without rewriting workflows. The notebook should treat model access as a pluggable resource rather than a hardwired dependency. No single provider should be able to hold agentic infrastructure hostage by controlling the runtime.

Principle 10

Sandboxed

Every agent needs its own computer. The same way a developer needs a laptop, an agent needs a sandboxed runtime where it can execute code, install dependencies, call APIs, manipulate files, and iterate when things fail - without any of that leaking into the host environment or affecting other workloads. Untrusted code is the default, because you never know beforehand what an LLM will generate. In a properly sandboxed notebook, if something goes wrong, you kill the sandbox and spin a new one in milliseconds.

This applies at every layer. At the application layer, user identity must propagate through the entire execution chain - across sub-agents, external systems, and consequential decisions. At the training layer, RL rollouts and evals execute model-generated code at scale - thousands of parallel sandboxes per training step, workloads ranging from five-second snippets to five-hour research sessions. Strong isolation (hardened containers or microVMs), network egress policy, secrets isolation, and explicit action gating are requirements, not options.

The notebook is simultaneously the safety boundary and the execution trace. You can audit what ran because it ran inside a controlled perimeter, and you can trust the perimeter because the full record is inspectable. Our open-source format strengthens this: auditability should not depend on vendor tooling, and any third party should be able to parse, validate, and replay the record independently.

Principle 11

Secured by design

Observability without governance is insufficient. The platform must enforce who can do what and under what conditions.

Role-based access controls must propagate through the entire agent execution chain - not just at the notebook level, but through every sub-agent call, external system interaction, and consequential decision. Policy-as-code guardrails should constrain agent behavior before actions are taken, not just flag violations after the fact. Consequential decisions require explicit approval gates with clear escalation paths.

Enterprise identity providers - SSO, SAML, SCIM - must be first-class integrations so that every action in every trace maps to an authenticated principal. For regulated industries - financial services, healthcare, government - these are not features on a roadmap. They are preconditions for procurement. The notebook's audit surface must align with the compliance frameworks buyers already operate under: SOC 2 Type II, HIPAA, and the sector-specific requirements that govern how decisions about people and money get made.