Field note

Playwright CLI, Skills and Isolated Agentic Testing

Mar 02, 202633 min read

AI Testing Playwright #AI #Testing #Playwright

The Playwright team has once again demonstrated that they are serious about staying at the forefront of AI. This time, they've released two closely related pieces: Playwright CLI and what we can reasonably call Playwright Skill. One of them was announced quite prominently; the other arrived more quietly — but both are strategically important.

In this post, I'll unpack what this release actually means.

To put it in context, I'll refer back to my earlier article on Agentic Testing, because this release has direct and rather interesting implications for agent-driven testing. In fact, we might consider it yet another — the nth, if you like — meaningful milestone in the evolution of testing in the age of AI, particularly from the tester's perspective.

What These Releases Are — and What They Are Not

Before going any further, it's worth being precise about definitions. Many adjacent topics have already been covered on this blog, so I'll link to relevant articles as we go. But first, let's clearly separate concepts.

Playwright CLI

Playwright CLI is a command-line tool for controlling a browser directly from the terminal. What makes it fundamentally different from previous tooling is this: it is explicitly designed for AI agents.

We are no longer looking at a tool primarily built for human developers who type commands manually. Instead, this CLI is intentionally shaped for agent workflows. It exposes a concise, purpose-built command surface that agents can invoke efficiently. In other words, this is a release targeted at AI agents.

That alone is a sign of the times. We are entering an era where companies are not just building tools for developers — they are building tools for autonomous systems that collaborate with developers. I expect this trend to intensify significantly.

Playwright Skill

The second part of the release is Playwright Skill.

As I explained in my recent post about skills, a skill is essentially a structured procedural playbook. In this case, Playwright Skill describes how an agent should use Playwright CLI. You can think of it as a formalised extension of playwright-cli --help.

The help output itself is already well structured and, from what I've seen, highly usable by agents. But the skill goes further. It organises workflows and advanced scenarios under the paradigm of progressive disclosure.

This means the agent does not load everything into context at once. Instead, it has access to layered procedural guides that it can consult only when needed.

These additional capabilities include:

Running custom Playwright code
Request mocking
Session management
Storage and state management
Test generation
Tracing
Video recording

These are not automatically injected into every interaction. They act as modular extensions — documentation and procedures the agent can load on demand.

That design decision is important. It keeps the agent's working context lean while still providing access to advanced workflows when necessary. This is precisely what progressive disclosure aims to achieve.

Important Clarification: This Is Not the Old CLI

It's also important to address a potential source of confusion.

Playwright has had a CLI for years — most notably npx playwright test. That CLI is a test runner designed for humans. It provides a convenient way to execute automated tests.

The new Playwright CLI is something else entirely.

npx playwright test remains a human-facing test runner.
playwright-cli is an agent-oriented browser control interface.

They serve different purposes and operate at different abstraction levels.

Keeping this distinction in mind is important. Without it, it's easy to conflate "Playwright CLI" with "the Playwright test runner," when in reality we are talking about a fundamentally different design direction.

Playwright MCP vs Playwright CLI

For those closely following AI tooling, the release of Playwright CLI might initially seem puzzling. After all, we already have Playwright MCP. Why introduce yet another agent-oriented integration layer?

To answer that, we need to acknowledge something important: the initial hype around MCP as a universal protocol has largely faded. What we are seeing now is a phase of rationalisation — and, in some circles, a more critical examination of its trade-offs.

Several factors contribute to this shift.

MCP Tax

One of the most commonly discussed issues is what practitioners informally call the MCP tax.

When you connect an MCP server — for example, Playwright MCP — your client (Cursor, Claude Code, Codex, and others) must expose the full definition of all tools provided by that server. In the case of Playwright MCP, this includes commands such as browser_navigate, browser_click, browser_type, browser_evaluate, browser_snapshot, and many more.

Each of these tools comes with a detailed schema: name, description, parameters, constraints, and expected behaviour. In practice, that means fairly large tool definitions occupy part of the model’s context window.

Even if your task only requires one or two simple interactions, the model often carries the entire tool catalogue in memory. Since LLM usage is billed in tokens, this translates directly into higher cost. You are effectively paying for structural overhead that may not meaningfully contribute to solving your specific task.

This is the “MCP tax” — a cost that emerges from protocol design rather than business logic.

Context Window Pressure and Performance Degradation

Cost is only one part of the equation. The second issue is performance.

MCP-based workflows tend to consume context aggressively for two main reasons. First, tool definitions themselves are verbose by design. Second, Playwright MCP frequently returns substantial accessibility trees or page state snapshots.

When interacting with a complex application — large DOM structures, dynamic components, multi-step UI flows — the context window fills quickly.

As the context window becomes saturated, model behaviour changes. The agent may become less precise, more prone to forgetting earlier instructions, or subtly drift away from the original objective. In longer sessions, particularly those involving more than a dozen meaningful interactions, this degradation becomes noticeable.

Model Optimisation and “Hermetic Environments”

There is also a subtler architectural factor at play.

Modern coding agents are increasingly optimised for their own environments. You can observe this with GPT-5.x Codex inside the Codex CLI or the Codex application. The model performs best within its own tightly integrated ecosystem. The same model may behave differently — sometimes less reliably — when embedded in Cursor or Copilot.

In other words, these systems are becoming hermetically optimised.

When you introduce an external MCP server with dozens of additional tools, schemas, and interaction patterns, you disrupt that carefully tuned environment. You increase the tool surface area, expand the cognitive load, and add abstraction layers the model must reason about.

This does not mean MCP is inherently flawed. It simply means that optimisation matters. Agents today are highly sensitive to their operating context, and expanding that context with protocol-heavy integrations can negatively affect reasoning quality, determinism, and overall reliability.

Why Playwright CLI Fits Better into Modern Agent Workflows

Playwright CLI approaches the problem differently.

It does not introduce a new protocol layer. It does not require injecting tool schemas into the model’s context. It does not rely on JSON-based negotiation between client and server.

Instead, it leverages something agents already understand and already use: the terminal.

All major coding agents can execute CLI commands natively. From the agent’s perspective, using Playwright CLI is simply invoking a command-line binary. There are no additional tool definitions to load, no schema inflation, and no repeated protocol payloads.

This makes Playwright CLI naturally aligned with existing agent capabilities. It integrates into workflows that are already optimised and battle-tested inside modern coding agents.

Playwright CLI deep dive

Playwright CLI is, at its core, a thin execution layer on top of the Playwright engine. It does not introduce a new automation model. It does not replace native Playwright. It simply exposes the same browser capabilities through a terminal-first interface.

Under the hood, Playwright CLI uses the same browser drivers, the same page and browserContext abstractions, and the same APIs that power @playwright/test. If you can do it in Playwright, you can almost certainly do it in Playwright CLI. Navigation. Interaction. Network interception. Storage management. Tracing. Video recording. Even executing arbitrary Playwright code. It is all there.

Playwright CLI runs as a Node-based binary (@playwright/cli). When you execute a command, it connects to a browser instance, performs the action, and writes structured output to stdout. At the same time, it stores richer artefacts — snapshots, traces, screenshots, session metadata — inside a local .playwright-cli/ directory.

That design choice is deliberate.

Instead of streaming full DOM trees or accessibility representations back into a model or client, Playwright CLI externalises state. The heavy data lives on disk. The terminal output remains compact. Each command prints a short, readable summary of the current page state and, crucially, stable element references such as e21 or e35.

Technically, a session is just a running browser context managed by the CLI process. By default, it is in-memory and headless (unlike Playwright MCP which is headed). You can make it persistent. You can isolate it with named sessions. You can run multiple sessions in parallel. The CLI handles lifecycle management, storage, and clean-up.

Because it is a CLI tool, it integrates naturally into existing developer environments. It can be scripted. It can be combined with other shell utilities. It can run inside CI containers. It does not require an additional protocol server. It does not require a JSON schema handshake. It behaves like any other Unix-style tool.

If you install it and run playwright-cli --help, you will see a broad command surface. Core interaction commands. Navigation. Keyboard and mouse simulation. Network routing. Storage control. Tracing. Video. Tab management. Everything you would expect from a modern browser automation toolkit — but flattened into a composable command set.

🔧 Complete Playwright CLI Command Reference (click to expand)

playwright-cli - run playwright mcp commands from terminal

Usage: playwright-cli <command> [args] [options]
Usage: playwright-cli -s=<session> <command> [args] [options]

Core:
  open [url]                  open the browser
  close                       close the browser
  goto <url>                  navigate to a url
  type <text>                 type text into editable element
  click <ref> [button]        perform click on a web page
  dblclick <ref> [button]     perform double click on a web page
  fill <ref> <text>           fill text into editable element
  drag <startRef> <endRef>    perform drag and drop between two elements
  hover <ref>                 hover over element on page
  select <ref> <val>          select an option in a dropdown
  upload <file>               upload one or multiple files
  check <ref>                 check a checkbox or radio button
  uncheck <ref>               uncheck a checkbox or radio button
  snapshot                    capture page snapshot to obtain element ref
  eval <func> [ref]           evaluate javascript expression on page or element
  dialog-accept [prompt]      accept a dialog
  dialog-dismiss              dismiss a dialog
  resize <w> <h>              resize the browser window
  delete-data                 delete session data

Navigation:
  go-back                     go back to the previous page
  go-forward                  go forward to the next page
  reload                      reload the current page

Keyboard:
  press <key>                 press a key on the keyboard, `a`, `arrowleft`
  keydown <key>               press a key down on the keyboard
  keyup <key>                 press a key up on the keyboard

Mouse:
  mousemove <x> <y>           move mouse to a given position
  mousedown [button]          press mouse down
  mouseup [button]            press mouse up
  mousewheel <dx> <dy>        scroll mouse wheel

Save as:
  screenshot [ref]            screenshot of the current page or element
  pdf                         save page as pdf

Tabs:
  tab-list                    list all tabs
  tab-new [url]               create a new tab
  tab-close [index]           close a browser tab
  tab-select <index>          select a browser tab

Storage:
  state-load <filename>       loads browser storage (authentication) state from a file
  state-save [filename]       saves the current storage (authentication) state to a file
  cookie-list                 list all cookies (optionally filtered by domain/path)
  cookie-get <name>           get a specific cookie by name
  cookie-set <name> <value>   set a cookie with optional flags
  cookie-delete <name>        delete a specific cookie
  cookie-clear                clear all cookies
  localstorage-list           list all localstorage key-value pairs
  localstorage-get <key>      get a localstorage item by key
  localstorage-set <key> <value> set a localstorage item
  localstorage-delete <key>   delete a localstorage item
  localstorage-clear          clear all localstorage
  sessionstorage-list         list all sessionstorage key-value pairs
  sessionstorage-get <key>    get a sessionstorage item by key
  sessionstorage-set <key> <value> set a sessionstorage item
  sessionstorage-delete <key> delete a sessionstorage item
  sessionstorage-clear        clear all sessionstorage

Network:
  route <pattern>             mock network requests matching a url pattern
  route-list                  list all active network routes
  unroute [pattern]           remove routes matching a pattern (or all routes)

DevTools:
  console [min-level]         list console messages
  run-code <code>             run playwright code snippet
  network                     list all network requests since loading the page
  tracing-start               start trace recording
  tracing-stop                stop trace recording
  video-start                 start video recording
  video-stop                  stop video recording
  show                        show browser devtools
  devtools-start              show browser devtools

Install:
  install                     initialize workspace
  install-browser             install browser

Browser sessions:
  list                        list browser sessions
  close-all                   close all browser sessions
  kill-all                    forcefully kill all browser sessions (for stale/zombie processes)

Global options:
  --help [command]            print help
  --version                   print version

Playwright Skill deep dive

If Playwright CLI is the execution engine, Playwright Skill is the operating manual — written specifically for agents.

When you run:

playwright-cli install --skills

the CLI scaffolds a .claude/skills/playwright-cli/ directory inside your workspace. This is not accidental. It is a deliberate integration with the Agent Skills standard and, more concretely, with Claude Code's skill-loading mechanism.

The structure looks like this:

.claude/skills/playwright-cli/
├── SKILL.md
└── references/
    ├── request-mocking.md
    ├── running-code.md
    ├── session-management.md
    ├── storage-state.md
    ├── test-generation.md
    ├── tracing.md
    └── video-recording.md

At the root sits SKILL.md. This file is the entry point. It is the agent’s primary procedural blueprint for using Playwright CLI correctly.

If you have read my earlier article on AI Testing Skills, you already know the core idea: a skill is not just documentation. It is structured, machine-consumable procedural memory. It defines when it should activate, what tools it may use, and how workflows should be executed.

The front matter in SKILL.md makes that explicit:

name: playwright-cli
description: >
  Automates browser interactions for web testing, form filling,
  screenshots, and data extraction. Use when the user needs to
  navigate websites, interact with web pages, fill forms, take
  screenshots, test web applications, or extract information
  from web pages.
allowed-tools: Bash(playwright-cli:*)

Two things are worth noticing here.

First, the description field defines the activation surface. This is what allows the agent to decide when the skill is relevant. It acts as a semantic trigger.

Second, the allowed-tools field explicitly restricts execution to playwright-cli commands via Bash. This is a subtle but important security boundary. The skill does not grant arbitrary shell execution. It narrows the execution scope to a specific command namespace.

📘 SKILL.md — Full Content (click to expand)

Quick Start

# open new browser
playwright-cli open
# navigate to a page
playwright-cli goto https://playwright.dev
# interact with the page using refs from the snapshot
playwright-cli click e15
playwright-cli type "page.click"
playwright-cli press Enter
# take a screenshot (rarely used, as snapshot is more common)
playwright-cli screenshot
# close the browser
playwright-cli close

Commands — Core

playwright-cli open
playwright-cli open https://example.com/
playwright-cli goto https://playwright.dev
playwright-cli type "search query"
playwright-cli click e3
playwright-cli dblclick e7
playwright-cli fill e5 "user@example.com"
playwright-cli drag e2 e8
playwright-cli hover e4
playwright-cli select e9 "option-value"
playwright-cli upload ./document.pdf
playwright-cli check e12
playwright-cli uncheck e12
playwright-cli snapshot
playwright-cli snapshot --filename=after-click.yaml
playwright-cli eval "document.title"
playwright-cli eval "el => el.textContent" e5
playwright-cli dialog-accept
playwright-cli dialog-accept "confirmation text"
playwright-cli dialog-dismiss
playwright-cli resize 1920 1080
playwright-cli close

playwright-cli go-back
playwright-cli go-forward
playwright-cli reload

Commands — Keyboard

playwright-cli press Enter
playwright-cli press ArrowDown
playwright-cli keydown Shift
playwright-cli keyup Shift

Commands — Mouse

playwright-cli mousemove 150 300
playwright-cli mousedown
playwright-cli mousedown right
playwright-cli mouseup
playwright-cli mouseup right
playwright-cli mousewheel 0 100

Commands — Save As

playwright-cli screenshot
playwright-cli screenshot e5
playwright-cli screenshot --filename=page.png
playwright-cli pdf --filename=page.pdf

Commands — Tabs

playwright-cli tab-list
playwright-cli tab-new
playwright-cli tab-new https://example.com/page
playwright-cli tab-close
playwright-cli tab-close 2
playwright-cli tab-select 0

Commands — Storage

playwright-cli state-save
playwright-cli state-save auth.json
playwright-cli state-load auth.json

# Cookies
playwright-cli cookie-list
playwright-cli cookie-list --domain=example.com
playwright-cli cookie-get session_id
playwright-cli cookie-set session_id abc123
playwright-cli cookie-set session_id abc123 --domain=example.com --httpOnly --secure
playwright-cli cookie-delete session_id
playwright-cli cookie-clear

# LocalStorage
playwright-cli localstorage-list
playwright-cli localstorage-get theme
playwright-cli localstorage-set theme dark
playwright-cli localstorage-delete theme
playwright-cli localstorage-clear

# SessionStorage
playwright-cli sessionstorage-list
playwright-cli sessionstorage-get step
playwright-cli sessionstorage-set step 3
playwright-cli sessionstorage-delete step
playwright-cli sessionstorage-clear

Commands — Network

playwright-cli route "**/*.jpg" --status=404
playwright-cli route "https://api.example.com/**" --body='{"mock": true}'
playwright-cli route-list
playwright-cli unroute "**/*.jpg"
playwright-cli unroute

Commands — DevTools

playwright-cli console
playwright-cli console warning
playwright-cli network
playwright-cli run-code "async page => await page.context().grantPermissions(['geolocation'])"
playwright-cli tracing-start
playwright-cli tracing-stop
playwright-cli video-start
playwright-cli video-stop video.webm

Open Parameters

# Use specific browser when creating session
playwright-cli open --browser=chrome
playwright-cli open --browser=firefox
playwright-cli open --browser=webkit
playwright-cli open --browser=msedge
# Connect to browser via extension
playwright-cli open --extension

# Use persistent profile (by default profile is in-memory)
playwright-cli open --persistent
# Use persistent profile with custom directory
playwright-cli open --profile=/path/to/profile

# Start with config file
playwright-cli open --config=my-config.json

# Close the browser
playwright-cli close
# Delete user data for the default session
playwright-cli delete-data

Snapshots

After each command, playwright-cli provides a snapshot of the current browser state:

> playwright-cli goto https://example.com
### Page
- Page URL: https://example.com/
- Page Title: Example Domain
### Snapshot
[Snapshot](.playwright-cli/page-2026-02-14T19-22-42-679Z.yml)

You can also take a snapshot on demand using playwright-cli snapshot. If --filename is not provided, a new snapshot file is created with a timestamp. Default to automatic file naming; use --filename= when the artifact is part of the workflow result.

Browser Sessions

# create new browser session named "mysession" with persistent profile
playwright-cli -s=mysession open example.com --persistent
# same with manually specified profile directory
playwright-cli -s=mysession open example.com --profile=/path/to/profile
playwright-cli -s=mysession click e6
playwright-cli -s=mysession close
playwright-cli -s=mysession delete-data

playwright-cli list
playwright-cli close-all
playwright-cli kill-all

Local Installation

If running the globally available playwright-cli binary fails, use npx playwright-cli:

npx playwright-cli open https://example.com
npx playwright-cli click e1

Example: Form Submission

playwright-cli open https://example.com/form
playwright-cli snapshot

playwright-cli fill e1 "user@example.com"
playwright-cli fill e2 "password123"
playwright-cli click e3
playwright-cli snapshot
playwright-cli close

Example: Multi-Tab Workflow

playwright-cli open https://example.com
playwright-cli tab-new https://example.com/other
playwright-cli tab-list
playwright-cli tab-select 0
playwright-cli snapshot
playwright-cli close

Example: Debugging with DevTools

playwright-cli open https://example.com
playwright-cli click e4
playwright-cli fill e7 "test"
playwright-cli console
playwright-cli network
playwright-cli close

playwright-cli open https://example.com
playwright-cli tracing-start
playwright-cli click e4
playwright-cli fill e7 "test"
playwright-cli tracing-stop
playwright-cli close

Specific Tasks (Reference Guides)

Request mocking — references/request-mocking.md
Running Playwright code — references/running-code.md
Browser session management — references/session-management.md
Storage state (cookies, localStorage) — references/storage-state.md
Test generation — references/test-generation.md
Tracing — references/tracing.md
Video recording — references/video-recording.md

Each reference file dives deep into a single capability. Below is the full content of every reference, presented in the same order as SKILL.md links them.

Request Mocking (click to expand)

Intercept, mock, modify, and block network requests.

CLI Route Commands

# Mock with custom status
playwright-cli route "**/*.jpg" --status=404

# Mock with JSON body
playwright-cli route "**/api/users" --body='[{"id":1,"name":"Alice"}]' --content-type=application/json

# Mock with custom headers
playwright-cli route "**/api/data" --body='{"ok":true}' --header="X-Custom: value"

# Remove headers from requests
playwright-cli route "**/*" --remove-header=cookie,authorization

# List active routes
playwright-cli route-list

# Remove a route or all routes
playwright-cli unroute "**/*.jpg"
playwright-cli unroute

URL Patterns

**/api/users           - Exact path match
**/api/*/details       - Wildcard in path
**/*.{png,jpg,jpeg}    - Match file extensions
**/search?q=*          - Match query parameters

Advanced Mocking with run-code

Conditional response based on request:

playwright-cli run-code "async page => {
  await page.route('**/api/login', route => {
    const body = route.request().postDataJSON();
    if (body.username === 'admin') {
      route.fulfill({ body: JSON.stringify({ token: 'mock-token' }) });
    } else {
      route.fulfill({ status: 401, body: JSON.stringify({ error: 'Invalid' }) });
    }
  });
}"

Modify real response:

playwright-cli run-code "async page => {
  await page.route('**/api/user', async route => {
    const response = await route.fetch();
    const json = await response.json();
    json.isPremium = true;
    await route.fulfill({ response, json });
  });
}"

Simulate network failures:

playwright-cli run-code "async page => {
  await page.route('**/api/offline', route => route.abort('internetdisconnected'));
}"
# Options: connectionrefused, timedout, connectionreset, internetdisconnected

Delayed response:

playwright-cli run-code "async page => {
  await page.route('**/api/slow', async route => {
    await new Promise(r => setTimeout(r, 3000));
    route.fulfill({ body: JSON.stringify({ data: 'loaded' }) });
  });
}"

Running Custom Playwright Code (click to expand)

Use run-code to execute arbitrary Playwright code for advanced scenarios not covered by CLI commands.

Syntax

playwright-cli run-code "async page => {
  // Your Playwright code here
  // Access page.context() for browser context operations
}"

Geolocation

# Grant geolocation permission and set location
playwright-cli run-code "async page => {
  await page.context().grantPermissions(['geolocation']);
  await page.context().setGeolocation({ latitude: 37.7749, longitude: -122.4194 });
}"

# Set location to London
playwright-cli run-code "async page => {
  await page.context().grantPermissions(['geolocation']);
  await page.context().setGeolocation({ latitude: 51.5074, longitude: -0.1278 });
}"

# Clear geolocation override
playwright-cli run-code "async page => {
  await page.context().clearPermissions();
}"

Permissions

# Grant multiple permissions
playwright-cli run-code "async page => {
  await page.context().grantPermissions([
    'geolocation', 'notifications', 'camera', 'microphone'
  ]);
}"

# Grant permissions for specific origin
playwright-cli run-code "async page => {
  await page.context().grantPermissions(['clipboard-read'], {
    origin: 'https://example.com'
  });
}"

Media Emulation

# Emulate dark color scheme
playwright-cli run-code "async page => {
  await page.emulateMedia({ colorScheme: 'dark' });
}"

# Emulate reduced motion
playwright-cli run-code "async page => {
  await page.emulateMedia({ reducedMotion: 'reduce' });
}"

# Emulate print media
playwright-cli run-code "async page => {
  await page.emulateMedia({ media: 'print' });
}"

Wait Strategies

# Wait for network idle
playwright-cli run-code "async page => {
  await page.waitForLoadState('networkidle');
}"

# Wait for specific element
playwright-cli run-code "async page => {
  await page.waitForSelector('.loading', { state: 'hidden' });
}"

# Wait for function to return true
playwright-cli run-code "async page => {
  await page.waitForFunction(() => window.appReady === true);
}"

# Wait with timeout
playwright-cli run-code "async page => {
  await page.waitForSelector('.result', { timeout: 10000 });
}"

Frames and Iframes

# Work with iframe
playwright-cli run-code "async page => {
  const frame = page.locator('iframe#my-iframe').contentFrame();
  await frame.locator('button').click();
}"

# Get all frames
playwright-cli run-code "async page => {
  const frames = page.frames();
  return frames.map(f => f.url());
}"

File Downloads

playwright-cli run-code "async page => {
  const [download] = await Promise.all([
    page.waitForEvent('download'),
    page.click('a.download-link')
  ]);
  await download.saveAs('./downloaded-file.pdf');
  return download.suggestedFilename();
}"

Clipboard

# Read clipboard (requires permission)
playwright-cli run-code "async page => {
  await page.context().grantPermissions(['clipboard-read']);
  return await page.evaluate(() => navigator.clipboard.readText());
}"

# Write to clipboard
playwright-cli run-code "async page => {
  await page.evaluate(text => navigator.clipboard.writeText(text), 'Hello clipboard!');
}"

Page Information

# Get page title
playwright-cli run-code "async page => { return await page.title(); }"

# Get current URL
playwright-cli run-code "async page => { return page.url(); }"

# Get page content
playwright-cli run-code "async page => { return await page.content(); }"

# Get viewport size
playwright-cli run-code "async page => { return page.viewportSize(); }"

JavaScript Execution

playwright-cli run-code "async page => {
  return await page.evaluate(() => {
    return {
      userAgent: navigator.userAgent,
      language: navigator.language,
      cookiesEnabled: navigator.cookieEnabled
    };
  });
}"

Error Handling

playwright-cli run-code "async page => {
  try {
    await page.click('.maybe-missing', { timeout: 1000 });
    return 'clicked';
  } catch (e) {
    return 'element not found';
  }
}"

Complex Workflows

# Login and save state
playwright-cli run-code "async page => {
  await page.goto('https://example.com/login');
  await page.fill('input[name=email]', 'user@example.com');
  await page.fill('input[name=password]', 'secret');
  await page.click('button[type=submit]');
  await page.waitForURL('**/dashboard');
  await page.context().storageState({ path: 'auth.json' });
  return 'Login successful';
}"

# Scrape data from multiple pages
playwright-cli run-code "async page => {
  const results = [];
  for (let i = 1; i <= 3; i++) {
    await page.goto('https://example.com/page/' + i);
    const items = await page.locator('.item').allTextContents();
    results.push(...items);
  }
  return results;
}"

Browser Session Management (click to expand)

Run multiple isolated browser sessions concurrently with state persistence.

Named Browser Sessions

Use the -s flag to isolate browser contexts:

# Browser 1: Authentication flow
playwright-cli -s=auth open https://app.example.com/login

# Browser 2: Public browsing (separate cookies, storage)
playwright-cli -s=public open https://example.com

# Commands are isolated by browser session
playwright-cli -s=auth fill e1 "user@example.com"
playwright-cli -s=public snapshot

Session Isolation Properties

Each browser session has independent: cookies, localStorage/sessionStorage, IndexedDB, cache, browsing history, and open tabs.

Session Commands

playwright-cli list                     # list all browser sessions
playwright-cli close                    # stop the default browser
playwright-cli -s=mysession close       # stop a named browser
playwright-cli close-all                # stop all browser sessions
playwright-cli kill-all                 # forcefully kill stale/zombie processes
playwright-cli delete-data              # delete default browser data
playwright-cli -s=mysession delete-data # delete named browser data

Environment Variable

export PLAYWRIGHT_CLI_SESSION="mysession"
playwright-cli open example.com  # uses "mysession" automatically

Pattern: Concurrent Scraping

#!/bin/bash
playwright-cli -s=site1 open https://site1.com &
playwright-cli -s=site2 open https://site2.com &
playwright-cli -s=site3 open https://site3.com &
wait

playwright-cli -s=site1 snapshot
playwright-cli -s=site2 snapshot
playwright-cli -s=site3 snapshot

playwright-cli close-all

Pattern: A/B Testing Sessions

playwright-cli -s=variant-a open "https://app.com?variant=a"
playwright-cli -s=variant-b open "https://app.com?variant=b"

playwright-cli -s=variant-a screenshot
playwright-cli -s=variant-b screenshot

Persistent Profile

# Use persistent profile (auto-generated location)
playwright-cli open https://example.com --persistent

# Use persistent profile with custom directory
playwright-cli open https://example.com --profile=/path/to/profile

Session Configuration

playwright-cli open https://example.com --config=.playwright/my-cli.json
playwright-cli open https://example.com --browser=firefox
playwright-cli open https://example.com --headed
playwright-cli open https://example.com --persistent

Best Practices

Name sessions semantically — use -s=github-auth, not -s=s1
Always clean up — close individual sessions or use close-all; use kill-all for zombie processes
Delete stale data — run delete-data to free disk space from old persistent sessions

Storage Management (click to expand)

Manage cookies, localStorage, sessionStorage, and browser storage state.

Save & Restore Storage State

# Save to auto-generated filename
playwright-cli state-save

# Save to specific filename
playwright-cli state-save my-auth-state.json

# Load storage state from file
playwright-cli state-load my-auth-state.json
playwright-cli open https://example.com

The saved file contains cookies (name, value, domain, path, expires, httpOnly, secure, sameSite) and origins with their localStorage entries.

Cookies

playwright-cli cookie-list
playwright-cli cookie-list --domain=example.com
playwright-cli cookie-list --path=/api
playwright-cli cookie-get session_id
playwright-cli cookie-set session abc123
playwright-cli cookie-set session abc123 --domain=example.com --path=/ --httpOnly --secure --sameSite=Lax
playwright-cli cookie-set remember_me token123 --expires=1735689600
playwright-cli cookie-delete session_id
playwright-cli cookie-clear

Advanced — multiple cookies at once:

playwright-cli run-code "async page => {
  await page.context().addCookies([
    { name: 'session_id', value: 'sess_abc123', domain: 'example.com', path: '/', httpOnly: true },
    { name: 'preferences', value: JSON.stringify({ theme: 'dark' }), domain: 'example.com', path: '/' }
  ]);
}"

Local Storage

playwright-cli localstorage-list
playwright-cli localstorage-get token
playwright-cli localstorage-set theme dark
playwright-cli localstorage-set user_settings '{"theme":"dark","language":"en"}'
playwright-cli localstorage-delete token
playwright-cli localstorage-clear

Session Storage

playwright-cli sessionstorage-list
playwright-cli sessionstorage-get form_data
playwright-cli sessionstorage-set step 3
playwright-cli sessionstorage-delete step
playwright-cli sessionstorage-clear

IndexedDB (via run-code)

# List databases
playwright-cli run-code "async page => {
  return await page.evaluate(async () => {
    const databases = await indexedDB.databases();
    return databases;
  });
}"

# Delete database
playwright-cli run-code "async page => {
  await page.evaluate(() => { indexedDB.deleteDatabase('myDatabase'); });
}"

Pattern: Authentication State Reuse

# Step 1: Login and save state
playwright-cli open https://app.example.com/login
playwright-cli snapshot
playwright-cli fill e1 "user@example.com"
playwright-cli fill e2 "password123"
playwright-cli click e3
playwright-cli state-save auth.json

# Step 2: Later, restore state and skip login
playwright-cli state-load auth.json
playwright-cli open https://app.example.com/dashboard
# Already logged in!

Security Notes

Never commit storage state files containing auth tokens
Add *.auth-state.json to .gitignore
Delete state files after automation completes
Use environment variables for sensitive data
Sessions run in-memory mode by default, which is safer for sensitive operations

Test Generation (click to expand)

Generate Playwright test code automatically as you interact with the browser. Every action you perform with playwright-cli generates corresponding Playwright TypeScript code that appears in the output and can be copied directly into your test files.

Example Workflow

# Start a session
playwright-cli open https://example.com/login

# Take a snapshot to see elements
playwright-cli snapshot
# Output: e1 [textbox "Email"], e2 [textbox "Password"], e3 [button "Sign In"]

# Fill form fields — generates code automatically
playwright-cli fill e1 "user@example.com"
# Ran Playwright code:
# await page.getByRole('textbox', { name: 'Email' }).fill('user@example.com');

playwright-cli fill e2 "password123"
# Ran Playwright code:
# await page.getByRole('textbox', { name: 'Password' }).fill('password123');

playwright-cli click e3
# Ran Playwright code:
# await page.getByRole('button', { name: 'Sign In' }).click();

Building a Test File

Collect the generated code into a Playwright test:

import { test, expect } from '@playwright/test';

test('login flow', async ({ page }) => {
  await page.goto('https://example.com/login');
  await page.getByRole('textbox', { name: 'Email' }).fill('user@example.com');
  await page.getByRole('textbox', { name: 'Password' }).fill('password123');
  await page.getByRole('button', { name: 'Sign In' }).click();

  await expect(page).toHaveURL(/.*dashboard/);
});

Best Practices

Use semantic locators — the generated code uses role-based locators (getByRole) when possible, which are more resilient than CSS selectors
Explore before recording — take snapshots to understand page structure before recording actions
Add assertions manually — generated code captures actions but not assertions; add expect() calls in your test

Tracing (click to expand)

Capture detailed execution traces for debugging and analysis. Traces include DOM snapshots, screenshots, network activity, and console logs.

Basic Usage

playwright-cli tracing-start

playwright-cli open https://example.com
playwright-cli click e1
playwright-cli fill e2 "test"

playwright-cli tracing-stop

Trace Output Files

trace-{timestamp}.trace — action log with every action performed, DOM snapshots before/after, screenshots at each step, timing info, console messages, and source locations
trace-{timestamp}.network — complete network activity including all HTTP requests/responses, headers, bodies, timing (DNS, connect, TLS, TTFB, download), resource sizes, and errors
resources/ — cached resources (images, fonts, stylesheets, scripts) needed to reconstruct page state

What Traces Capture

Category	Details
Actions	Clicks, fills, hovers, keyboard input, navigations
DOM	Full DOM snapshot before/after each action
Screenshots	Visual state at each step
Network	All requests, responses, headers, bodies, timing
Console	All console.log, warn, error messages
Timing	Precise timing for each operation

Use Case: Debugging Failed Actions

playwright-cli tracing-start
playwright-cli open https://app.example.com
playwright-cli click e5    # this click fails — why?
playwright-cli tracing-stop
# Open trace to see DOM state when click was attempted

Use Case: Analyzing Performance

playwright-cli tracing-start
playwright-cli open https://slow-site.com
playwright-cli tracing-stop
# View network waterfall to identify slow resources

Use Case: Capturing Evidence

playwright-cli tracing-start
playwright-cli open https://app.example.com/checkout
playwright-cli fill e1 "4111111111111111"
playwright-cli fill e2 "12/25"
playwright-cli fill e3 "123"
playwright-cli click e4
playwright-cli tracing-stop

Trace vs Video vs Screenshot

Feature	Trace	Video	Screenshot
Format	.trace file	.webm video	.png/.jpeg image
DOM inspection	Yes	No	No
Network details	Yes	No	No
Step-by-step replay	Yes	Continuous	Single frame
File size	Medium	Large	Small
Best for	Debugging	Demos	Quick capture

Best Practices

Start tracing before the problem — trace the entire flow, not just the failing step
Clean up old traces — traces can consume significant disk space; remove traces older than 7 days

Limitations

Traces add overhead to automation
Large traces can consume significant disk space
Some dynamic content may not replay perfectly

Video Recording (click to expand)

Capture browser automation sessions as video for debugging, documentation, or verification. Produces WebM (VP8/VP9 codec).

Basic Recording

# Start recording
playwright-cli video-start

# Perform actions
playwright-cli open https://example.com
playwright-cli snapshot
playwright-cli click e1
playwright-cli fill e2 "test input"

# Stop and save
playwright-cli video-stop demo.webm

Best Practices

Use descriptive filenames that include context:

playwright-cli video-stop recordings/login-flow-2024-01-15.webm
playwright-cli video-stop recordings/checkout-test-run-42.webm

Tracing vs Video

Feature	Video	Tracing
Output	WebM file	Trace file (viewable in Trace Viewer)
Shows	Visual recording	DOM snapshots, network, console, actions
Use case	Demos, documentation	Debugging, analysis
Size	Larger	Smaller

Limitations

Recording adds slight overhead to automation
Large recordings can consume significant disk space

Each reference file focuses on one capability in depth. That separation is intentional. It keeps the main skill concise while allowing the agent to load advanced procedures only when necessary.

In architectural terms, the skill acts as a layered playbook. Level one is discovery (metadata). Level two is operational guidance. Level three is specialised procedures.

Portability Across Agents

If you are using Codex instead of Claude Code, the good news is that skills are portable. In my case, all it took was a single prompt along the lines of:

Translate this Claude-style skill into a Codex-compatible skill.

Codex generated a fully working version in one go.

Isolated Agentic Testing

Let’s return to agentic testing, because this is where the implications of Playwright CLI become genuinely interesting.

In my earlier post on Agentic Testing, I defined it simply as testing performed by AI agents — not traditional scripts and not humans manually navigating flows, but autonomous systems capable of reasoning, acting, verifying outcomes, and reporting results. At the same time, I pointed out several structural weaknesses in that approach as it existed then.

The first major issue was cost. In MCP-based workflows, the agent had to repeatedly process large accessibility trees returned after almost every interaction. This significantly inflated token usage. The more complex the UI, the more expensive the session became. Long exploratory runs could quickly become economically impractical, especially when every navigation step injected thousands of tokens into the model’s context.

The second issue was the lack of reliable isolation. Without robust request interception, an agent performing UI testing remained tightly coupled to backend availability. If the API was down, unstable, or simply outside the scope of what you wanted to validate, the entire agentic flow would collapse. This made autonomous testing fragile and overly dependent on external systems.

The third issue was traceability. If an agent claimed that it had tested a given flow, how could you verify that claim? How could a tester confirm that the agent actually interacted with the intended elements and validated the expected states, rather than merely producing a plausible narrative? The absence of structured, inspectable artefacts made this a legitimate concern.

What is interesting is that with Playwright CLI and the Skill system, all three of these problems are substantially mitigated.

Cost Reduction Through Progressive Disclosure

Let’s start with cost.

The key improvement comes from progressive disclosure — the design principle behind the skill system and the CLI’s interaction model. Instead of injecting full tool schemas into the model’s context and returning complete accessibility trees after every action, Playwright CLI externalises state. Snapshots are written to disk. The agent receives a compact summary and stable element references. The heavy representation lives in local artefacts rather than inside the model’s context window.

This architectural decision changes the economics entirely. The agent no longer needs to ingest thousands of tokens of accessibility data after each step. It reads what it needs when it needs it. Context windows remain cleaner, sessions remain stable for longer, and token consumption drops significantly.

In practical terms, this makes agentic testing economically viable for longer and more complex workflows. It stops being a novelty and becomes something that can realistically be used in sustained verification scenarios.

Traceability via Local Artefacts

Traceability improves for the same reason: state is externalised and persisted.

Playwright CLI stores snapshots as YAML files inside the .playwright-cli/ directory. Every meaningful interaction leaves a structured footprint. Commands generate timestamped snapshots, console output, and optionally trace files or video recordings. These artefacts are not buried in conversational history; they are written to disk.

From a testing theory perspective, this is powerful. A tester can parse the YAML snapshots and verify that specific elements were present at specific points in time. One can confirm that navigation occurred, that a button existed, or that a certain UI state was reached. It becomes possible to build automated verification layers on top of these artefacts, independent of whatever summary the agent provides.

In other words, the CLI does not merely execute actions — it records them in a machine-readable, auditable format. The agent is no longer a black box that simply declares success. It produces evidence that can be inspected and validated.

Mocking

The most consequential improvement, however, concerns isolation.

Playwright has always supported powerful request interception via route. With the CLI, this capability becomes immediately accessible within an agent-driven workflow. The agent can block backend calls, mock API responses, simulate failures, inject delays, or modify responses on the fly.

An agent is no longer forced to rely on a live backend. It can intercept POST /login, fulfil it with a mock token, intercept GET /me, return a predefined user object, and proceed to verify the authenticated UI state. It can simulate edge cases without waiting for backend engineers to expose special test hooks. It can validate frontend behaviour under deterministic conditions even when the backend is unstable or intentionally unavailable.

Isolated Agentic Testing

This leads to a concept worth naming explicitly: isolated agentic testing.

By this I mean a mode of testing in which the agent autonomously establishes its own isolation boundary. It mocks or blocks external dependencies and verifies frontend behaviour under controlled conditions. The isolation is not externally enforced by test infrastructure; it is created dynamically by the agent itself.

From a classical testing perspective, this is significant. Isolation is one of the strongest predictors of test stability. Tests that depend on live external systems are inherently brittle. Tests that control their dependencies are usually robust and reproducible.

With Playwright CLI, the agent can enforce that control. It can detect backend unavailability, introduce a route mock, and continue the flow. It can create deterministic scenarios without polluting shared environments. It can shape its environment instead of merely reacting to it.

From a testing philosophy standpoint, that is a meaningful step forward.

Example Use Case

Let’s make this concrete.

Below is a real example of how I used Codex with the Playwright CLI skill to test a frontend login flow while the backend was completely offline. The goal was simple: verify that the frontend behaves correctly when authentication succeeds — without relying on a running API.

The frontend was running locally at http://localhost:8081. The backend at http://localhost:4001 was intentionally not running.

Step 1 — The Prompt

Inside Codex, I gave a very direct instruction:

$playwright-cli test the login flow on http://localhost:8081.
If backend calls fail, mock them.
Verify that the user is redirected to the homepage and that authenticated UI elements are visible.

I did not specify how to mock. I did not provide route examples. I simply relied on the installed Playwright skill.

Codex immediately activated the skill. It read SKILL.md, understood the expected interaction loop, and began issuing CLI commands.

Step 2 — Detecting the Backend Failure

The agent opened the page and took a snapshot.

### Page
- Page URL: http://localhost:8081/login
- Page Title: Awesome Testing

It then filled the login form and submitted it. Immediately, the console output revealed:

[ERROR] Failed to load resource: net::ERR_CONNECTION_REFUSED @ http://localhost:4001/users/signin

The agent did not stop. It reasoned.

It detected that http://localhost:4001/users/signin was unreachable. It inspected the network call pattern. It recognised that the frontend expects a token response from http://localhost:4001/users/signin and then a user profile from http://localhost:4001/users/me.

At this point, the skill’s request-mocking.md reference became relevant. Codex loaded it and followed the documented route patterns.

Step 3 — Mocking the Backend

Codex issued:

playwright-cli route "http://localhost:4001/users/signin" --body='{"token":"mock-token-123","refreshToken":"mock-refresh-123"}'
playwright-cli route "http://localhost:4001/users/me" --body='{"id":1,"username":"demo","firstName":"Demo","lastName":"User","email":"demo@example.com","roles":["ADMIN"]}'

The CLI responded:

Route added for pattern: http://localhost:4001/users/signin
Route added for pattern: http://localhost:4001/users/me

The agent did not mock everything. It mocked only the endpoints required to unblock the login flow. That selective mocking demonstrates reasoning, not brute-force stubbing.

Step 4 — Re-Executing the Flow

After setting up routes, Codex retried the login flow.

This time, navigation succeeded:

### Page
- Page URL: http://localhost:8081/
- Page Title: Awesome Testing

The agent took another snapshot and inspected the UI state. It confirmed the presence of authenticated elements:

link "Demo User"
heading "Welcome, Demo!"
paragraph: demo@example.com

It then checked storage state to confirm token persistence:

token=mock-token-123
refreshToken=mock-refresh-123

At this point, the contract was validated.

The frontend behaved exactly as expected when receiving a valid token and user payload — even though the real backend was offline.

Conclusions

Somewhat quietly, Playwright CLI has handed testers another potentially powerful toy.

It did not arrive with the same level of noise that accompanied MCP a few months ago. It did not trigger dramatic announcements about “the future of everything.” And yet, strategically, it may prove just as important — perhaps even more so.

We are still in the middle of a rapid AI expansion cycle. New capabilities keep appearing. Integration models evolve. Agents become more capable. Tooling ecosystems compete for dominance. There is an ongoing race to capture mindshare, workflows, and ultimately market share.

From one perspective, this is fascinating. The pace of change is remarkable. From another, it is slightly unsettling. The ground beneath our profession is shifting in real time.

If MCP servers currently feel like they have lost some of their initial hype, it may be that Playwright CLI represents the opposite dynamic. It feels less glamorous, less protocol-driven, less “architectural.” But precisely because of that, it may end up having a deeper and more lasting impact on everyday practice.

CLI-based automation aligns naturally with how modern coding agents already operate. Skills formalise procedural knowledge in a reusable way. Isolation, traceability, and token efficiency are no longer theoretical improvements — they are practical ones.

Will Playwright CLI fundamentally change our profession? Possibly. But it would be premature to declare anything definitive.

The sensible approach is to test it — calmly, pragmatically, in your own context. Try it on a real problem. Try it on something messy. See whether it helps, whether it simplifies, whether it integrates naturally into your workflow. As we know all too well, not every team faces problems that require this kind of tooling. Context matters.

My recommendation is simple: experiment. Install it. Use the Skill. Observe how the agent behaves when it has structured procedural guidance. It is increasingly clear that Skills are becoming a first-class citizen in the agent ecosystem. In fact, at this moment, there is arguably more discussion around Skills than around MCP servers — something that would have seemed unthinkable just three months ago.

The ecosystem is evolving quickly. The only reliable way to keep up is to engage with it directly.

If this was useful, get the next one by email.

A few adjacent pieces worth reading next.

PostSelf-Healing Tests with AI: Triage Before RepairJul 21, 2026 PostPlaywright Agentic Coding TipsSep 04, 2025 PostAI Testing Skills: The Evolution Beyond RAG and MCPDec 23, 2025

Comments

Loading comments...

What These Releases Are — and What They Are Not

Playwright CLI

Playwright Skill

Important Clarification: This Is Not the Old CLI

Playwright MCP vs Playwright CLI

MCP Tax

Context Window Pressure and Performance Degradation

Model Optimisation and “Hermetic Environments”

Why Playwright CLI Fits Better into Modern Agent Workflows

Playwright CLI deep dive

Playwright Skill deep dive

Quick Start

Commands — Core

Commands — Navigation

Commands — Keyboard

Commands — Mouse

Commands — Save As

Commands — Tabs

Commands — Storage

Commands — Network

Commands — DevTools

Open Parameters

Snapshots

Browser Sessions

Local Installation

Example: Form Submission

Example: Multi-Tab Workflow

Example: Debugging with DevTools

Specific Tasks (Reference Guides)

CLI Route Commands

URL Patterns

Advanced Mocking with run-code

Syntax

Geolocation

Permissions

Media Emulation

Wait Strategies

Frames and Iframes

File Downloads

Clipboard

Page Information

JavaScript Execution

Error Handling

Complex Workflows

Named Browser Sessions

Session Isolation Properties

Session Commands

Environment Variable

Pattern: Concurrent Scraping

Pattern: A/B Testing Sessions

Persistent Profile

Session Configuration

Best Practices

Save & Restore Storage State

Cookies

Local Storage

Session Storage

IndexedDB (via run-code)

Pattern: Authentication State Reuse

Security Notes

Example Workflow

Building a Test File

Best Practices

Basic Usage

Trace Output Files

What Traces Capture

Use Case: Debugging Failed Actions

Use Case: Analyzing Performance

Use Case: Capturing Evidence

Trace vs Video vs Screenshot

Best Practices

Limitations

Basic Recording

Best Practices

Tracing vs Video

Limitations

Portability Across Agents

Isolated Agentic Testing

Cost Reduction Through Progressive Disclosure

Traceability via Local Artefacts