Testing LLM-based Systems
Learn practical strategies to test and validate Large Language Model (LLM) systems. Discover how to ensure reliability, evaluate AI outputs, and maintain quality in real-world LLM-powered applications.
Read moreExplore older entries from the archive.
Learn practical strategies to test and validate Large Language Model (LLM) systems. Discover how to ensure reliability, evaluate AI outputs, and maintain quality in real-world LLM-powered applications.
Read moreDiscover Test-Driven AI Development (TDAID) — a modern approach that merges Test-Driven Development (TDD) with AI-powered software engineering. Learn how to apply TDD principles to AI coding agents, build reliable feedback loops, and prevent regressions in non-deterministic systems. This guide explains why TDD is making a comeback in the AI era, how to structure agentic workflows around tests, and what practices help teams deliver high-quality, maintainable code with AI tools like Claude, Cursor, and Gemini.
Read moreA deep dive into Playwright Agents and the Model Context Protocol (MCP) — how Microsoft’s latest AI-powered Playwright release automates test planning, script generation, and self-healing browser tests across Chrome, Firefox, and WebKit.
Read moreHow DevTools MCP enables AI agents to record real performance traces (LCP/CLS/TBT), analyse them, and apply fixes—bringing Lighthouse-style audits into an iterative debugging session. Notes on INP (field) vs TBT (lab) included.
Read moreLearn how Mermaid diagrams improve technical documentation and AI workflows by expressing architecture as code that stays readable, maintainable, and current.
Read more