State of AI Testing

Trends, research, and implications for quality engineering

Overview

This living document tracks the state of AI testing and quality engineering: emerging tools, research, and practices. It is updated monthly by the Research & Literary Agent, which curates discoveries from the portfolio's research pipeline and publishes a new section each month.

What you'll find here

Monthly digests of LLM and Gen AI research relevant to testing and QA: new models, tools, papers, and repos. Each section is generated automatically from the latest discovery feed, so the content stays current without manual publishing.

Monthly updates

Newest first. Sections are added by the Research & Literary Agent on the first of each month.

February 2026

February 2026

Curated discoveries from the LLM & Gen AI research pipeline relevant to testing and quality engineering.

Engram

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Link
Spatial-SSRL

Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"

Link
ChatGPT_on_CTF

We want to see whether ChatGPT or other AI-LLM (Microsoft New_Bing or Google Bard) are able to help the user to go to some test environment to run cmds to solve the CTF problems (Whether the AI large language models can understand the challenge question and capture the question flags)

Link
NewAdversarialAttackPaper

A list of recent adversarial attack and defense papers (including those on large language models)

Link
StillMe-Learning-AI-System-RAG-Foundation

Self-Evolving RAG System with ChromaDB for continuous knowledge updates (6x daily), designed to overcome Large Language Model data cutoff limitations.

Link
llm-release-action

GitHub Action that uses LLM to analyze commits, suggest semantic version bumps, and generate multi-audience changelogs

Link
mcp-github-pr-issue-analyser

A Model Context Protocol (MCP) application for automated GitHub PR analysis and issue management. Enables LLMs to fetch PR details, analyse diffs, manage issues, and handle releases through a standardised interface

Link
LLM-Evaluation

Real-time LLM delay & quality comparison tool built on FastAPI + SSE, tailored for Azure OpenAI. Modular architecture, unified responses/exceptions, env-based config, docs, and automation scripts ready for production release

Link
SSFT

Set Supervised Fine-Tuning (SSFT): Training Large Language Models To Reason In Parallel With Global Forking Tokens (ICLR2026).

Link
Comfy-Bridge-release

A desktop tool for inspecting and modifying ComfyUI workflows using LLMs.

Link
ExFairS

This repository provides a comprehensive vLLM benchmarking framework for testing large language model performance and fairness across multiple scheduling strategies (ExFairS, VTC, FCFS, Queue-based) with built-in engine management, multi-experiment batch execution, and advanced plotting capabilities.

Link
SEDAC-V7.0-Pre-release-Test-Version

SEDAC is a next-generation framework that dynamically allocates computation during LLM inference. By using entropy-based gating, it routes predictable tokens through shallow subnetworks and sends ambiguous or high-impact tokens to deeper, specialized paths.

Link
agentdeck-preview

AgentDeck is a research platform for studying AI behavior through game scenarios. Run controlled experiments with LLMs, collect comprehensive behavioral data, and replay matches for analysis. 🚧 Preview release - feedback welcome.

Link
dltha_reasoning_v1

This dataset is the first release from DLTHA Labs, focused on enhancing the logical reasoning and step-by-step problem-solving capabilities of Large Language Models (LLMs).

Link
mcp-nvidia

MCP server to search across NVIDIA blogs and releases to empower LLMs to better answer NVIDIA specific queries

Link
GenAI-CAD-CFD-Studio

🚀 Universal AI-Powered CAD & CFD Platform | Democratizing 3D Design & Simulation | Natural Language → Parametric Models | Build123d + Zoo.dev + Adam.new + OpenFOAM | Solar PV, Test Chambers, Digital Twins & More

Link
LLMfromscratch

Creating a large language model

Link
Academic-Extraction-GenAI-Pipeline

🔍 Extract structured academic metadata from research abstracts using multiple Large Language Models and assess their performance effectively.

Link
Source-Code-Security-Audit-Reviewer

intelligent auditing tool powered by large language models, supporting GPT, . It automatically detects security vulnerabilities, performance issues

Link
dp-fusion-lib

🔒 Enable secure Large Language Model inference with differential privacy for sensitive data protection using DP-Fusion-Lib.

Link

Source: discoveries-2026-01-26.md

Research & Literary Agent – State of AI Testing

March 2026

Kickoff

This is the first monthly section. Future sections will be added automatically with curated discoveries from the llm-discovery pipeline. Run the agent manually from Actions or wait for the monthly schedule.

Research & Literary Agent – State of AI Testing