Trends, research, and implications for quality engineering
This living document tracks the state of AI testing and quality engineering: emerging tools, research, and practices. It is updated monthly by the Research & Literary Agent, which curates discoveries from the portfolio's research pipeline and publishes a new section each month.
Monthly digests of LLM and Gen AI research relevant to testing and QA: new models, tools, papers, and repos. Each section is generated automatically from the latest discovery feed, so the content stays current without manual publishing.
Newest first. Sections are added by the Research & Literary Agent on the first of each month.
Curated discoveries from the LLM & Gen AI research pipeline relevant to testing and quality engineering.
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
LinkOfficial release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
LinkWe want to see whether ChatGPT or other AI-LLM (Microsoft New_Bing or Google Bard) are able to help the user to go to some test environment to run cmds to solve the CTF problems (Whether the AI large language models can understand the challenge question and capture the question flags)
LinkA list of recent adversarial attack and defense papers (including those on large language models)
LinkSelf-Evolving RAG System with ChromaDB for continuous knowledge updates (6x daily), designed to overcome Large Language Model data cutoff limitations.
LinkGitHub Action that uses LLM to analyze commits, suggest semantic version bumps, and generate multi-audience changelogs
LinkA Model Context Protocol (MCP) application for automated GitHub PR analysis and issue management. Enables LLMs to fetch PR details, analyse diffs, manage issues, and handle releases through a standardised interface
LinkReal-time LLM delay & quality comparison tool built on FastAPI + SSE, tailored for Azure OpenAI. Modular architecture, unified responses/exceptions, env-based config, docs, and automation scripts ready for production release
LinkSet Supervised Fine-Tuning (SSFT): Training Large Language Models To Reason In Parallel With Global Forking Tokens (ICLR2026).
LinkA desktop tool for inspecting and modifying ComfyUI workflows using LLMs.
LinkThis repository provides a comprehensive vLLM benchmarking framework for testing large language model performance and fairness across multiple scheduling strategies (ExFairS, VTC, FCFS, Queue-based) with built-in engine management, multi-experiment batch execution, and advanced plotting capabilities.
LinkSEDAC is a next-generation framework that dynamically allocates computation during LLM inference. By using entropy-based gating, it routes predictable tokens through shallow subnetworks and sends ambiguous or high-impact tokens to deeper, specialized paths.
LinkAgentDeck is a research platform for studying AI behavior through game scenarios. Run controlled experiments with LLMs, collect comprehensive behavioral data, and replay matches for analysis. 🚧 Preview release - feedback welcome.
LinkThis dataset is the first release from DLTHA Labs, focused on enhancing the logical reasoning and step-by-step problem-solving capabilities of Large Language Models (LLMs).
LinkMCP server to search across NVIDIA blogs and releases to empower LLMs to better answer NVIDIA specific queries
Link🚀 Universal AI-Powered CAD & CFD Platform | Democratizing 3D Design & Simulation | Natural Language → Parametric Models | Build123d + Zoo.dev + Adam.new + OpenFOAM | Solar PV, Test Chambers, Digital Twins & More
LinkCreating a large language model
Link🔍 Extract structured academic metadata from research abstracts using multiple Large Language Models and assess their performance effectively.
Linkintelligent auditing tool powered by large language models, supporting GPT, . It automatically detects security vulnerabilities, performance issues
Link🔒 Enable secure Large Language Model inference with differential privacy for sensitive data protection using DP-Fusion-Lib.
LinkSource: discoveries-2026-01-26.md
Research & Literary Agent – State of AI Testing
This is the first monthly section. Future sections will be added automatically with curated discoveries from the llm-discovery pipeline. Run the agent manually from Actions or wait for the monthly schedule.
Research & Literary Agent – State of AI Testing