Trends, research, and implications for quality engineering
This living document tracks the state of AI testing and quality engineering: emerging tools, research, and practices. It is updated monthly by the Research & Literary Agent, which curates discoveries from the portfolio's research pipeline and publishes a new section each month.
Monthly digests of LLM and Gen AI research relevant to testing and QA: new models, tools, papers, and repos. Each section is generated automatically from the latest discovery feed, so the content stays current without manual publishing.
Newest first. Sections are added by the Research & Literary Agent on the first of each month.
Curated discoveries from the LLM & Gen AI research pipeline relevant to testing and quality engineering.
SGLang is a high-performance serving framework for large language models and multimodal models.
LinkA scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
LinkA project to improve skills of large language models
LinkThe RAS-Commander library provides a python API for automating HEC-RAS 6.x and accessing HDF data using Python, built with and driven by large language models.
LinkNewsPilot is an automated intelligence analysis system based on Large Language Models (LLM), designed to transform massive global news into personalized, actionable insights. It is not just a news aggregation tool, but a 24/7 intelligent intelligence assistant that understands your profession, holdings, and interests.
LinkAI system powered by large language models.
LinkA unified Python library for interaction with multiple Large Language Model (LLM) providers. Write once, run everywhere.
Link🔍 Automate penetration testing with an intelligent agent that organizes security assessments, leveraging local LLMs and Kali Linux for effective exploitation.
LinkRecipe Language Model (RLM): We introduce a domain-specific RLM developed through seven AI layers and interconnected robotic boxes to drive the evolution of physical AI for defining a new paradigm - Materials Intelligence.
LinkExperiments in building Small/Medium/Large Language Models using JAX
LinkAuto-GPT is an experimental open-source framework that transforms large language models like GPT into autonomous agents capable of self-directed reasoning, recursive goal execution, and dynamic tool use. Unlike traditional chat interfaces
LinkThis repository powers the “How Hungry is AI?” dashboard and provides the code + data release accompanying our paper on inference-phase (operational) environmental footprints of LLMs.
LinkDaily automated tracking of 19 AI agent frameworks. Pulse Score ranks momentum via star velocity, release freshness, commit activity, and community health.
LinkPersistent, LLM-maintained 3-level documentation for codebases. AST-driven line numbers, doc-only or edit-enabled doc passes, project-config'd cascade and verification. (Haven't taught the agent how to read the doc, will be included in v1.1 release)
LinkMCP server to search across NVIDIA blogs and releases to empower LLMs to better answer NVIDIA specific queries
LinkThis project is to use python and Structured Query Language with some known exiting store data to predict if someone wanted to start a new store in the same area, how likely the new store would be visited. Using Huff Model.
LinkReal-Time Fake News Detection offers an API-first approach to content moderation and journalistic verification. Powered by advanced Natural Language Processing (NLP) and ensemble tree models, this platform analyzes structure, sentiment, and context to deliver actionable credibility metrics.
Link[arxiv: 2603.11331] Scaling laws for the attack success rate under prompt-injection-based jailbreak attacks
LinkBenchmarking the true internal knowledge cutoff dates and factual decay of Large Language Models (LLMs) using notable death records.
LinkChat With Documents is a Streamlit application designed to facilitate interactive, context-aware conversations with large language models (LLMs) by leveraging Retrieval-Augmented Generation (RAG). Users can upload documents or provide URLs, and the app indexes the content using a vector store called Chroma to supply relevant context during chats.
LinkSource: discoveries-2026-05-04.md
Research & Literary Agent – State of AI Testing
Curated discoveries from the LLM & Gen AI research pipeline relevant to testing and quality engineering.
SGLang is a high-performance serving framework for large language models and multimodal models.
LinkAccessible large language models via k-bit quantization for PyTorch.
LinkInspect: A framework for large language model evaluations
LinkJiuwenClaw is an intelligent AI Agent built on openJiuwen. It extends the powerful capabilities of large language models directly to your fingertips through various communication apps you use daily.
Link[CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
LinkOfficial release of code for the paper RL is a hammer and LLMs are nails A simple RL approach to stronger prompt injection attacks
LinkINFRA-COMPASS is a tool that leverages Large Language Models (LLMs) to create and maintain an inventory of state and local codes and ordinances applicable to energy infrastructure.
LinkCode for "FORGETTING: A New Mechanism Towards Better Large Language Model Fine-Tuning" paper. Novel approach for LLM fine-tuning using token-level forgetting mechanism.
LinkA Model Context Protocol (MCP) application for automated GitHub PR analysis and issue management. Enables LLMs to fetch PR details, analyse diffs, manage issues, and handle releases through a standardised interface
LinkA structured end-to-end AI/ML engineering journey covering mathematics, machine learning, deep learning, large language models, MLOps, and production-grade projects. Built with a strong focus on fundamentals, implementation, and real-world systems.
Link🗣️ Build an interactive voice agent that leverages Speech-to-Text, a Large Language Model, and Text-to-Speech for real-time voice interactions.
Linksynapz is a research prototype exploring how large language models can adapt teaching content to different cognitive styles. built over a 48-hour sprint with a strict $50 api budget, this project implements a scientific framework to test whether adaptive teaching produces measurably better results than static approaches.
LinkContinual pretrainig experiments with large language models
LinkAgentDeck is a research platform for studying AI behavior through game scenarios. Run controlled experiments with LLMs, collect comprehensive behavioral data, and replay matches for analysis. 🚧 Preview release - feedback welcome.
Link🔍 Automate penetration testing with an intelligent agent that organizes security assessments, leveraging local LLMs and Kali Linux for effective exploitation.
LinkAn end-to-end AI drug repurposing system based on knowledge graphs + large language models. Supported by CrewAI multi-agent collaboration, it quickly and efficiently identifies new indications for existing drugs.
Link🖥️ Execute Python commands with AIPy, unlocking the potential of large language models to solve complex problems seamlessly.
Linkintelligent auditing tool powered by large language models, supporting GPT, . It automatically detects security vulnerabilities, performance issues
LinkLocal-first workspace for using large language models in a practical, understandable way. It makes reasoning, memory, tools, and workflows visible and reusable, so you can save thinking once, keep control of your data, and move on.
LinkBenchmarking the ability of large language models to detect semantic conflicts across domains, documents, and evolving knowledge bases.
LinkSource: discoveries-2026-04-13.md
Research & Literary Agent – State of AI Testing
Curated discoveries from the LLM & Gen AI research pipeline relevant to testing and quality engineering.
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
LinkOfficial release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
LinkWe want to see whether ChatGPT or other AI-LLM (Microsoft New_Bing or Google Bard) are able to help the user to go to some test environment to run cmds to solve the CTF problems (Whether the AI large language models can understand the challenge question and capture the question flags)
LinkA list of recent adversarial attack and defense papers (including those on large language models)
LinkSelf-Evolving RAG System with ChromaDB for continuous knowledge updates (6x daily), designed to overcome Large Language Model data cutoff limitations.
LinkGitHub Action that uses LLM to analyze commits, suggest semantic version bumps, and generate multi-audience changelogs
LinkA Model Context Protocol (MCP) application for automated GitHub PR analysis and issue management. Enables LLMs to fetch PR details, analyse diffs, manage issues, and handle releases through a standardised interface
LinkReal-time LLM delay & quality comparison tool built on FastAPI + SSE, tailored for Azure OpenAI. Modular architecture, unified responses/exceptions, env-based config, docs, and automation scripts ready for production release
LinkSet Supervised Fine-Tuning (SSFT): Training Large Language Models To Reason In Parallel With Global Forking Tokens (ICLR2026).
LinkA desktop tool for inspecting and modifying ComfyUI workflows using LLMs.
LinkThis repository provides a comprehensive vLLM benchmarking framework for testing large language model performance and fairness across multiple scheduling strategies (ExFairS, VTC, FCFS, Queue-based) with built-in engine management, multi-experiment batch execution, and advanced plotting capabilities.
LinkSEDAC is a next-generation framework that dynamically allocates computation during LLM inference. By using entropy-based gating, it routes predictable tokens through shallow subnetworks and sends ambiguous or high-impact tokens to deeper, specialized paths.
LinkAgentDeck is a research platform for studying AI behavior through game scenarios. Run controlled experiments with LLMs, collect comprehensive behavioral data, and replay matches for analysis. 🚧 Preview release - feedback welcome.
LinkThis dataset is the first release from DLTHA Labs, focused on enhancing the logical reasoning and step-by-step problem-solving capabilities of Large Language Models (LLMs).
LinkMCP server to search across NVIDIA blogs and releases to empower LLMs to better answer NVIDIA specific queries
Link🚀 Universal AI-Powered CAD & CFD Platform | Democratizing 3D Design & Simulation | Natural Language → Parametric Models | Build123d + Zoo.dev + Adam.new + OpenFOAM | Solar PV, Test Chambers, Digital Twins & More
LinkCreating a large language model
Link🔍 Extract structured academic metadata from research abstracts using multiple Large Language Models and assess their performance effectively.
Linkintelligent auditing tool powered by large language models, supporting GPT, . It automatically detects security vulnerabilities, performance issues
Link🔒 Enable secure Large Language Model inference with differential privacy for sensitive data protection using DP-Fusion-Lib.
LinkSource: discoveries-2026-01-26.md
Research & Literary Agent – State of AI Testing
This is the first monthly section. Future sections will be added automatically with curated discoveries from the llm-discovery pipeline. Run the agent manually from Actions or wait for the monthly schedule.
Research & Literary Agent – State of AI Testing