The IT world is changing rapidly. Generative AI, AI Agents, and LLMs are today's hot topics. But these are more than three sexy buzzwords. Many software testers believe that they can test AI-supported applications in a similar way to standard software using their usual testing procedures, which are available in test centers and test factories. That is wrong! This article explains why.
Imagine this: You're in a submarine during World War II, tasked with firing a torpedo. There are six switches, each with two positions, A or B. You have to get the sequence just right, or... boom. The sub blows up, and everyone dies. Sounds like a high-stakes puzzle, right? But how many possible combinations are there? Ten? A hundred? But once you factor in the order of flipping those switches, it's factorial chaos: 2^6 is just 64, but with sequences, it's 720 possibilities per order variation. Thus, you will have an astronomically large number of possible combinations. This problem from the past isn't just a trivia quiz; it's the origin story of modern testing theory. And today, as we dive into the world of Large Language Models (LLMs) and AI, those same principles are more critical than ever.
Testing isn't new; it's evolved from mechanical nightmares like that torpedo system to today's digital dilemmas. Humans are notoriously bad at handling big numbers, thinking about exponential growth in combinations, and that's led to poor coverage, skyrocketing costs, and budgets where 50% of the time is sunk into testing. As one wise sage put it, "You can't test quality into a system." It starts at the design phase.
Enter white-box and black-box testing. White-box lets you peek inside the code or logs for transparency, while black-box treats systems like mysterious ERPs that "just do stuff." LLMs? Mostly black-box, but with clever prompts, you can lift the veil, asking the AI for its methodology and reasoning, which ensures you get the right answer for the right reasons.
Techniques like controlled natural languages (limited nouns and verbs for precision) and cause-effect modeling (using logical operators like ANDs and XORs) have paved the way. But the real game-changer? Model-based testing tools use constraint graphs for easier collaboration, introducing observability (like adding lights to those switches) and testability from the ground up.
Fast-forward to LLMs: They're guessing engines, pattern-matchers that thrive on data and prompts. But ambiguity reigns. Language is poetic but imprecise. A Humpty Dumpty-style, so to speak, where words mean what we want them to. Add misunderstood data, and you're on shaky ground.
One example of this is a test framework that uses synthetic data for a consulting firm. By generating JSON files with variations (high turnover, improved credit rating, etc.), finances were analyzed to select the best acquisition targets. Optimizing prompts while ignoring fluctuation changes leads to dramatic results and underscores the importance of isolation: freeze contexts, refine prompts, and compare with baselines.
Bad data? It changes the argument, even if it is irrelevant. Zero values shifted the reasoning and proved that LLMs need to be trained to deal with inconsistencies. And with APIs in the mix, model-based testing ensures deterministic results in all edge cases.
The following actions bring this home: Starting with a textual bank promo requirement (based on tenure, age, credit), AI agents build models, business analysts refine the user stories, and testers expand boundaries to generate a set of scenarios, they automate API tests, and bridge teams for a common understanding.
Data quality, lineage, and privacy? Massive hurdles. Hidden PII leaks easily; tokens don't mask everything. Bad data (inconsistent structures, errors, or outdated codes) amplifies biases and hallucinations. Neural networks learn patterns, not facts; overfitting noisy sets leads to confident nonsense.
Solutions? Scan structures, discover relationships, and use AI for documentation. Generate synthetic data with machine learning (CTGAN, TVAE) to match production characteristics: small, masked, and bias-testable. Train on synthetic, test on real (TSTR). NUCIDA's approach: Model enhancements for new patterns, like time-series costs in food businesses.
Hallucinations stem from unseen data; biases from skewed sets. Rare events? Synthetically expand them. New features? Train preemptively. This three-pronged strategy (modeling, production-synth, functional tests) saves costs and boosts reliability.
Building top-notch software doesn’t have to be a struggle. At NUCIDA, we’ve cracked the code with our B/R/AI/N Testwork testing solution - pairing our QA expertise with your test management tool to deliver streamlined processes, slick automation, and results you can count on. On time. Hassle-free. Ready to ditch future headaches? Let NUCIDA show you how!
NUCIDA's QM / QA experts are certified consultants for Testiny, SmartBear, TestRail, and Xray software testing tools.
Why Choose NUCIDA?
For us, digitization does not just mean modernizing what already exists but, most importantly, reshaping the future. That is why we have made it our goal to provide our customers with sustainable support in digitizing the entire value chain. Our work has only one goal: your success!
Don’t let testing slow you down. Explore how consulting services can make your software quality soar - headache-free! Got questions? We’ve got answers. Let’s build something amazing together!
From torpedoes to LLM prompts, testing's core remains: Rigor, structure, and understanding. Don't blow up your AI projects, but embrace model-based and data-driven testing approaches. NUCIDA's tools bridge the gap, ensuring your systems are testable, verifiable, and trustworthy.
Intrigued? Dive deeper with our follow-up webinars or practitioner guides. Contact the NUCIDA team to explore how we can help your organization. In the AI era, quality isn't optional; it's explosive.
Want to know more? Watch our YouTube video, Key Principles of Machine Learning, to learn more about the latest developments.
Logos and pictures from pixabay.com and NUCIDA Group
Article written and published by Torsten Zimmermann