top of page

Testing AI Enabled Platforms


“Testing AI systems presents a completely new set of challenges. While traditional application testing is deterministic, with a finite number of scenarios that can be defined in advance, AI systems require a limitless approach to testing,” said Ankur Chaudhry, Founder Next Generation Automation. “

There is huge need to create new capabilities for evaluating data and learning models, choosing algorithms, and monitoring for bias and ethical and regulatory compliance.

Experts in nearly every field are in a race to discover how to replicate brain functions – wholly or partially. In fact, by 2025, the value of the artificial intelligence (AI) market will surpass US $100 billion. For corporate organizations, investments in AI are made with the goal of amplifying the human potential, improving efficiency and optimizing processes. However, it is important to be aware that AI too is prone to error owing to its complexity. Let us first understand what makes AI systems different from traditional software systems:

Software systems

a) Features – Software is deterministic, i.e., it is pre-pro- grammed to provide a specific output based on a given set of inputs

b) Accuracy – Accurate software depends on the skill of the programmer and is deemed successful if it produces an output in accordance with its design

c) Programming – All software functions are designed based on if-then and for loops to convert input data to output data

d) Errors – When software encounters an error, remediation depends on human intelligence or a coded exit function

AI systems

a) Features – Artificial intelligence/machine learning (AI/ML) is non-deterministic, i.e., the algorithm can behave differently for different runs

b) Accuracy – Accuracy of AI algorithms depends on the training set and data inputs

c) Programming – Different input and output combinations are fed to the machine based on which it learns and defines the function

d) Errors – AI systems have self-healing capabilities whereby they resume operations after handling exceptions/errors

The above figure shows the sequential stages of AI algorithms. While each stage is necessary for successful AI programs, there are some typical failure points that exist within each stage. These must be carefully identified using the right testing technique as mentioned below:

Stage 1:

Learning Process from Data Sources

Points of Failures:

• Issues of correctness, completeness and appropriateness of source data quality and formatting

• Variety and velocity of dynamic data resulting in errors

• Heterogeneous data sources

How Testing Can be performed:

• Automated data quality checks

• Ability to handle heterogeneous data during comparison

• Data transformation testing

• Sampling and aggregate strategies

Stage 2:

Input data condition- ing – Big data stores and data lakes

Points of Failures:

• Incorrect data load rules and data duplicates

• Data nodes partition failure

• Truncated data and data drops

How Testing Can be performed:

• Data ingestion testing

• Knowledge of development model and codes

• Understanding data needed for testing

• Ability to subset and create test data sets

Stage 3:

ML and analytics – Cognitive learning/ algorithms

Points of Failures:

• Determining how data is split for training and testing

• Out-of-sample errors like new behavior in previously unseen data sets

• Failure to understand data relationships between enti- ties and tables

How Testing Can be performed:

• Algorithm testing

• System testing

• Regression testing

Stage 4:

Visualization – Cus- tom apps, connected devices, web, and bots

Points of Failures:

• Incorrectly coded rules in custom applications resulting in data issues

• Formatting and data reconciliation issues between reports and the back-end

• Communication failure in middleware systems/APIs resulting in disconnected data communication and visualization

How Testing Can be performed:

• API testing

• End-to-end functional testing and automa- tion

• Testing of analytical models

• Reconciliation with development models

Stage 5:

Feedback – From sen- sors, devices, apps, and systems

Points of Failures:

• Incorrectly coded rules in custom applications resulting in data issues

• Propagation of false positives at the feedback stage resulting in incorrect predictions

How Testing Can be performed:

• Optical character recognition (OCR) testing

• Speech, image and natural language pro- cessing (NLP) testing

• RPA testing

• Chatbot testing frameworks

The right testing strategy for AI systems

Given the fact that there are several failure points, the test strategy for any AI system must be carefully structured to mitigate risk of failure. To begin with, organizations must first understand the various stages in an AI framework as shown above figure. With this understanding, they will be able to define a comprehensive test strategy with specific testing techniques across the entire framework.

Here are four key AI use cases that must be tested to ensure proper AI system functioning:

• Testing standalone cognitive features such as natural language processing (NLP), speech recognition, image recognition, and optical character recognition (OCR)

• Testing AI platforms such as IBM Watson, Infosys NIA, Azure Machine Learning Studio, Microsoft Oxford, and Google DeepMind

• Testing ML-based analytical models

• Testing AI-powered solutions such as virtual assistants and robotic process automation (RPA)

Use case 1: Testing standalone cognitive features

Natural language processing (NLP)

• Test for ‘precision’ return of the keyword, i.e., a fraction of relevant instances among the total retrieved instances of NLP

• Test for ‘recall’, i.e., a fraction of retrieved instances over total number of retrieved instances available

• Test for true positives (TPs), true negatives (TNs), false positives (FPs), and false negatives (FNs). Ensure that FPs and FNs are within the defined error/fallout range

Speech recognition inputs

• Conduct basic testing of the speech recognition software to see if the system recognizes speech inputs

• Test for pattern recognition to determine if the system can identify when a unique phrase is repeated several times in a known accent and whether it can identify the same phrase when it is repeated in a different accent

• Test deep learning, the ability to differentiate between ‘New York’ and ‘Newark’

• Test how speech translates to response. For example, a query of “Find me a place I can drink coffee” should not generate a response with coffee shops and driving directions. Instead, it should point to a public place or park where one can enjoy his/her coffee

Image recognition

• Test the image recognition algorithm through basic forms and features

• Test supervised learning by distorting or blurring the image to determine the extent of recognition by the algorithm

• Test pattern recognition by replacing cartoons with the real image like showing a real dog instead of a cartoon dog

• Test deep learning using scenarios to see if the system can find a portion of an object in a larger image canvas and complete a specific action

Optical character recognition

• Test OCR and optical word recognition (OWR) basics by using character or word inputs for the system to recognize

• Test supervised learning to see if the system can recognize characters or words from printed, written or cursive scripts

• Test deep learning, i.e., whether the system can recognize characters or words from skewed, speckled or binarized (when color is converted to grayscale) documents

• Test constrained outputs by introducing a new word in a document that already has a defined lexicon with permitted words

Use case 2: Testing AI platforms

Testing any platform that hosts an AI framework is complex. Typically, it follows many of the steps used during functional testing.

Data source and conditioning testing

• Verify the quality of data from various systems – data correctness, completeness and appropriateness along with format checks, data lineage checks and pattern analysis

• Verify transformation rules and logic applied on raw data to get the desired output format. The testing methodology/automation framework should function irrespective of the nature of data – tables, flat files or big data

• Verify that the output queries or programs provide the intended data output

• Test for positive and negative scenarios

Algorithm testing

• Split input data for learning and for the algorithm

• If the algorithm uses ambiguous datasets, i.e., the output for a single input is not known, the software should be tested by feeding a set of inputs and checking if the output is related. Such relationships must be soundly established to ensure that algorithms do not have defects

• Check the cumulative accuracy of hits (TPs and TNs) over misses (FPs and FNs)

API integration

• Verify input request and response from each application programming interface (API)

• Verify request response pairs

• Test communication between components – input and response returned as well as response format and correctness

• Conduct integration testing of API and algorithms and verify reconciliation/visualization of output

System/regression testing

• Conduct end-to-end implementation testing for specific use cases, i.e., provide an input, verify data ingestion and quality, test the algorithms, verify communication through the API layer, and reconcile the final output on the data visualization platform with expected output

• Check for system security, i.e., static and dynamic security testing

• Conduct user interface and regression testing of the systems

Use case 3: Testing ML-based analytical models

Organizations build analytical models for three main purposes as shown below Figure

The validation strategy used while testing the analytical model involves the following three steps:

• Split the historical data into ‘test’ and ‘train’ datasets

• Train and test the model based on generated datasets

• Report the accuracy of model for the various generated scenarios

While testing a model, it is critical to do the following to ensure success:

• Devise the right strategy to split and subset historical dataset using deep knowledge of development model and code to understand how it works on data

• Model the end-to-end evaluation strategy to train and recreate model in test environments with associated components

• Customize test automation to optimize testing throughput and predictability by leveraging customized solutions to split the dataset, evaluate the model and enable reporting

Use case 4: Testing of AI-powered solutions

Chatbot testing framework

• Test the chatbot framework using semantically equivalent sentences and create an automated library for this purpose

• Maintain configurations of basic and advanced semantically equivalent sentences with formal and informal tones and complex words

• Automate end-to-end scenario (requesting chatbot, getting a response and validating the response action with accepted output)

• Generate automated scripts in Python for execution

RPA testing framework

• Use open source automation or functional testing tools (Selenium, Sikuli, Robot Class, AutoIT) for mul- tiple applications

• Use flexible test scripts with the ability to switch between machine language programming (where required as an input to the robot) and high-level language for functional automation

• Use a combination of pattern, text, voice, image, and optical character recognition testing techniques

with functional automation for true end-to-end testing of applications


AI frameworks typically follow 5 stages –

Learning from various data sources,

Input data conditioning,

Machine learning and analytics,

Visualization, and


Each stage has specific failure points that can be identified using several techniques.

Thus, when testing the AI systems, QA departments must clearly define the test strategy by considering the various challenges and failure points across all stages.

Some of the important testing use cases to be considered are testing standalone cognitive features, AI platforms, ML-based analytical models, and AI- powered solutions. Such a comprehensive testing strategy will help organizations streamline their AI frameworks and minimize failures, thereby improving output quality and accuracy.

At Next Generation Automation, Team holds tremendous expertise in testing AI Driven Platforms and this makes our clients to find right test partners to test their AI Enabled Applications with support of Next Generation Automation Experts.

If you have any specific needs related to AI Testing for outsourcing you can get connected with us at:


Ankur Chaudhry

Founder Next Generation Automation

Building better QA for tomorrow


bottom of page