The Automated Generation of Comprehensible XML Test Inputs
by S. Poulding and R. Feldt


Extensible Markup Language (XML) is often used to encode complex data structures that are the inputs to software, either in the form of configuration files that control the behaviour of the software, or the data on which the software operates. There are typically many domain-specific constraints on the hierarchy of elements in the XML, the attributes associated with each element, and the types of data that both elements and attributes contain. As a result, the automatic generation of valid XML inputs is beyond the capabilities of many test data generation techniques.

However it is not sufficient to simply generate test sets consisting of valid XML inputs: the test cases must also exhibit other properties that facilitate testing. In the absence of an automated oracle it should not be unnecessarily difficult for a human to predict the correct output of a test case, and thus a desirable property of a test case is comprehensibility.

In this paper we demonstrate the use of the GödelTest framework in generating XML test inputs, and show the generation strategy can be optimised using Nested Monte-Carlo Search to produce test cases that are comprehensible by a human. Moreover, when the validity of the inputs is defined using an XML Schema definition, we show that it is possible to automate much of the generation process and thereby realise significant savings of time and effort.

The work that went in to this paper was partly funded by The Knowledge Foundation (KKS) through the project 20130085 Testing of Critical System Characteristics (TOCSYC).


  author =    "Simon Poulding and Robert Feldt",
  title =     "The Automated Generation of Comprehensible XML Test Inputs",
  booktitle = "First North American Search Based Software Engineering Symposium (NASBASE)",
  year =      "2015",
  pages =     "",
  publisher = "IEEE",
  keywords =  "Search-based software testing, Automated testing, XML",