Test Data Management

Test data management (TDM) is the process of designing, developing and managing a software test process. TDM is intended to help companies to speed up their application testing and to make it more efficient.

Problem Statement

  • A large portion of the software development time is spent to look for appropriate data in order to test the software comprehensively. The data used to test the software must match the production data as closely as possible. This assures that an application works seamlessly in production use later on.
  • The system characteristics with larger data sizes should also be part of the initial software tests. This way, a system can grow during its production use without the risk of breaking down under heavier load.
  • Searching for the right data, transferring and storing them is a time-consuming and cost-intensive task. Apart from that, most production data have to be masked in some way to evady privacy issues. Last but not least, changes in the data structures (which happen frequently in the development process) require an adaption of the test data, which is very likely another time-consuming manual task.

bankmark’s Solution

The Parallel Data Generation Framework (PDGF) enables customers to easily get all their testing data from one source. PDGF is a configurable framework which allows its users to specify their data model and the output format of their data.

These high quality data have realistic values as well as configurable, realistic distributions. Nevertheless, the data is still random, so there is no risk of exposing private data into the testing environment.

Transferring and storing the generated data is not necessary. PDGF‘s innovative data generation approach guarantees a very high data generation speed. Therefore, it is also suited to generate bigger amounts of data. This allows for quick schema adaptions which are quite common during the software development process. Instead of storing the hard-to-change and space-consuming data, only the small, easily changable, and easily transferable schema configuration file has to be stored.

Apart from that, PDGF is also able to directly write to other systems. It is not necessary to write the results into a file and process this file manually after the data generation. PDGF is, e.g., able to directly feed its data into messaging systems like Apache Kafka.

Although PDGF speeds up the development process many times over, the tooling around PDGF enables bankmark to speed up this process even more by automating the initial schema configuration. If the production data can be accessed, bankmark has the tools to automatically scan the metadata of the database and, optionally, the values stored in the database tables. The end result is a ready-to-use schema configuration file for PDGF, which can be manually fine-tuned to model any specific requirements.