Quick Start

Parallel Data Generation Framework (PDGF) Quick Start

The aim of this tutorial is getting PDGF, bankmark’s leading data generator, up and running in a few easy steps.

Get PDGF

To receive your personal copy of PDGF, please contact us.

Pre-packaged binary

Create a new folder “PDGFEnvironment” and extract the provided zip file into that folder. As datasets can become very large, make sure you have sufficient free space on your partition for them.

mkdir PDGFEnvironment
cd PDGFEnvironment
unzip $PATH_TO_PDGF_PACKAGE.zip

Run the example

The PDGF package comes with example datasets and predefined data generators to get you started right away. As first step, you might want to start generating data with the provided demo schema.

Prerequisites: Java 7 or newer installed

This command starts PDGF and generates data with the demo schema:

java -jar pdgf.jar -l demo-schema.xml -l default-csv-generation.xml -c -ns -sf 1 -s

Hint: When starting PDGF for the first time, its license terms must be accepted by the user. Therefore, press enter when PDGF prompts you to print out the license information. After reading the license terms, write an uppercase “YES” (without quotes) and press ENTER if you agree to these terms. Any user input other than “YES” will lead to an immediate termination of the program.

Check the results

The generated output data are stored as CSV files in the “output” folder in the PDGFEnvironment directory. You can look at the generated data by opening, e.g., Customer.csv in a text editor (like vi or notepad). For example, in Customer.csv PDGF generated the customer schema and stored the data in a text file with a comma as field separator.

ls -l output/ # list all files in output/
cat output/* # print all file contents on the console
vi output/Customer.csv # open the customer data in vi

Next steps

Interested in learning more? Please have a look at the detailed explanation as well as some guidance on modifying the demo dataset.