Parallel Data Generation Framework (PDGF) Quick Start
The aim of this tutorial is getting PDGF, bankmark’s leading data generator, up and running in a few easy steps.
To receive your personal copy of PDGF, please contact us.
Create a new folder “PDGFEnvironment” and extract the provided zip file into that folder. As datasets can become very large, make sure you have sufficient free space on your partition for them.
mkdir PDGFEnvironment cd PDGFEnvironment unzip $PATH_TO_PDGF_PACKAGE.zip
Run the example
The PDGF package comes with example datasets and predefined data generators to get you started right away. As first step, you might want to start generating data with the provided demo schema.
Prerequisites: Java 7 or newer installed
This command starts PDGF and generates data with the demo schema:
java -jar pdgf.jar -l demo-schema.xml -l default-csv-generation.xml -c -ns -sf 1 -s
Hint: When starting PDGF for the first time, its license terms must be accepted by the user. Therefore, press enter when PDGF prompts you to print out the license information. After reading the license terms, write an uppercase “YES” (without quotes) and press ENTER if you agree to these terms. Any user input other than “YES” will lead to an immediate termination of the program.
Check the results
The generated output data are stored as CSV files in the “output” folder in the PDGFEnvironment directory. You can look at the generated data by opening, e.g., Customer.csv in a text editor (like vi or notepad). For example, in Customer.csv PDGF generated the customer schema and stored the data in a text file with a comma as field separator.
ls -l output/ # list all files in output/ cat output/* # print all file contents on the console vi output/Customer.csv # open the customer data in vi
Interested in learning more? Please have a look at the detailed explanation as well as some guidance on modifying the demo dataset.