My inventions : Data Set Generator

About

Here's a synthetic multi-dimensional data set generator I've made. If you have any info or questions please contact me.

Quite a few times I have run into the situation where I needed a several different sets of test data to get an idea of the behavior of a certain algorithms performance under different conditions.

I think this is a quite common problem because it is difficult to find and import natural data sets. It is even harder to to find a nice spectrum of data sets with different qualities to do comparative analyisis with.

It is also vey important that any synthetic data set have the same kinds of attributes that natural sets do.

My friend Ciphergoth was working on a program to do searches within metric spaces and it seemed liked he needed a good tool to see how his "new" algorithm faired against the other well known ones.

Since I felt that it could be useful to someone else that finally motivated me to write it even though I had been kicking around the idea in my head for some time. This program is the result.

Background

The program is written in C++ and is designed to be integrated into the code of your project. It is not designed to run in a standalone way (though the example application runs just fine). Please make sure to read the licensing conditions. This software is protected under the GPL.

A few things to note:

Dowload

datagen.zip (1,454 Kb).