This resource is using open-source code maintained in github (see the quick-start-guide section) and available for download from NGC
Synthetic data generation has become pervasive with imploding amounts of data and demand to deploy machine learning models leveraging such data. There has been an increasing interest in leveraging graph-based neural network model on graph datasets, though many public datasets are of a much smaller scale than that used in real-world applications. Synthetic Graph Generation is a common problem in multiple domains for various applications, including the generation of big graphs with similar properties to original or anonymizing data that cannot be shared. The Synthetic Graph Generation tool enables users to generate arbitrary graphs based on provided real data.
The tool has the following architecture.
The module is composed of three parts: a structural generator, which fits the graph structure, feature generator, which fits the feature distribution contained in the graph; and finally, an aligner, which aligns the generated features with the generated graph structure
The graph structural generator fits graph structure and generate a corresponding graph containing the nodes and edges.
The feature generator fits the feature distribution contained in the graph and generates the corresponding features. There is the option to allow users to generate features associated with nodes, edges, or both.
The aligner aligns the generated features taken from the feature generator with the graph structure generated by a graph structural generator.
By default, the synthetic graph generation tool generates a random graph with random features specified by the user.
This tool supports the following features:
Feature | Synthetic Graph Generation |
---|---|
Non-partite graph generation | Yes |
Bipartite graph generation | Yes |
N-partite graph generation | No |
Undirected graph generation | Yes |
Directed graph generation | Yes |
Self-loops generation | Yes |
Edge features generation | Yes |
Node features generation | Yes |
Non-partite graph generation is a task to generate a graph that doesn't contain any explicit partites (disjoint and independent sets of nodes).
Bipartite graph generation is a task to generate a graph that consists of two partites.
N-partite graph generation is a task to generate a graph that consists of an arbitrary number of partites.
Undirected graph generation is a task to generate a graph made up of a set of vertices connected by not ordered edges.
Directed graph generation is a task to generate a graph made up of a set of vertices connected by directed edges.
Self-loops generation is a task to generate edges that connect a vertex to itself.
Edge features generation is a task to generate features associated with an edge.
Node features generation is a task to generate features associated with a node.
Structural graph generation
- RMAT
- Random (Erdos-Renyi)
Tabular features
- CTGAN (Conditional GAN)
- CTAB
- KDE
- Gaussian
- Random (Uniform)
Aligner
- XGBoost
- Random