Welcome to GraphGen’s documentation!

GraphGen is a framework for synthetic data generation guided by knowledge graphs. Here is our paper.

It begins by constructing a fine-grained knowledge graph from the source text, then identifies knowledge gaps in LLMs using the expected calibration error metric, prioritizing the generation of QA pairs that target high-value, long-tail knowledge.

Furthermore, GraphGen incorporates multi-hop neighborhood sampling to capture complex relational information and employs style-controlled generation to diversify the resulting QA data.

Note

This project is under active development.

Contents