Introduction

Building machines with commonsense to compose realistically plausible sentences is challenging. CommonGen is a constrained text generation task, associated with a benchmark dataset, to explicitly test machines for the ability of generative commonsense reasoning. Given a set of common concepts; the task is to generate a coherent sentence describing an everyday sce- nario using these concepts.

CommonGen is challenging because it inherently requires 1) relational reasoning using background commonsense knowledge, and 2) compositional generalization ability to work on unseen concept combinations. Our dataset, constructed through a combination of crowd-sourcing from AMT and existing caption corpora, consists of 30k concept-sets and 50k sentences in total.

Links:   [Paper]   [Leaderboard]   [Data]   [Huggingface Viewer]   [Github]   [INK Lab]  

USC/ISI

Key Challenges



Why is this problem hard?
First, it needs much relational commonsense knowledge about given concepts, and they are latent and compositional. We can see that for this particular real example in our dataset, we need know a list of facts and find the best composition of them for writing the sentence “A woman in a gym exercises by waving ropes tied to a wall.”

Second, the dataset needs compositional generalization ability so that it can work for unseen combinations of concepts. For example, in the training time, we can see such examples, about apple, bag, put, tree, pick, basket, wash. However, in the testing time, the model needs to deal with a more challenging example like this one. The model has never seen the concept pear in the training, or the combinations of any two of them.

Misc.

Citation

@article{lin20comgen,
    author = {Bill Yuchen Lin  and Wangchunshu Zhou and Ming Shen and Pei Zhou and Chandra Bhagavatula and Yejin Choi and Xiang Ren},
    title = {CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning},
    journal = {Findings of EMNLP},
    year = {2020},
}