Building machines with commonsense to compose realistically plausible sentences is challenging.
CommonGen is a constrained text generation task, associated with a benchmark dataset, to explicitly test machines for the ability of generative commonsense reasoning.
Given a set of common concepts; the task is to generate a coherent sentence describing an everyday sce- nario using these concepts.
CommonGen is challenging because it inherently requires 1) relational reasoning using background commonsense knowledge, and 2) compositional generalization ability to work on unseen concept combinations. Our dataset, constructed through a combination of crowd-sourcing from AMT and existing caption corpora, consists of 30k concept-sets and 50k sentences in total.
We use the AMT platform for collecting such sentences for covered the top-ranked 2,500 concept-sets in the sampled results from large visual caption copora. Each of them is assigned to at least three different workers. Furthermore, we use the remaining concept-sets as the training examples, for which we use the associated captions as the target outputs. Note that we explicitly control the overlap between the training and dev/test examples by filtering training concept-sets that have more than two overlapping concepts with any example in the dev/test set. There are on average 4 sentences for each example in dev and test sets, which provide a more diverse test-bed for further automatic and manual evaluation. We highlight the ratio of novel concept compositions (i.e., concept, concept-pair, and concept-triple) in dev/test, which never (co-)occur in training examples. This makes CommonGen challenging in terms of compositional generalization ability.