CommonGen Leaderboard (v1.1)

Rank

Model

BLEU-4

CIDEr

SPICE

1

Jun 09, 2021

KFCNet

MSRA and Microsoft Ads

Email   Paper (EMNLP'21)
43.619 18.845 33.911

2

May 18, 2021

KGR^4

Anonymous (under review)

Email   Document (placeholder)
42.818 18.423 33.564

3

Mar 23, 2021

KFC (v1)

MSRA and Microsoft Ads

Email   Paper (EMNLP'21)
42.453 18.376 33.277

4

April 25, 2021

R^3-BART

Anonymous (under review).

Email   Document (placeholder)
41.954 17.706 32.961

5

July 1, 2021

WittGEN + T5-large

Anonymous (under review)

38.233 18.036 31.682

6

Jan 13, 2021

RE-T5 (Retrieval-Enhanced T5)

Microsoft Cognitive Services Research Group

Email   Paper (ACL21)
40.863 17.663 31.079

7

Aug 1, 2021

VisCTG (BART-large)

CMU-LTI

Email   Paper (arXiv)
36.939 17.199 29.973

8

Aug 10, 2021

SAPPHIRE (T5-large)

CMU-LTI

Email   Paper (INLG)
37.119 16.901 29.751

9

Aug 26, 2020

KG-BART

University of Illinois at Chicago

Email   Paper
33.867 16.927 29.634

10

Oct 12, 2020

EKI-BART

MSRA and Fudan University

Email   Paper (COLING 2020)
35.945 16.999 29.583

11

Jun 1, 2020

T5-Large

Fine-tuned by USC-INK

T5 Paper
31.962 15.128 28.855

12

Jun 1, 2020

BART

Fine-tuned by USC-INK

BART Paper
31.827 13.976 27.995

13

Jun 1, 2020

UniLM

Fine-tuned by USC-INK

UniLM Paper
30.616 14.889 27.429

14

Jun 1, 2020

BERT-Gen

Fine-tuned by USC-INK

Code
23.468 12.606 24.822

15

Jun 1, 2020

GPT-2

Fine-tuned by USC-INK

GPT-2 Paper
26.833 12.187 23.567

16

Jun 1, 2020

T5-Base

Fine-tuned by USC-INK

T5 Paper
18.546 9.399 19.871

Submit to this leaderboard: You can submit your prediction by sending email to yuchen.lin@usc.edu with the title "CommonGen submission (your model name)" and the same format of this example prediction file.

We use SPICE for ranking all methods because SPICE correlates our human evaluation the most (please check our paper for more details.)

The above results are based on our latest human references (v1.1) and the previous results on v1.0 can be found here.

The difference between v1.1 and v1.0 is about the human references of the test examples. We add one more human reference for each example in the test set (previously 4, now 5). Please find the deatils in Table 1,3 ; Figure 4 ;and Section 3.3, 3.4. Note that the train/dev data and test data's input are unchanged.

Misc.

Citation

@inproceedings{lin-etal-2020-commongen,
    title = "{C}ommon{G}en: A Constrained Text Generation Challenge for Generative Commonsense Reasoning",
    author = "Lin, Bill Yuchen  and
      Zhou, Wangchunshu  and
      Shen, Ming  and
      Zhou, Pei  and
      Bhagavatula, Chandra  and
      Choi, Yejin  and
      Ren, Xiang",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.findings-emnlp.165",
    pages = "1823--1840", 
}