Introduction

RICA is a logically-grounded inference challenge with focus on the ability to make robust commonsense inferences despite textual perturbations.

RICA consists of sets of natural language statements in the "premise-conclusion" format that require reasoning using latent (implicit) commonsense relationships. We generate 257k commonsense statements capturing 43k axioms comprising different types of commonsense, such as physical, material, and social properties.

Links:   [Paper]   [Data]   [Github]   [INK Lab]  

USC/ISI

Examples

Physical (30%)

A is smaller in size than B, so A is likely [MASK] to put into a box than B.

Select the answer!

Material (30%)

A is made out of glass and B is made out of stone, so A is [MASK] transparent than B.

Select the answer!

Social (30%)

A makes the varsity team while B does not, so A is [MASK] skilled than B.

Select the answer!

Temporal (10%)

A was eating dinner now, so A was probably hungry [MASK] eating dinner.

Select the answer!

Physical-Perturbed

B is smaller in size than A, so A is likely [MASK] to put into a box than B.

Select the answer!

Material-Perturbed

A is made out of glass and B is made out of stone, so A is not [MASK] transparent than B.

Select the answer!

Social-Perturbed

A makes the varsity team while B does not, so B is [MASK] inexperienced than A.

Select the answer!

Temporal-Perturbed

A was eating dinner now, so A was probably not hungry [MASK] eating dinner.

Select the answer!

Leaderboard

Submit to this leaderboard: You can submit your prediction by sending email to peiz@usc.edu with the title "RICA submission (your model name)" and the same format of this example prediction file.

RICA-Zero Shot

Rank

Model

Average Accuracy

 

Human Performance

91.7

1

RoBERTa-Large

Radford et. al. 2019
50.3

2

ERNIE

Zhang et. al. 2019
50.2

3

BART

Lewis et. al. 2019
50.2

4

GPT-2

Radford et. al. 2019
50.1

5

BERT-Large

Devlin et. al. 2018
49.4

RICA-Finetuned

Rank

Model

Average Accuracy

 

Human Performance

91.7

1

RoBERTa-Large

Radford et. al. 2019
52.3

2

BART

Lewis et. al. 2019
50.2

3

ERNIE

Zhang et. al. 2019
50.1

4

GPT-2

Radford et. al. 2019
50.1

5

BERT-Large

Devlin et. al. 2018
49.9

Citation.


	@inproceedings{zhou2021rica,
		title={RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms},
		author={Zhou, Pei and Khanna, Rahul and Lee, Seyeon and Lin, Bill Yuchen and Ho, Daniel and Pujara, Jay and Ren, Xiang},
		booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
		pages={7560--7579},
		year={2021}
	  }