NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models

Everday Objects (35.2%)

A bicycle has [MASK] tires.

Select the answer!

Biology (13.5%)

Most ants have [MASK] legs.

Select the answer!

Geometry (11.7%)

A cube has [MASK] faces.

Select the answer!

Unit Converting (6.3%)

A week is [MASK] days.

Select the answer!

Math (7.3%)

I will be [MASK] next year,as I am nine now.

Select the answer!

Physics (5.7%)

Water will freeze at [MASK] degrees centigrade.

Select the answer!

Geography (2.9%)

The world contains [MASK] continents.

Select the answer!

Others (17.5%)

There are [MASK] princes in the United States.

Select the answer!

For submitting your prediction and check the lastest submissions, please check it at the eval.ai.

Rank	Model	Hit@1	Hit@2	Hit@3
	Human Performance	88.3 (closed-book) 93.7 (open-book)	N/A	N/A
1	T5-11B + GKP University of Washington - 2021-9	72.47	85.57	91.58
2	T5 1.1 Zero-Shot +digits ISI Waltham - 2021-04	66.18	82.80	89.64
3	T5-11B + IR MOWGLI/USC INK - Jun Yan - 2021-01-10	65.10	81.56	88.33
4	T5-11B Stanford - Yuhui Zhang - 2021-01-08	64.08	79.66	87.29
5	T5-11B (Closed-book QA) Team Cosmic - Yizhong Wang - 2021-01-11	56.91	72.01	80.51
6	RoBERTa + UnifiedQA (T5-3B) MICS ISI - Dong-Ho Lee - 2021-01-20	56.33	73.30	82.33
7	RoBERTa-Large (Fine-tuned)	47.58	66.34	76.74
8	BERT-Large (Fine-tuned)	43.68	66.41	72.87
9	RoBERTa-Large (Zero-shot)	35.89	58.07	74.09
10	BERT-Large (Zero-shot)	27.15	52.92	70.25
11	RoBERTa-base (Zero-shot)	26.80	50.57	66.72
12	BERT-base (Zero-shot)	25.30	48.70	64.84
13	GPT-2 (Zero-shot)	24.76	44.28	62.40

Rank	Model	Hit@1	Hit@2	Hit@3
	Human Performance	89.7 (closed-book) 96.3 (open-book)	N/A	N/A
1	T5-11B + GKP University of Washington - 2021-9	79.24	89.93	94.17
2	T5 1.1 Zero-Shot +digits ISI Waltham - 2021-04	72.61	87.10	92.23
3	T5-11B + IR MOWGLI/USC INK - Jun Yan - 2021-01-10	70.41	84.81	90.99
4	T5-11B Stanford - Yuhui Zhang - 2021-01-08	70.23	83.57	90.11
5	T5-11B (Closed-book QA) Team Cosmic - Yizhong Wang - 2021-01-11	62.51	75.77	82.40
6	RoBERTa + UnifiedQA (T5-3B) MICS ISI - Dong-Ho Lee - 2021-01-20	60.87	76.33	84.54
7	RoBERTa-Large (Fine-tuned)	54.22	69.53	78.97
8	BERT-Large (Fine-tuned)	50.19	66.23	74.72
9	RoBERTa-Large (Zero-shot)	46.11	66.08	79.42
10	BERT-Large (Zero-shot)	37.54	62.10	76.86
11	RoBERTa-base (Zero-shot)	33.39	58.83	71.91
12	BERT-base (Zero-shot)	31.98	56.01	70.67
13	GPT-2 (Zero-shot)	30.04	51.06	67.58


@inproceedings{lin2020numersense,
    title={Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models},
    author={Bill Yuchen Lin and Seyeon Lee and Rahul Khanna and Xiang Ren}, 
    booktitle={Proceedings of EMNLP},
    year={2020},
    note={to appear}
}

NumerSense

Introduction

Examples

Everday Objects (35.2%)

A bicycle has [MASK] tires.

Biology (13.5%)

Most ants have [MASK] legs.

Geometry (11.7%)

A cube has [MASK] faces.

Unit Converting (6.3%)

A week is [MASK] days.

Math (7.3%)

I will be [MASK] next year,as I am nine now.

Physics (5.7%)

Water will freeze at [MASK] degrees centigrade.

Geography (2.9%)

The world contains [MASK] continents.

Others (17.5%)

There are [MASK] princes in the United States.

Leaderboard

NumerSense-All (Core + Adversarial)

NumerSense-Core

Citation.