More details on matching
Continuing the example in the homepage, we learn X should be a noun phrase and Y should be a verb by observing X is “funeral”. Y is “held”.
We also know that the answer should be a date.
The rules in the explanation are parsed into the following logical forms:
(1)
@In(@And(@Is(X, @LessThan(@Left("when was"), 4)), @Is(Y, @Direct(@Right(X)))), Question)
(2)
@Is("on", @Direct(@Left(Answer)))
(3)
@Is(Y, @LessThan(@Left(Answer), 4))
(4)
@Is(X, @LessThan(@Left(Y), 3))
(5)
@And(@StartsWith(Question, "when"), @Is(Answer, @NER(DATE))
Case 1. Given a new question “When was independence declared?”, we first look for noun phrase and verb. In this simple case, there is only one noun phrase (independence), and one verb (declared). Then we look at the context “Independence was declared on 24 September 1973.” There is only one date (24 September 1973). Now we have the variable assignment (X=independence, Y=declared, ANS=24 September 1973)
. We derive into the 5 logical forms, and fortunately, every logical form outputs True
, so the answer “24 September 1973” is correct and will be used as an pseudo-label.
Case 2. Given a new question “When was independence declared in Republic of Guinea-Bissau?” and the same context, here we have two noun phrases in the question, “independence” and “Republic of Guinea-Bissau”. Though it is obvious to human “independence” is the correct choice, the algorithm has to enumerate the two choices. That is, we will evalutate both (X=independence, Y=declared, ANS=24 September 1973)
and (X=Republic of Guinea-Bissau, Y=declared, ANS=24 September 1973)
. The latter will fail on rule (1), because Y is not directly after X. In this case, we can still get the correct answer.
Case 3. Given a new question “When was independence announced?” and the same context, we will first attempt to assign (X=independence, Y=announced, ANS=24 September 1973)
, and try to look for synonyms for Y=announced
in the context with a FIND module. Our FIND module makes use of pre-trained model BERT-base, and will return the similar phrase declared
in the context. In this case, the matched answer 24 September 1973
will have a lower confidence score, since the constraints are slightly broken. We regard this as softened-match.
We have 4 modules respondisble for handling softened-match: Fill, Find, Compare, and Logic. When the variable has multiple potential assignments or the modules raises multiple candidates, we use beam search to keep the most promising combinations according to the confidence scores.