Project Description

In LEAN-LIFE, we introduce an open-source web-based Label Efficient AnnotatioN framework for sequence labeling and classification tasks. Our framework enables annotator to provide the needed labels for a task, but also enables LearnIng From Explanations for each labeling decision with an easy-to-use UI. LEAN-LIFE differentiates itself from other annotation frameworks in these ways:

  1. Improved Model Training: Our annotation recommendation models are trained using an improved model training process that leverages annotator-provided-explanations to weakly label unlabeled instances. These weak labels are incorporated into model training allowing our models' performances to be bolstered, hence reducing future annotation costs.

  2. Multiple supported tasks: We support both sequence labeling (named entity recognition) and sequence classification (relation extraction, sentiment analysis) tasks. All tasks can incorporate our improved model training if the annotator wishes so.

  3. Explanation dataset creation: Our framework enables the building of a new type of dataset, one that consists of triples of: text, labels, and labeling explanations. Our models have shown improvements on common NLP tasks using this type of datasets and we hope the community will build upon our work and utilize these triples. We support two forms of explanations:
    1. Natural Language: guided written explanations of the labeling decision
    2. Triggers: groups of words in a sentence that aided the labeling decision
see caption
Natural Language Explanations
Our framework guides users to write parsable explanations that are used for weak labeling. In the Relation Extraction example above, the explanation of "the phrase 'caused by' occurs between SUBJ and OBJ" aids in weakly labeling the relationship between burst and pressure as "cause-effect" in the unlabeled sentence. Similarly in the Sentiment Analysis example, because the word fair appears just before the word price we can weakly label the sentence "Delicious food with a fair price" as positive.

see caption
Trigger Explanations
Our framework also allows annotators to select groups of words that aided in their labeling decision, these groupings are called triggers. For example, in the Named Entity Recognition scenario above, because the restaurant entity Rumble Fish is surrounded by the phrases "had lunch at" and "where the food" we can soft match against unlabeled sentences to extract other mentions of restaurants, like McDonalds.

Overview

LEAN-LIFE consists of two main components :

  • Capturing Label and Explanation: Support a user-friendly web-UI that can capture labels and explanations for labeling decisions.
  • Weak Supervision Framework: Support weak supervision framework that parses explanations for the creation of weakly labeled data.
  • The framework uses weakly labeled data in conjunction with user-provided labels to train models for improved annotation recommendations. Our UI shows annotators unlabeled instances (can be sampled using active learning), along with annotation recommendations in an effort to reduce annotation costs.

    see caption
    Overview of LEAN-LIFE.

    UI

    As mentioned our UI supports multiple tasks, but it also supports multiple configurations of the annotation process. The project creator can decide on various forms of active sampling, backend models, recommendation heuristics, as well as the type of explanation the system should capture from the user. The animation to the right shows how our UI captures both forms of explanations for the Named Entity Recognition task.

    1. Users are presented with sentences to annotate
    2. Annotation Recommendations are also presented to users
    3. The user starts the annotation process
    4. The user may either provide a guided natural language explanation, trigger explanations or nothing
    5. The user is shown their now saved annotation

    Animation

    see caption
    Capturing Natural Language or Trigger explanations for the NER task

    Incorporating Explanations

    Natural LanguagePaper

    see caption
    Leveraging natural language form of explanation.

    TriggerPaper

    see caption
    Leveraging trigger form of explanation.

    Experiments

    see caption
    (left) Relation Extraction, (right) Named Entity Recognition

    When starting with little to no labeled data, it is more effective to ask annotators to provide a label and an explanation for the label, than to just request a label. To support this claim, we conduct experiments to demonstrate the label efficiency of our explanation-leveraging-model. We found that the time for labeling one instance plus providing an explanation takes 2X times more time than just simply providing a label. Given this annotation time observation, we compare the performance between our improved training process and the traditional label-only training process by holding annotation time constant between the two trials. We see that our model not only is more time and label efficient than the traditional label-only training process, but it also outright outperforms the label-only training process.


    To cite us

    @inproceedings{
        LEANLIFE2020,
        title={LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation},
        author={Lee, Dong-Ho and Khanna, Rahul and Lin, Bill Yuchen and Chen, Jamin and Lee, Seyeon and Ye, Qinyuan and Boschee, Elizabeth and Neves, Leonardo and Ren, Xiang},
        booktitle={Proc. of ACL (Demo)},
        year={2020},
        url={}
    }