We introduce an open-source web-based Label Efficient AnnotatioN framework for sequence labeling and classification tasks.
Our framework enables annotator to provide labels for a task, but also enables LearnIng From Explanations for labeling decision with an easy-to-use UI.

LEAN-LIFE differentiates itself from other frameworks in these ways:

  1. Improved Model Training: Leveraging annotator-provided-explanations to weakly label unlabeled instances, our framework is able to train models with less data-points and improve model performance; hence reducing future annotations costs via better recommendations.

  2. Multiple supported tasks: We support both sequence labeling (named entity recognition) and sequence classification (relation extraction, sentiment analysis) tasks. All tasks can incorporate our improved model training if the annotator wishes so.

  3. Explanation dataset creation: We enable the building of a new type of dataset, one that consists of triples of: text, labels, and labeling explanations. We have shown improvements on common NLP tasks using these triples and hope the community will build upon our work by utilizing these triples.
    We support two forms of explanation capture:

    1. Natural Language: guided written explanations that aided the labeling decision.
    2. Triggers: groups of words in a sentence that aided the labeling decision..

Overview

LEAN-LIFE consists of two main components :

  • Capturing Label and Explanation: Support a user-friendly web-UI that can capture labels and explanations for labeling decisions.

  • Weak Supervision Framework: Support weak supervision framework that parses explanations for the creation of weakly labeled data.
  • The framework uses weakly labeled data in conjunction with user-provided labels to train models for improved annotation recommendations.
    Our UI shows annotators unlabeled instances (can be sampled using active learning), along with annotation recommendations in an effort to reduce annotation costs.

    see caption

    Natural Language Explanations

    see caption
    see caption
    Our framework guides users to write parsable explanations that are used for weak labeling. In the Relation Extraction example above, the explanation of "the phrase 'caused by' occurs between SUBJ and OBJ" aids in weakly labeling the relationship between burst and pressure as "cause-effect" in the unlabeled sentence.
    Similarly in the Sentiment Analysis example, because the word fair appears just before the word price we can weakly label the sentence "Delicious food with a fair price" as positive.


    Trigger Explanations

    see caption
    see caption
    Our framework also allows annotators to select groups of words that aided in their labeling decision, these groupings are called triggers. For example, in the Named Entity Recognition scenario above, because the restaurant entity Rumble Fish is surrounded by the phrases "had lunch at" and "where the food" we can soft match against unlabeled sentences to extract other mentions of restaurants, like McDonalds.


    Incorporating Explanations

    Natural LanguageProjectPaper

    see caption
    Leveraging natural language form of explanation.

    TriggerProjectPaper

    see caption
    Leveraging trigger form of explanation.

    Experiments

    see caption
    (left) Relation Extraction, (right) Named Entity Recognition

    "When starting with little to no labeled data, it is more effective to ask annotators to provide a label and an explanation for the label, than to just request a label."

    We found that the time for labeling one instance plus providing an explanation takes 2X times more time than just simply providing a label. Given this annotation time observation, we compare the performance between our improved training process and the traditional label-only training process by holding annotation time constant between the two trials.

    We see that our model not only is more time and label efficient than the traditional label-only training process, but it also outright outperforms the label-only training process.


          To cite us

           @inproceedings{lee-etal-2020-lean,
                    title = "{LEAN}-{LIFE}: A Label-Efficient Annotation Framework Towards Learning from Explanation",
                    author = "Lee, Dong-Ho  and
                      Khanna, Rahul  and
                      Lin, Bill Yuchen  and
                      Lee, Seyeon  and
                      Ye, Qinyuan  and
                      Boschee, Elizabeth  and
                      Neves, Leonardo  and
                      Ren, Xiang",
                    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
                    month = jul,
                    year = "2020",
                    address = "Online",
                    publisher = "Association for Computational Linguistics",
                    url = "https://www.aclweb.org/anthology/2020.acl-demos.42",
                    pages = "372--379"}