Meta Learning for Natural Language Processing - Task Construction in meta learning: Part 2

In part 2 of this tutorial series on meta learning for NLP, we discuss different useful techniques for task construction.

10 months ago   •   6 min read

By Adrien Payong

Sign up FREE

Build & scale AI models on low-cost cloud GPUs.

Get started Talk to an expert
Table of contents

The term "task construction" is used to describe the process of creating or generating tasks for the purpose of training and assessing meta-learning models. For a meta-learning model to be successful in learning and generalizing to new tasks, it must first be trained on a collection of tasks that adequately reflect the underlying patterns and variations present in the target domain.

Cross-domain Transfer

The capacity of a learning algorithm to generalize its knowledge from one domain to another is known as cross-domain transfer.
Meta-learning, in which meta-parameters are learnt across domains to increase the learning algorithm's generalizability, is a key component of cross-domain transfer.

In meta-learning, the purpose of cross-domain transfer is to provide the meta-learning model the ability to generalize successfully to new tasks or domains with a little amount of training data available. During the meta-training step, the model may gain more robust and transferable representations by learning from several related domains. These representations can then be applied to new domains that have features that are comparable to those of the original domains.

Setting for constructing the tasks is based on domains (Qian and Yu, 2019; Yan et al., 2020; Li et al., 2020a; Park et al., 2021; Chen et al., 2020b; Huang et al., 2020a; Dai et al., 2020; Wang et al., 2021b; Dingliwal et al.). In this setting, all the tasks no matter belonging to Ttrain or Ttest are from the same NLP problems. The support set Sn and the query set Qn are from the same domain while different tasks hold examples from different domains. The model is trained on the support set of a domain and evaluated in the query set in the same domain, which can be regarded as domain adaptation.

If there are enough tasks in Ttrain, then cross-task training should be able to identify an appropriate value φ∗ for a broad variety of domains. As a result, cross-task training should also function well on the tasks in Ttest that include the domains that were not seen during cross-task training.

This suggests that meta-learning may be a useful tool for enhancing domain adaptability. If there are few instances in each task's support set, meta-learning must identify the meta-parameters φ∗ that enable learning from a limited support set and good generalization to the query set in the same domain. Thus, meta-learning is seen as a viable strategy for achieving few-shot learning.

Few examples of the cross-domain setting in NLP

  • Cross-Domain Knowledge Distillation for Text Classification: The process of knowledge distillation involves transfering one model's data to another. In cross-domain knowledge distillation, a teacher model is used to impart its knowledge onto a student model that has been taught in a completely other domain. This allows the student model to benefit from the teacher model's knowledge and generalize across domains.
  • Cross-Domain Text-to-SQL Semantic Parsing: The process of transforming questions written in plain language into structured queries, such as SQL, is known as semantic parsing. Cross-domain text-to-SQL semantic parsing can be referred to as the process of teaching a model to generalize across several databases using queries and schema that have not been seen before. This requires the adaptability of the model to new databases as well as an understanding of the fundamental structure of the queries.
  • Multi-domain Multilingual Question: This requires the development of question-answering systems that are adaptable to diverse spheres of domains and languages. The objective is to build models that are capable of generalization across a variety of domains and languages, even if there is only a small amount of labeled data available in the target domain.

Cross-problem Training

In the realm of NLP, tackling cross-problem settings presents a significant challenge. One of the main hurdles is that different NLP problems often require different meta-parameters in their learning algorithms. Consequently, finding unified meta-parameters during meta-training that can effectively generalize to meta-testing tasks becomes a daunting task. Additionally, meta-learning algorithms, such as MAML, heavily rely on a single network architecture for all tasks. However, this poses a problem when different problems require diverse network architectures, rendering the original MAML approach unsuitable for the cross-problem setting.

To overcome this issue, researchers have developed MAML variants, such as LEOPARD (Bansal et al., 2020a) and ProtoMAML (van der Heijden et al., 2021). These variants are specifically designed for classification tasks with varying class numbers, enabling greater adaptability to diverse problem settings.

Both approaches use the data of a class to generate the class-specific head, so only the parameters of the head parameter generation model are required. The head parameter generation model is shared across all classes, so the network architecture becomes class-number agnostic.

The head parameter generation model is a neural network architecture that can be utilized for any classification task, regardless of the number of classes. This means that you can apply the same model to different classification tasks without making any changes to the architecture.

However, the paper also discusses the emergence of universal models that can handle a wide range of NLP problems. These models are designed to be more flexible and can be used for multiple tasks without requiring retraining or modification. According to the authors, the development of these universal models will bring significant advantages in the cross-problem setting of meta-learning.

Domain Generalization

The common conception of supervised learning is that the distributions of the training and testing data are the same.
The term "domain shift" describes the issue of a model's poor performance when the statistics of the training data and the testing data are drastically different. To adjust the model, domain adaptation, as described above, requires very minimal data from the target domain. However, domain generalization techniques work to address the problem of domain mismatch by developing models that perform well in unexplored testing domains.

Meta-learning can be used to achieve domain generalization by learning an algorithm that can train from one domain and evaluate to another domain. This is achieved by creating a collection of meta-training tasks, where data from diverse domains is sampled to construct support and query sets. By means of cross-task training, the algorithm aims to identify the most optimal meta-parameters φ∗ that exhibit strong performance in scenarios where training examples (support set) and testing examples (query set) originate from different domains. This approach empowers the algorithm to effectively generalize its learning across various domains.

Task Augmentation

In the field of machine learning, data augmentation is frequently used in situations where there is a scarcity of data. Likewise, in the realm of meta-learning, task augmentation is considered as a type of data augmentation. Task augmentation in meta-learning can be categorized into two main approaches. The first approach entails generating additional tasks without the need for human labeling, thereby enhancing the quantity and diversity of tasks used for meta-training. The second approach involves splitting the training data from a single dataset into homogeneous partitions, allowing meta-learning techniques to be applied and improving performance.

Inventing more tasks

  • Self-Supervised Learning:Bansal et al. (2020b) generates a large number of cloze tasks, which can be considered as multi-class classification tasks but obtained without labeling effort, to augment the meta-training tasks.
  • Unsupervised Task Distribution: Bansal et al. (2021) further explores the influence of unsupervised task distribution and creates task distributions that are inductive to better meta-training efficiency. The self-supervised generated tasks improve the performance on a wide range of different meta-testing tasks which are classification problems (Bansal et al., 2020b), and it even performs comparably with supervised meta-learning methods on FewRel 2.0 benchmark (Gao et al., 2019b) on 5-shot evaluation (Bansal et al., 2021).

Generating tasks from a monolithic corpus


Many tasks can be constructed with one monolithic corpus.

  • First, the training set of the corpus is split into support partition, Ds, and query partition, Dq. Two subsets of examples are sampled from Ds and Dq
    as the support set, S, and query set, Q, respectively.
  • In each episode, model parameters θ are updated with S, and then the losses are computed with the updated model and Q. The meta-parameters φ
    are then updated based on the losses.
  • The test set of the corpus is used to build Ttest for evaluation. As compared to constructing Ttrain from multiple relevant corpora, which are often not avail-
    able, building Ttrain with one corpus makes meta-learning methodology more applicable.
  • However, only using a single data stream makes the resulting models less generalizable to various attributes such as domains and languages.

When applied to machine learning, meta-learning is a potent technique that allows models to learn how to learn. Meta-learning models promote generalization and transfer learning by applying acquired data from one task to another. This allows the models to rapidly adapt to new tasks with limited training data. There are several methods and techniques that aim to improve the effectiveness and efficiency of meta-learning models, including task construction, cross-domain transfer, and meta-optimization.

Reference

Meta Learning for Natural Language Processing: A Survey
Deep learning has been the mainstream technique in natural language processing (NLP) area. However, the techniques require many labeled data and are less generalizable across domains. Meta-learning is an arising field in machine learning studying approaches to learn better learning algorithms. Appro…

Spread the word

Keep reading