Adapter-based Selective Knowledge Distillation for Federated Multi-domain Meeting Summarization

Paper · arXiv 2308.03275 · Published August 7, 2023
Reading SummarizingRouters

Xiachong Feng, Xiaocheng Feng, Xiyuan Du, Min-Yen Kan, Bing Qin

https://arxiv.org/abs/2308.03275

Routers Agents Multi Reading Summarizing

“To close the above gap, we take the first step to study the meeting summarization task by leveraging a federated learning framework, a widely-adopted approach for decentralized machine learning. It enables model training across multiple distributed clients [13], [14]. Figure 1 depicts the entire learning framework that aims to effectively train performant client-side summarization models by deriving global knowledge from other clients, without needing to access their private data. However, there are two critical challenges that need to be carefully addressed in order to learn high-performance summarization models under federated learning. First, current state-of-the-art meeting summarization models are based on pre-trained language models that maintain a very large number of parameters. Updating all model parameters represents an infeasible communication cost. Instead, limited scale client– server communication is more realistic. This restricts the exchange of parameter updates between the server and its clients to a budget. Second, meetings distributed across multiple clients often belong to different domains. Figure 1 illustrate this scenario, in which there exists three meeting domains: academic, committee, and product, respectively. A single, central model would not serve to support the distinct needs of the different domains. This challenging non-identically and independently distributed (non-IID) data learning setting often causes the client model to deviate from its own domain as it learns global knowledge based on non-IID data.

To mitigate the above two challenges, we propose a unified method, dubbed Adapter-based Federated Selective Knowledge Distillation (ADAFEDSELECTKD). To address the first challenge, we draw support from parameter-efficient finetuning techniques and design an adapter-based summarization model to reduce communication costs. Specifically, we introduce a few lightweight trainable adapters [15] to pre-trained language models [16], [17] while keeping the pre-trained language models frozen. We meticulously design two types of adapters — global adapter and local adapter — tailored for the federated learning framework to facilitate information exchange between the server and clients. In particular, the global adapter is responsible for providing global knowledge while local adapters are optimized towards the local meeting summarization task. To address the second challenge, we devise a federated selective knowledge distillation strategy to not only effectively derive global knowledge for the client summarization model, but also train the model to favour its own local domain performance. Specifically, the client model adopts knowledge distillation [18] as the optimization algorithm to both learn from its local data and distill global knowledge from the global adapter. Moreover, we propose an entropy-based selective strategy based on the assumption that the higher the entropy of global knowledge, the more uncertain the knowledge. This adaptively distills knowledge from the global adapter.”