论文-Global-Locally Self-Attentive

作者: 魏鹏飞 | 来源:发表于2019-11-07 16:28 被阅读0次

论文-Global-Locally Self-Attentive
A Structured Self-attentive Sent
论文-A Self-Attentive Model with G
2017 · ICLR · A STRUCTURED SELF-
CIKM'21 DESTINE：基于解耦自注意网络的CTR模型
2019-04-28
论文笔记 | ICDM2018 | Self-Attentive
《A Structured Self-Attentive Sen
2018-05-08
6.29

1. 简称

论文《Global-Locally Self-Attentive Dialogue State Tracker》简称GLAD，作者Victor Zhong（Salesforce Research），经典的对话状态追踪论文。

2. 摘要

对话状态追踪是面向任务对话系统的重要组成部分，它在给定对话上下文情况下估计用户目标和请求。

本文中，我们提出了一种全局-局部自注意对话状态追踪(GLAD)，它通过全局-局部模块学习用户话语和以前的系统动作的表示。

我们的模型使用全局模块来共享不同类型(称为槽)的对话状态的估计器之间的参数，并使用局部模块来学习槽特定的特征。

我们证明了，这改进了对稀有状态的追踪，并在WOZ和DSTC2状态跟踪任务上实现了最先进的性能。

GLAD在WOZ上获得了88.1%的联合目标准确率和97.1%的请求准确率，比之前的工作分别高出3.7%和5.5%。
GLAD在DSTC2上，我们的模型获得了74.5%的联合目标准确率和97.5%的请求准确率，比之前的工作分别提高了1.1%和1.0%。

3. 核心

3.1 数据集样例介绍

带有注释的回合状态的示例对话，用户在其中预订一家餐厅

对话状态跟踪(DST)是对话系统的重要组成部分。

在DST中，对话状态跟踪器。使用当前用户话语和会话历史来估计会话的状态。然后，对话系统使用这个已建立的状态来计划下一个动作并响应用户。

DST中的状态通常由一组请求和联合目标组成。

以餐厅预订任务为例：

在每一轮中，用户向系统通知他们想要实现的特定目标(e.g. inform(food=french))，或从系统请求更多信息(e.g. request(address))。在一个回合期间给定的目标和请求槽值对(e.g. (food，french)，(request，address))的集合被称为回合目标和回合请求。
联合目标是截至当前回合的累积回合目标集。

3.2 Global-Locally Self-Attentive Encoder

Global-locally self-attentive encoder.

考虑相对于特定插槽 $s$ 对序列进行编码的过程。设 $n$ 表示序列中的字数， $d_{emb}$ 表示嵌入的维数， $X∈R^{n\times d_{emb}}$ 表示与序列中的字相对应的字嵌入。

我们使用全局双向 $LSTM$ 产生 $X$ 的全局编码 $H^g$ 。
$H^g=biLSTM^g(X)\in R^{n \times d_{rnn}}\tag{3.2.1}$

其中 $d_{rnn}$ 是 $LSTM$ 状态的维度。

考虑到时隙 $s$ ，我们使用局部双向 $LSTM$ 类似地产生 $X$ 的局部编码 $H^s$ 。
$H^s=biLSTM^s(X)\in R^{n \times d_{rnn}}\tag{3.2.2}$

两个 $LSTM$ 的输出通过混合函数组合以产生 $X$ 的全局-局部编码 $H$ 。
$H=\beta^sH^s+(1-\beta^s)H^g\in R^{n \times d_{rnn}}\tag{3.2.3}$

这里，标量 $β^s$ 是特定于槽 $s$ 的0和1之间的学习参数。接下来，我们计算 $H$ 上的全局-局部自我注意上下文 $c$ 。Self-attention或intra-attention是计算自然语言处理任务的可变长度序列上的摘要上下文的有效方法。

在我们的例子中，我们使用全局self-attention模块来计算对通用状态跟踪有用的注意力上下文，以及使用局部self-attention模块来计算特定于槽的注意力上下文。

对于每个第 $i$ 个元素 $H_i$ ，我们计算标量self-attention得分 $a^g_i$ ，该分数使用 $Softmax$ 函数在所有元素上进行次序归一化。
$a_i^g=W^gH_i+b^g\in R \\ p^g=softmax(a^g)\in R^n \tag{3.2.4}$

然后，全局self-attention上下文 $c^g$ 是每个元素 $H_i$ 的总和，由对应的归一化全局self-attention得分 $p^g_i$ 加权。
$c^g=\sum_ip_i^gH_i\in R^{d_{rnn}}\tag{3.2.5}$

我们同样计算局部self-attention上下文 $c^s$ 。
$a_i^s=W^sH_i+b^s\in R \\ p^s=softmax(a^s)\in R^n \\ c^s=\sum_ip_i^sH_i\in R^{d_{rnn}} \tag{3.2.6}$

全局-局部自我注意上下文c是混合的:
$c=\beta^sc^s+(1-\beta^s)c^g\in R^{d_{rnn}}\tag{3.2.7}$

为了便于阐述，我们定义了多值编码器函数 $encode(X)$ 。
$encode: X \Rightarrow H, c\tag{3.2.8}$

该函数将序列 $X$ 映射到编码 $H$ 和自注意(self-attention)上下文 $c$ 。

3.3 The Global-Locally Self-Attentive Dialogue State Tracker`（Encoder module + Scoring module）`

The Global-Locally Self-Attentive Dialogue State Tracker.

Encoding module：
定义了全局-局部自注意(self-attention)编码器之后，我们现在为用户话语(user utterence)、先前的系统操作(system actions)和考虑中的时隙-值(slot-value)对构建表示。

设 $U$ 表示用户话语的单词嵌入， $A_j$ 表示第 $j$ 个先前系统动作的单词嵌入(e.g. request(price range))，并且 $V$ 表示考虑中的那些时隙-值对(e.g. food=french)。
我们有：
$H^{utt}，c^{utt}=encode(U) \\ H_j^{act}，c_j^{act}=encode(A_j) \\ H^{val}，c^{val}=encode(V) \tag{3.3.1}$

Scoring module：
直观地，我们可以通过检查两个输入源来确定用户是否表达了所考虑的槽值对。

第一个来源是用户话语，其中用户直接陈述目标和请求。这方面的一个例子是，在系统询问“how may I help you?”（“我有什么能帮你的吗？”）之后，用户说“how about a French restaurant in the centre of town?”（“市中心的一家法国餐馆怎么样？”）为了处理这些情况，我们确定话语是否指定了槽值对。即，我们关注用户话语 $H^{utt}$ ，考虑被认为是 $c^{val}$ 的时隙-值对，并使用所得到的注意力上下文 $q^{utt}$ 来对时隙-值对进行评分。
$a_i^{utt}=(H_i^{utt})^Tc^{val}\in R \\ P^{utt}=softmax(a^{utt})\in R^m \\ q^{utt}=\sum_ip_i^{utt}H_i^{utt}\in R^{d_{rnn}} \\ y^{utt}=Wq^{utt}+b\in R \tag{3.3.2}$

其中 $m$ 是用户话语中的字数。分数 $y^{utt}$ 指示用户话语表达该值的程度。

第二个来源是以前的系统操作。当用户话语没有提供足够的信息，而是引用以前的系统操作时，此来源具有信息性。这方面的一个例子是，在系统询问“would you like a restaurant in the
centre of town?”（“你想在市中心开一家餐馆吗？”）之后，用户说“yes”（“是”）。
1. 为了处理这些情况，我们在考虑用户话语之后检查以前的操作。首先，我们参考前面的动作表示 $C^{act}=[C_1^{act}···C_l^{act}]$ ，考虑到当前用户话语 $c^{act}$ 。
2. 这里， $l$ 是以前系统操作的数量。
  然后，我们利用注意上下文 $q^{act}$ 和时隙-值对 $c^{val}$ 之间的相似性来对时隙-值对进行评分。
  $a_j^{act}=(C_j^{act})^Tc^{utt}\in R \\ P^{act}=softmax(a^{act})\in R^{l+1} \\ q^{act}=\sum_jp_j^{act}C_j^{act}\in R^{d_{rnn}} \\ y^{act}=(q^{act})^Tc^{val}\in R \tag{3.3.3}$

除了真实的动作之外，我们为每个回合引入了一个哨兵动作，这允许注意力机制忽略先前的系统动作。分数 $y^{act}$ 指示先前操作表达该值的程度。然后，最终分数 $y$ 是两个分数 $y^{utt}$ 和 $y^{act}$ 之间的加权和，通过 $Sigmoid$ 函数 $\sigma$ 进行归一化。
$y=\sigma(y^{utt}+wy^{act})\in R\tag{3.3.4}$

此处，权重 $w$ 是一个可学习参数。

4. 实验

评估结果1

评估结果2

5. 重点论文

Tsung-Hsien Wen, David Vandyke, Nikola Mrksˇic ́, Milica Gasˇic ́, Lina M. Rojas Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young. 2017. A network- based end-to-end trainable task-oriented dialogue system. In EACL.
Nikola Mrksˇic ́, Diarmuid O Se ́aghdha, Tsung-Hsien Wen, Blaise Thomson, and Steve Young. 2017. Neural belief tracker: Data-driven dialogue state tracking. In ACL.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- gio. 2015. Neural machine translation by jointly learning to align and translate. In ICLR.
Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention- based neural machine translation. In ACL.
Romain Paulus, Caiming Xiong, and Richard Socher. 2018. A deep reinforced model for abstractive sum- marization. In ICLR.
Luheng He, Kenton Lee, Mike Lewis, and Luke Zettle- moyer. 2017. Deep semantic role labeling: What works and whats next. In ACL.
Kenton Lee, Luheng He, Mike Lewis, and Luke S. Zettlemoyer. 2017. End-to-end neural coreference resolution. In EMNLP.
Caiming Xiong, Victor Zhong, and Richard Socher. 2018. DCN+: Mixed objective and deep residual coattention for question answering. In ICLR.
Julien Perez and Fei Liu. 2017. Dialog state tracking, a machine reading approach using memory network. In EACL.
Nikola Mrksˇic ́, Diarmuid O Se ́aghdha, Blaise Thom- son, Milica Gasˇic ́, Pei-Hao Su, David Vandyke, Tsung-Hsien Wen, and Steve Young. 2015. Multi- domain dialog state tracking using recurrent neural networks. In ACL.
Lukas Zilka and Filip Jurcicek. 2015. Incremental LSTM-based dialog state tracker. In Automatic Speech Recognition and Understanding Workshop (ASRU).
Steve Young, Milica Gasˇic ́, Blaise Thomson, and Ja- son D Williams. 2013. POMDP-based statistical spoken dialog systems: A review. Proceedings of the IEEE 101(5).

6. 代码编写

本文源码地址：https://github.com/salesforce/glad

# 后续追加代码分析

参考文献

Zhong, V., Xiong, C., & Socher, R. (2018). Global-Locally Self-Attentive Dialogue State Tracker. CoRR.

论文-Global-Locally Self-Attentive
1. 简称论文《Global-Locally Self-Attentive Dialogue State Tra...
A Structured Self-attentive Sent
论文原文：A STRUCTURED SELF-ATTENTIVE SENTENCE EMBEDDING 文章来源：...
论文-A Self-Attentive Model with G
1.简称论文《A Self-Attentive Model with Gate Mechanism for Sp...
2017 · ICLR · A STRUCTURED SELF-
2017 · ICLR · A STRUCTURED SELF-ATTENTIVE SENTENCE EMBEDD...
CIKM'21 DESTINE：基于解耦自注意网络的CTR模型
Title：Disentangled Self-Attentive Neural Networks for Cli...
2019-04-28
Self-Attentive Sequential Recommendation ICDM 2018 用类似Tra...
论文笔记 | ICDM2018 | Self-Attentive
论文地址：https://arxiv.org/abs/1808.09781 官方代码：https://github...
《A Structured Self-Attentive Sen
论文来源：ICLR 2017 本文利用self-attention的方式去学习句子的embedding，表示为二维...
2018-05-08
论文过了！论文过了！论文过了！论文过了！论文过了！论文过了！论文过了！论文过了！论文过了！论文过了！论文过了！论文...
6.29
明天继续早起，然后看论文写论文看论文看论文明天继续早起，然后看论文写论文看论文看论文明天继续早起，然后看论文写论文...