美文网首页
因果NLP工具箱 - CF-VQA(一)

因果NLP工具箱 - CF-VQA(一)

作者: processor4d | 来源:发表于2022-02-16 20:24 被阅读0次

文章名称

【CVPR-2021】【Nanyang Technological University, Singapore/Gaoling School of Artificial Intelligence/Damo Academy, Alibaba Group】Counterfactual VQA: A Cause-Effect Look at Language Bias

核心要点

文章旨在解决现有VQA方法容易受到语言偏差的影响,无法有效学习多模态知识信息的问题。虽然一些方法能够直接在预测环节排除了语言偏置的影响,但这种做法可能忽略上下文信息。作者利用因果推断方法,从整体效果中减去语言偏差的因果效应,以此消除语言偏差。

研究背景

VQA方法智能对话场景至关重要,最近的一些研究[20, 3, 8, 27]发现VQA模型可能依赖于虚假的语言相关性(一些偏差),而并没有真正的学习到多模态推理知识。一些方法采用反事实数据增强[12, 1, 58, 19, 31]的方法解决这一问题。但是这些方法不能够实现从有偏的模型中进行无偏的预测。另一些方法[11, 14]直接用单独的语言数据(问题数据)学习语言偏差,在预测时消除这部分偏差,但是这类方法无法有效的利用上下文信息。

从因果推断的角度,语言信息可能同时影响模型学习到的直接多模态只知识,也同时直接影响答案的分布。但实际上,问题对答案的影响应该仅仅通过知识反映,也就是说答案仅仅是因为回答问题所需的知识而被选择,而不是因为其他因素。例如,问“图上香蕉是什么颜色?“,模型见到的图片部分都是”黄色的“,所以模型盲猜”黄色“而不管图片是什么颜色的。

方法细节

基于上述问题,作者提出了基于因果推断的CF-VQA,其核心思路是从VQA的整体因果效应TE中去掉Question偏差的NDE,得到最终的TIE,并最大化TIE选择答案A

Causal Effect

在介绍具体做法之前,首先回顾一下causal inference的基础知识。约定随机变量用大写字母表示,其具体观测值用小写字母表示。在如下包含3个随机变量的因果图中,随机变量{Y}^{}_{}在观测到x, m两个观测值(factual)时的观测值为{Y}^{}_{x,m} = {Y}({X}^{}_{} = x, {M}^{}_{} = {M}({X}^{}_{} = x))

causal graphs

在现实世界里,我们是不可能让随机变量Y, M的依赖(输入)X取不同的值的,因为这是内在机制或者这是具体的观测结果(已经不可改变了)。但是,在反事实世界里,我们可以得到反事实结果{Y}^{}_{x,{M}^{}_{{x}^{*}_{}}} = {Y}({X}^{}_{} = x, {M}^{}_{} = {M}({X}^{}_{} = {x}^{*}_{}))。这个反事实表示了切断X \rightarrow M的影响。 同理,上述子图b的右侧表示了切断X \rightarrow Y的影响。

因果效应反映的是一个个体接受不同treatment时的效果差异,即TE = {Y}^{}_{x,{M}^{}_{{x}^{}_{}}} - {Y}^{}_{x^*,{M}^{}_{{x}^{*}_{}}}(这里TE指的是total effect,也是treatment effect)。

在这种存在mediator的情况下(把M当treatment,就是fork结构),我们可以把TE分解为两部分,即TE = TIE + NDE, NDE = {Y}^{}_{x,{M}^{}_{{x^*}^{}_{}}} - {Y}^{}_{x^*,{M}^{}_{{x}^{*}_{}}}, TIE = {Y}^{}_{x,{M}^{}_{{x}^{}_{}}} - {Y}^{}_{x,{M}^{}_{{x}^{*}_{}}}。其中NDE表示由treatment(或者说我们关心的原因要素)引起的纯粹的因果效应。而TIE是全部因果效应下,treatment经过Mediator间接获得的因果效应(注意,TIE是在X已经是目标treatment时的间接效果)。**

当然,还可以把TE分解为TE = NIE + TDE, NIE = {Y}^{}_{x^*,{M}^{}_{{x}^{}_{}}} - {Y}^{}_{x^*,{M}^{}_{{x}^{*}_{}}}, TDE = {Y}^{}_{x,{M}^{}_{{x}^{}_{}}} - {Y}^{}_{x^*,{M}^{}_{{x}^{}_{}}}

本节介绍了VQA中存在的语言偏差,以及作者从causal inference角度解决偏差的思路,并简单回顾了TE,NDE和TIE等因果效应。下节继续介绍CF-VQA的实现方法。

心得体会

debias basing on causal inference

文章的两点是利用NDE和TIE方法来消除语言偏差的影响,利用这种思路进行偏差消除的方法在推荐场景也有应用,感兴趣的同学可以参考因果推断推荐系统工具箱 - CauSeR(一)

文章引用

[1] Ehsan Abbasnejad, Damien Teney, Amin Parvaneh, Javen Shi, and Anton van den Hengel. Counterfactual vision and language learning. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 10044–10054, 2020. 1, 3, 6

[3] Aishwarya Agrawal, Dhruv Batra, Devi Parikh, and Anirud- dha Kembhavi. Don’t just assume; look and answer: Over- coming priors for visual question answering. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4971–4980, 2018. 1, 2, 6, 7, 11

[8] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision, pages 2425– 2433, 2015. 1, 2

[11] Remi Cadene, Corentin Dancette, Matthieu Cord, Devi Parikh, et al. Rubi: Reducing unimodal biases for visual question answering. Advances in Neural Information Pro- cessing Systems, 32:841–852, 2019. 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12

[12] Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, and Yueting Zhuang. Counterfactual samples synthesiz- ing for robust visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10800–10809, 2020. 1, 3, 6, 7

[14] Christopher Clark, Mark Yatskar, and Luke Zettlemoyer. Don’t take the easy way out: Ensemble based methods for avoiding known dataset biases. In Proceedings of the 2019

[19] Tejas Gokhale, Pratyay Banerjee, Chitta Baral, and Yezhou Yang. Mutant: A training paradigm for out-of-distribution generalization in visual question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 878–892, 2020. 1, 6, 7

[20] Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Ba- tra, and Devi Parikh. Making the v in vqa matter: Elevating the role of image understanding in visual question answer- ing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6904–6913, 2017. 1, 2, 11

[21] Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, and Stefan Lee. Counterfactual visual explanations. In In- ternational Conference on Machine Learning, pages 2376– 2384. PMLR, 2019. 3

[27] Kushal Kafle and Christopher Kanan. An analysis of visual
question answering algorithms. In Proceedings of the IEEE International Conference on Computer Vision, pages 1965– 1973, 2017. 1, 2

[31] Zujie Liang, Weitao Jiang, Haifeng Hu, and Jiaying Zhu. Learning to contrast the counterfactual samples for robust vi- sual question answering. In Proceedings of the 2020 Confer- ence on Empirical Methods in Natural Language Processing (EMNLP), pages 3285–3292, 2020. 1, 6, 7

[58] Xi Zhu, Zhendong Mao, Chunxiao Liu, Peng Zhang, Bin Wang, and Yongdong Zhang. Overcoming language priors with self-supervised learning for visual question answering. arXiv preprint arXiv:2012.11528, 2020. 1, 3, 6, 7

相关文章

网友评论

      本文标题:因果NLP工具箱 - CF-VQA(一)

      本文链接:https://www.haomeiwen.com/subject/vcmdlrtx.html