美文网首页
[NLP] BERT模型压缩

[NLP] BERT模型压缩

作者: nlpming | 来源:发表于2021-11-22 15:59 被阅读0次
  • BERT模型压缩大致分为以下几类:(参考:http://mitchgordon.me/machine/learning/2019/11/18/all-the-ways-to-compress-BERT.html
    (1)剪枝(Pruning)
    (2)权重因式分解(Weight Factorization ),该方法基本思想是将原始的大矩阵分解为两个或多个低秩矩阵的乘积。就模型压缩技术而言主要用于全连接层和卷积层。
    (3)知识蒸馏(Knowledge Distillation ),基本思想是将知识从大型的,经过预训练的教师模型转移到通常较小的学生模型中,常见的学生模型根据教师模型的输出以及分类标签进行训练。比如DistillBERT、TinyBERT、MobileBERT等;
    (4)权重共享(Weight Sharing);比如ALBERT等;
    (5)量化(Quantization ),量化技术通过减少用于表示每个权重值的精度来压缩模型。例如模型使用float32标准定义参数的精度进行训练,然后我们可以使用量化技术选择float16,甚至int8表示参数的精度用于压缩模型。比如QBERT等;

参考论文

-【预训练模型综述】Pre-trained Models for Natural Language Processing: A Survey
https://arxiv.org/pdf/2003.08271.pdf(邱锡鹏老师 - 视频讲解:https://www.bilibili.com/video/BV16K4y1475Z/
-【BERT模型压缩综述】Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
https://arxiv.org/abs/2002.11985
-【ALBERT】ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
https://arxiv.org/abs/1909.11942
-【DistillBERT】DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
https://arxiv.org/abs/1910.01108
-【TinyBERT】TinyBERT: Distilling BERT for Natural Language Understanding
https://arxiv.org/abs/1909.10351
-【MobileBERT】MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
https://arxiv.org/abs/2004.02984
-【QBERT】Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
https://arxiv.org/abs/1909.05840

参考资料

相关文章

网友评论

      本文标题:[NLP] BERT模型压缩

      本文链接:https://www.haomeiwen.com/subject/ihwstrtx.html