分词 jieba - python笔记

作者: 自走炮 | 来源:发表于2020-08-14 02:01 被阅读0次

金伟的python学习笔记--分词与词云
结巴中文分词的用法
python 结巴分词
Python 结巴分词
python jieba分词库使用
jieba分词
Python_ jieba、snownlp中文分词、Pinyin
Python分词模块jieba (01)-jieba安装，分词，
分词 jieba - python笔记
Python下的中文分词

默认
词性过滤
自定义词典

import jieba

# 一般过滤
def chinese_cut1(text):
    return ' '.join(jieba.cut(text, cut_all = False)) # 精确模式

datacutted = data.apply(chinese_cut1)

词性过滤

import jieba.posseg

# 词性过滤
def chinese_cut2(text):
    result = jieba.posseg.cut(text)
    return ' '.join(x.word for x in result if x.flag == 'a' or x.flag == 'n' or x.flag == 'v')

datacutted = data.apply(chinese_cut2)

自定义词典

词典：UTF-8 编码，一词一条，空格间隔，每条 3 个特征，word 为词(必须)，freq 为词频，word_type 为词性

jieba.load_userdict('dict.txt') # 自定义词典

# 动态修改词典
jieba.add_word('newword', freq = 10, tag = 'nz') # 添加自定义词
jieba.del_word('word') # 删除自定义词

jieba.suggest_freq(line.strip(), True) for line in open('dict.txt', 'r', encoding = 'utf8') # 批量修改词频

网友评论

本文标题：分词 jieba - python笔记

本文链接：https://www.haomeiwen.com/subject/qlsgrktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

分词 jieba - python笔记

词性过滤

自定义词典

相关文章

金伟的python学习笔记--分词与词云

结巴中文分词的用法

python 结巴分词

Python 结巴分词

python jieba分词库使用

jieba分词

Python_ jieba、snownlp中文分词、Pinyin

Python分词模块jieba (01)-jieba安装，分词，

分词 jieba - python笔记

Python下的中文分词

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读