python BeautifulSoup 应用

作者: Uchen | 来源:发表于2018-12-12 19:01 被阅读2次

python BeautifulSoup 应用
图书馆上座研究（技术细节）
Python 抓取花瓣图片地址
Python基础学习19
Python实战计划学习笔记（2）网页解析
爬取百度图片各种狗狗的图片，使用caffe训练模型分类
windows 下 beautifulsoup 安装
2020-05-27 学习python爬虫系列（四）：Beaut
Python爬取图虫网摄影作品
Python 爬虫

打开百度搜索时候，经常看实时热搜。这两天想到索性用脚本来抓取热搜的 Top，使用 BeautifulSoup 抓取。

安装 BeautifulSoup

pip install beautifulsoup4

使用文档：https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html

代码如下

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import random
import re
import prettytable as pt
from bs4 import BeautifulSoup

user_agents = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:49.0) Gecko/20100101 Firefox/49.0',
    'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
]


def crawlit():
    headers = {
        'Host': 'top.baidu.com',
        'Referer': 'http://top.baidu.com',
        'Uesr-Agent': random.choice(user_agents)
    }
    url = 'http://top.baidu.com/'
    html = requests.get(url, headers=headers).content
    soup = BeautifulSoup(html, 'html.parser')
    content = soup.find(id='hot-list')
    rank_list =[n.get_text() for n in content.find_all('span', class_=re.compile("num-*"))]
    keyword_list = [a.get_text() for a in content.find_all('a', class_='list-title')]
    keyword_href = [a['href'] for a in content.find_all('a', class_='list-title')]
    search_index = [format(int(i.get_text()), ',') for i in content.find_all('span', class_=re.compile('icon-*')) if i.get_text()]
    tb = pt.PrettyTable()
    tb.field_names = [u'排名', u'关键词', u'搜索指数', u'链接']
    for i, item in enumerate(rank_list):
        tb.add_row([item, keyword_list[i], search_index[i], keyword_href[i]])
    print(tb)


if __name__ == '__main__':
    crawlit()

github 地址：https://github.com/ugo5/phorcys/blob/master/Python/fetchtml/bd_hotspots.py

网友评论

本文标题：python BeautifulSoup 应用

本文链接：https://www.haomeiwen.com/subject/eciphqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

python BeautifulSoup 应用

安装 BeautifulSoup

代码如下

相关文章

python BeautifulSoup 应用

图书馆上座研究（技术细节）

Python 抓取花瓣图片地址

Python基础学习19

Python实战计划学习笔记（2）网页解析

爬取百度图片各种狗狗的图片，使用caffe训练模型分类

windows 下 beautifulsoup 安装

2020-05-27 学习python爬虫系列（四）：Beaut

Python爬取图虫网摄影作品

Python 爬虫

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读