美文网首页
scrapy爬虫--小练习

scrapy爬虫--小练习

作者: 松爱家的小秦 | 来源:发表于2017-12-04 20:59 被阅读0次

scrapy startproject example

tree

├── example

│   ├── __init__.py

│   ├── __init__.pyc

│   ├── items.py

│   ├── middlewares.py

│   ├── pipelines.py

│   ├── settings.py

│   ├── settings.pyc

│   └── spiders

│      ├── book_spider.py

│      ├── book_spider.pyc

│      ├── __init__.py

│      └── __init__.pyc

cd example

cd spider

vim book_spider.py

#-*- coding: utf-8 -*-

import scrapy

class BooksSpider(scrapy.Spider):

#每个爬虫都有相应的标识符

name = "book"

#定义开始爬取的起始点 可以有多个

start_urls = ['http://books.toscrape.com/']

def parse(self, response):

for book in response.css('article.product_pod'):

name = book.xpath('./h3/a/@title').extract_first()

price = book.css('p.price_color::text').extract_first()

yield {

'name':name,

'price':price

}

next_url = response.css('ul.pager li.next a::attr(href)').extract_first()

if next_url:

next_url = response.urljoin(next_url)

yield scrapy.Request(next_url,callback=self.parse)

这个http://books.toscrape.com/可以用来练习爬虫

scrapy crawl book -o book.csv

相关文章

网友评论

      本文标题:scrapy爬虫--小练习

      本文链接:https://www.haomeiwen.com/subject/hidfixtx.html