美文网首页
request库的使用

request库的使用

作者: 小董不太懂 | 来源:发表于2019-09-14 13:32 被阅读0次

文章参照:https://www.icourse163.org/(北理工嵩天教授公开课)

京东商品页面的爬取

随便选了一件商品,url:https://item.jd.com/25258810211.html

色亦有道
爬取代码:
import requests#导入requests库

def getHTMLText(url):
    try:
        kv = {'user-agent':'Mozilla/5.0'}#添加头文件,模拟浏览器访问
        r = requests.get(url, headers=kv, timeout=30)
        r.raise_for_status()#如果状态不是200,将返回httperror错误
        r.encoding = r.apparent_encoding
        return r.text[:1000]
    except:
        return '产生异常'

if __name__ == '__main__':#横线是两条哦
    url = 'https://item.jd.com/25258810211.html'
    page = getHTMLText(url)
    print(page)

返回数据:

D:\bilibili大学\venv\Scripts\python.exe D:/bilibili大学/简书代码/requests_1.py
<!DOCTYPE HTML>
<html lang="zh-CN">
<head>
    <!-- shouji -->
    <meta http-equiv="Content-Type" content="text/html; charset=gbk" />
    <title>愉悦加温震动棒女用自慰器柔软全自动伸缩摇摆成人情趣用品AV棒高潮振动棒按摩棒 【玫红色】自动充电双震智能加温+礼包【图片 价格 品牌 报价】-京东</title>
    <meta name="keywords" content="愉悦加温震动棒女用自慰器柔软全自动伸缩摇摆成人情趣用品AV棒高潮振动棒按摩棒 【玫红色】自动充电双震智能加温+礼包,愉悦,,京东,网上购物"/>
    <meta name="description" content="愉悦加温震动棒女用自慰器柔软全自动伸缩摇摆成人情趣用品AV棒高潮振动棒按摩棒 【玫红色】自动充电双震智能加温+礼包图片、价格、品牌样样齐全!【京东正品行货,全国配送,心动不如行动,立即购买享受更多优惠哦!】" />
    <meta name="format-detection" content="telephone=no">
    <meta http-equiv="mobile-agent" content="format=xhtml; url=//item.m.jd.com/product/25258810211.html">
    <meta http-equiv="mobile-agent" content="format=html5; url=//item.m.jd.com/product/25258810211.html">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
    <link rel="canonical" href="//item.jd.com/25258810211.html"/>
        <link rel="dns-prefetch" href="//misc.360buyimg.com"/>
    <link rel="dns-prefetch" href="//static.360buyimg.com"/>


Process finished with exit code 0

趁热打铁,我们再爬个淘宝的情趣内衣吧!做事情要全套:

情趣内衣地址:https://detail.tmall.com/item.htm?spm=a220m.1000858.1000725.17.5271677d46wU8h&id=590071675099&skuId=4037392201738&areaId=320100&user_id=3376094485&cat_id=2&is_b=1&rn=cf5732f37e1c2ba232b31317c079ed5a

色亦有道
代码如下:
import requests

def getHTMLText(url):
    try:
        kv = {'user-agent':'Mozilla'}
        r = requests.get(url, headers=kv, timeout=30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text[1000:3000]
    except:
        return '产生错误'

if __name__ == '__main__':
    url = 'https://detail.tmall.com/item.htm?spm=a220m.1000858.1000725.17.5271677d46wU8h&id=590071675099&skuId=4037392201738&areaId=320100&user_id=3376094485&cat_id=2&is_b=1&rn=cf5732f37e1c2ba232b31317c079ed5a'
    page = getHTMLText(url)
    print(page)

返回结果:

while(r=reg.exec(params)){arr.push(r[1]+"="+r[2]);}
        d.referrer && (arr.push("ref="+encodeURIComponent(d.referrer)));
        w.onMdskip=function(c){l=l?c(l):c}
        w.setMdskip=function(v){l=l?l(v):v;}
        try{
            var head=d.head || d.getElementsByTagName("head")[0];
            var script=d.createElement("script");
            head.insertBefore(script,head.firstChild);
            script.src=url+'&'+arr.join("&");
        }
        catch(err){
            d.write('<script src="'+url+'&'+arr.join("&")+'" async="async"></'+'script>');
        }
    }catch(e){
        w.onMdskip=null;
        setTimeout(function(){throw err;},0);
    }
    }(window,document))
</script>

    <meta name="keywords" content="情趣内衣激情套装骚开档露乳透明三点式小胸诱惑性感睡衣吊带短裙"/>
<meta name="description" content="欢迎前来淘宝网实力旺铺,选购情趣内衣激情套装骚开档露乳透明三点式小胸诱惑性感睡衣吊带短裙,想了解详情情趣内衣激情套装骚开档露乳透明三点式小胸诱惑性感睡衣吊带短裙,请进入敢性成人用品专营店的敢性成人用品专营店实力旺铺,更多商品任你选购"/>
<script>(function(D,W){var sampling=1000,maxNum=5,_st = +new Date();if(Math.floor(Math.random()*sampling)>0){return;}var onError=W.onerror=function(err,file,line){if((--maxNum)<=0){return;};err='[t'+(new Date()-_st)+']'+err;if(W._jstErrCat){err='[c'+W._jstErrCat+']'+err;}var nick="",result;try{result=/_nk_=([^;]+)/.exec(D.cookie);if(result){nick=decodeURIComponent(result[1]);}}catch(e){}new Image().src="//gm.mmstat.com/jstracker.2?"+["type=9","id=jstracker","v=0.01","nick="+encodeURIComponent(nick),"islogin=0","msg="+encodeURIComponent(err||""),"file="+encodeURIComponent(file||""),"line="+encodeURIComponent(line||""),"scrolltop="+(D.documentElement.scrollTop||D.body.scrollTop||0),"screen="+screen.width+"x"+screen.height,"t="+new Date().valueOf()].join("&");};W._jstErrTimeout=setTimeout(function(){onError("App init timeout");},4096);})(document, window)</script>


                                

                                                                                                                                                                          

Process finished with exit code 0

亚马逊商品页面的爬取

地址:https://www.amazon.cn/dp/B016D7CA06/ref=sr_1_3?__mk_zh_CN=%E4%BA%9A%E9%A9%AC%E9%80%8A%E7%BD%91%E7%AB%99&keywords=%E6%9D%9C%E8%95%BE%E6%96%AF&qid=1554814082&s=gateway&sr=8-3

色亦有道
代码如下:
import requests

def getHTMLText(url):
    try:
        kv = {'user-agent':'Mozilla'}
        r = requests.get(url, headers=kv, timeout=20 )
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text[:1000]
    except:
        return '产生错误'
if __name__ == '__main__':
    url = 'https://www.amazon.cn/dp/B016D7CA06/ref=sr_1_3?__mk_zh_CN=%E4%BA%9A%E9%A9%AC%E9%80%8A%E7%BD%91%E7%AB%99&keywords=%E6%9D%9C%E8%95%BE%E6%96%AF&qid=1554814082&s=gateway&sr=8-3'
    page = getHTMLText(url)
    print(page)

返回结果:

<!doctype html><html class="a-no-js" data-19ax5a9jf="dingo">
    <head>
<script type="text/javascript">var ue_t0=ue_t0||+new Date();</script>
<script type="text/javascript">
window.ue_ihb = (window.ue_ihb || window.ueinit || 0) + 1;
if (window.ue_ihb === 1) {
var ue_hob=+new Date();
var ue_id='RXTJ1GGCQGR0WRAMDB9M',
ue_csm = window,
ue_err_chan = 'jserr-rw',
ue = {};
(function(d){var e=d.ue=d.ue||{},f=Date.now||function(){return+new Date};e.d=function(b){return f()-(b?0:d.ue_t0)};e.stub=function(b,a){if(!b[a]){var c=[];b[a]=function(){c.push([c.slice.call(arguments),e.d(),d.ue_id])};b[a].replay=function(b){for(var a;a=c.shift();)b(a[0],a[1],a[2])};b[a].isStub=1}};e.exec=function(b,a){return function(){if(1==window.ueinit)try{return b.apply(this,arguments)}catch(c){ueLogError(c,{attribution:a||"undefined",logLevel:"WARN"})}}}})(ue_csm);

ue.stub(ue,"log");ue.stub(ue,"onunload");ue.stub(ue,"onflush");


Process finished with exit code 0

百度搜索关键词提交

代码如下:

import requests

def getHTMLText(url):
    try:
        kv = {'wd':'亚洲情色'}
        r = requests.get(url, params=kv)
        r.raise_for_status()
        print(r.request.url)
        r.encoding = r.apparent_encoding
        return len(r.text)
    except:
        return '产生错误'

if __name__ == '__main__':
    url = 'https://www.baidu.com/s'
    page = getHTMLText(url)
    print(page)

返回结果:

https://www.baidu.com/s?wd=%E4%BA%9A%E6%B4%B2%E6%83%85%E8%89%B2
227

360搜索关键词提交

代码如下:

import requests

def getHTMLText(url):
    try:
        kv = {"q":"无码在线"}
        r = requests.get(url, params=kv)
        r.raise_for_status()
        print(r.request.url)
        r.encoding = r.apparent_encoding
        return len(r.text)
    except:
        return '产生错误'

if __name__ == '__main__':
    url = 'https://www.so.com/s'
    page = getHTMLText(url)
    print(page)

结果如下:

https://www.so.com/s?q=%E6%97%A0%E7%A0%81%E5%9C%A8%E7%BA%BF
133054

IP地址查询

代码如下:

import requests

def getHTMLText(url):
    try:
        ipl = input('请输入要查询的ip地址:')
        r = requests.get(url+ipl)
        r.raise_for_status()
        print(r.request.url)
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return '产生错误'

if __name__ == '__main__':
    url = 'https://www.ip138.com/ips138.asp?ip='
    page = getHTMLText(url)
    print(page)

返回结果如下:

请输入要查询的ip地址:223.65.140.179
https://www.ip138.com/ips138.asp?ip=223.65.140.179
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
<TITLE>IP地址查询--手机号码查询归属地 | 邮政编码查询 | 长途电话区号 | 身份证号码验证在线查询网</TITLE>
<META NAME="Generator" CONTENT="ip,IP查询,IP地址查询,ip138,网址之家">
<META NAME="Author" CONTENT="ip138,网址之家">
<META NAME="Keywords" CONTENT="ip,IP查询,IP地址查询,ip138,网址之家">
<META NAME="Description" CONTENT="ip,IP查询,IP地址查询,ip138,网址之家">
<SCRIPT LANGUAGE="JavaScript">
<!--
function checkIP()
{
    var ipArray,ip,j;
    ip = document.ipform.ip.value;

    if (ip.indexOf(" ")>=0){
        ip = ip.replace(/ /g,"");
        document.ipform.ip.value = ip;
    }
    if (ip.toLowerCase().indexOf("http://")==0){
        ip = ip.slice(7);
        document.ipform.ip.value = ip;
    }
    if (ip.toLowerCase().indexOf("https://")==0){
        ip = ip.slice(8);
        document.ipform.ip.value = ip;
    }
    if (ip.slice(ip.length-1)=="/"){
        ip = ip.slice(0,ip.length-1);
        document.ipform.ip.value = ip;
    }

    if(/[A-Za-z_-]/.test(ip)){
        if(!/^([\w-]+\.)+((ac)|(ad)|(ae)|(af)|(ag)|(ai)|(al)|(am)|(an)|(ao)|(aq)|(ar)|(as)|(asia)|(at)|(au)|(aw)|(az)|(ba)|(band)|(bb)|(bd)|(be)|(bf)|(bg)|(bh)|(bi)|(bid)|(biz)|(bj)|(bm)|(bn)|(bo)|(br)|(bs)|(bt)|(bv)|(bw)|(by)|(bz)|(ca)|(cc)|(cd)|(cf)|(cg)|(ch)|(ci)|(ck)|(cl)|(click)|(club)|(cm)|(cn)|(co)|(co\.in)|(co\.nz)|(co\.uk)|(com)|(com\.ag)|(com\.br)|(com\.bz)|(com\.cn)|(com\.co)|(com\.es)|(com\.hk)|(com\.mx)|(com\.tw)|(cr)|(cu)|(cv)|(cx)|(cy)|(cz)|(date)|(de)|(design)|(dj)|(dk)|(dm)|(do)|(dz)|(ec)|(ee)|(eg)|(engineer)|(er)|(es)|(et)|(eu)|(fi)|(firm\.in)|(fj)|(fk)|(fm)|(fo)|(fr)|(ga)|(gd)|(ge)|(gen\.in)|(gf)|(gg)|(gh)|(gi)|(gift)|(gl)|(gm)|(gn)|(gov\.cn)|(gp)|(gq)|(gr)|(gs)|(gt)|(gu)|(gw)|(gy)|(help)|(hk)|(hm)|(hn)|(hr)|(ht)|(hu)|(id)|(idv\.tw)|(ie)|(il)|(im)|(in)|(ind\.in)|(info)|(io)|(iq)|(ir)|(is)|(it)|(je)|(jm)|(jo)|(jobs)|(jp)|(ke)|(kg)|(kh)|(ki)|(km)|(kn)|(kr)|(kw)|(ky)|(kz)|(la)|(lawyer)|(lb)|(lc)|(li)|(link)|(live)|(lk)|(lr)|(ls)|(lt)|(ltd\.uk)|(lu)|(lv)|(ly)|(ma)|(market)|(mc)|(md)|(me)|(me\.uk)|(mg)|(mh)|(mk)|(ml)|(mm)|(mn)|(mo)|(mobi)|(mp)|(mq)|(mr)|(ms)|(mt)|(mu)|(mv)|(mw)|(mx)|(my)|(mz)|(na)|(name)|(nc)|(ne)|(net)|(net\.ag)|(net\.br)|(net\.bz)|(net\.cn)|(net\.co)|(net\.in)|(net\.nz)|(news)|(nf)|(ng)|(ni)|(nl)|(no)|(nom\.co)|(nom\.es)|(np)|(nr)|(nu)|(nz)|(om)|(online)|(org)|(org\.ag)|(org\.cn)|(org\.es)|(org\.in)|(org\.nz)|(org\.tw)|(org\.uk)|(pa)|(party)|(pe)|(pf)|(pg)|(ph)|(photo)|(pics)|(pk)|(pl)|(plc\.uk)|(pm)|(pn)|(pr)|(pro)|(press)|(ps)|(pt)|(pub)|(pw)|(py)|(qa)|(re)|(ren)|(ro)|(rocks)|(ru)|(rw)|(sa)|(sb)|(sc)|(science)|(sd)|(se)|(sg)|(sh)|(si)|(site)|(sk)|(sl)|(sm)|(sn)|(so)|(social)|(software)|(space)|(sr)|(st)|(studio)|(sv)|(sy)|(sz)|(tc)|(td)|(tech)|(tf)|(tg)|(th)|(tj)|(tk)|(tl)|(tm)|(tn)|(to)|(top)|(tr)|(trade)|(travel)|(tt)|(tv)|(tw)|(tz)|(ua)|(ug)|(uk)|(us)|(uy)|(uz)|(va)|(vc)|(ve)|(vg)|(vi)|(video)|(vn)|(vu)|(wang)|(website)|(wf)|(wiki)|(win)|(ws)|(xin)|(xyz)|(ye)|(yt)|(yu)|(za)|(zm)|(zw))$/i.test(ip)){
            alert("不是正确的域名");
            document.ipform.ip.focus();
            return false;
        }
    }
    else{
        ipArray = ip.split(".");
        j = ipArray.length
        if(j!=4)
        {
            alert("不是正确的IP");
            document.ipform.ip.focus();
            return false;
        }

        for(var i=0;i<4;i++)
        {
            if(ipArray[i].length==0 || ipArray[i]>255)
            {
                alert("不是正确的IP");
                document.ipform.ip.focus();
                return false;
            }
        }
    }
}
function WeatherSearch(){
    document.forms["weathersearch"].submit()
}
function Area2Zip(){
    document.forms["area2zip"].submit()
}
//-->
</SCRIPT>

<STYLE type="text/css">
p,td {font-size:16px}
A:link {
    COLOR: #1c5f82; TEXT-DECORATION: none
}
A:visited {
    COLOR: #1c5f82; TEXT-DECORATION: none
}
A:hover {
    COLOR: #cc5533; TEXT-DECORATION: underline
}

A.green:link {COLOR: #008000;}
A.green:visited {COLOR: #008000;}
A.green:hover {COLOR: #008000;}

BODY {
    SCROLLBAR-HIGHLIGHT-COLOR: #f7f7f7; SCROLLBAR-SHADOW-COLOR: #f7f7f7; SCROLLBAR-ARROW-COLOR: #EFF1F3; SCROLLBAR-TRACK-COLOR: #EFF1F3; SCROLLBAR-BASE-COLOR: #f7f7f7
}
.ul1{
    width:480px;
    text-align:left;
}
li{
    color:green;
    }
</STYLE>
</HEAD>
<BODY>
<div align="center"><center>
<table cellSpacing="0" cellPadding="0" width="760" align="center" border="0">
    <tr vAlign="bottom">
        <td align="left"><a href="http://www.ip138.com"><b>www.ip138.com 查询网</b></a></td>
        <td align="middle"></td>
        <td align="right"><strong>手机上网查询:wap.ip138.com</strong></td>
    </tr>
    <tr vAlign="top" align="left">
        <td colSpan="3"><hr width="100%" SIZE="1">
        </td>
    </tr>
</table>
</center></div>

<div align="center"><center>
<table height="22" cellSpacing="0" cellPadding="0" width="710" border="0">
    <tr align="middle">
        <td width="179"><b><font color="#008000">→</font></b><a href="http://www.ip138.com/ips1388.asp" target="_blank">ip地址所在地查询</a></td>
        <td width="177"><b><font color="#008000">→</font></b><a href="http://qq.ip138.com/train/" target="_blank">国内列车时刻表查询</a></td>
        <td width="177"><b><font color="#008000">→</font></b><a href="http://www.ip138.com/sj/" target="_blank">手机号码所在地区查询</a></td>
        <td width="177"><b><font color="#008000">→</font></b><a href="http://qq.ip138.com/weather/" target="_blank">天气预报-预报五天</a></td>
    </tr>
    <tr align="middle">
        <td><b><font color="#008000">→</font></b><a href="http://www.ip138.com/gb.htm" target="_blank">汉字简体繁体转换</a></td>
        <td><b><font color="#008000">→</font></b><a href="http://www.ip138.com/jb.htm" target="_blank">国内国际机票查询</a></td>
        <td><b><font color="#008000">→</font></b><a href="http://10.ip138.com/" target="_blank">品牌排行榜</a></td>
        <td><b><font color="#008000">→</font></b><a href="http://qq.ip138.com/wb/wb.asp" target="_blank">五笔编码拼音查询</a></td>
    </tr>
    <tr align="middle">
        <td><b><font color="#008000">→</font></b><a href="http://qq.ip138.com/tran.htm" target="_blank">在线翻译</a></td>
        <td><b><font color="#008000">→</font></b><a href="http://qq.ip138.com/hl.asp" target="_blank">货币汇率兑换</a></td>
        <td><b><font color="#008000">→</font></b><a href="http://qq.ip138.com/day/" target="_blank">阴阳转换万年历</a></td>
        <td><b><font color="#008000">→</font></b><a href="http://www.ip138.com/post/" target="_blank">邮编查询区号查询</a></td>
    </tr>
    <tr align="middle">
        <td><a href="http://qq.ip138.com/idsearch/" target="_blank">身份证号码查询验证</a></td>
        <td><a href="http://www.ip138.com/ems/" target="_blank">快递查询</a> <a href="http://www.ip138.com/ems/" target="_blank">EMS查询</a></td>
        <td><a href="http://www.ip138.com/carlist.htm" target="_blank">全国各地车牌查询表</a></td>
        <td><a href="http://www.ip138.com/weizhang.htm" target="_blank">车辆交通违章查询</a></td>
    </tr>
</table>
</center></div>
<form method="post" action="http://qq.ip138.com/weather/search.asp" name="weathersearch" target="_blank">
<input type="hidden" name="k" value="" />
</form>
<form method="post" action="http://www.ip138.com/post/search.asp" name="area2zip" target="_blank">
<input type="hidden" name="area" value="" />
<input type="hidden" name="action" value="area2zip" />
</form>
<table width="80%"  border="0" align="center" cellpadding="0" cellspacing="0">
    <tr>
        <td align="center"><h3>ip138.com IP查询(搜索IP地址的地理位置)</h3></td>
    </tr>
    <tr>
        <td align="center"><h1>您查询的IP:223.65.140.179</h1></td>
    </tr>
    <tr>
        <td align="center"><ul class="ul1"><li>本站数据:江苏省南京市  移动</li><li>参考数据1:江苏南京  移动</li><li>参考数据2:中国 移动</li><li>兼容IPv6地址:::DF41:8CB3</li><li>映射IPv6地址:::FFFF:DF41:8CB3</li></ul></td>
    </tr>
    <tr>
        <td align="center"><a href="/idc/" target="_blank"><b><font color="red">idc公司大全</font></b></a> | <a href="http://user.ip138.com/ip/" target="_blank"><b>ip查询接口</b></a></td>
    </tr>
    <tr>
        <td align="center">如果您发现查询结果不详细或不正确,请使用<a href="ip_add.asp?ip=223.65.140.179"><font color="#006600"><b>IP数据库自助添加</b></font></a>功能进行修正<br/><br/>
<p align="center">
<div align="center">
<!--
<script type="text/javascript">
var cpro_id = "u2962614";
</script>
<script type="text/javascript" src="http://cpro.baidustatic.com/cpro/ui/c.js"></script>
-->
<a href="https://www.yisu.com/hk/huodong.html" target="_blank" rel="nofollow"><img src="http://p.abcache.com/xc/image/yisu-468x60.gif"/></a>
</div>
</p>
<br/></td>
    </tr>
    <form method="get" action="ips1388.asp" name="ipform" onsubmit="return checkIP();">
    <tr>
        <td align="center">IP地址或者域名:<input type="text" name="ip" size="16"> <input type="submit" value="查询"><input type="hidden" name="action" value="2"></td>
    </tr><br>
<br>
    </form>
</table>
<p align="center">如发现小部分ip查询结果不正确请到官方网站<a
href="http://www.apnic.net" rel="nofollow" target="_blank">http://www.apnic.net</a>查询,以apnic为准</p>
<p align="center">
<div align="center">
<!--
<script type="text/javascript">
var cpro_id = "u2962622";
</script>
<script type="text/javascript" src="http://cpro.baidustatic.com/cpro/ui/c.js"></script>
-->
<a href="http://www.yisu.com/ddos/huodong" target="_blank" rel="nofollow"><img src="http://p.abcache.com/xc/image/yun728x90.gif" width="728" height="90"/></a>
</div>
</p>
<p align="center"></a>联系我们.请<a href="mail.htm" rel="nofollow" target="_blank">发email</a>.或给<a
href="http://qq.3533.com:8080/book.asp?siteid=7" rel="nofollow" target="_blank">我们留言</a>谢谢!</p>
<p align="center">沪ICP备10013467号-1号</p>
<div style="display:none"><script type="text/javascript" src="http://tajs.qq.com/stats?sId=36241650" charset="UTF-8"></script></div>
</body>
</html>

Process finished with exit code 0

相关文章

网友评论

      本文标题:request库的使用

      本文链接:https://www.haomeiwen.com/subject/vltiiqtx.html