python requests库入门

requests是一个很实用的Python HTTP客户端库，编写爬虫和测试服务器响应数据时经常会用到。可以说，Requests 完全满足如今网络的需求

本文全部来源于官方文档 http://docs.python-requests.org/en/master/

安装方式一般采用$ pip install requests。其它安装方式参考官方文档

requests模块的学习笔记

使用之前

pip install requests

发起get，post，请求获取响应

response = requests.get(url,headers) # 发起get请求，请求url地址对应的响应
response = requests.post(url,data={请求体的字典}) # 发起post请求

response的用法

response.text
- 该方式往往会出现乱码，出现乱码使用response.encoding=”utf-8″
response.content.decode()
- 把响应的二进制字节转化为str类型
- bytes—> str
response.request.url # 发送请求的url地址
response.request.headers # 请求头
response.headers # 响应头

获取网页的正确打开方式(通过下面三种获取解码之后的字符串)

1、response.content.decode()
2、response.content.decode(“gbk”)
3、response.text

发起带headers的请求

为了模拟浏览器,获取和浏览器一样的内容headers = { "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Referer": "http: // acc.hnczt.gov.cn / SpaceAction.do?method = list & ntype = 3", "Cookie": "Hm_lvt_35cde00bcde87c267839e0309e482db1 = 1554703285, 1554777297, 1554884326" } response = requests.get(url,headers)

使用超时参数

pip install retrying

from retrying import retry

@retry(stop_max_attempt_number=3)
# 数字3代表的是下面的函数执行三次，有一次成功则跳过，没有一次成功则跑出异常
def func():
    print("this is func")
    raise ValueError("this is test error")

处理cookie相关的请求

直接携带cookie请求url地址
- 1、cookie放在headers中
headers = { "User-Agent":"...","Cookie":"cookie字符串" }
- 2、cookie字典传递给cookies参数
  - requests.get(url, cookies=cookie_dict)
先发送post请求,获取cookie,带上cookie请求登录的页面
- 1、session = requests.session() # session具有的方法和requests一样
- 2、session.post(url, data, headers) # 服务器设置在本地的cookie存在session
- 3、session.get(url) # 会带上之前保存在session中的cookie, 能够请求成功

参考文档：
http://docs.python-requests.org/zh_CN/latest/user/quickstart.html

什么是Requests

Requests是用python语言基于urllib编写的，采用的是Apache2 Licensed开源协议的HTTP库
如果你看过上篇文章关于urllib库的使用，你会发现，其实urllib还是非常不方便的，而Requests它会比urllib更加方便，可以节约我们大量的工作。（用了requests之后，你基本都不愿意用urllib了）一句话，requests是python实现的最简单易用的HTTP库，建议爬虫使用requests库。

默认安装好python之后，是没有安装requests模块的，需要单独通过pip安装

requests功能详解

总体功能的一个演示

import requests response  = requests.get("https://www.tianqiweiqi.com") 
print(type(response)) 
print(response.status_code) 
print(type(response.text)) 
print(response.text) 
print(response.cookies) 
print(response.content) 
print(response.content.decode("utf-8"))

我们可以看出response使用起来确实非常方便，这里有个问题需要注意一下：
很多情况下的网站如果直接response.text会出现乱码的问题，所以这个使用response.content
这样返回的数据格式其实是二进制格式，然后通过decode()转换为utf-8，这样就解决了通过response.text直接返回显示乱码的问题.

请求发出后，Requests 会基于 HTTP 头部对响应的编码作出有根据的推测。当你访问 response.text 之时，Requests 会使用其推测的文本编码。你可以找出 Requests 使用了什么编码，并且能够使用 response.encoding 属性来改变它.如：

response =requests.get("http://www.tianqiweiqi.com") response.encoding="utf-8" print(response.text)

不管是通过response.content.decode(“utf-8)的方式还是通过response.encoding=”utf-8″的方式都可以避免乱码的问题发生

作者：

喜欢围棋和编程。查看的所有文章