python爬虫-代理的使用

python爬⾍-代理的使⽤
代理的设置
在urllib库中使⽤代理,代码如下:
quest import ProxyHandler,build_opener
三诺n20g import URLError
proxy = "113.116.50.182:808"
血液回收proxy_handler = ProxyHandler({
"http":""+proxy,
"https":""+proxy,
})
opener = build_opener(proxy_handler)
try:
response = opener.open("/ip")
ad().decode())波斯王子男主角
except URLError as e:
print("ip不能⽤")
显⽰为下⾯的情况,说明代理设置成功:
{
"origin": "113.116.50.182, 113.116.50.182"
}
对于需要认证的代理,,只需要改变proxy变量,在代理前⾯加⼊代理认证的⽤户名密码即可:"username:password@113.116.50.182"
quest import ProxyHandler,build_opener
import URLError
proxy = "username:password@113.116.50.182:808"
proxy_handler = ProxyHandler({
"http":""+proxy,
"https":""+proxy,
})
opener = build_opener(proxy_handler)
try:
response = opener.open("/ip")
ad().decode())
except URLError as e:
print("ip不能⽤")
如果遇到了socks代理服务器:
采⽤socks协议的就是SOCKS服务器,是⼀种通⽤的代理服务器。Socks是个电路级的底层,是DavidKoblas在1990年开发的,此后就⼀直作为Internet RFC标准的开放标准。Socks 不要求应⽤程序遵循特定的操作系统平台,Socks 代理与代理、 HTTP 层代理不同,Socks 代理只是简单地传递数据包,⽽不必关⼼是何种应⽤协议(⽐如FTP、HTTP和NNTP请求)。所以,⽐其他应⽤层代理要快得多。
代码设置如下:
import socks
import socket
from urllib import request
import URLError
socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807)
socket.socket = socks.socksocket
try:
response = request.urlopen("/ip")
ad().decode())
except URLError as e:
print("ip不能⽤")
中国环球电视网
requests库代理设置
import requests
proxy = "113.116.50.182:808"
proxies = {
"http":""+proxy,
"https":""+proxy,
}
try:
response = ("/ip",proxies=proxies)
)
ptions.ConnectionError as e:
print("Error",e.args)
⽐urllib中使⽤代理设置要简单的多,当然这⾥对于需要认证的代理,同样使⽤proxy = “username:password@113.116.50.182:808”即可,这⾥不再演⽰
对于requests库中使⽤socks5代理,设置如下:
import requests
import socks
import socket
socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807)
mediaring talksocket.socket = socks.socksocket
try:
response = ("/ip")
)
ptions.ConnectionError as e:
print("Error",e.args)
Selenium中设置代理
鉴于PhantomJS⽆界⾯浏览器已经⽆⼈维护,这⾥只演⽰有界⾯浏览器Chrome
from selenium import webdriver
proxy = "113.116.50.182:808"
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument('--proxy-server='+proxy)
driver = webdriver.Chrome(executable_path=r"C:\Users\Administrator\",options=chromeOptions)
<("/ip")
print(driver.page_source)
爬取结果如下:
<html xmlns="/1999/xhtml"><head></head><body><pre >{
"origin": "113.116.50.182, 113.116.50.182"
}
pt100
</pre></body></html>
注意:chromeOptions⽬前需要使⽤options代替
对于在Selenium中使⽤认证代理,稍微⿇烦⼀些,以后直接修改以下代码即可
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import zipfile
ip = '113.116.50.182'
port = 808
username = 'xxxx'
password = 'xxxx'
manifest_json = """
{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"<all_urls>",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
}
}
"""
background_js = """
var config = {
mode: "fixed_servers",
rules: {
singleProxy: {
scheme: "http",
host: "%(ip)s",
port: %(port)s
}
}
}
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
return {
authCredentials: {
username: "%(username)s",
password: "%(password)s"
}
}
}
AuthRequired.addListener(
callbackFn,
{urls: ["<all_urls>"]},
['blocking']
)
""" % {'ip': ip, 'port': port, 'username': username, 'password': password}
plugin_file = 'proxy_auth_plugin.zip'
with zipfile.ZipFile(plugin_file, 'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", background_js)
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
chrome_options.add_extension(plugin_file)
browser = webdriver.Chrome(executable_path=r"C:\Users\Administrator\",options=chrome_options) ('/ip')

本文发布于:2024-09-22 18:27:19,感谢您对本站的认可!

本文链接:https://www.17tex.com/xueshu/164627.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2024 Comsenz Inc.Powered by © 易纺专利技术学习网 豫ICP备2022007602号 豫公网安备41160202000603 站长QQ:729038198 关于我们 投诉建议