设为首页收藏本站

Crossin的编程教室

 找回密码
 立即加入
查看: 7678|回复: 1
打印 上一主题 下一主题

爬取网站时,有超时报错,更换镜像源无果

[复制链接]

1

主题

0

好友

5

积分

新手上路

Rank: 1

跳转到指定楼层
楼主
发表于 2020-7-15 12:49:53 |只看该作者 |倒序浏览
写爬虫爬取网站时,有超时或者404的报错信息,百度说换国内的镜像源,更换成清华的镜像源,并且timeout设置成100,还是报错,异常如下:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 426, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    httplib_response = conn.getresponse()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1336, in getresponse
    response.begin()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 267, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 725, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/util/retry.py", line 403, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 428, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='static1.scrape.cuiqingcai.com', port=443): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/zhouwen/python/spider/practice.py", line 53, in <module>
    main()
  File "/Users/zhouwen/python/spider/practice.py", line 48, in main
    raise e
  File "/Users/zhouwen/python/spider/practice.py", line 46, in main
    movie_details(detail_url)
  File "/Users/zhouwen/python/spider/practice.py", line 29, in movie_details
    doc = pq(url=html)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyquery/pyquery.py", line 224, in __init__
    html = url_opener(url, kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyquery/openers.py", line 84, in url_opener
    return _requests(url, kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyquery/openers.py", line 66, in _requests
    resp = meth(url=url, timeout=kwargs.get('timeout', DEFAULT_TIMEOUT), **kw)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='static1.scrape.cuiqingcai.com', port=443): Read timed out. (read timeout=60)
回复

使用道具 举报

174

主题

45

好友

11万

积分

管理员

Rank: 9Rank: 9Rank: 9

沙发
发表于 2020-7-15 21:55:05 |只看该作者
不发代码这没法帮你看

404先去检查你的url是不是对的,有没有拼错(放浏览器里打开看看)

另外,网站请求失败跟国内镜像什么的没关系,不是下载安装包
#==== Crossin的编程教室 ====#
微信ID:crossincode
网站:http://crossincode.com
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即加入

QQ|手机版|Archiver|Crossin的编程教室 ( 苏ICP备15063769号  

GMT+8, 2024-11-25 14:06 , Processed in 0.014525 second(s), 20 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部