查看: 13399|回复: 3

python的转译问题

1 主题	0 好友	28 积分

新手上路

Rank: 1

发消息

电梯直达

楼主

发表于 2015-10-17 17:41:58 |只看该作者 |倒序浏览

我正在做一个简单的爬虫，遇到一些问题，不知哪位同学可以帮忙解决一下，十分感谢
以下是我的代码：
#coding=utf-8
import urllib
import re
import time

def getHtml(url):
page = urllib.urlopen(url)
html = page.read()
return html

def getImg(html):
reg = r'''"url":"(.*?\.jpg)","width"'''
imgre = re.compile(reg)
imglist = re.findall(imgre,html)
return imglist

html = getHtml("http://image.baidu.com/channel/star")
print getImg(html)
urllist=getImg(html)
x=0
for imgurl in urllist:
urllib.urlretrieve(imgurl,'%s.jpg' % x)
time.sleep(10)
x+=1

然后报错了：
Traceback (most recent call last):
  File "D:\BaiduYunDownload\Python\web crawler.py", line 24, in <module>
urllib.urlretrieve(imgurl,'%s.jpg' % x)
  File "C:\Python27\lib\urllib.py", line 98, in urlretrieve
return opener.retrieve(url, filename, reporthook, data)
  File "C:\Python27\lib\urllib.py", line 245, in retrieve
fp = self.open(url, data)
  File "C:\Python27\lib\urllib.py", line 213, in open
return getattr(self, name)(url)
  File "C:\Python27\lib\urllib.py", line 326, in open_http
if not host: raise IOError, ('http error', 'no host given')
IOError: [Errno http error] no host given

---------------------------------------------------分隔线----------------------------------------------------------
根据我的观察应该是url取下来的时候是长这样的：
'http:\\/\\/img0.bdstatic.com\\/img\\/image\\/2016ss1.jpg'
这个原代码用“\\”来转译，我在访问的时候应该需要把这条url解析回去，变成：
'http://img0.bdstatic.com/img/image/2016ss1.jpg'
不知道用什么公式或者模块才能实现这样的功能呢？

python

收藏0