Crossin的编程教室

标题: python的转译问题 [打印本页]

作者: 陈小六 时间: 2015-10-17 17:41
标题: python的转译问题
我正在做一个简单的爬虫，遇到一些问题，不知哪位同学可以帮忙解决一下，十分感谢
以下是我的代码：
#coding=utf-8
import urllib
import re
import time

def getHtml(url):
page = urllib.urlopen(url)
html = page.read()
return html

def getImg(html):
reg = r'''"url":"(.*?\.jpg)","width"'''
imgre = re.compile(reg)
imglist = re.findall(imgre,html)
return imglist

html = getHtml("http://image.baidu.com/channel/star")
print getImg(html)
urllist=getImg(html)
x=0
for imgurl in urllist:
urllib.urlretrieve(imgurl,'%s.jpg' % x)
time.sleep(10)
x+=1

然后报错了：
Traceback (most recent call last):
  File "D:\BaiduYunDownload\Python\web crawler.py", line 24, in <module>
urllib.urlretrieve(imgurl,'%s.jpg' % x)
  File "C:\Python27\lib\urllib.py", line 98, in urlretrieve
return opener.retrieve(url, filename, reporthook, data)
  File "C:\Python27\lib\urllib.py", line 245, in retrieve
fp = self.open(url, data)
  File "C:\Python27\lib\urllib.py", line 213, in open
return getattr(self, name)(url)
  File "C:\Python27\lib\urllib.py", line 326, in open_http
if not host: raise IOError, ('http error', 'no host given')
IOError: [Errno http error] no host given

---------------------------------------------------分隔线----------------------------------------------------------
根据我的观察应该是url取下来的时候是长这样的：
'http:\\/\\/img0.bdstatic.com\\/img\\/image\\/2016ss1.jpg'
这个原代码用“\\”来转译，我在访问的时候应该需要把这条url解析回去，变成：
'http://img0.bdstatic.com/img/image/2016ss1.jpg'
不知道用什么公式或者模块才能实现这样的功能呢？

作者: 陈小六 时间: 2015-10-17 18:01
最后我用了一个很傻的办法，如下：不知道有没有什么现成的模块或者方法啊~

elements=imgurl.split('\\')
url=''.join(elements)

复制代码

作者: crossin先生 时间: 2015-10-17 18:03
imgurl.replace('\\/','/')

作者: 陈小六 时间: 2015-10-17 18:38

crossin先生发表于 2015-10-17 18:03
imgurl.replace('\\/','/')

十分感谢

欢迎光临 Crossin的编程教室 (https://bbs.crossincode.com/)