Crossin的编程教室

标题: 又是万恶的编码问题=-= [打印本页]

作者: 小可爱    时间: 2018-2-15 14:22
标题: 又是万恶的编码问题=-=
本帖最后由 小可爱 于 2018-2-15 14:22 编辑

代码如下:
#!/usr/bin/python
#coding: UTF-8

import urllib.request
import urllib.parse

url='http://www.baidu.com'
header={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0'}

response=urllib.request.Request(url,headers=header)
html=urllib.request.urlopen(response).read()
html=html.decode('utf-8')
print(html)

这时候还是正常的(不过直接查看网页源代码貌似没这个print出来的多.......为什么会这样啊?)
然后我加了一句:
with open('/html_bd.txt','a') as f:
    f.write(html)
然后就报错了,报错如下:
Traceback (most recent call last):
  File "D:\Documents\notepad\webscrab.py", line 16, in <module>
    f.write(html)
UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position 28678: illegal multibyte sequence
网上查了很久,还是稀里糊涂没整好:
=-=怎么整哟.....




作者: crossin先生    时间: 2018-2-16 22:09
f.write(html, encoding='utf8')
作者: 小可爱    时间: 2018-2-17 12:59
不对哦,还是会报错的喔:
Traceback (most recent call last):
  File "D:\Documents\notepad\webscrab.py", line 17, in <module>
    f.write(html,encoding='utf-8')
TypeError: write() takes no keyword arguments
[Finished in 1.5s with exit code 1]
[shell_cmd: python -u "D:\Documents\notepad\webscrab.py"]
[dir: D:\Documents\notepad]
[path: C:;D:\JAVA\bin;D:\JAVA\jre\bin;\ProgramData\Oracle\Java\javapath;C:\Program Files (x86)\Intel\iCLS Client\;C:\Program Files\Intel\iCLS Client\;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\DAL;C:\Program Files\Intel\Intel(R) Management Engine Components\DAL;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\IPT;C:\Program Files\Intel\Intel(R) Management Engine Components\IPT;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;c:\Program Files\Intel\WiFi\bin\;c:\Program Files\Common Files\Intel\WirelessCommon\;D:\matlab 2015b\runtime\win64;D:\matlab 2015b\bin;D:\matlab 2015b\polyspace\bin;D:\Python36\Scripts\;D:\Python36\;D:\VC6.0\Tools\WinNT;D:\VC6.0\MSDev98\Bin;D:\VC6.0\Tools;D:\VC98\bin;D:\Fiddler;]
改成这样就对了:
with open('/html_bd.txt','a',encoding='utf-8') as f:
    f.write(html)
嘻嘻!







欢迎光临 Crossin的编程教室 (https://bbs.crossincode.com/) Powered by Discuz! X2.5