Crossin的编程教室

标题: python编码问题 [打印本页]

作者: jinhang    时间: 2016-11-8 15:22
标题: python编码问题
最近看了crossin 老师的课,想做一个网络爬虫遇到问题。。

老师课中用的 网站  http://www.weather.com.cn/data/sk/101010100.html
现在已经不更新了,里边的数据读出来是这样的:

{"weatherinfo":{"city":"鍖椾含","cityid":"101010100","temp":"18","WD":"涓滃崡椋�","WS":"1绾�","SD":"17%","WSE":"1","time":"17:05","isRadar":"1","Radar":"JC_RADAR_AZ9010_JB","njd":"鏆傛棤瀹炲喌","qy":"1011","rain":"0"}}

虽然直接打开链接是乱码,但是用代码获取这个内容,代码是这样的:

web=urllib2.Request('http://wthrcdn.etouch.cn/weather_mini?citykey=101010100')
data=urllib2.urlopen(web).read()
print type(data)
print data

可以完美的读出内容,没有乱码。打印结果是这样的:
8.jpg

可是中国天气网这个网站早就不更新了,而且数据还少...秉承着孜孜不倦的精神翻过去的帖子。在老师另一篇帖子下找到这个网站   :http://wthrcdn.etouch.cn/weather_mini?citykey=101010100

里边数据很丰富,数据是这样的:

{"desc":"OK","status":1000,"data":{"wendu":"11","ganmao":"鏄煎娓╁樊寰堝ぇ锛屾槗鍙戠敓鎰熷啋锛岃娉ㄦ剰閫傚綋澧炲噺琛f湇锛屽姞寮鸿嚜鎴戦槻鎶ら伩鍏嶆劅鍐掋€�","forecast":[{"fengxiang":"鏃犳寔缁鍚�","fengli":"寰绾�","high":"楂樻俯 11鈩�","type":"鏅�","low":"浣庢俯 -1鈩�","date":"8鏃ユ槦鏈熶簩"},{"fengxiang":"鏃犳寔缁鍚�","fengli":"寰绾�","high":"楂樻俯 10鈩�","type":"鏅�","low":"浣庢俯 1鈩�","date":"9鏃ユ槦鏈熶笁"},{"fengxiang":"鏃犳寔缁鍚�","fengli":"寰绾�","high":"楂樻俯 9鈩�","type":"闇�","low":"浣庢俯 0鈩�","date":"10鏃ユ槦鏈熷洓"},{"fengxiang":"鏃犳寔缁鍚�","fengli":"寰绾�","high":"楂樻俯 11鈩�","type":"澶氫簯","low":"浣庢俯 2鈩�","date":"11鏃ユ槦鏈熶簲"},{"fengxiang":"鏃犳寔缁鍚�","fengli":"寰绾�","high":"楂樻俯 11鈩�","type":"澶氫簯","low":"浣庢俯 1鈩�","date":"12鏃ユ槦鏈熷叚"}],"yesterday":{"fl":"4-5绾�","fx":"鍖楅","high":"楂樻俯 11鈩�","type":"鏅�","low":"浣庢俯 2鈩�","date":"7鏃ユ槦鏈熶竴"},"aqi":"39","city":"鍖椾含"}}
还用刚才的代码去获取,问题来了!!!!读不出来我的天!!
打印结果是这样的:
7.jpg
到底是哪里出了问题???
需要老师给解答一下!!
然后,自己进入这个坑死活出不去啦!就想着,试一下xml文件获取天气信息的方法!同样是老师的那篇帖子,下边还有 .xml的链接 ,就是这个:  http://wthrcdn.etouch.cn/WeatherApi?citykey=101010100
这信息真特么的丰富啊!!还有七天的预报!然而!!!
依旧是乱码!

希望大神能把这俩问题给我解答一下...
作者: crossin先生    时间: 2016-11-8 15:28
https://zhuanlan.zhihu.com/p/21057822
参考这篇文章

乱码的2种可能,编码问题,和压缩
简单的办法是直接用 requests 包来抓取。
手动的办法就是重新编码和解压缩。




欢迎光临 Crossin的编程教室 (https://bbs.crossincode.com/) Powered by Discuz! X2.5