鄙人在《python实战第五课:拿来主义 》遇到问题了
我在抓取豆瓣API每部电影详细信息中时,可以print输出,但是一旦存入数据库时就出现了问题:
ERROR:
cursor.execute(s)OperationalError: near "oceano": syntax error
PROBLEM:
后来我试了很次,发现了问题所在,即:original_title里含有单引号。
例如:id:1292001 title:海上钢琴师 original_title:La leggenda del pianista sull'oceano
再例如:id:1295124 title:辛德勒的名单 original_title:Schindler's List
Solution:
我也网上搜索了很多,听说要把单引号变成两个单引号即可。
于是我利用 .replace命令 : m_origin.replace("'","''")
然而结果还是一样,我在其中输出m_origin看看,发现replace不起作用,m_origin不变。
于是我想是不是单引号太特殊?接着试用正常字母的replace,结果无异。
FINAL:
所以求教Mr.crossin, 我这是哪儿出问题了,有什么解决方法吗?
附上代码
MY CODE:#我的代码可能和您的稍微有一点点不一样
import urllib2
import json
import sqlite3
m_ids=[1292052, 1295644, 1292720, 1291546, 1292063, 1291561, 1295124, 1292001, 2131459, 3541415, 1292722, 3793023, 1291549, 3011091, 1292213, 1291560, 1291841, 1300267, 1291828, 1849031, 1292000, 1292064, 6786002, 1291552, 1293839, 1293182, 1291583, 3319755, 1293350, 2129039, 3442220, 1299398, 1307914, 1292224, 1929463, 1291858, 1900841, 1851857, 5912992, 1309046, 1292365, 1298624, 1292215, 1306029, 1291571, 1291572, 1299131, 1308807, 1292223, 1292220, 1291548, 1292262, 1294639, 1292370, 1296736, 1780330, 1301753, 1889243, 1787291, 1294408, 1303021, 1291832, 2149806, 1292343, 3072124, 1293544, 1485260, 1291843, 21937445, 1292849, 1291818, 1292402, 1297630, 1297359, 1291545, 1292656, 1316510, 3742360, 1291875, 4917726, 1292208, 1418019, 1293318, 1296141, 1292679, 1291999, 2334904, 1297192, 4268598, 1305164, 1298070, 21937452, 1296339, 1652587, 1291585, 1292434, 1291990, 2353023, 3443389, 1297052, 1295865, 1292274, 1305487, 1296909, 5322596, 1418834, 1294371, 3287562, 1417598, 11525673, 1292401, 1293359, 4202302, 1291870, 2209573, 3792799, 6985810, 1907966, 1578507, 1304447, 1309163, 1292328, 1937946, 1293964, 1297447, 1295399, 1292528, 1291579, 3008247, 1295038, 1291822, 1293172, 1978709, 2043546, 1306861, 1300299, 1300992, 1418200, 1302827, 1294240, 1858711, 1297574, 1293460, 1760622, 1291578, 1295409, 1388216, 1300960, 21318488, 1865703, 2297265, 5989818, 21360417, 1303037, 3007773, 1397546, 1308857, 2213597, 1419936, 1292270, 1292281, 1905462, 1291879, 3011235, 10777687, 1304102, 5964718, 1307793, 1300374, 1292728, 2363506, 1302425, 1308767, 4798888, 1296753, 1291853, 10577869, 1291557, 1292659, 25814705, 1305690, 1306249, 2300586, 1294638, 1292217, 1307811, 1293181, 3011051, 1308817, 1292214, 6874403, 3395373, 1297478, 2053515, 25917973, 1291844, 1299361, 1292233, 11026735, 1959195, 1308575, 1307315, 4739952, 1291992, 3075287, 1292056, 1302467, 2365260, 3157605, 1292062, 1291568, 3217169, 6307447, 1401118, 1793929, 1292287, 1293764, 1299327, 1302476, 1308777, 1303394, 1292218, 1867345, 1438652, 1293908, 1292329, 1301171, 6534248, 1309027, 1395091, 1305725, 4023638, 1298653, 24750126, 1428175, 6146955, 3073124, 1300117, 1316572, 1293929, 1293530, 1301617, 4286017, 5908478, 1304073, 1756073, 1301169, 25773932, 1862151, 1292348]
#此处为了测试后面的BUG,直接赋值id了,不然重新抓取比较耗费时间ToT
def get_detail():
for id in m_ids:
m_title=[]
m_origin=[]
m_url=[]
m_rating=[]
m_image=[]
m_directors=[]
m_casts=[]
m_year=[]
m_genres=[]
m_country=[]
m_summary=[]
b=urllib2.urlopen('http://api.douban.com/v2/movie/subject/%d'%id)
s=b.read()
dict2=json.loads(s)
m_title.append(dict2['title'])
m_origin.append(dict2['original_title'])
print m_origin
m_url.append(dict2['alt'])
m_rating.append(str(dict2['rating']['average']))
m_image.append(dict2['images']['large'])
m_directors.append(dict2['directors'][0]['name'])
for i in range(0,len(dict2['casts'])-1):
m_casts.append(dict2['casts']['name'])
m_year.append(dict2['year'])
m_genres.append(dict2['genres'])
m_country.append(dict2['countries'][0])
m_summary.append(dict2['summary'])
m_title=','.join(m_title)
m_origin=str(''.join(m_origin))
#print m_origin
m_origin.replace("'","''") #deal with the single quote in a str
#print type(m_origin)
print m_origin
m_url=','.join(m_url)
m_rating=','.join(m_rating)
m_image=','.join(m_image)
m_directors=','.join(m_directors)
m_casts=','.join(m_casts)
m_year=','.join(m_year)
m_genres=','.join(m_genres[0])
m_country=','.join(m_country)
m_summary=','.join(m_summary)
print 'Movie\'s ID:'+str(id),'\nName:'+ m_title,'\nOriginal Name:'+m_origin,'\nWebsite:'+m_url,'\nAverage Rating:'+m_rating,'\nPostsite:'+m_image,'\nDirectors:'+m_directors,'\nCasts:'+m_casts,'\nYear:'+m_year,'\nGenres:'+m_genres,'\nCountry and Area:'+m_country,'\nSummary:'+m_summary+'\n'
conn=sqlite3.connect('Movies_douban.db')
cursor =conn.cursor()
s="insert into Movies('ID','title','origin','url','rating','image','directors','casts','year','genres','country','summary')\
values(%d,'%s','%s','%s','%s','%s','%s','%s','%s','%s','%s','%s')"%(id,m_title,m_origin,m_url,m_rating,m_image,m_directors,m_casts,m_year,m_genres,m_country,m_summary)
cursor.execute(s)
conn.commit()
cursor.close()
get_detail()
|