查看: 7014|回复: 3

为什么知乎用户的「关注的问题」只能爬取4个信息？

2 主题	0 好友	42 积分

新手上路

Rank: 1

发消息

电梯直达

楼主

发表于 2020-11-18 20:21:33 |只看该作者 |正序浏览

大佬求助！！！
初学爬虫，用requests和beautifulsoup爬取安居客，豆瓣电影都很OK，
但是爬取知乎中，某个用户的「关注的问题」时，只能爬取4条信息，用cookie登录也是这样，为什么呀？

附上代码：
超过字符数限制了，用附件代替了，麻烦了！！

test2.py.zip

995 Bytes, 下载次数: 1

如问题所示，拜托了！！！

收藏0

使用道具举报

zzz

2 主题	0 好友	42 积分

新手上路

Rank: 1

发消息

地板

发表于 2020-11-22 15:18:00 |只看该作者

crossin先生发表于 2020-11-19 12:54
更多的数据是在数据请求里获取的，不是随页面一次性拿到的
了解下 AJAX
以及这篇里的Network部分：

解决了谢谢

使用道具举报

crossin先生

174 主题	45 好友	11万积分

管理员

Rank: 9 Rank: 9 Rank: 9

发消息

板凳

发表于 2020-11-19 12:54:12 |只看该作者

更多的数据是在数据请求里获取的，不是随页面一次性拿到的
了解下 AJAX
以及这篇里的Network部分：
https://mp.weixin.qq.com/s/Vi2SO5Ep3ZBLH0T4a-YlRg

#==== Crossin的编程教室 ====#
微信ID：crossincode
网站：http://crossincode.com

使用道具举报

zzz

2 主题	0 好友	42 积分

新手上路

Rank: 1

发消息

沙发

发表于 2020-11-18 20:22:04 |只看该作者

#代码如下
import requests
from bs4 import BeautifulSoup

link =  'https://www.zhihu.com/people/you-wu-jun-77/following/questions' #想要爬取的用户的关注的问题
print("知乎关注的问题爬取：")

headers = {
      'cookies': 'xxx',
      'User-Agent': 'xxx'
}
response = requests.get(link, headers=headers)

soup = BeautifulSoup(response.text, 'lxml')
print(soup)
following_question_list = soup.find_all('div', class_='List-item')
print(following_question_list)
print('-------------')

for following_question in following_question_list:
question = following_question.find('div', class_='QuestionItem-title').text.strip()
data = following_question.select('.ContentItem-status > span')[0].text
answer_num = following_question.select('.ContentItem-status > span')[1].text
following_num = following_question.select('.ContentItem-status > span')[2].text

print(question, data, answer_num, following_num)

使用道具举报

返回列表

		自动登录	找回密码
密码			立即加入