爬取图片出现的问题

2 主题	0 好友	22 积分

新手上路

Rank: 1

发消息

电梯直达

楼主

发表于 2019-4-11 01:23:32 |显示全部楼层 |倒序浏览

最近刚学习爬虫，小白想请教crossin老师和大家，为什么不能获取图片，
python 3.7  url ： https://tieba.baidu.com/f?kw=%E5%A5%B3%E7%A5%9E&ie=utf-8&pn=0

import os
import requests
from bs4 import BeautifulSoup as bs

def url_open(url):
headers = {}
headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
r = requests.get(url, headers = headers).text
soup = bs(r, 'lxml')
return soup

def img_download(img_url, file = './nv_shen'):
filename = img_url.split('/')[-1]
img_content = requests.get(img_url)
with open(filename, 'wb') as f:
      f.write(img_content.content)

def download_nvshen(s, file = './nv_shen'):
os.makedirs(file, exist_ok = True)
os.chdir(file)

links = []
for i in range(s):
      endw = str(50*i)
      link = "https://tieba.baidu.com/f?kw=%E5%A5%B3%E7%A5%9E&ie=utf-8&pn="+ endw
      links.append(link)

#print(links)

for link in links:
      content = url_open(link)
      #print(content)
      img_classes = content.find_all('div', class_= 'media_box j_remove j_media_box')
      # 问题出在这里 img_classes是个空列表，请问为什么
      for img_class in img_classes:
         imgs = img_class.find_all('img', class_= 'j_retract')
         for img in imgs:
            img_url = img['src']
            img_download(img_url, file)

s = int(input('print a number: '))
download_nvshen(s)

疑问, 爬虫

收藏1

		自动登录	找回密码
密码			立即加入

爬取图片出现的问题

相关帖子