Crossin的编程教室»论坛 › Python › 编程练习 › 【每日一坑 6】查找文件内容

123 / 3 页下一页

楼主: crossin先生

【每日一坑 6】查找文件内容

[复制链接]

coolqing

0 主题	2 好友	46 积分

新手上路

Rank: 1

发消息

15^#

发表于 2018-5-16 16:48:56 |只看该作者

我这边自己建了一个文件夹，里面放了一个子文件夹，一些txt、excel和word文件。
运行代码后，txt文件正常显示内容，excel和word都不正常。
自己搜索了下，说是excel和word的读取要用其他模块来处理的，但是基础课程中讲到read的使用时，老师好像也没有提到txt文件之外的类型该怎么读取，麻烦问下老师excel这些是要再另外处理吗？

1526460289(1).jpg (17.28 KB, 下载次数: 449)

1526460289(1).jpg

1526460267(1).jpg (9.67 KB, 下载次数: 428)

1526460267(1).jpg

使用道具举报

wolfog

1 主题	0 好友	39 积分

新手上路

Rank: 1

发消息

14^#

发表于 2017-8-31 11:57:51 |只看该作者

import fnmatch
import os

def filterFile(filePath, contain):
for grandFather, father, sons in os.walk(filePath):
      sonList = []
      for son in sons:
         sonList.append(son)
      fnmatchs = fnmatch.filter(sonList, "*.txt")
      if len(fnmatchs) != 0:
         for fnmatchFile in fnmatchs:
            absultePath = grandFather + "/" + fnmatchFile
            absultePath = absultePath.replace("\\", "/")
            f = file(absultePath)
            str = f.read()
            f.close()
            if str.__contains__(contain):
                  print absultePath

path = raw_input("请输入遍历的绝对路径：")
filterFile(path, "aaa")

做的过程中存在两个问题困扰了我:
1、sonList刚开始在方法外放着，导致生命周期太长，所以导致了我的拼接出来的路径下不存在某个文件
2、拼接出来的路径是双引号的，用来打开文件会出错。但是我敲的demo在pycharm中打印出来却是单斜杠的。一直不知道是哪里出问题了。最后发现是pycharm自动将其转化为合格的路劲。

使用道具举报

crossin先生

174 主题	45 好友	11万积分

管理员

Rank: 9 Rank: 9 Rank: 9

发消息

13^#

发表于 2017-4-11 23:29:28 |只看该作者

xqqxjnt1988 发表于 2017-4-11 15:33
谢谢你，crossin先生，我在你的论坛上学习了python之后，就找了一个写python的工作，所以到现在才来看， ...

#==== Crossin的编程教室 ====#
微信ID：crossincode
网站：http://crossincode.com

使用道具举报

xqqxjnt1988

0 主题	1 好友	61 积分

注册会员

Rank: 2

发消息

12^#

发表于 2017-4-11 15:33:46 |只看该作者

crossin先生发表于 2016-2-16 22:35
查找文件里的内容不用readlines，直接read到一个字符串，然后find就可以了。
另外你这个代码好像无法检测 ...

谢谢你，crossin先生，我在你的论坛上学习了python之后，就找了一个写python的工作，所以到现在才来看，特意来感谢你的。谢谢！

使用道具举报

南斗

0 主题	0 好友	30 积分

新手上路

Rank: 1

发消息

11^#

发表于 2017-3-27 22:53:45 |只看该作者

#! /usr/bin/env python
#coding=utf-8
import os
rootdir = 'E:\Famine' #定义根目录
#三个参数：分别返回1.父目录 2.父目录下所有文件夹名字（不含路径） 3.父目录下所有文件名字
for father_path, foldernames, filenames in os.walk(rootdir):
for filename in filenames: #遍历文件
if os.path.splitext(filename)[1] == '.txt': #判断文件后缀是否是txt
dir = os.path.join(father_path, filename) #组合文件路径
if os.path.exists(dir): #判断文件路径是否存在
folder = os.path.split(os.path.split(dir)[0])[1] #获取文件上层文件夹
print 'The upper folder is:\n%s' % folder, '\n'
print 'The file name is:\n%s' % filename, '\n'
f = file(dir) #打开文件
context = f.read() #读取文件内容
print 'The contents of this file is', '\n', context, '\n\n'
f.close() #关闭文件

复制代码

使用道具举报

huiwenwu

1 主题	0 好友	45 积分

新手上路

Rank: 1

发消息

10^#

发表于 2017-3-7 15:43:42 |只看该作者

模仿上一个题目的一个解答编写的。让用户提供文件夹，文件类型和搜索文本。

#!/usr/bin/python
#This file is used to search files in path A. The names of the files contains B, and the content of the files contain C
#the input should like this:python search.py A B C

import os
import re
import sys

def searchmethod(path,filetype,text):
  all=[]
  for dirpath, dirnames, filenames in os.walk(path):
for filename in filenames:
   f=file(os.path.join(dirpath,filename))
   content=f.read()
   f.close()
   #print content
   if os.path.splitext(filename)[1]==filetype and re.findall(text, content): # to match file type and file content
      all.append(os.path.join(dirpath,filename))
  print all

if __name__=='__main__':
  if len(sys.argv) !=4:
print 'Please input like this:./9.py your_dir file_type your_match_string'
  elif os.path.exists(sys.argv[1]):
searchmethod(sys.argv[1],sys.argv[2],sys.argv[3])

使用道具举报

morpheus2222

0 主题	0 好友	26 积分

新手上路

Rank: 1

发消息

9^#

发表于 2017-1-22 09:03:09 |只看该作者

import os
def find_certain_files(keyword = input("search:") ):
result = []
for path,dirs,files in os.walk("文件夹"):
      for file in files:
         if file.endswith(".txt"):
            fileWithPath = os.path.join(path,file)
            a = open(fileWithPath)
            b = a.read()
            if keyword in b:
                  result.append(fileWithPath)
            a.close()

print(result)
find_certain_files()

使用道具举报

crossin先生

174 主题	45 好友	11万积分

管理员

Rank: 9 Rank: 9 Rank: 9

发消息

8^#

发表于 2016-2-16 22:35:13 |只看该作者

xqqxjnt1988 发表于 2016-2-16 15:25
@crossing先生
请帮我看看，新手鼓起勇气努力写了大半天，期待前辈指点，有任何错误，不妥，都请您指出，拜 ...

查找文件里的内容不用readlines，直接read到一个字符串，然后find就可以了。
另外你这个代码好像无法检测子文件夹，因为你没有进一步对dir去递归搜索

#==== Crossin的编程教室 ====#
微信ID：crossincode
网站：http://crossincode.com

使用道具举报

xqqxjnt1988

0 主题	1 好友	61 积分

注册会员

Rank: 2

发消息

7^#

发表于 2016-2-16 15:25:26 |只看该作者

回帖奖励 +5

@crossing先生
请帮我看看，新手鼓起勇气努力写了大半天，期待前辈指点，有任何错误，不妥，都请您指出，拜谢
#!\usr\bin\python
#-*-encoding:utf-8-*-
#author=xuqq
#下面这个程序实现的功能是：在一个文件夹，包括子文件夹中，寻找包含相关内容的文件，把这些文件罗列出来
import os,re

def search_in_path(path_original,str_targit):
targit_files = []
if (os.path.exists(path_original) and os.path.isabs(path_original)):
      for root,dir,files in os.walk(path_original):                                  #os.walk()遍历是个好东东
         print root
         print dir
         print files

         for file_obj in files:
            file_whole_name = os.path.join(root,file_obj)
            result = search_in_file(file_whole_name,str_targit)
            if result !="":
                  targit_files.append(result)
            else:
                  continue
else:
      print "The path which you have input is not valid!"
print "最终包括这个内容的文件有：\n"
result_num = 0
for tar in targit_files:
      print "%d\t%s" %(result_num,tar)
      result_num = result_num+1

def  search_in_file(file_original,str_targit):
if os.path.isfile(file_original):
      fp = open(file_original,'r')
      match = False
      line = "file begin:"
      i = 1
      while line:
         line =  fp.readline()
         #print "以下打印内容:\n" ,line
         m = re.findall(str_targit,line)
         #print "m = \t",m
         if m !=[]:
            match =True
            print "在文件里找到了"
            break
         else:
            continue
         i = i+1
      print "%_%"*28
      print "match:\t",match
      if match ==True:
         print "Found it!\t"+file_original
         fp.close()
         print '*'*80
         return file_original
      else:
         print "Nothing found!"
         fp.close()
         print '#'*80
         return ""

def main():
search_in_path("C:\Python27", "xuqq")

if __name__ =='__main__':
main()

使用道具举报