- 帖子
- 8
- 精华
- 0
- 积分
- 29
- 阅读权限
- 10
- 注册时间
- 2013-7-25
- 最后登录
- 2013-10-28
|
原始文件是这样的:
[work@sageskr python]$ cat from.txt
A bad beginning makes a bad ending. 恶其始者必恶其终。
Hello Mr.张,welcome you to 南京.
#==========程序==========
#!/bin/env python
# -*- encoding:utf-8 -*-
import re
fo = open('/home/work/python/from.txt')
ot = open('/home/work/python/ot.txt')
t_result = []
while True:
line = fo.readline()
if not line:
fo.close()
break
else:
line = re.sub('[^A-Za-z]',' ',line)
list_line = line.split()
for i in list_line:
t_result.appen(i)
def order_by_str(x):
temp = []
res = []
for i in x:
temp.append((i.lower(),i))
temp.sort()
for l in temp:
res.append(l[1])
return res
result = order_by_str(t_result)
for i in result:
ot.write(i + "\n")
ot.close()
#==========程序结束==========
结果是这样的:
[work@sageskr python]$ cat ot.txt
A
a
bad
bad
beginning
ending
Hello
makes
Mr
to
welcome
you
发现结果中有些单词是重复的,怎么减少重复呢,做了下改良
=====start=====
#!/bin/env python
# -*- encoding:utf-8 -*-
import re
fo = open('/home/work/python/from.txt')
ot = open('/home/work/python/ot.txt','w')
t_result = []
while True:
line = fo.readline()
if not line:
fo.close()
break
else:
line = re.sub('[^A-Za-z]',' ',line)
list_line = line.split()
for i in list_line:
t_result.append(i)
def order_by_str(x):
temp = []
res = []
_temp = []
for i in x:
temp.append((i.lower(),i))
for k in temp:
_temp.append(k[0])
temp = set(_temp)
temp = sorted(temp)
for l in temp:
res.append(l)
return res
result = order_by_str(t_result)
for i in result:
ot.write(i + "\n")
ot.close()
=====end=====
|
|