本站消息

站长简介/公众号

出租广告位,需要合作请联系站长

what

1086

文章

940752

访问

+关注

分类

暂无分类

日期归档

2024-11(1)

*笨办法学python3 学习笔记习题17-19

发布于2020-04-03 10:46 阅读(1630) 评论(0) 点赞(17) 收藏(2)

习题17 更多文件操作

如何解决文件编码错误

在运行程序时报错，称编码失败。

原代码：

from sys import argv
from os.path import exists

script, from_file, to_file = argv

print(f"Copying from {from_file} to {to_file}")

# We could do these two on one line, how?
# 打开from_file的文件对象并将其赋值给in_file
in_file = open(from_file)
# 读取in_file内容并将其赋值给indata
indata = in_file.read()

# 输出indata的文件字符长度
print(f"The input file is {len(indata)} bytes long")

# 查看to_file文件是否存在
print(f"Does the output file exist? {exists(to_file)}")

print("Ready, hit RETURN to continue, hit CTRL-C to abort.")
input()

# 打开to_file的文件对象并将其赋值给out_file
out_file = open(to_file, 'w')
# 将indata的内容写入out_file文件
out_file.write(indata)

print("Alright, all done.")

#关闭in_file, out_file文件
out_file.close()
in_file.close()

PS D:\pythonp> # first make a sample file.
PS D:\pythonp> echo "This is a test file." > test17.txt
PS D:\pythonp> #then look at it.
PS D:\pythonp> cat test17.txt
This is a test file.
PS D:\pythonp> # now run our script on it.
PS D:\pythonp> python ex17.py test17.txt new_file17.txt
Copying from test17.txt to new_file17.txt
Traceback (most recent call last):
  File "ex17.py", line 10, in <module>
    indata = in_file.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 0: illegal multibyte sequence

在read（）函数处报错

查询发现可能是文件编码问题。

按照如下文章调试不成功，具体错误如下列代码。
UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xab in position 11126: illegal multibyte sequence

尝试一（×）

将第9行改为

# 将编码格式改为gbk
in_file = open(from_file， encoding = 'gbk')

PS D:\pythonp> python ex17.py test17.txt new_file17.txt
Copying from test17.txt to new_file17.txt
Traceback (most recent call last):
  File "ex17.py", line 10, in <module>
    indata = in_file.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 0: illegal multibyte sequence

仍然在同一个位置报错

尝试二：（×）

将第9行改为

# 将编码格式改为gb18030
in_file = open(from_file， encoding = 'gb18030')

PS D:\pythonp> python ex17.py test17.txt new_file.txt
Copying from test17.txt to new_file.txt
Traceback (most recent call last):
  File "ex17.py", line 10, in <module>
    indata = in_file.read()
UnicodeDecodeError: 'gb18030' codec can't decode byte 0xff in position 0: illegal multibyte sequence

继续同一位置报错

尝试三（×）

将第9行改为

# 将编码格式改为gb18030，令忽略错误
in_file = open(from_file, encoding = 'gb18030', errors = 'ignore')

PS D:\pythonp> python ex17.py test17.txt new_file17.txt
Copying from test17.txt to new_file17.txt
The input file is 44 bytes long
Does the output file exist? False
Ready, hit RETURN to continue, hit CTRL-C to abort.

Traceback (most recent call last):
  File "ex17.py", line 19, in <module>
    out_file.write(indata)
UnicodeEncodeError: 'gbk' codec can't encode character '\u2e84' in position 0: illegal multibyte sequence

原来的位置运行成功，但在write（）函数处再一次报错，所以又去查阅资料，想要搞清楚问题根源。

powershell的文件输出格式

经查，问题在于PowerShell对于文件的输出重定向默认选择”UTF-16 (LE)”(微软称之为Unicode编码)，而实际需要文件输出格式为”UTF-8”

试用参考文章中的解决方法
Windows PowerShell 输出文件编码格式问题
 Powershell改变默认编码
 将PowerShell的默认输出编码更改为UTF-8

更倾向于不更改默认输出编码的方式
所以做以下尝试

尝试一（×）

PS D:\pythonp> chcp 65001
chcp : 无法将“chcp”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写，如果包括路径，请确保路径正确
，然后再试一次。
所在位置 行:1 字符: 1
+ chcp 65001
+ ~~~~
    + CategoryInfo          : ObjectNotFound: (chcp:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

报错，输入chcp 65001切换当前命令行窗口工作编码格式为”UTF-8”的方式不适用。

在不想尝试改变默认输出编码的情况下只能从输出方式入手，尝试不同的输出途径

尝试一：在powershell中利用echo创建txt文件（最初导致报错的输出方式）

PS D:\pythonp> echo "TEST one">test17.1.txt
PS D:\pythonp> cat test17.1.txt
TEST one

其编码格式为Unicode。

尝试二：在powershell中利用echo创建txt文件，并加上-encoding utf-8的后缀

PS D:\pythonp> echo "test two" > test17.2.txt -encoding utf-8
PS D:\pythonp> cat test17.2.txt
test two
-encoding
utf-8

其编码格式依旧为Unicode，可见-encoding utf-8后缀被当作内容一同写入文件。运行后与 <尝试一> 报相同的错误如下。

PS D:\pythonp> python ex17.py test17.2.txt new_file.txt
Copying from test17.2.txt to new_file.txt
Traceback (most recent call last):
  File "ex17.py", line 10, in <module>
    indata = in_file.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 0: illegal multibyte sequence

尝试三（√）：直接新建文本文档生成txt文件【若针对已有文件，则较为繁琐】

其编码格式为ANSI，相关内容见编码格式

PS D:\pythonp> python ex17.py test17.3.txt new_file.txt
Copying from test17.3.txt to new_file.txt
The input file is 10 bytes long
Does the output file exist? True
Ready, hit RETURN to continue, hit CTRL-C to abort.
teturn
Alright, all done.

尝试四（√）：用VS生成的txt文件【若针对已有文件，则较为繁琐】

其编码格式为ANSI

PS D:\pythonp> python ex17.py test17.4.txt new_file.txt
Copying from test17.4.txt to new_file.txt
The input file is 9 bytes long
Does the output file exist? True
Ready, hit RETURN to continue, hit CTRL-C to abort.

Alright, all done.

编码格式

在这里插入图片描述

具体内容如如何编码，具体细节与区别等之后更深入了解后补充

参考文章：
编码格式简介（ANSI、GBK、GB2312、UTF-8、GB18030和 UNICODE）
编码格式（UTF-8 与 ANSI）各种编码解码（encode、decode）
编码方式之ASCII、ANSI、Unicode概述

查阅之中无意发现，将open函数的打开权限改为以二进制方式打开，可以避免编码格式不同造成的报错。

尝试如下（√）【只改代码即可，方便简洁。但据查阅资料显示，有的文档以二进制方式打开将导致文档内容改变等问题。待试用证明】

将第9行和第18行代码分别改为

#以只读二进制格式打开from_file的文件对象，并将其赋值给in_file
in_file = open(from_file, 'rb')
#以只写二进制格式打开to_file的文件对象，并将其赋值给out_file
out_file = open(to_file, 'wb')

PS D:\pythonp>  python ex17.py test17.2.txt new_test17.1.txt
Copying from test17.2.txt to new_test17.1.txt
The input file is 38 bytes long
Does the output file exist? True
Ready, hit RETURN to continue, hit CTRL-C to abort.

Alright, all done.

PS D:\pythonp> cat test17.2.txt
There is a test.

PS D:\pythonp> cat new_test17.1.txt
There is a test.

经验证，运行结果未出现异常。

以默认方式/二进制方式打开文档的区别
- 读文章
  以默认方式读取二进制文件，可能会出现文件读取不全的情况。
  碰到’0x1A’，就错误地视为文件结束（EOF）。使用二进制方式读取二进制文件可避免这种情况。
- 写文章
  对于字符串x=‘abc\ndef’,我们可用len(x)得到它的长度为7，\n我们称之为换行符，实际上是0x0A。当我们用’w’即文本方式写的时候，在windows平台上会自动将’0x0A’变成两个字符’0x0D’, ‘0x0A’，即文件长度实际上变成8。当用’r’文本方式读取时，又自动的转换成原来的换行符。如果换成’wb’二进制方式来写的话，则会保持一个字符不变，读取的时候也是原样读取。所以如果用文本方式写入，用二进制方式读取的话，就要考虑这多出的一个字节了。'0x0D’也称回车符。 Linux下不会变，因为linux只使用’0X0A’来表示换行。

参考文章：
《笨办法学Python3》——练习17
Python中读取txt文本出现“ ‘gbk’ codec can’t decode byte 0xbf in position 2: illegal multibyte sequence”的解决办法

这个脚本实在是烦人。没必要在做复制之前问你，也没必要在屏幕上输出那么多东西。试着删掉脚本的一些特性，让它用起来更加友好。

from sys import argv

script, from_file, to_file = argv
#打开from_file的文件对象，并读其内容，将内容赋值给indata
indata = open(from_file).read()

print(f"The input file is {len(indata)} bytes long")
#以只写方式打开to_file的文件对象，并将indata中的内容写入to_file，将内容赋值给outdata
outdata = open(to_file, 'w').write(indata)

#关闭in_file, out_file文件
indata.close()
outdata.close()

看看你能把这个脚本改多短，我可以把它变成一行。

from sys import argv
script, from_file, to_file = argv
# 以只写方式打开to_file的文件对象，打开from_file的文件对象并读其内容，将内容写入打开的to_file的文件对象
open(to_file, "w").write(open(from_file).read())

PS D:\pythonp> python ex17.2.py test17.4.txt test17.5.txt
PS D:\pythonp>

没有输出内容，希望看到文件内容

from sys import argv

script, from_file, to_file = argv

open(to_file, "w").write(open(from_file).read())
#输出to_file指代的文件名，而非文件内容
print(to_file)

PS D:\pythonp> python ex17.2.py test17.4.txt test17.5.txt
test17.5.txt

PS D:\pythonp> cat test17.5.txt
TEST four   ——new

当打印变量时，打印输出的是该变量指代的文件名，而非内容
在shell内用cat可以查看文件的全部内容

from sys import argv

script, from_file, to_file = argv

open(to_file, "w").write(open(from_file).read())
#打开to_file的文件对象并读其内容，输出文件内容
print(open(to_file).read())

PS D:\pythonp>  python ex17.2.py test17.4.txt test17.5.txt
TEST four   ——new

若希望直接在py文件内令其输出文件内容，则需如上

在“应该看到的结果”中我使用了一个叫cat的东西，这个古老的命令的用途是将两个文件“拼接”（ concatenate）到一起，不过实际上它最大的用途是打印文件内容到屏幕上。你可以通过 man cat命令了解到更多信息。
找出为什么需要在代码中写 out_file.close()。

close() 相当于关闭文件并保存文件。如果没有close()，写入的内容可能会存在缓冲区中，并没有真正的写入文件里，无法被保存下来。

待解决

阅读一下与 Python的 import语句相关的内容，打开 python测试一下。试着导入一些东西，看看你能不能弄对，弄不对也没关系

习题18 命名、变量、代码和函数

# this one is like your script with argv
# 创建函数，需对其中的参数进行解包
# 非最简单的方法
def print_two(*args):
    arg1, arg2 = args
    print(f"arg1: {arg1}, arg2: {arg2}")

# ok, that *args is actually pointless, we can just do this
# 在 Python中创建函数时，可以跳过整个参数解包的过程，直接使用（）里边的名称作为变量名
def print_two_again(arg1, arg2):
    print(f"arg1: {arg1}, arg2: {arg2}")

# this just takes one argument
# 函数如何接收一个参数
def print_one(arg1):
    print(f"arg1: {arg1}")

# this one takes no arguments
# 函数可以不接收任何参数
def print_none():
    print("I got nothin'.")

print_two("Zed","Shaw")
print_two_again("Zed","Shaw")
print_one("Fisrt!")
print_none()

PS D:\pythonp> python .\ex18.py
arg1: Zed, arg2: Shaw
arg1: Zed, arg2: Shaw
arg1: Fisrt!
I got nothin'.
PS D:\pythonp>

创建一个函数

使用def命令创建一个函数，即定义（ define）
紧挨着def的是函数的名字。名字可以随便取，最好函数名能够体现出函数的功能。
然后告诉函数，我们需要*args，这和脚本的argv非常相似，参数必须放在圆括号（）中才能正常工作。
接着用冒号（：）结束这一行，然后开始下一行缩进。
冒号以下，使用4个空格缩进的行都是属于 print two这个函数的内容
其中第一行的作用是将参数解包，这和脚本参数解包的原理差不多。

在 Python中创建函数时，可以跳过整个参数解包的过程，直接使用（）里边的名称作为变量名。
函数可以接收一个参数
函数可以不接收任何参数

运行函数、调用函数和使用函数是同一个意思。

习题19 函数和变量

def cheese_and_crackers(cheese_count, boxes_of_crackers):
    print(f"You have {cheese_count} cheeses!")
    print(f"You have {boxes_of_crackers} boxes of crackers!")
    print("Man that's enough for a party!")
    print("Get a blanket!\n")

print("We can just give the function number directly:")
cheese_and_crackers(20, 30)


print("Oh! We can use the variables from our script:")
amount_of_cheese = 10
amount_of_crackers = 50

cheese_and_crackers(amount_of_cheese, amount_of_crackers)

print("We can even do math inside too:")
cheese_and_crackers(10 + 20, 5 + 6)

print("And we can combine the two, math and variables:")
cheese_and_crackers(amount_of_cheese + 100, amount_of_crackers + 1000)

PS D:\pythonp> python .\ex19.py
We can just give the function number directly:
You have 20 cheeses!
You have 30 boxes of crackers!
Man that's enough for a party!
Get a blanket!

Oh! We can use the variables from our script:
You have 10 cheeses!
You have 50 boxes of crackers!
Man that's enough for a party!
Get a blanket!

We can even do math inside too:
You have 30 cheeses!
You have 11 boxes of crackers!
Man that's enough for a party!
Get a blanket!

And we can combine the two, math and variables:
You have 110 cheeses!
You have 1050 boxes of crackers!
Man that's enough for a party!
Get a blanket!

倒着将脚本读完，在每一行上面添加一条注释，说明这一行的作用。

# 定义cheese_and_crackers函数
    ## 将函数命名为cheese_and_crackers，设定函数接收两个参数，分别是cheese_count, boxes_of_crackers
def cheese_and_crackers(cheese_count, boxes_of_crackers):
    ## 输出格式化字符串：“你有 {cheese_count} 份奶酪！”
    print(f"You have {cheese_count} cheeses!")
    ## 输出格式化字符串：“你有 {boxes_of_crackers} 盒薄脆饼干！”
    print(f"You have {boxes_of_crackers} boxes of crackers!")
    ## 输出字符串：“这对一个派对来说足够了！”
    print("Man that's enough for a party!")
    ## 输出字符串：“拿条毯子！”，并换一行
    print("Get a blanket!\n")

# 输出字符串：“我们可以直接给出功能数字”
print("We can just give the function number directly:")
# 调用cheese_and_crackers函数，并直接赋值
cheese_and_crackers(20, 30)

# 输出字符串：“我们可以在我们的脚本中使用变量”
print("Oh! We can use the variables from our script:")
# 定义变量
amount_of_cheese = 10
amount_of_crackers = 50

# 调用cheese_and_crackers函数，并带入变量
cheese_and_crackers(amount_of_cheese, amount_of_crackers)

# 输出字符串：“我们甚至可以在函数内部进行数学运算”
print("We can even do math inside too:")
# 调用cheese_and_crackers函数，并进行数学运算
cheese_and_crackers(10 + 20, 5 + 6)

# 输出字符串：“我们可以将数学表达式和变量组合起来用”
print("And we can combine the two, math and variables:")
# 调用cheese_and_crackers函数，并将变量的值经过运算赋给函数
cheese_and_crackers(amount_of_cheese + 100, amount_of_crackers + 1000)

从最后一行开始，倒着读每一行，读出所有的重要字符来。

# def：定义函数
def cheese_and_crackers(cheese_count, boxes_of_crackers):
    # f：将字符串进行格式化
    print(f"You have {cheese_count} cheeses!")
    # f：将字符串进行格式化
    print(f"You have {boxes_of_crackers} boxes of crackers!")
    print("Man that's enough for a party!")
    # \n：换行符
    print("Get a blanket!\n")

print("We can just give the function number directly:")
# 20/30：对函数直接赋值
cheese_and_crackers(20, 30)

print("Oh! We can use the variables from our script:")
# 定义变量并赋值
amount_of_cheese = 10
amount_of_crackers = 50

cheese_and_crackers(amount_of_cheese, amount_of_crackers)

print("We can even do math inside too:")
# 10/20/5/6：直接给定进行数学运算的数值
cheese_and_crackers(10 + 20, 5 + 6)

print("And we can combine the two, math and variables:")
# 100/1000：直接给定进行数学运算的数值
cheese_and_crackers(amount_of_cheese + 100, amount_of_crackers + 1000)

自己编写至少一个函数出来，然后用10种不同的方式运行这个函数。
分为两个文件运行

# 定义potatoes_and_tomatoes函数
def potatoes_and_tomatoes(potatoes_count, tomatoes_count):
    print(f"You have {potatoes_count} potatoes and {tomatoes_count} tomatoes.")
    print("Let's cook!")

# 不添加变量/无用户输入
# 方法一：直接赋值
potatoes_and_tomatoes(15, 20)

# 方法二：进行数学运算
potatoes_and_tomatoes(15 + 60, 25 * 3)

# 添加变量/无用户输入
# 方法三：直接使用变量赋值
amount_of_potatoes = 70
amount_of_tomatoes = 45
potatoes_and_tomatoes(amount_of_potatoes, amount_of_tomatoes)

# 方法四：使用变量+数学运算进行赋值
potatoes_and_tomatoes(amount_of_potatoes - 25, amount_of_tomatoes / 3)

# 添加变量/input用户输入
# 方法五：将用户输入的数值传递给变量，再传递给函数
amount_of_potatoes1 = int(input("input the number of your potatoes:"))
amount_of_tomatoes1 = int(input("input the number of your tomatoes:"))
potatoes_and_tomatoes(amount_of_potatoes1, amount_of_tomatoes1)

# 方法六：将用户输入的数值经数学运算后传递给函数
potatoes_and_tomatoes(amount_of_potatoes1 - 25, amount_of_tomatoes1 + 5)

# 不添加变量/input用户输入
# 方法七：将用户输入的数值直接传递给函数，无需创建变量
potatoes_and_tomatoes(int(input("input the number of your potatoes:")), int(input("input the number of your tomatoes:")))

# 方法八：不添加新变量，将用户输入的数据经数学运算后传递给函数
potatoes_and_tomatoes(int(input("input the number of your potatoes:")) + 15, int(input("input the number of your tomatoes:")) - 10)

#使用argv方式获得用户输入
# 方法九：直接传递数据
from sys import argv
script, potatoes, tomatoes = argv

potatoes_and_tomatoes(int(potatoes), int(tomatoes))

# 定义potatoes_and_tomatoes函数
def potatoes_and_tomatoes(potatoes_count, tomatoes_count):
    print(f"You have {potatoes_count} potatoes and {tomatoes_count} tomatoes.")
    print("Let's cook!")
    
# 方法十：方法九的基础上加数学运算
from sys import argv
script, potatoes, tomatoes = argv

potatoes_and_tomatoes(int(potatoes) + 10, int(tomatoes) - 10)

共十种结果如下：

PS D:\pythonp> python ex19.3.py 90 95
You have 15 potatoes and 20 tomatoes.
Let's cook!
You have 75 potatoes and 75 tomatoes.
Let's cook!
You have 70 potatoes and 45 tomatoes.
Let's cook!
You have 45 potatoes and 15.0 tomatoes.
Let's cook!
input the number of your potatoes:50
input the number of your tomatoes:55
You have 50 potatoes and 55 tomatoes.
Let's cook!
You have 25 potatoes and 60 tomatoes.
Let's cook!
input the number of your potatoes:70
input the number of your tomatoes:75
You have 70 potatoes and 75 tomatoes.
Let's cook!
input the number of your potatoes:80
input the number of your tomatoes:85
You have 95 potatoes and 75 tomatoes.
Let's cook!
You have 90 potatoes and 95 tomatoes.
Let's cook!
PS D:\pythonp> python ex19.3.2.py 100 110
You have 110 potatoes and 100 tomatoes.
Let's cook!