发布于2023-05-15 23:08 阅读(1066) 评论(0) 点赞(4) 收藏(3)
所以我是 python 的新手,我想发现它的潜力并获得更多关于我可以用它做什么的知识。我做了这个代码来比较 CSV,基本上它是做什么的,你提供它 2 个 CSV,CSV1 有一些 id 列和一个包含你想添加到其他 CSV (CSV2) 的值的列
注意:这个脚本完全符合我的要求,而且似乎工作正常,希望它对某些人也有用,我的问题真的是我可以做些什么来提高它的性能甚至使代码更清晰
# Made by Varqas
# CSV1 = CSV containing values that can be matched in CSV2 and a column that will be added
# CSV2 = CSV containing values that can be matched and column that will be concatenated at the end of the CSV (The last column values should be empty)
with open('csv1.csv', encoding="utf8") as check_file:
# Get Column that will be used to Compare values and add it to a list
columnToCompare = list([row.split(',')[0].strip() for row in check_file])
with open('csv1.csv', encoding="utf8") as check_file:
# Get Column that will be used to add to a row values and add it to a list
columnToAdd = list([row.split(',')[2].strip() for row in check_file])
with open('csv2.csv', 'r', encoding="utf8") as in_file, open('out.csv', 'w', encoding="utf8") as out_file:
i = 0
# For each Row in CSV2
for line in in_file:
# Write Headers
if i == 0:
out_file.write(line)
else:
# GET Column on CSV2 containing value that will be compared on CVS1
value = line.split(',')[1].strip()
# Check if first Column value on CSV2 either variable is in
if value in columnToCompare:
# Check for duplicates in the list
numberOfOccurences = list(columnToCompare).count(value)
concatRow = ""
if numberOfOccurences > 1:
# Concatenate all values of occurences
for x in range(numberOfOccurences):
index = list(columnToCompare).index(value)
concatRow = concatRow + columnToAdd[index]
if x != numberOfOccurences - 1:
concatRow = concatRow + " + "
# Remove value so list.index doesn't found same row
columnToCompare[index] = ""
else:
# Add other row that doesn't match
index = list(columnToCompare).index(value)
concatRow = columnToAdd[index]
# Concat to last column of CSV2
out_file.write(line.strip() + concatRow + "\n")
else:
# Still concat value in CSV2 to last column if not found in csv1
out_file.write(line.strip() + "not found" + "\n")
i = i + 1
我知道它可以改进,也许可以使用一些库来缩小……让我知道你的想法!
我尝试使用 pd merge,但我不太了解如何在其中添加连接和值。
您可以使用 Pandas 库将两个 CSV 文件读取到数据框中,并将两列合并到第二个 CSV 中,并输出包含合并列的新 CSV。
import pandas as pd
# read first CSV
df1 = pd.read_csv('first.csv')
# read second CSV
df2 = pd.read_csv('second.csv')
# merge the id column and a "column with values
# that you want to add to other CSV (CSV2)"
# for the example the second column is named 'data'.
merged_df = pd.merge(df2, df1[['id', 'data']], on='id', how='left')
# save new dataframe to csv.
merged_df.to_csv('merged.csv', index=False)
作者:黑洞官方问答小能手
链接:https://www.pythonheidong.com/blog/article/1975999/964e531554fb1ac6ca53/
来源:python黑洞网
任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任
昵称:
评论内容:(最多支持255个字符)
---无人问津也好,技不如人也罢,你都要试着安静下来,去做自己该做的事,而不是让内心的烦躁、焦虑,坏掉你本来就不多的热情和定力
Copyright © 2018-2021 python黑洞网 All Rights Reserved 版权所有,并保留所有权利。 京ICP备18063182号-1
投诉与举报,广告合作请联系vgs_info@163.com或QQ3083709327
免责声明:网站文章均由用户上传,仅供读者学习交流使用,禁止用做商业用途。若文章涉及色情,反动,侵权等违法信息,请向我们举报,一经核实我们会立即删除!