程序员最近都爱上了这个网站  程序员们快来瞅瞅吧!  it98k网:it98k.com

本站消息

站长简介/公众号

  出租广告位,需要合作请联系站长

+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

暂无数据

比较 CSV 列并添加到第二个 CSV 上的特定行的更有效方法(Python)[重复]

发布于2023-05-15 23:08     阅读(1147)     评论(0)     点赞(4)     收藏(3)


所以我是 python 的新手,我想发现它的潜力并获得更多关于我可以用它做什么的知识。我做了这个代码来比较 CSV,基本上它是做什么的,你提供它 2 个 CSV,CSV1 有一些 id 列和一个包含你想添加到其他 CSV (CSV2) 的值的列

注意:这个脚本完全符合我的要求,而且似乎工作正常,希望它对某些人也有用,我的问题真的是我可以做些什么来提高它的性能甚至使代码更清晰

# Made by Varqas
# CSV1 = CSV containing values that can be matched in CSV2 and a column that will be added
# CSV2 = CSV containing values that can be matched and column that will be concatenated at the end of the CSV (The last column values should be empty)

with open('csv1.csv', encoding="utf8") as check_file:
    # Get Column that will be used to Compare values and add it to a list
    columnToCompare = list([row.split(',')[0].strip() for row in check_file])

with open('csv1.csv', encoding="utf8") as check_file:
    # Get Column that will be used to add to a row values and add it to a list
    columnToAdd = list([row.split(',')[2].strip() for row in check_file])

with open('csv2.csv', 'r', encoding="utf8") as in_file, open('out.csv', 'w', encoding="utf8") as out_file:
    i = 0
    # For each Row in CSV2
    for line in in_file:
        # Write Headers
        if i == 0:
            out_file.write(line)
        else:
            # GET Column on CSV2 containing value that will be compared on CVS1
            value = line.split(',')[1].strip()
            # Check if first Column value on CSV2  either variable is in 
            if value in columnToCompare:
                # Check for duplicates in the list 
                numberOfOccurences = list(columnToCompare).count(value)
                concatRow = ""
                if numberOfOccurences > 1:
                    # Concatenate all values of occurences
                    for x in range(numberOfOccurences):
                        index = list(columnToCompare).index(value)
                        concatRow = concatRow + columnToAdd[index]
                        if x != numberOfOccurences - 1:
                            concatRow = concatRow + " + "
                        # Remove value so list.index doesn't found same row
                        columnToCompare[index] = ""
                else:
                    # Add other row that doesn't match
                    index = list(columnToCompare).index(value)
                    concatRow = columnToAdd[index]

                # Concat to last column of CSV2
                out_file.write(line.strip() + concatRow + "\n")
            else:
                # Still concat value in CSV2 to last column if not found in csv1 
                out_file.write(line.strip() + "not found" + "\n")
        i = i + 1

我知道它可以改进,也许可以使用一些库来缩小……让我知道你的想法!

我尝试使用 pd merge,但我不太了解如何在其中添加连接和值。


解决方案


您可以使用 Pandas 库将两个 CSV 文件读取到数据框中,并将两列合并到第二个 CSV 中,并输出包含合并列的新 CSV。

import pandas as pd

# read first CSV
df1 = pd.read_csv('first.csv')

# read second CSV
df2 = pd.read_csv('second.csv')

# merge the id column and a "column with values 
# that you want to add to other CSV (CSV2)"
# for the example the second column is named 'data'.
merged_df = pd.merge(df2, df1[['id', 'data']], on='id', how='left')

# save new dataframe to csv.
merged_df.to_csv('merged.csv', index=False)


所属网站分类: 技术文章 > 问答

作者:黑洞官方问答小能手

链接:https://www.pythonheidong.com/blog/article/1975999/964e531554fb1ac6ca53/

来源:python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

4 0
收藏该文
已收藏

评论内容:(最多支持255个字符)