本站消息

站长简介/公众号


站长简介:高级软件工程师,曾在阿里云,每日优鲜从事全栈开发工作,利用周末时间开发出本站,欢迎关注我的微信公众号:程序员总部,程序员的家,探索程序员的人生之路!分享IT最新技术,关注行业最新动向,让你永不落伍。了解同行们的工资,生活工作中的酸甜苦辣,谋求程序员的最终出路!

  价值13000svip视频教程,python大神匠心打造,零基础python开发工程师视频教程全套,基础+进阶+项目实战,包含课件和源码

  出租广告位,需要合作请联系站长

+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

暂无数据

Python: Use regex to extract a column of a file

发布于2021-10-15 00:12     阅读(539)     评论(0)     点赞(27)     收藏(5)



I am currently extracting columns in a file by using awk in os.system():

os.system("awk '{print $'%i'}' < infile > outfile"%some_column)
np.loadtxt('outfile')

Is there an equivalent way to accomplish this using regex?

Thanks.

Edit: I want to clarify that I am looking for the most optimal way to extract specific columns of large files.


解决方案


Depending on what your data delimiters are, regex is probably overkill for this. If the delimiters are simple (whitespace or a specific character/string), you can separate columns simply by using the string.split method.

Here is an example program to explain how this might work:

column = 0  # First column
with open("data.txt") as file:
  data = file.readlines()
columns = list(map(lambda x: x.strip().split()[column], data))

To break this down:

column = 0
# Read a file named "data.txt" into an array of lines
with open("data.txt") as file:
  data = file.readlines()
# This is where we will store the columns as we extract them
columns = []
# Iterate over each line in the file
for line in data:
  # Strip the whitespace (including the trailing newline character) from the
  # start and end of the string
  line = line.strip()
  # Split the line, using the standard delimiter (arbitrary number of
  # whitespace characters)
  line = line.split()
  # Extract the column data from the desired index and store it in our list
  columns.append(line[column])
# columns now holds a list of strings extracted from that column






所属网站分类: 技术文章 > 问答

作者:黑洞官方问答小能手

链接:https://www.pythonheidong.com/blog/article/1059590/8097654ff97ca9b9e5f4/

来源:python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

27 0
收藏该文
已收藏

评论内容:(最多支持255个字符)