程序员最近都爱上了这个网站  程序员们快来瞅瞅吧!  it98k网:it98k.com

本站消息

站长简介/公众号

  出租广告位,需要合作请联系站长

+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

暂无数据

在 Python 抓取数据存储到 csv 文件期间,只需存储单行数据 [重复]

发布于2022-11-01 07:28     阅读(1081)     评论(0)     点赞(6)     收藏(3)


csv文件只存储一行数据,如果我在csv中使用范围,那么它只会一次又一次地执行一行,直到满足范围。

我无法修复这个错误,我花了我 2 天的时间。


for page in range(0,10):
    url = "https://cryptonews.net/?page={page}".format(page =page)
    # print(url)
 
# open the file in the write mode
    # f = open('file.csv', 'w',newline='' )
    header = ['Title', 'Tag', 'UTC','Web_Address']


    # write a row to the csv file
    page = requests.get(url)
    soup = BeautifulSoup(page.content, "html.parser")
    lists = soup.find_all("main")

    for lis in lists:
        title = lis.find('a', class_="title").text
        tag = lis.find('span', class_="etc-mark").text
        datetime = lis.find('span', class_="datetime").text
        address = lis.find('div', class_="middle-xs").text
        img = lis.find('span', class_="src")

        data =([title, tag, datetime,address,img])
 

counter = range(100)

with open('crypto.csv', 'a', newline='') as crypto:
    FileWriter = csv.writer(crypto)
    FileWriter.writerow(header)

    for x in counter:
        
         FileWriter.writerow(data)# writer.writerows(data)




 

解决方案


您没有存储数据,并且如前所述,每次遍历lists. 其次,我会选择在这里使用 pandas 来创建一个数据框,然后将其写入文件。

此外,您收集 5 个要编写的项目,并且只有 4 个列名。

import pandas as pd
import requests
from bs4 import BeautifulSoup


data = []
for page in range(0,10):
    print(page)
    url = "https://cryptonews.net/?page={page}".format(page =page)
    # print(url)

    page = requests.get(url)
    soup = BeautifulSoup(page.content, "html.parser")
    lists = soup.find_all("main")

    for lis in lists:
        title = lis.find('a', class_="title").text
        tag = lis.find('span', class_="etc-mark").text
        datetime = lis.find('span', class_="datetime").text
        address = lis.find('div', class_="middle-xs").text
        img = lis.find('span', class_="src")

        data.append([title, tag, datetime,address,img])
 
header = ['Title', 'Tag', 'UTC','Web_Address','Image']
df = pd.DataFrame(data, columns=header)
df.to_csv('crypto.csv', index=False)

另外,我不确定你想要什么作为输出(正如你没有说的那样)。这更准确吗?

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

data = []
for page in range(0,10):
    print(page)
    url = "https://cryptonews.net/?page={page}".format(page =page)
    # print(url)

    page = requests.get(url)
    soup = BeautifulSoup(page.content, "html.parser")
    lists = soup.find_all("div", {'class':re.compile('^row news-item.*')})

    for lis in lists:
        title = lis['data-title']
        tag = lis.find('span', class_="etc-mark").text
        datetime = lis.find('span', class_=re.compile("^datetime")).text.strip()
        address = lis['data-domain']
        img = lis['data-image']

        data.append([title, tag, datetime,address,img])
 
header = ['Title', 'Tag', 'UTC','Web_Address','Image']
df = pd.DataFrame(data, columns=header)
df.to_csv('crypto.csv', index=False)

输出:

print(df)
                                                 Title  ...                                              Image
0    ETH Breaches $1,500 Level As Ethereum Adds Ove...  ...  https://cnews24.ru/uploads/e29/e29a5677e448f6e...
1    India Seeing Spike in Drug Smuggling Using Cry...  ...  https://cnews24.ru/uploads/65b/65b50302f65e12c...
2    Optimism (OP) Price Prediction: 87% Rally Is J...  ...  https://cnews24.ru/uploads/5e1/5e1189bbb2c1e2b...
3        Mysterious Whale Adds 3.94 Trillion Shiba Inu  ...  https://cnews24.ru/uploads/54a/54af6726248c29a...
4    Are the big fundraising efforts of blockchain ...  ...  https://cnews24.ru/uploads/5af/5afb066d81be4a6...
..                                                 ...  ...                                                ...
195  Terra Classic (LUNC) Chief Community Officer S...  ...  https://cnews24.ru/uploads/a53/a53fd4206ab5f95...
196  Reddit NFT Collection: How to Sell Your Avatar...  ...  https://cnews24.ru/uploads/ab6/ab6718f707c3428...
197  In Topsy Turvy Market Logic, Positive U.S. GDP...  ...  https://cnews24.ru/uploads/264/264ab9327f4774a...
198  XRP Wallets Spikes Above 4.34M, Gaining 29,883...  ...  https://cnews24.ru/uploads/2e5/2e56d092b7c253b...
199                     Are crypto trading bots legit?  ...  https://cnews24.ru/uploads/ccb/ccb73d9d9b79280...

[200 rows x 5 columns]


所属网站分类: 技术文章 > 问答

作者:黑洞官方问答小能手

链接:https://www.pythonheidong.com/blog/article/1837344/5a599c7fbf5752cf3687/

来源:python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

6 0
收藏该文
已收藏

评论内容:(最多支持255个字符)