+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

2019-03(2)

2019-04(1)

2019-06(1)

2019-07(6)

2019-08(122)

Python B站弹幕爬取并制作词云图

发布于2020-11-26 21:33     阅读(957)     评论(0)     点赞(26)     收藏(1)


0

1

2

3

4

5

6

7

8

示例代码如下:

import requests
import jieba
import wordcloud
import numpy as np
from pyquery import PyQuery as pq
from PIL import Image

def get_danmu(url):
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36',
        'cookie': "finger=158939783; _uuid=1D942BB2-EBEB-C3C3-9026-4AB6B15365CC10680infoc; buvid3=86E0A4D9-734F-4537-BD1A-E8CA03A594E9143077infoc; sid=8s0h00e6; CURRENT_FNVAL=80; blackside_state=1; rpdid=|(k|)Jumu~uY0J'uY|JRl|Jku; CURRENT_QUALITY=64; LIVE_BUVID=AUTO5116038516749932; DedeUserID=409871996; DedeUserID__ckMd5=f6bdb8d459021b42; SESSDATA=6c46771e%2C1620224742%2Cbfc85*b1; bili_jct=71291544e6e18dbe5f65e18e71f6a34c; bp_video_offset_409871996=457347019835935713; bp_t_offset_409871996=457353028495207860; PVID=1; bfe_id=61a513175dc1ae8854a560f6b82b37af"
    }
    resp = requests.get(url=url, headers=headers)
    resp.raise_for_status()
    doc = pq(resp.content)
    all_d = doc("d")
    danmu = list()
    for i in all_d.items():
        danmu.append(i.text())
    print(danmu)
    return danmu

def save_data(danmu):
    with open('弹幕.csv', "a", encoding="utf-8-sig")as f:
        for i in danmu:
            f.write(i + "\n")

def draw_word_Cloud():
    # 读取csv内容
    f = open('弹幕.csv', encoding='utf-8')
    content = f.read()
    text_list = jieba.lcut(content)
    text_str = ''.join(text_list)

    back_groud = np.array(Image.open('background.jpeg'))

    # 设置词云图
    wd = wordcloud.WordCloud(
        # width = 800,  # 设置图片的宽
        # height = 400,  # 设置图片的高
        background_color = 'white',  # 设置图片背景颜色
        font_path = 'msyh.ttc',  # 设置微软雅黑字体
        scale = 15,  # 设置缩放比例
        mask = back_groud # 如果设置了绘制背景图,则宽高不用设置,默认生成的图片是矩形图片
    )
    # 绘制文字
    wd.generate(text_str)
    # 输出图片
    wd.to_file('1.jpeg')

if __name__ == '__main__':
    result = []
    for i in range(10,31):
        url = "https://api.bilibili.com/x/v2/dm/history?type=1&oid=140610898&date=2020-10-%s"%(i)
        print(url)
        result.extend(get_danmu(url))
    save_data(danmu=result)
    draw_word_Cloud()


原文链接:https://blog.csdn.net/weixin_44243623/article/details/110060931

0

1

2

3

4



所属网站分类: 技术文章 > 博客

作者:我是天上的仙女

链接: https://www.pythonheidong.com/blog/article/629423/df4ac5aaaa6c22fe8ec1/

来源: python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

26 0
收藏该文
已收藏

评论内容:(最多支持255个字符)