+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

2020-03(56)

2020-04(61)

2020-05(28)

2020-06(50)

2020-07(38)

Python爬虫训练:爬取酷燃网视频数据

发布于2020-10-25 09:11     阅读(844)     评论(0)     点赞(3)     收藏(1)


0

1

2

3

4

前言

本文的文字及图片来源于网络,仅供学习、交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时联系我们以作处理

项目目标

爬取酷燃网视频数据

https://krcom.cn/

 

环境

Python3.6

pycharm

 

爬虫代码

import pprint
import requests
import re

def download_video(title, url):
    filename_video = 'C:\\Users\\Administrator\\Desktop\\酷燃网\\' + title + '.mp4'
    response_video = requests.get(url=url)
    with open(filename_video, mode='wb') as f:
        f.write(response_video.content)

def download_mp3(title, url):
    filename_mp3 = 'C:\\Users\\Administrator\\Desktop\\酷燃网\\' + title + '.mp3'
    response_mp3 = requests.get(url=url)
    with open(filename_mp3, mode='wb') as f:
        f.write(response_mp3.content)

for page in range(0, 101, 20):
    url = 'https://krcom.cn/aj/hot/loadingmore?ajwvr=6&cursor=0;2020102014&YmdH=&__rnd=1603176486876'
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
    }
    response = requests.get(url=url, headers=headers)
    html_data = response.text.encode('utf-8').decode('unicode_escape')
    urls = re.findall('vid=(.*?)\"', html_data, re.S)
    titles = re.findall('<h3 class="V_autocut_2l">(.*?)<', html_data, re.S)
    data = zip(urls, titles)
    for i in data:
        vid = i[0]
        title = i[1]
        page_url = 'https://krcom.cn/aj/dash/media?media_ids={}&protocols=dash&watermarks=krcom'.format(vid)
        response_2 = requests.get(url=page_url, headers=headers)
        html_json = response_2.json()
        video_url = html_json['data']['list'][0]['details'][1]['play_info']['url']
        mp3_url = html_json['data']['list'][0]['details'][-1]['play_info']['url']
        download_video(title, video_url)
        download_mp3(title, mp3_url)
        print(title)

 

原文链接:https://www.cnblogs.com/hhh188764/p/13864523.html

0

1

2

3

4

5

6

7



所属网站分类: 技术文章 > 博客

作者:病毒快消失

链接: https://www.pythonheidong.com/blog/article/608660/74f3d4d963bb2e9be16a/

来源: python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

3 0
收藏该文
已收藏

评论内容:(最多支持255个字符)