发布于2019-08-07 16:23 阅读(603) 评论(0) 点赞(2) 收藏(2)
目录
目标:https://hr.tencent.com/position.php?&start=0#a
爬取所有的职位信息信息
class TecentjobItem(scrapy.Item):
# define the fields for your item here like:
positionname = scrapy.Field()
positionlink = scrapy.Field()
positionType = scrapy.Field()
peopleNum = scrapy.Field()
workLocation = scrapy.Field()
publishTime = scrapy.Field()
# -*- coding: utf-8 -*-
import scrapy
from tecentJob.items import TecentjobItem
class TencentSpider(scrapy.Spider):
name = 'tencent'
allowed_domains = ['tencent.com']
url = 'https://hr.tencent.com/position.php?&start='
offset = 0
start_urls = [url + str(offset)]
def parse(self, response):
for each in response.xpath("//tr[@class = 'even'] | //tr[@class = 'odd']"):
# 初始化模型对象
item = TecentjobItem()
item['positionname'] = each.xpath("./td[1]/a/text()").extract()[0]
item['positionlink'] = each.xpath("./td[1]/a/@href").extract()[0]
item['positionType'] = each.xpath("./td[2]/text()").extract()[0]
item['peopleNum'] = each.xpath("./td[3]/text()").extract()[0]
item['workLocation'] = each.xpath("./td[4]/text()").extract()[0]
item['publishTime'] = each.xpath("./td[5]/text()").extract()[0]
yield item
if self.offset < 100:
self.offset += 10
# 将请求重写发送给调度器入队列、出队列、交给下载器下载
# 拼接新的rurl,并回调parse函数处理response
# yield scrapy.Request(url, callback = self.parse)
yield scrapy.Request(self.url + str(self.offset), callback=self.parse)
import json
class TecentjobPipeline(object):
def __init__(self):
self.filename = open("tencent.json", 'wb')
def process_item(self, item, spider):
text = json.dumps(dict(item),ensure_ascii=False) + "\n"
self.filename.write(text.encode('utf-8'))
return item
def close_spider(self, spider):
self.filename.close()
DEFAULT_REQUEST_HEADERS = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en',
}
作者:085iitirtu
链接:https://www.pythonheidong.com/blog/article/11510/29b3136da1261401375f/
来源:python黑洞网
任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任
昵称:
评论内容:(最多支持255个字符)
---无人问津也好,技不如人也罢,你都要试着安静下来,去做自己该做的事,而不是让内心的烦躁、焦虑,坏掉你本来就不多的热情和定力
Copyright © 2018-2021 python黑洞网 All Rights Reserved 版权所有,并保留所有权利。 京ICP备18063182号-1
投诉与举报,广告合作请联系vgs_info@163.com或QQ3083709327
免责声明:网站文章均由用户上传,仅供读者学习交流使用,禁止用做商业用途。若文章涉及色情,反动,侵权等违法信息,请向我们举报,一经核实我们会立即删除!