+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

暂无数据

scopus keywords and citations crawling

发布于2020-10-30 21:37     阅读(779)     评论(0)     点赞(25)     收藏(0)


0

1

2

3

4

5

6

7

I'm trying to crawling articles' data from Scopus api. I have api key and can receive fields from Standard view.

Here is example:

Firstly, initialization (api, searching query and headers)

import json
import requests

api_resource = "https://api.elsevier.com/content/search/scopus?"
search_param = 'query=title-abs-key(big data)'  # for example

# headers
headers = dict()
headers['X-ELS-APIKey'] = api_key
headers['X-ELS-ResourceVersion'] = 'XOCS'
headers['Accept'] = 'application/json'

Now I can receive article json (for example, first article from first page):

# request with first searching page
page_request = requests.get(api_resource + search_param, headers=headers)
# response to json
page = json.loads(page_request.content.decode("utf-8"))
# List of articles from this page
articles_list = page['search-results']['entry']

article = articles_list[0]

I can easily get some main fields from standard view:

title = article['dc:title']
cit_count = article['citedby-count']
authors = article['dc:creator']
date = article['prism:coverDate']

However, I need keywords and citations of this article. I solved problem with keywords with additional request to article's url:

article_url = article['prism:url']
# something like this:
# 'http://api.elsevier.com/content/abstract/scopus_id/84909993848'

with field=authkeywords

article_request = requests.get(article_url + "?field=authkeywords", headers=headers)
article_keywords = json.loads(article_request.content.decode("utf-8"))
keywords = [keyword['$'] for keyword in article_keywords['abstracts-retrieval-response']['authkeywords']['author-keyword']]

This method works, but sometimes keywords are missing. Also, scopus api-key has limit of requests (10000 per week), and this way not optimal.

Can I make it easier?

Next question about citations. To find citations of articles, I'm send one more request again, by use article['eid'] field:

citations_response = requests.get(api_resource + 'query=refeid(' + str(article['eid']) + ')', headers=headers)
citations_result = json.loads(citations_response.content.decode("utf-8"))
citations = citations_result['search-results']['entry']  # list of citations

So, can I get citations, without additional request?


解决方案


You can get references with a single query only with a COMPLETE view. (subscriber only)

0

1

2

3

4

5

6

7

8

9



所属网站分类: 技术文章 > 问答

作者:黑洞官方问答小能手

链接: https://www.pythonheidong.com/blog/article/609625/4032c562aa5af29d8111/

来源: python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

25 0
收藏该文
已收藏

评论内容:(最多支持255个字符)