python批量下载上次论文,还在爬取贴吧图片?快用批量下载sci论文吧,根据标题名或者DOI批量下载 scihub 科研下载神器

昨晚在下载scil论文,一共295篇,手动下载的话岂不是要累si?

于是想到有没有批量下载sci论文的。

在web of science 上导出下载问下的标题、DOI等txt文件,然后筛选得到DOI和标题,保存为新文件。

通过循环得到DOI与标题,下载并保存成标题命名。

程序参考如下网址:

https://github.com/zaytoun/scihub.py

Setup

pip install -r requirements.txt

Usage

You can interact with scihub.py from the commandline:

usage: scihub.py [-h] [-d (DOI|PMID|URL)] [-f path] [-s query] [-sd query]
                 [-l N] [-o path] [-v]

SciHub - To remove all barriers in the way of science.

optional arguments:
  -h, --help            show this help message and exit
  -d (DOI|PMID|URL), --download (DOI|PMID|URL)
                        tries to find and download the paper
  -f path, --file path  pass file with list of identifiers and download each
  -s query, --search query
                        search Google Scholars
  -sd query, --search_download query
                        search Google Scholars and download if possible
  -l N, --limit N       the number of search results to limit to
  -o path, --output path
                        directory to store papers
  -v, --verbose         increase output verbosity
  -p, --proxy           set proxy

You can also import scihub. The following examples below demonstrate all the features.

fetch

from scihub import SciHub

sh = SciHub()

# fetch specific article (don't download to disk)
# this will return a dictionary in the form 
# {'pdf': PDF_DATA,
#  'url': SOURCE_URL,
#  'name': UNIQUE_GENERATED NAME
# }
result = sh.fetch('http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1648853')

download

from scihub import SciHub

sh = SciHub()

# exactly the same thing as fetch except downloads the articles to disk
# if no path given, a unique name will be used as the file name
result = sh.download('http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1648853', path='paper.pdf')

search

from scihub import SciHub

sh = SciHub()

# retrieve 5 articles on Google Scholars related to 'bittorrent'
results = sh.search('bittorrent', 5)

# download the papers; will use sci-hub.io if it must
for paper in results['papers']:
	sh.download(paper['url'])

但是scihub存在验证码问题,验证码问题如何解决呢?

http://sci-hub.tw/

存在验证码问题

导致爬取失败,如何解决验证码识别问题将是关键!!

以后有时间再试试咯!

猜你喜欢

转载自blog.csdn.net/qq_26004387/article/details/83927986