MATLAB | How to use MATLAB to obtain all the drawings of the top journal "Nature" (with all images in the past 3 years)

I learned how to use MATLAB to obtain all the drawings of the journal "Cell". Immediately, fans asked whether "Nature", "Sience", "PNAS" and so on will be arranged. In this issue, I will arrange for you to obtain all the drawings of "Nature". After that, other journals will be arranged slowly, but they will not be published all at once (after all, if you can’t grasp a topic, you will update it).

Since the acquisition of "Nature" drawings requires scientific access to the United States, otherwise the download will be extremely slow, so it is recommended to go directly to the end of the article to get the image compression package. The annual image compression package is about 2G in size, and more than 10,000 images have been collected in the past three years. picture.

Let’s also put the loquat code here. It’s very simple to use. For example, to download a picture in 2022, run getNaturePNGWhileTure(2022) on the command line window:

function getNaturePNGWhileTure(YEAR)
if nargin < 1
    YEAR = 2023;
end

pbegin = 1; ibegin = 1; jbegin = 1; 
forderName=['Year_',num2str(YEAR)];
if exist(['.\image_',forderName,'\pijbreak.mat'],'file')
    load(['.\image_',forderName,'\pijbreak.mat']);
end
if ~exist(['.\image_',forderName],'dir')
    mkdir(['.\image_',forderName]);
end
disp([pbegin,ibegin,jbegin])

url_full = 'https://www.nature.com/nature/research-articles?searchType=journalSearch&sort=PubDate&year=<Y/>&page=<P/>';
url_year = strrep(url_full,'<Y/>',num2str(YEAR));

options=weboptions('Timeout',inf);
html_year  = webread(strrep(url_year,'<P/>','1'),options);fprintf('1->')
A_page_num = strfind(html_year,'u-visually-hidden'); 
Z_page_num = strfind(html_year,'data-page="next"');
page_num   = html_year(A_page_num(find(A_page_num<Z_page_num,1,'last')):Z_page_num);
page_num   = page_num(32:36);
page_num   = str2double(page_num(abs(page_num)<=57&abs(page_num)>=48));

for p = pbegin:page_num
    url_page  = strrep(url_year,'<P/>',num2str(p));
    html_page = webread(url_page,options);fprintf('2\n')
    A_html_artical = strfind(html_page,'itemprop="name headline"');
    Z_html_artical = strfind(html_page,'data-track-action="view article"');

    for i = ibegin:length(Z_html_artical)
        html_artical = html_page(A_html_artical(find(A_html_artical<Z_html_artical(i),1,'last')):Z_html_artical(i));
        A_artical    = strfind(html_artical,'<a href=');
        Z_artical    = strfind(html_artical,'class="c-card__link u-link-inherit"');
        html_artical = html_artical(A_artical(1)+10:Z_artical);
        html_artical = html_artical(1:find(html_artical=='"')-1);

        for j = jbegin:50
            pbegin = p; ibegin = i ; jbegin = j;
            save(['.\image_',forderName,'\pijbreak.mat'],'pbegin','ibegin','jbegin')
            html_png=webread(['https://www.nature.com/',html_artical,'/figures/',num2str(j)]);
            A_png        = strfind(html_png,'aria-describedby');
            Z_png        = strfind(html_png,'alt="Fig.');
            
            if isempty(Z_png)
                break;
            else
                url_png  = html_png(A_png:Z_png(find(Z_png>A_png,1)));
                url_png  = ['https:',url_png(strfind(url_png,'src="')+5:end-3)];
                url_png  = strrep(url_png,'lw685','full');
                name_png = ['.\image_',forderName,'\',html_artical(10:end),' P',num2str(j)];
                websave(name_png,url_png,options);
            disp(['Downloading Year-',num2str(YEAR),...
                ' Page-',num2str(p),' Artical-',num2str(i),...
                ' Pic-',num2str(j),':',html_artical])
            end
        end
        jbegin = 1;
    end
    ibegin = 1;
end
end

Since many articles can only be obtained after logging in with school information or unit information Open Access, in order to let MATLAB bypass this limitation, a method of guessing the display page of each picture in the gallery and obtaining a link for each picture separately is adopted, so the download speed is slow. It will be too fast, and it may take two or three seconds to download a picture. Again, it is recommended to go directly to the network disk at the end of the article to download the compressed package that has been downloaded.

The code is set to download with breakpoints, that is, you can download half of the interrupted program and continue downloading after a while.

At the same time, if you sometimes see a picture and want to find the source article to read, the name of the image downloaded by this code will mark the source of the image, for example: I am very interested in the following backgammon paper:

Seeing that the article number is s41586-023-06124-2, you can directly enter the article link in the browser:

  • https://www.nature.com/articles/s41586-023-06124-2

The original paper can be found easily:


Partial image display

2023


2022


2021


Fans who have followed me for a long time may be familiar with this picture:

That's right, this is the original picture of the second re-engraving of the series of drawing re-engraving articles I wrote. The effect of my engraving looks like this:

Is it exactly the same! If you want to take a look, you can search on the homepage by yourself:

This article, the series of drawing reproductions will continue. If you encounter some good-looking and interesting pictures in the pictures provided in this article, you can also send me the picture number and ask me. If you have time, you will Consider out pushing!


image acquisition

Baidu Netdisk

Provide links to Baidu Netdisk pictures in the past three years, including 2,232 pictures in 2023, 4,509 pictures in 2022, and 3,727 pictures in 2021, totaling more than 10,000 pictures:

2023(1.34G-2232 sheets)

Link:
https://pan.baidu.com/s/1pE-34BXmX7HPYAHiQ5SZ-w?pwd=slan
Extraction code: slan

2022 (2.78G-4509 sheets)

Link:
https://pan.baidu.com/s/1fIbM92eK5Q3rgS3x8h6QjA?pwd=slan
Extraction code: slan

2021 (1.85G-3727 sheets)

Link:
https://pan.baidu.com/s/1bQCj7vBgTCrNx4WprmCjbg?pwd=slan
Extraction code: slan

gitee warehouse

If the network disk fails, you can go to the gitee warehouse to get the latest network disk link:

https://gitee.com/slandarer/nature-figures

Guess you like

Origin blog.csdn.net/slandarer/article/details/131276399