Download ERA5 data in batches (Python+IDM)

This article describes how to batch download ERA5 data through Python scripts and Internet Download Manager (IDM) software.

1. Introduction to ERA5 data

  • ERA5 is the fifth generation of ECMWF Atmospheric Reanalysis global climate data, the first part of this dataset is now publicly available (1979 to 3 months). ERA5 data provides hourly estimates of atmospheric, terrestrial, and oceanic climate variables. Earth data is accurate to a 30km grid, including 137 layers of atmospheric data.
  • ERA5 data is located on the Climate Data Store (CDS) website at the following URL: https://cds.climate.copernicus.eu/cdsapp#!/search?type=dataset&text=ERA5

2. Preparations

insert image description here

  • (2) Get the API key
  • After the registration is complete, log in and click the user in the upper right corner to view user information:

insert image description here

  • Make a note of the UID and API key, you will need them later.

  • (3) Configure and install CDS API

  • Create a ".cdsapirc" file under the path "C:\Users\username" (create a ".cdsapirc.txt" file first, and then delete ".txt"), enter the following in the ".cdsapirc" file content:

url: https://cds.climate.copernicus.eu/api/v2
key: UID:API Key
  • Open the console and enter the following command in cmd to install the cdsapi third-party library.
pip install cdsapi
  • (4) Install and configure IDM software
  • IDM is the abbreviation of "Internet Download Manager", which is a very powerful download software.
  • Installation package link: https://pan.baidu.com/s/1iojjYOg_Y2NdMcmJahz_pw , extraction code: dimq, version v6.36 Build 7, resources from Carrot Week.
  • After downloading the installation package, open the folder, double-click "idman636build7.exe" to start installing IDM, keep clicking forward, and install to the default location.
  • Copy the crack patch to the IDM installation directory (the default location is "C:\Program Files (x86)\Internet Download Manager"), double-click to run the crack patch, click "Crack IDM", and click "Finish" to close the patch after cracking.
  • Note: Do not update IDM, otherwise it may make the software unusable.
  • Configure IDM, open the software, find "Options" in the main interface, and open it.

insert image description here

  • In the options window, find the "Connection" page, modify the "Connection Type/Speed" to "Higher Speed ​​Connection: LAN/Wi-Fi/Mobile Network 4G/Others", modify the "Default Maximum Number of Connections" to 16, click " OK" to complete the configuration.

insert image description here

  • At this point, the preparatory work for downloading is complete.

3. Batch download

  • Download individual data:
  • Select the required data, take the "ERA5 hourly data on single levels from 1979 to present" dataset as an example, open the data page, select "Download data", and select the data according to your needs on the page.

insert image description here

  • After the data is selected, turn the page to the bottom, and you can see the "Show API request" option. After clicking, the following code will appear:
import cdsapi  # 导入cdsapi库

c = cdsapi.Client()  # 创建用户

# 下载数据
c.retrieve(
    'reanalysis-era5-single-levels',  # 数据集名称
    {
    
    
        'product_type': 'reanalysis',  # 产品类型
        'format': 'netcdf',  # 数据格式
        'variable': '2m_temperature',  # 变量名称
        'year': '1979',  # 年
        'month': '01',  # 月
        'day': [  # 日
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
            '13', '14', '15',
            '16', '17', '18',
            '19', '20', '21',
            '22', '23', '24',
            '25', '26', '27',
            '28', '29', '30',
            '31',
        ],
        'time': [  # 小时
            '00:00', '01:00', '02:00',
            '03:00', '04:00', '05:00',
            '06:00', '07:00', '08:00',
            '09:00', '10:00', '11:00',
            '12:00', '13:00', '14:00',
            '15:00', '16:00', '17:00',
            '18:00', '19:00', '20:00',
            '21:00', '22:00', '23:00',
        ],
    },
    'download.nc')  # 存储文件名称
  • Copy the above code into Python and run it to download the global 2 m temperature reanalysis data in January 1979.
  • However, the above method is downloaded through Python, which is slow and cannot be downloaded in batches.
  • Download data in batches:
  • For example, to download the global 2 m temperature reanalysis data for each month from 1979 to 2020 in the "ERA5 hourly data on single levels from 1979 to present" dataset, and save it as an nc file.
import cdsapi
import calendar

c = cdsapi.Client()  # 创建用户

# 数据信息字典
dic = {
    
    
    'product_type': 'reanalysis',  # 产品类型
    'format': 'netcdf',  # 数据格式
    'variable': '2m_temperature',  # 变量名称
    'year': '',  # 年,设为空
    'month': '',  # 月,设为空
    'day': [],  # 日,设为空
    'time': [  # 小时
        '00:00', '01:00', '02:00', '03:00', '04:00', '05:00',
        '06:00', '07:00', '08:00', '09:00', '10:00', '11:00',
        '12:00', '13:00', '14:00', '15:00', '16:00', '17:00',
        '18:00', '19:00', '20:00', '21:00', '22:00', '23:00'
    ]
}

# 通过循环批量下载1979年到2020年所有月份数据
for y in range(1979, 2021):  # 遍历年
    for m in range(1, 13):  # 遍历月
        day_num = calendar.monthrange(y, m)[1]  # 根据年月,获取当月日数
        # 将年、月、日更新至字典中
        dic['year'] = str(y)
        dic['month'] = str(m).zfill(2)
        dic['day'] = [str(d).zfill(2) for d in range(1, day_num + 1)]
        filename = 'E:\\Data\\ERA5\\1979-2020\\2m_temperature\\' 
        			+ str(y) + str(m).zfill(2) + '.nc'  # 文件存储路径
        c.retrieve('reanalysis-era5-single-levels', dic, filename)  # 下载数据
  • Although the above code realizes batch download, it is still downloaded through Python, and the download speed is slow. In order to improve the download speed, use IDM software to download, which needs to obtain the download address of each data, which can be realized by the following code:
r = c.retrieve('reanalysis-era5-single-levels', dic, )  # 文件下载器
url = r.location  # 获取文件下载地址
  • Then add the file download address into the IDM software to achieve fast download:
from subprocess import call

def idmDownloader(task_url, folder_path, file_name):
    """
    IDM下载器
    :param task_url: 下载任务地址
    :param folder_path: 存放文件夹
    :param file_name: 文件名
    :return:
    """
    # IDM安装目录
    idm_engine = "C:\\Program Files (x86)\\Internet Download Manager\\IDMan.exe"
    # 将任务添加至队列
    call([idm_engine, '/d', task_url, '/p', folder_path, '/f', file_name, '/a'])
    # 开始任务队列
    call([idm_engine, '/s'])
  • Batch download complete code:
import cdsapi
import calendar
from subprocess import call


def idmDownloader(task_url, folder_path, file_name):
    """
    IDM下载器
    :param task_url: 下载任务地址
    :param folder_path: 存放文件夹
    :param file_name: 文件名
    :return:
    """
    # IDM安装目录
    idm_engine = "C:\\Program Files (x86)\\Internet Download Manager\\IDMan.exe"
    # 将任务添加至队列
    call([idm_engine, '/d', task_url, '/p', folder_path, '/f', file_name, '/a'])
    # 开始任务队列
    call([idm_engine, '/s'])


if __name__ == '__main__':
    c = cdsapi.Client()  # 创建用户

    # 数据信息字典
    dic = {
    
    
        'product_type': 'reanalysis',  # 产品类型
        'format': 'netcdf',  # 数据格式
        'variable': '2m_temperature',  # 变量名称
        'year': '',  # 年,设为空
        'month': '',  # 月,设为空
        'day': [],  # 日,设为空
        'time': [  # 小时
            '00:00', '01:00', '02:00', '03:00', '04:00', '05:00',
            '06:00', '07:00', '08:00', '09:00', '10:00', '11:00',
            '12:00', '13:00', '14:00', '15:00', '16:00', '17:00',
            '18:00', '19:00', '20:00', '21:00', '22:00', '23:00'
        ]
    }

    # 通过循环批量下载1979年到2020年所有月份数据
    for y in range(1979, 2021):  # 遍历年
        for m in range(1, 13):  # 遍历月
            day_num = calendar.monthrange(y, m)[1]  # 根据年月,获取当月日数
            # 将年、月、日更新至字典中
            dic['year'] = str(y)
            dic['month'] = str(m).zfill(2)
            dic['day'] = [str(d).zfill(2) for d in range(1, day_num + 1)]

            r = c.retrieve('reanalysis-era5-single-levels', dic, )  # 文件下载器
            url = r.location  # 获取文件下载地址
            path = 'E:\\Data\\ERA5\\1979-2020\\2m_temperature'  # 存放文件夹
            filename = str(y) + str(m).zfill(2) + '.nc'  # 文件名
            idmDownloader(url, path, filename)  # 添加进IDM中下载

4. Finally

  • The content is only for your study and reference. If there are any deficiencies, please criticize and correct!

Guess you like

Origin blog.csdn.net/qq_39373443/article/details/118086241