GFS forecast data download

Fast download GRIB file, http transfer part

Translated from https://www.cpc.ncep.noaa.gov/products/wesley/fast_downloading_grib.html

Introduction

News: 1/2019 nomoads.ncep.noaa.gov is its URL from http: change https :. Fast download technology is suitable for the type of http and https URL. Change is usually very simple, the script changes from http to https. As long as the URL from http: https :, modified to use grib_filter script makes it work. If you are using an older version, you may need a new cURL. By the way, I have modified the documents in a timely manner on the page, but changes are very subtle, I decided to write this red text.

 

NOMADS, NOAA Operational Model Archive and Distribution System, NOAA business model storage and distribution system

 

 

If you are lucky, it is very simple 

Some data sets can be downloaded by the script written. See section 2 moiety.

detail

http protocol allows "random access" to read; however, this means that we need an index file and an http program supports random access. For the index file, we can modify wgrib list. Http program for random access, we can use cURL . They are freely available, widely used, is used in the following many platforms, and can easily be scripted / automated / cronjob added to a task.

Fast download basic format is

get_inv.pl INV_URL | grep (options) FIELDS | get_grib.pl GRIB_URL OUTPUT

INV_URL is the URL wgrib list.
   As https://nomad3.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12.inv 

grep (options) FIELDS select the field to be acquired (compatible with wgrib)
   such as grep -F ": HGT: 500 mb :" selects ": HGT: 500 mb"
   such as grep -E ": (HGT | TMP ): 500 mb: "selects": HGT: 500 mb  : "and": TMP: 500 mb: "

GRIB_URL is grib file a URL
   such as https://nomad3.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12 

OUTPUT grib file is downloaded from the name

"Get_inv.pl INV_URL" downloaded from the Internet wgrib list and add a range variable field.

"Grep FIELDS" using the grep command to select the variable field you want from the list. Use "grep FIELDS" similar to using a variable field wgrib extraction process.

"Get_grib.pl GRIB_URL OUTPUT" using the screening to choose whether to download from GRIB_URL variables in the field. The selected variables stored in the field in OUTPUT.

 

example

 

get_inv.pl https://nomad3.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12.inv | \ 
grep ":HGT:500 mb:" | \ 
get_grib.pl https://nomad3.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12 out.grb

 

The above example can be written on one line without a backslash situation. (Backslash is unix convention, indicates the row to the next line.) The example from 00Z NCEP NOMAD2 server (t00z) GFS fcst download 12 hours (F12) of 500 mb height field.

 

 

get_inv.pl https://nomad2.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12.inv | \ 
egrep "(:HGT:500 mb:|:TMP:1000 mb:)" | \ 
get_grib.pl https://nomad2.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12 out.grb

 

Example of the previous similar to the above, except that it downloads 500 mb 1000 mb height and temperature.

Warning: Metacharacter

In the beginning, you can use a similar string filter stock

  egrep ":(UGRD|VGRD|TMP|HGT):(1000|500|200) mb:"
  egrep "(:UGRD:200 mb:|:TMP:2 m above ground:)"

 The first egrep is abandoned, and "grep -E" substituted. No big deal. Then someone decided to egrep wildcard into the level of official information. Imagine trying to do

grep -E "(: UGRD: 200 mb: |: HGT: PV = 2e-06 (Km ^ 2 / kg / s) surface :)" 

You see the problem. HGT level field contains "(" and ")." In order to get rid of "(" and ")" special meaning, they should use \ (and \) reference. Caret "^" also has a special meaning, it should also be cited. The modified command line
  grep -E "(: UGRD: 200 mb: |: HGT: PV = 2e-06 \ (Km \ ^ 2 / kg / s \) surface :)" 
You should all regular expression metacharacters quoted with a backslash ,include
\,^,$,.,|,?,*,+,(,),[,],{,}

Sample script

The following is an example of data downloaded R2 year.
#!/bin/sh
# simple script to download 4x daily V winds at 10mb
# from the R2 archive

set -x
date=197901
enddate=197912
while [ $date -le $enddate ]
do
     url="https://nomad3.ncep.noaa.gov/pub/reanalysis-2/6hr/pgb/pgb.$date"
     get_inv.pl "${url}.inv" | grep ":VGRD:" | grep ":10 mb" | \
     get_grib.pl "${url}" pgb.$date
     date=$(($date + 1))
     if [ $(($date % 100)) -eq 13 ] ; then
         date=$(($date - 12 + 100));
     fi
done

 

rely

  1. perl
  2. grep
  3. cURL
  4. grib files and their wgrib inventory on an http server
  5. get_inv.pl
  6. get_grib.pl

Configuration (UNIX / LINUX)

We need to modify the first two lines of get_inv.pl and get_grib.pl. The first line should point to your perl interpreter. The second line needs to point to the location of the curl, if it is not in your path.

 

HTTPS servers

To access the https server, you need to get_inv.pl and get_grib.pl updated to the current version (4/2017). Some sites have a certificate / invalid / dodgy is self-signed, unless you enter an unsafe mode, otherwise it will not curl to download from these sites. (There must be some government policy, because many sites are NOAA issues a certificate.) If you are willing to take risks downloaded from these websites, you can run in an unsafe mode curl.

 

In get_inv.pl 
this line: open (In, "$ curl -f -s $ file |"); 
    read: open (In, "$ curl -k -f -s $ file |"); 

in get_grib. pl 
this line: $ err = system ( "$ curl -f -v -s -r \" $ range \ "$ url -o $ file.tmp"); 
    read: $ err = system ( "$ curl - -s -f -r -v k \ "the Range $ \" $ url -o $ file.tmp "); 

Usage: Windows
there are some reports that perl script does not work on Windows machines. Alexander Ryan solve this problem.
Hi Wesley,

thought this might be of some use to your win32 users.

I had the following problem when running the get_grib.pl file as per your instructions.

run this
grep ":UGRD:" < my_inv | get_grib.pl $URL ugrd.grb
and I would get the error No download! No matching grib fields. on further 
investigation I found that it was just skipping the while STDIN part of the 
code. a few google searches later and I found that for some strange reason in 
the pipe I needed to specify the path or command for perl even though the file 
associations for .pl are set up. (don't fiqure)

this works for me

grep ":UGRD:" < my_inv | PERL get_grib.pl $URL ugrd.grb

Regards and thanks for the fine service
Alexander Ryan

Alexander's another message

Hi Wesley,
Further to my last email here are some details regarding the enviorment I run this all on for your referance. 

My computer is P4 1.7GHz with 1Gb Ram running Windows 2000 service pack 4
Perl version :V5.6.1 provided by  https://www.activestate.com
cUrl Version: 7.15.4 from  https://curl.haxx.se/
grep & egrep: win32 versions of grep and egrep, I found both at https://unxutils.sourceforge.net who provide some useful ports of common GNU utilities to native Win32. (no cygwin required) 

so far this is working fine

Regards Alexander

 

Obviously,
    get_inv.pl INV_URL | grep FIELDS | perl get_grib.pl URL OUTPUT 
 
should work. Linux users may prefer to use cygwin system, because it includes bash, X servers, compilers, and the usual unix tools.

Tips

If you want to download a plurality of fields, such as precipitation and temperature of 2 meters, can be entered,
URL="https://www.ftp.ncep.noaa.gov/data/nccf/com/gfs/prod/gfs.2006070312/gfs.t12z.pgrb2f00"
get_inv.pl $URL.idx | egrep ':(PRATE|TMP:2 m above gnd):' | get_grib.pl $URL out

 

 The above code is precipitated and the temperature of 2 meters in the file. Of course, egrep understanding of regular expressions, which is a very powerful feature.

If multiple downloads from the same file, you can save time by retaining a local copy in stock. E.g,

URL="https://www.ftp.ncep.noaa.gov/data/nccf/com/gfs/prod/gfs.2006070312/gfs.t12z.pgrb2f00"
get_inv.pl $URL.idx > my_inv
grep ":UGRD:" < my_inv | get_grib.pl $URL ugrd.grb
grep ":VGRD:" < my_inv | get_grib.pl $URL vgrd.grb
grep ":TMP:" < my_inv | get_grib.pl $URL tmp.grb

The above code can save two additional inventory download.

 

Data provider Notes

grib need to access data on the http server. This is usually a small httpd configuration changes.

Users need wgrib list (grib-1) or wgrib2 list (grib-2). If the inventory data files in the same directory and use the '.inv' suffix convention, it is very convenient. You can create a list, by,

GRIB-1:wgrib -s grib_file> grib_file.inv

GRIB-2:wgrib2 -s grib_file> grib_file.inv

 

WANTS-2

Since the summer of 2006, Grib-2 has been supported.

 

note

In theory, curl allows random access to the FTP server, but in fact we find that it is very slow (each is its own random access FTP session). Since we want to use the data to provide faster http protocol, supports FTP access.

The region cropping

As the grid becomes more sophisticated, the demand for the region cropping is growing. Use grib2, it may be a subset of the area, but if possible, it will be some tricky coding on the client. Now, I'm glad g2subset nomads software running on the server. Even in jpeg2000 decompression overhead, the server software is also faster than grib1 software (ftp2u / ftp4u). 

 
Created: 1/21/2005
Last modified date: 6/2017
Comments: [email protected]

 

Fast download Grib, Part 2

Translated from https://www.cpc.ncep.noaa.gov/products/wesley/get_gfs.html

news

January 2, 2019: nomads.ncep.noaa.gov are from the URL http: // changed to https: //. December 31, 2014 issued a get_gfs.pl version with the new URL. If you encounter problems, you may need to obtain an updated version of cURL.

Wrappers @ NCDC

Although the procedure detailed in Part 1 is straightforward, but it may be easier. I do not like to find and enter the URL. Write cycle takes time. Inexperienced people prefer it. Dan Swank for the North American Regional Reanalysis (NARR) to download a good interface. He wrote get-httpsubset.pl , it works very well. In May 2006, 95% of NCDC-NOMADS downloads are done using cURL.

Wrappers @ NCEP (NOMADS): get_gfs.pl

In NCEP, we want people to (1) the use of partial-http instead ftp2u transport to get the forecast fields, and (2) the nomads move to a more reliable server NCO server. So get_gfs.pl born. I want the script is easy to use, easy to reconfigure, easier to install and use under Windows.

Requirements

  1. get_gfs.pl.

  2. perl

  3. cURL

Configuration

  1. CURL need to download an executable file and place it in the $ PATH directory.
  2. The first line should point to the location of the local get_gfs.pl perl interpreter.
  3. Non-Windows users can set the $ windows flag in the get_gfs.pl as "thankfully no", to improve efficiency.

Usage is simple:

get_gfs.pl data DATE HR0 HR1 DHR VARS LEVS DIRECTORY 


Note: Some Windows settings will need to enter:
      perl get_gfs.pl data DATE HR0 HR1 DHR DIRECTORY

DATE = forecast YYYYMMDDHH start time. Note: HH should be 000,612 or 18

HR0 = number of hours you want a prediction

HR1 = the number of hours you want to predict the final

DHR = predicted hour increments (6, 12, or 24 hours per prediction)

VARS = variable list or "all"
    such as HGT: TMP: OZONE
    for example all

LEVS = hierarchical list, replaces the space with an underscore, or "all"
    such as 500_mb: 200_mb: surface
    such as all

DIRECTORY = directory placement output

example: perl get_gfs.pl data 2006101800 0 12 6 UGRD:VGRD 200_mb .

example: Perl get_gfs.pl data 2006101800 0 12 6 grd: VGRD 200_mb: 500_mb: 1000_mb.

example: perl get_gfs.pl data 2006101800 0 12 12 all surface .

Regular metacharacters:  . () ^ * [] $ +

get_gfs.pl script using perl regular expressions (regex) as a string match. Accordingly, reference should be regular expression metacharacters they are part of the search string. For example, try to find the following layers

       "entire atmosphere (considered as a single_layer)"

       "entire_atmosphere_(considered_as_a_single_layer)"

Because parentheses are metacharacters, it does not work. The following techniques will work.

Reference "(" and ")" characters

 get_gfs.pl data 2012053000 0 6 3 TCDC "entire atmosphere \(considered as a single layer\)" .
 get_gfs.pl data 2012053000 0 6 3 TCDC entire_atmosphere_\\\(considered_as_a_single_layer\\\) .

使用 句点(匹配所有字符) 来匹配 "(”和 ")"字符  get_gfs.pl data 2012053000 0 6 3 TCDC "entire atmosphere .considered as a single layer." . get_gfs.pl data 2012053000 0 6 3 TCDC entire_atmosphere_.considered_as_a_single_layer. .

How get_gfs.pl works 

get_gfs.pl based get_inv.pl and get_grib.pl script. get_gfs.pl advantage is the URL built forecasting cycle time. 

Meta-language get_gfs.pl data DATE HR0 HR1 DHR VARS LEVS DIRECTORY

 

# convert LEVS and VARS into REGEX
  if (VARS == "all") {
    VARS=".";
  }
  else {
    VARS = substitute(VARS,':','|')
    VARS = substitute(VARS,'_',' ')
    VARS = ":(VARS):";
  }

  if (LEVS == "all") {
    LEVS=".";
  }
    LEVS = substitute(LEVS,':','|')
    LEVS = substitute(LEVS,'_',' ')
    LEVS = ":(LEVS)";
  }

# loop over all forecaset hours

  for fhour = HR0, HR1, DHR
     URL= URL_name(DATE,fhour)
     URLinv= URL_name(DATE,fhour).idx

     inventory_array[] = get_inv(URLinv);
     for i = inventory..array[0] .. inventory_array[last]
        if (regex_match(LEVS,inventory_array[i]) and regex_match(VARS,inventory_array[i]) {
	   add_to_curl_fetch_request(invetory_array[i]);
        }
     }
     curl_request(URL,curl_fetch_request,DIRECTORY);
  endfor

Advanced Users

One user asked if he could mix variables and levels. For example, TMP @ 500 mb, HGT @ (250 and 700 mb). Of course, you can run twice get-gfs.pl but this is not efficient.

This is possible because get-gfs.pl use regular expressions, regular expressions are very powerful. You need to remember is, get-gfs.pl respectively convert colon and underline vertical bars and spaces, respectively, for the VAR / LEV parameters.

Unix/Linux:

       get-gfs.pl data 2006111500 0 12 12 all 'TMP.500 mb|HGT.(200 mb|700 mb)'  data_dir

Windows:

       get-gfs.pl data 2006111500 0 12 12 all "TMP.500 mb|HGT.(200 mb|700 mb)"  C:\unix\

 Other GRIB Data sets

A get_gfs.pl object is to provide a simple script, for use httpd portion grib download protocol to download the data. Write code to make it easy to adapt to other grib + inv data set.

Wrappers @ NCEP (NCO): get_data.sh

NCO (NCEP Centeral Operations) also has an interface get_data.sh .

 

Created: 10/2006,

Updated: May 2012

Comments: [email protected]

Guess you like

Origin www.cnblogs.com/jiangleads/p/10972291.html