Fast download GRIB file, http transfer part
Translated from https://www.cpc.ncep.noaa.gov/products/wesley/fast_downloading_grib.html
Introduction
News: 1/2019 nomoads.ncep.noaa.gov is its URL from http: change https :. Fast download technology is suitable for the type of http and https URL. Change is usually very simple, the script changes from http to https. As long as the URL from http: https :, modified to use grib_filter script makes it work. If you are using an older version, you may need a new cURL. By the way, I have modified the documents in a timely manner on the page, but changes are very subtle, I decided to write this red text.
NOMADS, NOAA Operational Model Archive and Distribution System, NOAA business model storage and distribution system
If you are lucky, it is very simple
Some data sets can be downloaded by the script written. See section 2 moiety.
detail
http protocol allows "random access" to read; however, this means that we need an index file and an http program supports random access. For the index file, we can modify wgrib list. Http program for random access, we can use cURL . They are freely available, widely used, is used in the following many platforms, and can easily be scripted / automated / cronjob added to a task.
Fast download basic format is
get_inv.pl INV_URL | grep (options) FIELDS | get_grib.pl GRIB_URL OUTPUT
INV_URL is the URL wgrib list.
As https://nomad3.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12.inv
grep (options) FIELDS select the field to be acquired (compatible with wgrib)
such as grep -F ": HGT: 500 mb :" selects ": HGT: 500 mb"
such as grep -E ": (HGT | TMP ): 500 mb: "selects": HGT: 500 mb : "and": TMP: 500 mb: "
GRIB_URL is grib file a URL
such as https://nomad3.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12
OUTPUT grib file is downloaded from the name
"Get_inv.pl INV_URL" downloaded from the Internet wgrib list and add a range variable field.
"Grep FIELDS" using the grep command to select the variable field you want from the list. Use "grep FIELDS" similar to using a variable field wgrib extraction process.
"Get_grib.pl GRIB_URL OUTPUT" using the screening to choose whether to download from GRIB_URL variables in the field. The selected variables stored in the field in OUTPUT.
example
get_inv.pl https://nomad3.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12.inv | \ grep ":HGT:500 mb:" | \ get_grib.pl https://nomad3.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12 out.grb
The above example can be written on one line without a backslash situation. (Backslash is unix convention, indicates the row to the next line.) The example from 00Z NCEP NOMAD2 server (t00z) GFS fcst download 12 hours (F12) of 500 mb height field.
get_inv.pl https://nomad2.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12.inv | \ egrep "(:HGT:500 mb:|:TMP:1000 mb:)" | \ get_grib.pl https://nomad2.ncep.noaa.gov/pub/gfs/rotating/gblav.t00z.pgrbf12 out.grb
Example of the previous similar to the above, except that it downloads 500 mb 1000 mb height and temperature.
Warning: Metacharacter
In the beginning, you can use a similar string filter stock
egrep ":(UGRD|VGRD|TMP|HGT):(1000|500|200) mb:" egrep "(:UGRD:200 mb:|:TMP:2 m above ground:)"
The first egrep is abandoned, and "grep -E" substituted. No big deal. Then someone decided to egrep wildcard into the level of official information. Imagine trying to do
grep -E "(: UGRD: 200 mb: |: HGT: PV = 2e-06 (Km ^ 2 / kg / s) surface :)"
You see the problem. HGT level field contains "(" and ")." In order to get rid of "(" and ")" special meaning, they should use \ (and \) reference. Caret "^" also has a special meaning, it should also be cited. The modified command line
grep -E "(: UGRD: 200 mb: |: HGT: PV = 2e-06 \ (Km \ ^ 2 / kg / s \) surface :)"
You should all regular expression metacharacters quoted with a backslash ,include
\,^,$,.,|,?,*,+,(,),[,],{,}
Sample script
The following is an example of data downloaded R2 year.
#!/bin/sh # simple script to download 4x daily V winds at 10mb # from the R2 archive set -x date=197901 enddate=197912 while [ $date -le $enddate ] do url="https://nomad3.ncep.noaa.gov/pub/reanalysis-2/6hr/pgb/pgb.$date" get_inv.pl "${url}.inv" | grep ":VGRD:" | grep ":10 mb" | \ get_grib.pl "${url}" pgb.$date date=$(($date + 1)) if [ $(($date % 100)) -eq 13 ] ; then date=$(($date - 12 + 100)); fi done
rely
- perl
- grep
- cURL
- grib files and their wgrib inventory on an http server
- get_inv.pl
- get_grib.pl
Configuration (UNIX / LINUX)
We need to modify the first two lines of get_inv.pl and get_grib.pl. The first line should point to your perl interpreter. The second line needs to point to the location of the curl, if it is not in your path.
HTTPS servers
To access the https server, you need to get_inv.pl and get_grib.pl updated to the current version (4/2017). Some sites have a certificate / invalid / dodgy is self-signed, unless you enter an unsafe mode, otherwise it will not curl to download from these sites. (There must be some government policy, because many sites are NOAA issues a certificate.) If you are willing to take risks downloaded from these websites, you can run in an unsafe mode curl.
In get_inv.pl this line: open (In, "$ curl -f -s $ file |"); read: open (In, "$ curl -k -f -s $ file |"); in get_grib. pl this line: $ err = system ( "$ curl -f -v -s -r \" $ range \ "$ url -o $ file.tmp"); read: $ err = system ( "$ curl - -s -f -r -v k \ "the Range $ \" $ url -o $ file.tmp ");
Usage: Windows
there are some reports that perl script does not work on Windows machines. Alexander Ryan solve this problem.
Hi Wesley,
thought this might be of some use to your win32 users.
I had the following problem when running the get_grib.pl file as per your instructions.
run this
grep ":UGRD:" < my_inv | get_grib.pl $URL ugrd.grb
and I would get the error No download! No matching grib fields. on further
investigation I found that it was just skipping the while STDIN part of the
code. a few google searches later and I found that for some strange reason in
the pipe I needed to specify the path or command for perl even though the file
associations for .pl are set up. (don't fiqure)
this works for me
grep ":UGRD:" < my_inv | PERL get_grib.pl $URL ugrd.grb
Regards and thanks for the fine service
Alexander Ryan
Alexander's another message
Hi Wesley,
Further to my last email here are some details regarding the enviorment I run this all on for your referance.
My computer is P4 1.7GHz with 1Gb Ram running Windows 2000 service pack 4
Perl version :V5.6.1 provided by https://www.activestate.com
cUrl Version: 7.15.4 from https://curl.haxx.se/
grep & egrep: win32 versions of grep and egrep, I found both at https://unxutils.sourceforge.net who provide some useful ports of common GNU utilities to native Win32. (no cygwin required)
so far this is working fine
Regards Alexander
Obviously,
get_inv.pl INV_URL | grep FIELDS | perl get_grib.pl URL OUTPUT
should work. Linux users may prefer to use cygwin system, because it includes bash, X servers, compilers, and the usual unix tools.
Tips
If you want to download a plurality of fields, such as precipitation and temperature of 2 meters, can be entered,
URL="https://www.ftp.ncep.noaa.gov/data/nccf/com/gfs/prod/gfs.2006070312/gfs.t12z.pgrb2f00" get_inv.pl $URL.idx | egrep ':(PRATE|TMP:2 m above gnd):' | get_grib.pl $URL out
The above code is precipitated and the temperature of 2 meters in the file. Of course, egrep understanding of regular expressions, which is a very powerful feature.
If multiple downloads from the same file, you can save time by retaining a local copy in stock. E.g,
URL="https://www.ftp.ncep.noaa.gov/data/nccf/com/gfs/prod/gfs.2006070312/gfs.t12z.pgrb2f00" get_inv.pl $URL.idx > my_inv grep ":UGRD:" < my_inv | get_grib.pl $URL ugrd.grb grep ":VGRD:" < my_inv | get_grib.pl $URL vgrd.grb grep ":TMP:" < my_inv | get_grib.pl $URL tmp.grb
The above code can save two additional inventory download.
Data provider Notes
grib need to access data on the http server. This is usually a small httpd configuration changes.
Users need wgrib list (grib-1) or wgrib2 list (grib-2). If the inventory data files in the same directory and use the '.inv' suffix convention, it is very convenient. You can create a list, by,
GRIB-1:wgrib -s grib_file> grib_file.inv
GRIB-2:wgrib2 -s grib_file> grib_file.inv
WANTS-2
Since the summer of 2006, Grib-2 has been supported.
note
In theory, curl allows random access to the FTP server, but in fact we find that it is very slow (each is its own random access FTP session). Since we want to use the data to provide faster http protocol, supports FTP access.
The region cropping
As the grid becomes more sophisticated, the demand for the region cropping is growing. Use grib2, it may be a subset of the area, but if possible, it will be some tricky coding on the client. Now, I'm glad g2subset nomads software running on the server. Even in jpeg2000 decompression overhead, the server software is also faster than grib1 software (ftp2u / ftp4u).
Created: 1/21/2005
Last modified date: 6/2017
Comments: [email protected]
Fast download Grib, Part 2
Translated from https://www.cpc.ncep.noaa.gov/products/wesley/get_gfs.html
news
January 2, 2019: nomads.ncep.noaa.gov are from the URL http: // changed to https: //. December 31, 2014 issued a get_gfs.pl version with the new URL. If you encounter problems, you may need to obtain an updated version of cURL.
Wrappers @ NCDC
Although the procedure detailed in Part 1 is straightforward, but it may be easier. I do not like to find and enter the URL. Write cycle takes time. Inexperienced people prefer it. Dan Swank for the North American Regional Reanalysis (NARR) to download a good interface. He wrote get-httpsubset.pl , it works very well. In May 2006, 95% of NCDC-NOMADS downloads are done using cURL.
Wrappers @ NCEP (NOMADS): get_gfs.pl
In NCEP, we want people to (1) the use of partial-http instead ftp2u transport to get the forecast fields, and (2) the nomads move to a more reliable server NCO server. So get_gfs.pl born. I want the script is easy to use, easy to reconfigure, easier to install and use under Windows.
Requirements
-
perl
- cURL
Configuration
- CURL need to download an executable file and place it in the $ PATH directory.
- The first line should point to the location of the local get_gfs.pl perl interpreter.
- Non-Windows users can set the $ windows flag in the get_gfs.pl as "thankfully no", to improve efficiency.
Usage is simple:
get_gfs.pl data DATE HR0 HR1 DHR VARS LEVS DIRECTORY
Note: Some Windows settings will need to enter:
perl get_gfs.pl data DATE HR0 HR1 DHR DIRECTORY
DATE = forecast YYYYMMDDHH start time. Note: HH should be 000,612 or 18
HR0 = number of hours you want a prediction
HR1 = the number of hours you want to predict the final
DHR = predicted hour increments (6, 12, or 24 hours per prediction)
VARS = variable list or "all"
such as HGT: TMP: OZONE
for example all
LEVS = hierarchical list, replaces the space with an underscore, or "all"
such as 500_mb: 200_mb: surface
such as all
DIRECTORY = directory placement output
example: perl get_gfs.pl data 2006101800 0 12 6 UGRD:VGRD 200_mb .
example: Perl get_gfs.pl data 2006101800 0 12 6 grd: VGRD 200_mb: 500_mb: 1000_mb.
example: perl get_gfs.pl data 2006101800 0 12 12 all surface .
Regular metacharacters: . () ^ * [] $ +
get_gfs.pl script using perl regular expressions (regex) as a string match. Accordingly, reference should be regular expression metacharacters they are part of the search string. For example, try to find the following layers
"entire atmosphere (considered as a single_layer)"
"entire_atmosphere_(considered_as_a_single_layer)"
Because parentheses are metacharacters, it does not work. The following techniques will work.
Reference "(" and ")" characters
get_gfs.pl data 2012053000 0 6 3 TCDC "entire atmosphere \(considered as a single layer\)" . get_gfs.pl data 2012053000 0 6 3 TCDC entire_atmosphere_\\\(considered_as_a_single_layer\\\) .
使用 句点(匹配所有字符) 来匹配 "(”和 ")"字符 get_gfs.pl data 2012053000 0 6 3 TCDC "entire atmosphere .considered as a single layer." . get_gfs.pl data 2012053000 0 6 3 TCDC entire_atmosphere_.considered_as_a_single_layer. .
How get_gfs.pl works
get_gfs.pl based get_inv.pl and get_grib.pl script. get_gfs.pl advantage is the URL built forecasting cycle time.
Meta-language get_gfs.pl data DATE HR0 HR1 DHR VARS LEVS DIRECTORY
# convert LEVS and VARS into REGEX if (VARS == "all") { VARS="."; } else { VARS = substitute(VARS,':','|') VARS = substitute(VARS,'_',' ') VARS = ":(VARS):"; } if (LEVS == "all") { LEVS="."; } LEVS = substitute(LEVS,':','|') LEVS = substitute(LEVS,'_',' ') LEVS = ":(LEVS)"; } # loop over all forecaset hours for fhour = HR0, HR1, DHR URL= URL_name(DATE,fhour) URLinv= URL_name(DATE,fhour).idx inventory_array[] = get_inv(URLinv); for i = inventory..array[0] .. inventory_array[last] if (regex_match(LEVS,inventory_array[i]) and regex_match(VARS,inventory_array[i]) { add_to_curl_fetch_request(invetory_array[i]); } } curl_request(URL,curl_fetch_request,DIRECTORY); endfor
Advanced Users
One user asked if he could mix variables and levels. For example, TMP @ 500 mb, HGT @ (250 and 700 mb). Of course, you can run twice get-gfs.pl but this is not efficient.
This is possible because get-gfs.pl use regular expressions, regular expressions are very powerful. You need to remember is, get-gfs.pl respectively convert colon and underline vertical bars and spaces, respectively, for the VAR / LEV parameters.
Unix/Linux: get-gfs.pl data 2006111500 0 12 12 all 'TMP.500 mb|HGT.(200 mb|700 mb)' data_dir Windows: get-gfs.pl data 2006111500 0 12 12 all "TMP.500 mb|HGT.(200 mb|700 mb)" C:\unix\
Other GRIB Data sets
A get_gfs.pl object is to provide a simple script, for use httpd portion grib download protocol to download the data. Write code to make it easy to adapt to other grib + inv data set.
Wrappers @ NCEP (NCO): get_data.sh
NCO (NCEP Centeral Operations) also has an interface get_data.sh .
Created: 10/2006,
Updated: May 2012
Comments: [email protected]