java.time.format.DateTimeFormatter.RFC_1123_DATE_TIME fails to parse time zone names

Jules :

I'm trying to parse timestamps from a data source that is defined as using RFC1123-compatible date time specifications. My code is:

value = Instant.from (DateTimeFormatter.RFC_1123_DATE_TIME.parse (textValue));

This works fine for some data, but I get exceptions for strings that contain zone names, even ones that are defined in RFC2822 (which is indirectly referenced from RFC1123 because it obsoletes RFC822). Examples:

java.time.format.DateTimeParseException: Text 'Sun, 20 Aug 2017 00:30:00 UT' could not be parsed at index 26
java.time.format.DateTimeParseException: Text 'Mon, 21 Aug 2017 15:00:00 EST' could not be parsed at index 26

How do I persuade DateTimeFormatter to accept this type of date?

user7605325 :

As noticed by @shmosel's comment, the javadoc says that RFC_1123_DATE_TIME "does not handle North American or military zone names, only 'GMT' and offset amounts".

To make it recognize short timezone names like UT and EST, the only way is to build a custom formatter, with the structure similar to what RFC_1123_DATE_TIME has, but adding the short zone ID in the end.

This format uses English names for month and day of week, so one alternative is to use an English locale, but the source code uses a custom map with fixed values, to not depend on the locale if that changes (the comment says locale data can be changed by application code). So we first recreate these maps:

// custom map for days of week
Map<Long, String> dow = new HashMap<>();
dow.put(1L, "Mon");
dow.put(2L, "Tue");
dow.put(3L, "Wed");
dow.put(4L, "Thu");
dow.put(5L, "Fri");
dow.put(6L, "Sat");
dow.put(7L, "Sun");
// custom map for months
Map<Long, String> moy = new HashMap<>();
moy.put(1L, "Jan");
moy.put(2L, "Feb");
moy.put(3L, "Mar");
moy.put(4L, "Apr");
moy.put(5L, "May");
moy.put(6L, "Jun");
moy.put(7L, "Jul");
moy.put(8L, "Aug");
moy.put(9L, "Sep");
moy.put(10L, "Oct");
moy.put(11L, "Nov");
moy.put(12L, "Dec");

Then I recreate the same structure of RFC_1123_DATE_TIME, but adding the zone ID in the end:

// create with same format as RFC_1123_DATE_TIME 
DateTimeFormatter fmt = new DateTimeFormatterBuilder()
    .parseCaseInsensitive()
    .parseLenient()
    .optionalStart()
    .appendText(DAY_OF_WEEK, dow)
    .appendLiteral(", ")
    .optionalEnd()
    .appendValue(DAY_OF_MONTH, 1, 2, SignStyle.NOT_NEGATIVE)
    .appendLiteral(' ')
    .appendText(MONTH_OF_YEAR, moy)
    .appendLiteral(' ')
    .appendValue(YEAR, 4)  // 2 digit year not handled
    .appendLiteral(' ')
    .appendValue(HOUR_OF_DAY, 2)
    .appendLiteral(':')
    .appendValue(MINUTE_OF_HOUR, 2)
    .optionalStart()
    .appendLiteral(':')
    .appendValue(SECOND_OF_MINUTE, 2)
    .optionalEnd()
    .appendLiteral(' ')
    // difference from RFC_1123_DATE_TIME: optional offset OR zone ID
    .optionalStart()
    .appendZoneText(TextStyle.SHORT)
    .optionalEnd()
    .optionalStart()
    .appendOffset("+HHMM", "GMT")
    // use the same resolver style and chronology
    .toFormatter().withResolverStyle(ResolverStyle.SMART).withChronology(IsoChronology.INSTANCE);

The difference here is the .appendZoneText(TextStyle.SHORT) (with the optionalStart() because it can have either offset/GMT or a short zone ID).

You'll also notice that in the source code it uses:

.toFormatter(ResolverStyle.SMART, IsoChronology.INSTANCE);

But this overloaded version of toFormatter is not public. So I had to adapt it using with methods to adjust the values accordingly.

With this formatter, I can parse the inputs:

System.out.println(Instant.from(fmt.parse("Mon, 21 Aug 2017 15:00:00 EST")));
System.out.println(Instant.from(fmt.parse("Sun, 20 Aug 2017 00:30:00 UT")));

The output is:

2017-08-21T19:00:00Z
2017-08-20T00:30:00Z


PS: short names like EST are ambiguous and not standard. The ideal is to always use IANA timezones names (always in the format Region/City, like America/New_York or Europe/London).

EST is ambiguous because there's more than one timezone that uses it. Some short names are not recognized, but some of them are set to arbitrary defauls due to retro-compatibility reasons. EST, for example, is mapped to America/New_York, and if I parse it to a ZonedDateTime:

System.out.println(ZonedDateTime.from(fmt.parse("Mon, 21 Aug 2017 15:00:00 EST")));

The output is:

2017-08-21T15:00-04:00[America/New_York]

Maybe this doesn't apply to your case as you're parsing everything to an Instant, but if you want a ZonedDateTime, these defauls can be changed by defining a set of preferred zones:

// set of preferred zones
Set<ZoneId> preferredZones = new HashSet<>();
// add my arbitrary choices
preferredZones.add(ZoneId.of("America/Indianapolis"));

America/Indianapolis is another timezone that uses EST as a short name, so I can set it as preferred instead of the default America/New_York. I just need to set it in the formatter. Instead of this:

.appendZoneText(TextStyle.SHORT)

I call this:

.appendZoneText(TextStyle.SHORT, preferredZones)

And now my preferred arbitrary zones will be used. This same code:

System.out.println(ZonedDateTime.from(fmt.parse("Mon, 21 Aug 2017 15:00:00 EST")));

Now prints:

2017-08-21T15:00-04:00[America/Indianapolis]

Also note that the ZonedDateTime's above have an offset of -04:00. That's because in August these zones are in Daylight Saving Time (DST), so actually the respective short name is EDT. If you format the date using the same formatter above:

System.out.println(ZonedDateTime.now(ZoneId.of("America/New_York")).format(fmt));

The output will be:

Wed, 23 Aug 2017 08:43:52 EDT-0400

Note that the formatter uses all the optional sections to print a date (so it prints both the zone ID EDT and the offset -0400). If you want to print just one of them, you'll have to create another formatter (or just use RFC_1123_DATE_TIME).


Instead of appendZoneText and appendOffset, you could also use:

.appendPattern("[z][x]")

Note the optional sections (delimited by []). This will parse a zone Id (z) or offset (x). Look at the docs for more details about the patterns.

The only difference is that using this pattern you can't use the set of preferred zones.

And to format, this will also print both fields (so outputs will be like EDT-0400).

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=465939&siteId=1