Skip or ignore YAML parsing errors with Jackson

David DeMar :

I'm trying to parse a large YAML file (over 3000 lines) in a Java application that is downloaded from another system (a PHP app). I have limited control over the YAML file itself. Changes to it are done manually and the YAML parser in the other system seems to be a lot more forgiving about how the YAML is formatted.

The problem I'm running into is that when I try to parse the file with Jackson, I get an exception because a handful of lines have an invalid character at the end. This causes the entire parse attempt to fail.

Is there a way to configure or set up Jackson to simply skip over lines or YAML blocks if they are malformed or have invalid tokens?

Example YAML

example.good_yaml:
  description: "Example of good YAML"
example.bad_yaml:
  description: "Example of bad YAML")

Parsing Code

ObjectMapper mapper = new YAMLMapper();
mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
Map<String, Object> result = mapper.readValue(sourceYaml, new TypeReference<Map<String, Object>>() {});

Error

com.fasterxml.jackson.dataformat.yaml.snakeyaml.error.MarkedYAMLException: while parsing a block mapping
 in 'reader', line 4, column 3:
      description: "Example of bad YAML")
      ^
expected <block end>, but found '<scalar>'
 in 'reader', line 4, column 37:
      description: "Example of bad YAML")
                                        ^

 at [Source: (File); line: 4, column: 37]
flyx :

That would require SnakeYAML, which is used by Jackson for parsing, to support this. The options for loading don't include a setting for this, nor do I know of any API for it, so I am pretty sure that it doesn't have any such functionality.

Mind that recovery from syntax errors is a rather complex endeavor (even though it seems simple for your specific use-case) and I don't know of any YAML implementation which implements that (since most of them are rewrites of PyYAML/libyaml).

Chances are that it's easier to sanitize your file with a well-placed sed command assuming there are a small number of repeating syntax errors that are easily discoverable with a RegEx.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=26168&siteId=1