Need to set values in columns of dataset based on value of 1 column

Aditya Singh :

I have a Dataset<Row> in java. I need to read value of 1 column which is a JSON string, parse it, and set the value of a few other columns based on the parsed JSON value.

My dataset looks like this:

|json                     | name|  age |
======================================== 
| "{'a':'john', 'b': 23}" | null| null |
----------------------------------------
| "{'a':'joe', 'b': 25}"  | null| null |
----------------------------------------
| "{'a':'zack'}"          | null| null |
----------------------------------------

And I need to make it like this:

|json                     | name  |  age |
======================================== 
| "{'a':'john', 'b': 23}" | 'john'| 23 |
----------------------------------------
| "{'a':'joe', 'b': 25}"  | 'joe' | 25 |
----------------------------------------
| "{'a':'zack'}"          | 'zack'|null|
----------------------------------------

I am unable to figure out a way to do it. Please help with the code.

Pavel Filatov :

There is a function get_json_object exists in Spark. Suggesting, you have a data frame named df, you may choose this way to solve your problem:

df.selectExpr("get_json_object(json, '$.a') as name", "get_json_object(json, '$.b') as age" )

But first and foremost, be sure that your json attribute has double quotes instead of single ones.

Note: there is a full list of Spark SQL functions. I am using it heavily. Consider to add it to bookmarks and reference time to time.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=69524&siteId=1