Convert pandas DataFrame to 2-layer nested JSON using groupby

weovibewvoibweoivwoiv :

Assume that I have a pandas dataframe called df similar to:

source      tables
src1        table1       
src1        table2          
src1        table3       
src2        table1        
src2        table2 

I'm currently able to output a JSON file that iterates through the various sources, creating an object for each, with the code below:

all_data = [] 

    for src in df['source']:
        source_data = {
            src: {
            }
        }
        all_data.append(source_data)

    with open('data.json', 'w') as f:
        json.dump(all_data, f, indent = 2)

This yields the following output:

[
  {
    "src1": {}
  },
  {
    "src2": {}
  }
]

Essentially, what I want to do is also iterate through those list of sources and add the table objects corresponding to each source respectively. My desired output would look similar to as follows:

[
  {
    "src1": {
      "table1": {},
      "table2": {},
      "table3": {}
    }
  },
  {
    "src2": {
      "table1": {},
      "table2": {}
    }
  }
]

Any assistance on how I can modify my code to also iterate through the tables column and append that to the respective source values would be greatly appreciated. Thanks in advance.

cs95 :

Is this what you're looking for?

data = [
    {k: v} 
    for k, v in df.groupby('source')['tables'].agg(
        lambda x: {v: {} for v in x}).items()
]

with open('data.json', 'w') as f:
    json.dump(data, f, indent=2)  

There are two layers to the answer here. To group the tables by source, use groupby first with an inner comprehension. You can use a list comprehension to assemble your data in this specific format overall.

[
  {
    "src1": {
      "table1": {},
      "table2": {},
      "table3": {}
    }
  },
  {
    "src2": {
      "table1": {},
      "table2": {}
    }
  }
]

Example using .apply with arbitrary data

df['tables2'] = 'abc'

def func(g): 
    return {x: y for x, y in zip(g['tables'], g['tables2'])}

data = [{k: v} for k, v in df.groupby('source').apply(func).items()]
data
# [{'src1': {'table1': 'abc', 'table2': 'abc', 'table3': 'abc'}},
#  {'src2': {'table1': 'abc', 'table2': 'abc'}}]

Note that this will not work with pandas 1.0 (probably because of a bug)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=19524&siteId=1