Assume that I have a pandas dataframe called df
similar to:
source tables
src1 table1
src1 table2
src1 table3
src2 table1
src2 table2
I'm currently able to output a JSON file that iterates through the various sources, creating an object for each, with the code below:
all_data = []
for src in df['source']:
source_data = {
src: {
}
}
all_data.append(source_data)
with open('data.json', 'w') as f:
json.dump(all_data, f, indent = 2)
This yields the following output:
[
{
"src1": {}
},
{
"src2": {}
}
]
Essentially, what I want to do is also iterate through those list of sources and add the table objects corresponding to each source respectively. My desired output would look similar to as follows:
[
{
"src1": {
"table1": {},
"table2": {},
"table3": {}
}
},
{
"src2": {
"table1": {},
"table2": {}
}
}
]
Any assistance on how I can modify my code to also iterate through the tables column and append that to the respective source values would be greatly appreciated. Thanks in advance.
Is this what you're looking for?
data = [
{k: v}
for k, v in df.groupby('source')['tables'].agg(
lambda x: {v: {} for v in x}).items()
]
with open('data.json', 'w') as f:
json.dump(data, f, indent=2)
There are two layers to the answer here. To group the tables by source, use groupby
first with an inner comprehension. You can use a list comprehension to assemble your data in this specific format overall.
[
{
"src1": {
"table1": {},
"table2": {},
"table3": {}
}
},
{
"src2": {
"table1": {},
"table2": {}
}
}
]
Example using .apply
with arbitrary data
df['tables2'] = 'abc'
def func(g):
return {x: y for x, y in zip(g['tables'], g['tables2'])}
data = [{k: v} for k, v in df.groupby('source').apply(func).items()]
data
# [{'src1': {'table1': 'abc', 'table2': 'abc', 'table3': 'abc'}},
# {'src2': {'table1': 'abc', 'table2': 'abc'}}]
Note that this will not work with pandas 1.0 (probably because of a bug)