Flat is better than nested: how to deal with deeply nested dictionaries?

Raphael :

In my work as an engineer (not software) sometimes I have to program in Python. Following the tutorials, I learned about dictionaries and how they are used to store data, they are fast etc.

I have nested dictionaries that go to a level as far as 7 like so:

mydict['level1']['level2']['level3']['level4']['level5']['level6']['level7']

I have these little monsters because I parsed a document with a library that returns something like:

components['comp1']['comp_name']
components['comp1']['comp_dimension']
components['comp1']['subcomponents']['subcomp1']
components['comp1']['subcomponents']['subcomp1']['comp_name1']
etc.

I have other dicts that are deeply nested too.

The problem I'm facing is that I need to iterate up to the last level to get and filter results since I don't know the keys beforehand, so I iterate like the following pseudo-code:

Example 1: the pretty one

filter_subcomps = ["comp1", "comp10"]
filter_value = 10.0

for component in components
    comp_name = components[component]['comp_name']
    comp_dimension = components[component]['comp_dimension']
    if components[component].get('subcomponents', False):
        subcomp_keys = filter_subcomps if filter_subcomps else components[component]['subcomponents'].keys()
        for subcomponent in subcomp_keys:
            etc etc
            if value > X:
                return value

Example 2: a little bit uglier (going down 3 levels, still 4 to go...):

# I changed the variable names and cut somethings to give a shorter example. 
# So there probably are some errors that you should ignore. 
# The goal is to show the ugliness of my code :)

    def get_peak_xy_position(self):
        peak_final_lst = list()
        x_coord_list = list()
        y_coord_list = list()

        filter_comp = self.params['options']['filter_comp']
        filter_comp_parent = self.params['options']['comp_parent']
        filter_peak = self.params['options']['filter_peak']

        for xy_coord in self.xy:
            peak_xy_list = [0]
            x_coord = float(xy_coord.split('_')[0])
            y_coord = float(xy_coord.split('_')[1])

            if self.components.get(xy_coord, False):
                comp_keys = filter_comp if filter_comp else self.components[xy_coord].keys()
                for comp in comp_keys:
                    if self.components[xy_coord].get(comp, False):
                        if self.components[xy_coord][comp].get('comp_parents', False):
                            comp_parent_keys = filter_comp_parent if filter_comp_parent else self.components[xy_coord][comp]['comp_parents'].keys()
                            for parent in comp_parent_keys:
                                if self.components[xy_coord][comp]['comp_parents'][parent].get('comp_signal', False):
                                    peak_signal = self.components[xy_coord][comp]['comp_parents'][parent]['comp_signal']['peak']
                                    final_peak = current_peak_signal
                                    if filter_peak:
                                        final_peak = current_peak_signal if filter_peak <= current_peak_signal else 0
                                    peak_xy_list.append(final_current)
            peak_final_lst.append(max(peak_final_lst))
            x_coord_list.append(x_coord)
            y_coord_list.append(y_coord)

        return x_coord_list, y_coord_list, peak_final_list

These are a fairly simple examples, sometimes the code goes to more than 10 levels of indentation which looks terrible and I have to scroll the page horizontally. Besides that, it's hard even for me to read the code after several days or weeks.

I read some tutorials from people that converts tabulated data to nested dicts and vice-versa and even people using xpath to access dict keys.

Anyway, following the zen of python I'm certainly not respecting "flat is better than nested" since my code goes to astronomic levels of indentation.

I was thinking to convert all the nested dicts to SQLite and query using SQL language instead of these ugly for loops and if conditions. So, what should I do? How do I deal with nested dicts in Python and at the same time keep the code as flat as possible? I'm a little bit lost here.

PS: my question has no relation to “Flat is better than nested” - for data as well as code? since I already have deeply nested dicts. I want to know how to deal with these dicts, query / filter values etc. and at the same time have a flat code.

Eugene Mayevski 'Callback :

There exist several approaches to the problem:

  1. Flatten the structure. If you presume that processing of the flat table will work faster, it might make sense to flatten your structure into the plain list of class or struct instances, and then process the plain list.
  2. If the structure is uniform across levels (even if the names are different on each level), you can use simple recursion. You would have the function which will check if the item on certain level has children, and call itself with each of those children if they are present, or call the data processing function for the final level entries. If subcomponent names differ, you can have an array of such names that would say "on level 1 the items are called 'comp*', on level 2 - 'subcomp*' and so on.
  3. If levels require completely different handling, introduce separate functions for each level.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=169313&siteId=1