Python - iterating through list and dictionary to get a nested list output

desmonwu2001 :

I have a dictionary mydict which contains some filenames as keys and text within them as values.

I am extracting a list of words from the text in each file. Words are stored in a list mywords.

I have tried the following.

mydict = {'File1': 'some text. \n Foo extract this. \n Bar extract this', 
'File2': 'more text. \n Bar extract this too.'}
mywords = ['Foo', 'Bar']
mylist= []
for k,v in mydict.items():
        for word in mywords:
            extracted = (re.findall('^ ' + word + ".*", v, flags=re.IGNORECASE|re.MULTILINE))
            mylist.append(extracted[:1])

This gives me

[[' Foo extract this. '],
 [' Bar extract this'],
 [],
 [' Bar extract this too.']]

However, I want the output to have 2 nested lists (for each file) instead of a separate list each time it searches a word in a file.

Desired output:

[[' Foo extract this. '], [' Bar extract this']],
 [[], [' Bar extract this too.']]
Marcel :

You might want to try making sublists and appending them to your list instead. Here's a possible solution:

mydict = {'File1': 'some text. \n Foo extract this. \n Bar extract this', 
'File2': 'more text. \n Bar extract this too.'}
mywords = ['Foo', 'Bar']
mylist= []
for k,v in mydict.items():
    sublist = []
    for word in mywords:
        extracted = (re.findall('^ ' + word + ".*", v, flags=re.IGNORECASE|re.MULTILINE))
        sublist.append(extracted[:1])
    mylist.append(sublist)

This outputs: [[[' Foo extract this. '], [' Bar extract this']], [[], [' Bar extract this too.']]]


If you wanted to have the strings without the surrounding list, insert the first result only if there is a result:

import re

mydict = {'File1': 'some text. \n Foo extract this. \n Bar extract this', 
'File2': 'more text. \n Bar extract this too.'}
mywords = ['Foo', 'Bar']
mylist= []
for k,v in mydict.items():
    sublist = []
    for word in mywords:
        extracted = (re.findall('^ ' + word + ".*", v, flags=re.IGNORECASE|re.MULTILINE))
        if extracted: # Checks if there is at least one element in the list
            sublist.append(extracted[0])
    mylist.append(sublist)

This outputs: [[' Foo extract this. ', ' Bar extract this'], [' Bar extract this too.']]


If you want to be able to get several results from each file, you can do as follows (note that I put another match for Foo in the second file:

import re

mydict = {'File1': 'some text. \n Foo extract this. \n Bar extract this', 
'File2': 'more text. \n Bar extract this too. \n Bar extract this one as well'}
mywords = ['Foo', 'Bar']
mylist= []
for k,v in mydict.items():
    sublist = []
    for word in mywords:
        extracted = (re.findall('^ ' + word + ".*", v, flags=re.IGNORECASE|re.MULTILINE))
        if extracted:
            sublist += extracted
    mylist.append(sublist)

This outputs: [[' Foo extract this. ', ' Bar extract this'], [' Bar extract this too. ', ' Bar extract this one as well']]

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=351141&siteId=1