Python data structure algorithm question 18: Group records by a certain field

question:

You have a dictionary or sequence of instances, and you want to iterate through them grouped by a specific field such as date.

solution:

The itertools.groupby() function is very useful for such data grouping operations. For demonstration purposes, assume you already have the following list of dictionaries.

rows = [
    {
    
    'address': '5412 N CLARK', 'date': '07/01/2012'},
    {
    
    'address': '5148 N CLARK', 'date': '07/04/2012'},
    {
    
    'address': '5800 E 58TH', 'date': '07/02/2012'},
    {
    
    'address': '2122 N CLARK', 'date': '07/03/2012'},
    {
    
    'address': '5645 N RAVENSWOOD', 'date': '07/02/2012'},
    {
    
    'address': '1060 W ADDISON', 'date': '07/02/2012'},
    {
    
    'address': '4801 N BROADWAY', 'date': '07/01/2012'},
    {
    
    'address': '1039 W GRANVILLE', 'date': '07/04/2012'},
]

Now suppose you want to iterate over chunks of data grouped by date. In order to do this, you first need to sort by the specified field (in this case, date) and then call the itertools.groupby() function.

from operator import itemgetter from itertools import groupby
# Sort by the desired field first
rows.sort(key=itemgetter('date'))
# Iterate in groups
for date, items in groupby(rows, key=itemgetter('date')):
print(date) for i in items:
print(' ', i)

operation result:

07/01/2012
  {
    
    'date': '07/01/2012', 'address': '5412 N CLARK'}
  {
    
    'date': '07/01/2012', 'address': '4801 N BROADWAY'}
07/02/2012
  {
    
    'date': '07/02/2012', 'address': '5800 E 58TH'}
  {
    
    'date': '07/02/2012', 'address': '5645 N RAVENSWOOD'}
  {
    
    'date': '07/02/2012', 'address': '1060 W ADDISON'}
07/03/2012
  {
    
    'date': '07/03/2012', 'address': '2122 N CLARK'}
07/04/2012
  {
    
    'date': '07/04/2012', 'address': '5148 N CLARK'}
  {
    
    'date': '07/04/2012', 'address': '1039 W GRANVILLE'}

The groupby() function scans the entire sequence and finds a sequence of elements with the same consecutive value (or the same value returned by the specified key function). At each iteration, it returns a value and an iterator object that can generate all objects in the group whose element values ​​are all equal to the above value.

A very important preparation step is to sort the data according to specified fields. Because groupby() only checks consecutive elements, the grouping function will not get the desired results if the sorting is not done beforehand.

If you just want to group the data into a large data structure based on the date field and allow random access, then it is better to use defaultdict() to build a multi-value dictionary.

from collections import defaultdict rows_by_date = defaultdict(list) for row in rows:
    rows_by_date[row['date']].append(row)

You can easily access the corresponding records for each specified date.

>>> for r in rows_by_date['07/01/2012']:
... print(r)
...
{
    
    'date': '07/01/2012', 'address': '5412 N CLARK'}
 {
    
    'date': '07/01/2012', 'address': '4801 N BROADWAY'}
>>>

If you are not very concerned about memory usage, this method will run faster than sorting first and then iterating through the groupby() function.

Guess you like

Origin blog.csdn.net/m0_68635815/article/details/135442342