Django Annotated Query to Count Only Latest from Reverse Relationship

gtalarico :

Problem Overview

Given the models

class Candidate(BaseModel):
    name = models.CharField(max_length=128)

class Status(BaseModel):
    name = models.CharField(max_length=128)

class StatusChange(BaseModel):
    candidate = models.ForeignKey("Candidate", related_name="status_changes")
    status = models.ForeignKey("Status", related_name="status_changes")
    created_at = models.DateTimeField(auto_now_add=True, blank=True)

And SQL Tables:

candidates
+----+--------------+
| id | name         |
+----+--------------+
|  1 | Beth         |
|  2 | Mark         |
|  3 | Mike         |
|  4 | Ryan         |
+----+--------------+

status
+----+--------------+
| id | name         |
+----+--------------+
|  1 | Review       |
|  2 | Accepted     |
|  3 | Rejected     |
+----+--------------+

status_change
+----+--------------+-----------+------------+
| id | candidate_id | status_id | created_at |
+----+--------------+-----------+------------+
|  1 | 1            | 1         | 03-01-2019 |
|  2 | 1            | 2         | 05-01-2019 |
|  4 | 2            | 1         | 01-01-2019 |
|  5 | 3            | 1         | 01-01-2019 |
|  6 | 4            | 3         | 01-01-2019 |
+----+--------------+-----------+------------+

I want to get the get the total number of candidates with a given status, but only the latest status_change is counted.

In other words, StatusChange is used to track history of status, but only the latest is considered when counting current status of candidates.

SQL Solution

Using SQL, I was able to achieve it using Group BY and COUNT. (SQL untested)

SELECT
       status.id as status_id
    ,  status.name as status_name
    , COUNT(*) as status_count
FROM
    (
    SELECT
        status_id, 
        Max(created_at) AS latest_status_change
    FROM 
        status_change
    GROUP BY status_id
    ) 
AS last_status_count
INNER JOIN 
    last_status_count AS status 
    ON (last_status_count.status_id = status.id)
GROUP BY status.name
ORDER BY status_count DESC;
last_status_count
+-----------+-------------+--------+
| status_id | status_name | count  |
+-----------+-------------+--------+
| 1         | Review      | 2      | # <= Does not include instance from candidate 1
| 2         | Accepted    | 1      | # because status 2 is latest
| 3         | Rejected    | 1      |
+-----------+-------------+--------+

Attempted Django Solution

I need a view to return each status and their corresponding count - eg [{ status_name: "Review", count: 2 }, ...]

I am not sure how to build this queryset, without pulling all records and aggregating in python.

I figured I need annotate() and possibly Subquery but I haven't been able to stitch it all together.

The closest I got is this, which counts the number of status change for each status but does counts non-latest changes.

    queryset = Status.objects.all().annotate(case_count=Count("status_changes"))

I have found lot's of SO questions on aggregating, but I couldn't find a clear answer on aggregating and annotating "latest.

Thanks in advance.

Willem Van Onsem :

We can perform a query where we first filter the last StatusChanges per Candidate and then count the statusses:

from django.db.models import Count, F, Max

Status.objects.filter(
    status_changes__in=StatusChange.objects.annotate(
        last=Max('candidate__status_changes__created_at')
    ).filter(
        created_at=F('last')
    )
).annotate(
    nlast=Count('status_changes')
)

For the given sample data, this gives us:

>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1)]

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=169438&siteId=1