Problem Overview
Given the models
class Candidate(BaseModel):
name = models.CharField(max_length=128)
class Status(BaseModel):
name = models.CharField(max_length=128)
class StatusChange(BaseModel):
candidate = models.ForeignKey("Candidate", related_name="status_changes")
status = models.ForeignKey("Status", related_name="status_changes")
created_at = models.DateTimeField(auto_now_add=True, blank=True)
And SQL Tables:
candidates
+----+--------------+
| id | name |
+----+--------------+
| 1 | Beth |
| 2 | Mark |
| 3 | Mike |
| 4 | Ryan |
+----+--------------+
status
+----+--------------+
| id | name |
+----+--------------+
| 1 | Review |
| 2 | Accepted |
| 3 | Rejected |
+----+--------------+
status_change
+----+--------------+-----------+------------+
| id | candidate_id | status_id | created_at |
+----+--------------+-----------+------------+
| 1 | 1 | 1 | 03-01-2019 |
| 2 | 1 | 2 | 05-01-2019 |
| 4 | 2 | 1 | 01-01-2019 |
| 5 | 3 | 1 | 01-01-2019 |
| 6 | 4 | 3 | 01-01-2019 |
+----+--------------+-----------+------------+
I want to get the get the total number of candidates with a given status, but only the latest status_change is counted.
In other words, StatusChange is used to track history of status, but only the latest is considered when counting current status of candidates.
SQL Solution
Using SQL, I was able to achieve it using Group BY and COUNT. (SQL untested)
SELECT
status.id as status_id
, status.name as status_name
, COUNT(*) as status_count
FROM
(
SELECT
status_id,
Max(created_at) AS latest_status_change
FROM
status_change
GROUP BY status_id
)
AS last_status_count
INNER JOIN
last_status_count AS status
ON (last_status_count.status_id = status.id)
GROUP BY status.name
ORDER BY status_count DESC;
last_status_count
+-----------+-------------+--------+
| status_id | status_name | count |
+-----------+-------------+--------+
| 1 | Review | 2 | # <= Does not include instance from candidate 1
| 2 | Accepted | 1 | # because status 2 is latest
| 3 | Rejected | 1 |
+-----------+-------------+--------+
Attempted Django Solution
I need a view to return each status and their corresponding count - eg [{ status_name: "Review", count: 2 }, ...]
I am not sure how to build this queryset, without pulling all records and aggregating in python.
I figured I need annotate()
and possibly Subquery
but I haven't been able to stitch it all together.
The closest I got is this, which counts the number of status change for each status but does counts non-latest changes.
queryset = Status.objects.all().annotate(case_count=Count("status_changes"))
I have found lot's of SO questions on aggregating, but I couldn't find a clear answer on aggregating and annotating "latest.
Thanks in advance.
We can perform a query where we first filter the last StatusChange
s per Candidate
and then count the statusses:
from django.db.models import Count, F, Max
Status.objects.filter(
status_changes__in=StatusChange.objects.annotate(
last=Max('candidate__status_changes__created_at')
).filter(
created_at=F('last')
)
).annotate(
nlast=Count('status_changes')
)
For the given sample data, this gives us:
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1)]