Currently, I am trying to find the average of every index inside the data set not the overall average mean (please view photo).
'410' represents a category let's say an engine model (Honda in case) and I would like to average out the first number in each columns (1-4) until the very last data point:
1 + 8218 + 352 + 111 = 8682 / 4 = 2170.5 would represent the average of the 1st index. I would like to figure a way to do this for the next indexes until the very last data point.
Thanks in advance!
Setting aside for now a discussion about comma separated lists being an anti-pattern (we really do need to have that discussion.)
But to address just the question that was asked...
I would make use of SUBSTRING_INDEX
function to extract the elements from the comma separated list, and then convert those to numeric.
We need to extract the individual elements if we want to have them available for aggregate function to operate on.
Consider:
SELECT t.rn
, NULLIF( SUBSTRING_INDEX(CONCAT( t.foo ,REPEAT(',',255)),',', 1 ) ,'')+0 AS n1
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t.foo ,REPEAT(',',255)),',', 2 ),',',-1),'')+0 AS n2
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t.foo ,REPEAT(',',255)),',', 3 ),',',-1),'')+0 AS n3
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t.foo ,REPEAT(',',255)),',', 4 ),',',-1),'')+0 AS n4
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t.foo ,REPEAT(',',255)),',', 5 ),',',-1),'')+0 AS n5
FROM (
SELECT 1 AS rn, '1,211,234,255,222,202,217,205' AS foo
UNION ALL SELECT 2,'8218,121,129,127,352,382,339'
UNION ALL SELECT 3,'352,8216,144,104'
UNION ALL SELECT 4,'111,100,109'
) t
ORDER BY t.rn
returns:
rn n1 n2 n3 n4 n5
-- ------ ------ ------ ------ ------
1 1 211 234 255 222
2 8218 121 129 127 352
3 352 8216 144 104 (NULL)
4 111 100 109 (NULL) (NULL)
We could use the query that returns this resultset as an inline view in an outer query, to perform aggregation.
If there are objections about how messy the syntax to achieve this is, those objections are mostly overruled by the anti-pattern of storing values in comma separated lists.
Note that the approach above can be applied for a static number of entries; easy enough to see how we would extract the 6th, 7th, nth value from the comma separated lists. Note that when the SUBSTRING_INDEX function doesn't find a matching delimiter, it will return the whole string. Concatenating on a bunch of commas is meant to pad out the list of values. (The demonstration SQL above, is padding each list with 255 commas. The sample lists in the demo data are intended to demonstrate the behavior when the comma separated list doesn't contain enough values, returning a NULL.)
FOLLOWUP
Q: to better understand your answer: how can we add the SUBSTRING_INDEX
and CONCAT
to SELECT SAVEKEY, AVG(DATA) FROM( SELECT SAVEKEY, DATA FROM camarillo_1_specrec UNION ALL SELECT SAVEKEY, DATA FROM camarillo_2_specrec ) AS subquery GROUP BY SAVEKEY
A: The answer above uses an inline view t
to represents the rowsource illustrated (confusingly represented as an image rather than text, bad form). Based on the followup question, we presume that it's table named camarillo_1_specrec
. The original question says that '410' represents a category; the followup question shows a column name of savekey. The question being asked is confusingly obfuscated.
Let's assume we have a rowsource, let's call it q
that returns a set
savekey val
------- -----
410 211
410 121
410 8216
410 100
Then we could do:
SELECT q.savekey
, AVG(q.val) AS avg_val
FROM q
GROUP
BY q.savekey
To return: ( 211 + 121 + 8216 + 100 ) / 4
savekey avg_val
------- -------
410 2162
We can use an inline view to return the resultset q
. Based on the followup question, ...
SELECT q.savekey
, AVG(q.val1) AS avg_val1
, AVG(q.val2) AS avg_val2
, AVG(q.val3) AS avg_val3
, AVG(q.val4) AS avg_val4
, AVG(q.val5) AS avg_val5
FROM (
SELECT t1.savekey
, NULLIF( SUBSTRING_INDEX(CONCAT( t1.data ,REPEAT(',',255)),',', 1 ) ,'')+0 AS val1
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t1.data ,REPEAT(',',255)),',', 2 ),',',-1),'')+0 AS val2
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t1.data ,REPEAT(',',255)),',', 3 ),',',-1),'')+0 AS val3
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t1.data ,REPEAT(',',255)),',', 4 ),',',-1),'')+0 AS val4
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t1.data ,REPEAT(',',255)),',', 5 ),',',-1),'')+0 AS val5
FROM camarillo_1_specrec t1
UNION ALL
SELECT t2.savekey
, NULLIF( SUBSTRING_INDEX(CONCAT( t2.data ,REPEAT(',',255)),',', 1 ) ,'')+0 AS val1
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t2.data ,REPEAT(',',255)),',', 2 ),',',-1),'')+0 AS val2
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t2.data ,REPEAT(',',255)),',', 3 ),',',-1),'')+0 AS val3
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t2.data ,REPEAT(',',255)),',', 4 ),',',-1),'')+0 AS val4
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t2.data ,REPEAT(',',255)),',', 5 ),',',-1),'')+0 AS val5
FROM camarillo_2_specrec t2
) q
GROUP
BY q.savekey
Note that the inline view query q
should be tested separately, make sure we get that working right, before we wrap that in an outer query. That's what gets us the individual values (extracted from the really-bad-idea anti-pattern comma separated list) lined up in separate columns.
The original answer gave a possible query pattern to address the big rock problem, unwinding the comma separated list rigmarole into separate columns that we can work with.