Finding the Average of each index using SQL

dhahn :

Currently, I am trying to find the average of every index inside the data set not the overall average mean (please view photo).

'410' represents a category let's say an engine model (Honda in case) and I would like to average out the first number in each columns (1-4) until the very last data point:

1 + 8218 + 352 + 111 = 8682 / 4 = 2170.5 would represent the average of the 1st index. I would like to figure a way to do this for the next indexes until the very last data point.

enter image description here

Thanks in advance!

spencer7593 :

Setting aside for now a discussion about comma separated lists being an anti-pattern (we really do need to have that discussion.)

But to address just the question that was asked...

I would make use of SUBSTRING_INDEX function to extract the elements from the comma separated list, and then convert those to numeric.

We need to extract the individual elements if we want to have them available for aggregate function to operate on.

Consider:

SELECT t.rn
     , NULLIF(                SUBSTRING_INDEX(CONCAT( t.foo ,REPEAT(',',255)),',', 1 )        ,'')+0  AS n1
     , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t.foo ,REPEAT(',',255)),',', 2 ),',',-1),'')+0  AS n2
     , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t.foo ,REPEAT(',',255)),',', 3 ),',',-1),'')+0  AS n3
     , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t.foo ,REPEAT(',',255)),',', 4 ),',',-1),'')+0  AS n4
     , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t.foo ,REPEAT(',',255)),',', 5 ),',',-1),'')+0  AS n5
  FROM ( 
         SELECT 1 AS rn, '1,211,234,255,222,202,217,205' AS foo 
         UNION ALL SELECT 2,'8218,121,129,127,352,382,339'
         UNION ALL SELECT 3,'352,8216,144,104'
         UNION ALL SELECT 4,'111,100,109'
       ) t
 ORDER BY t.rn

returns:

  rn      n1      n2      n3      n4      n5  
  --  ------  ------  ------  ------  ------
   1       1     211     234     255     222
   2    8218     121     129     127     352
   3     352    8216     144     104  (NULL)
   4     111     100     109  (NULL)  (NULL)

We could use the query that returns this resultset as an inline view in an outer query, to perform aggregation.

If there are objections about how messy the syntax to achieve this is, those objections are mostly overruled by the anti-pattern of storing values in comma separated lists.


Note that the approach above can be applied for a static number of entries; easy enough to see how we would extract the 6th, 7th, nth value from the comma separated lists. Note that when the SUBSTRING_INDEX function doesn't find a matching delimiter, it will return the whole string. Concatenating on a bunch of commas is meant to pad out the list of values. (The demonstration SQL above, is padding each list with 255 commas. The sample lists in the demo data are intended to demonstrate the behavior when the comma separated list doesn't contain enough values, returning a NULL.)


FOLLOWUP

Q: to better understand your answer: how can we add the SUBSTRING_INDEX and CONCAT to SELECT SAVEKEY, AVG(DATA) FROM( SELECT SAVEKEY, DATA FROM camarillo_1_specrec UNION ALL SELECT SAVEKEY, DATA FROM camarillo_2_specrec ) AS subquery GROUP BY SAVEKEY

A: The answer above uses an inline view t to represents the rowsource illustrated (confusingly represented as an image rather than text, bad form). Based on the followup question, we presume that it's table named camarillo_1_specrec. The original question says that '410' represents a category; the followup question shows a column name of savekey. The question being asked is confusingly obfuscated.

Let's assume we have a rowsource, let's call it q that returns a set

savekey    val
-------  -----
    410    211
    410    121
    410   8216
    410    100 

Then we could do:

SELECT q.savekey
     , AVG(q.val) AS avg_val
  FROM q
 GROUP
    BY q.savekey

To return: ( 211 + 121 + 8216 + 100 ) / 4

 savekey  avg_val
 -------  -------
     410     2162     

We can use an inline view to return the resultset q. Based on the followup question, ...

SELECT q.savekey
     , AVG(q.val1) AS avg_val1
     , AVG(q.val2) AS avg_val2
     , AVG(q.val3) AS avg_val3
     , AVG(q.val4) AS avg_val4
     , AVG(q.val5) AS avg_val5
  FROM (
         SELECT t1.savekey
              , NULLIF(                SUBSTRING_INDEX(CONCAT( t1.data ,REPEAT(',',255)),',', 1 )        ,'')+0  AS val1
              , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t1.data ,REPEAT(',',255)),',', 2 ),',',-1),'')+0  AS val2
              , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t1.data ,REPEAT(',',255)),',', 3 ),',',-1),'')+0  AS val3
              , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t1.data ,REPEAT(',',255)),',', 4 ),',',-1),'')+0  AS val4
              , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t1.data ,REPEAT(',',255)),',', 5 ),',',-1),'')+0  AS val5
           FROM camarillo_1_specrec t1
          UNION ALL
         SELECT t2.savekey
              , NULLIF(                SUBSTRING_INDEX(CONCAT( t2.data ,REPEAT(',',255)),',', 1 )        ,'')+0  AS val1
              , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t2.data ,REPEAT(',',255)),',', 2 ),',',-1),'')+0  AS val2
              , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t2.data ,REPEAT(',',255)),',', 3 ),',',-1),'')+0  AS val3
              , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t2.data ,REPEAT(',',255)),',', 4 ),',',-1),'')+0  AS val4
              , NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT( t2.data ,REPEAT(',',255)),',', 5 ),',',-1),'')+0  AS val5
           FROM camarillo_2_specrec t2
       ) q
 GROUP
    BY q.savekey

Note that the inline view query q should be tested separately, make sure we get that working right, before we wrap that in an outer query. That's what gets us the individual values (extracted from the really-bad-idea anti-pattern comma separated list) lined up in separate columns.

The original answer gave a possible query pattern to address the big rock problem, unwinding the comma separated list rigmarole into separate columns that we can work with.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=195443&siteId=1