hive Commonly used data analysis indicators-Web access data indicators-User activity indicators-Paid behavior indicators


      Here are a few commonly used data analysis indicators, and will slowly increase the indicators used in their work and even created. As for sql, slowly add it.

1. Web access data indicators

1. UV (UniqueVisitor) independent visitor

      uv-number of unique visitors-the number of all visitors in a day. That is the number of uid deduplication within one day.
      The independent client terminal (computer, mobile phone, pad, etc.) visited is an independent visitor, and the MAC address is technically used as the unique identification ID. In theory (limited to theory), the 24-hour repeated access terminal is only counted once. (PS: UV is also different from the visit behavior of Visits. Visits usually takes half an hour as the deduplication statistical period.)

2. PV (Page View) page views

      pv-The total number of visits in a day. The number of times a web page is visited can also be referred to simply as the number of visits or views. Some statistical tools calculate the PV every time the user refreshes (this is also one of the reasons for the high PV of many websites). Because PV data is usually one of the highest (UV / IP / RU / WAU) data indicators, so The number of website visits is currently the most commonly used caliber in the statistics released by Internet companies.

3. (Internet Protocol) Independent IP

      4. Add IP. That is, an ID that has never appeared before, that is, the number of all IPs that have been deduplicated within a day and have never appeared in historical data.

select count(distinct dataclear.cip) 
from dataclear 
where dataclear.reportTime = '2020-02-17'
and cip not in
(
	select dc2.cip 
	from dataclear as dc2 
	where dc2.reportTime < '2020-02-17'
);

4. BR (Bounce Rate) bounce rate

      Refers to the landing page (Landing Page), without clicking to enter any other page or other interactive behavior, that is, the percentage of visitors who directly leave the total number of visitors to the landing page. This indicator can measure the quality of a web page or a website.

/* 等几天再写字段和表啥意思   */
select 
	round(br_taba.a/br_tabb.b,4)as br 
from (
	select 
	count(*) as a 
	from (
		select ssid from dataclear
		where reportTime='2020-02-17' 
		group by ssid 
		having count(ssid) = 1) as br_tab
	) as br_taba,
	(select
		count(distinct ssid) as b 
	from dataclear 
	where reportTime='2020-02-17'
) as br_tabb;

2. Statistical indicators of user activity data

1. RU (registered users) registered users

      The number of users who have completed the registration, the strict data should be the number of registered users that have been activated through valid verification, and the enlarged data can be filled in and submitted to submit the registration information.

2.AU (Active users) active users

      Users who have logged in or used a product within a certain period of time.

3.DAU (Daily Active User) daily active users

      Number of users who log in or use a product in a single day (excluding users who log in repeatedly). Usually the game-type paid website will adopt the concept of DAU.

4. MAU (monthly activeusers) monthly active users

      Extend the DAU statistical period to one month, which is the MAU data.

5.DNU (Daily New Users) daily new users

      That is, the number of users newly registered and logged in that day.

6. ACU (Averageconcurrent users) average number of simultaneous online users

      The average number of simultaneous online users is usually divided by the total number of simultaneous online users per hour within 24 hours divided by 24 hours.

7. PCU (Peakconcurrent users) the highest simultaneous online users

      The highest number of users online at the same time within 24 hours. If you want higher data performance, you can usually use the value of the largest number of simultaneous online users within an hour; if more stringent, you can also count the instantaneous peak value of the number of simultaneous online users in a certain second.

8. The average online time of TS (Time Spending) users

      The total duration of all online users divided by the number of online users in the period.

9. URR (Users RetentionRate) user retention rate

      Among new users, users who are still active after a certain period account for the proportion of total new users. It is relatively strict to calculate according to different interval days as the statistical period unit; according to the frequency characteristics of different products, it is relatively more reasonable to calculate according to the weekly interval as the statistical unit, because few products require users to log in every day. in use.
      Retention on the next day: Retention on the next day on January 1, 2020 = Number of visitors who visited on January 1, 2020 and who still visited on January 2, 2020 / Number of users who visited on January 1, 2020.
      Retention for the next 7 days: Retention for the next day on January 1, 2020 = Number of visitors who visited on January 1, 2020 and who still visited on January 8, 2020 / Number of users who visited on January 1, 2020.
      Note: The next 7 days retention refers to-visit today and visit again 7 days later; instead of visiting today, visit any day within 7 days from today.

/* 
	cuid是访客的id。
	这里使用了笛卡尔积,其实使用left join也可以。
*/ 
select
	count(cuid) ci,
	count(cuid)/t11.uv  ciL 
from 
(
	select
		cuid	
	from tb_cuid_1d 
	where event_day = "20190101"
	group by cuid
) t1 
join 
(
	select
		cuid	
	from tb_cuid_1d 
	where event_day = "20190102"
	group by cuid
) t2 on t1.cuid = t2.cuid
,(
	select 
		count(cuid) uv
	from 
	(select 
		cuid
	from tb_cuid_1d 
	where event_day = "20190101"
	group by cuid
) t11
;

10.UCR (Users Churn Rate) user churn rate

      A group of concepts opposite to "user retention rate" refers to users who have no active activities such as login use after a certain period of new users.
      User churn rate = (1-user retention rate) * 100% calculation

3. Statistical indicators of user payment behavior data

1.PU (Paying User)

      Users who have paid behavior. This indicator weakens the background of the statistical cycle, so it is not commonly used in data statistics.

2. CR (ConversionRate) paid conversion rate

      Among the new users, new users with payment behavior divided by the total number of new users. This formula is similar to the payment conversion rate indicator in e-commerce online shopping.

3. ARPU (Average Revenue Per User) average revenue per user

      A measure of a certain paid product or business income level within a period of time. Generally, telecommunications operators or online game companies adopt more, while retail e-commerce uses less.

4. ARPPU (Average Revenue Per Paying User) average revenue per paying user

      ARPPU = Total revenue in a certain period / Total PU number in this period.

5. APA (Active PaymentAccount) active paying users

      Refers to the number of paid users (active PUs) that remain active during the statistical period. The users here are generally subject to the user registration ID. It is necessary to exclude silent paying users (silent PU) who had paid behavior but had no active behavior during the statistical period.

6. PUR (Paying User Rate) user payment rate

      The calculation formula is: APA / AU, usually based on the active users within a specific statistical period. Refers to the number of active paying users (APA) in the statistical period divided by the total number of active users (AU) in the period.

7. LTV (Life Time Value) life cycle value

      The total value of all economic benefits contributed by the user during the entire life cycle from the first login to the last login. Since the user's life cycle is usually difficult to count, in practice, more "LTV_N" is used to count the total value contributed by new users within N days after the first login. This indicator is more flexible and practical.

3. Summary

      The above are just some of the more commonly used statistical terms for operational data. It cannot be said to be a complete book. There are many more detailed data indicators in the operation of games and APPs.
      With the different product shapes and life cycle stages, the data analysis indicators that are biased will have some differences, and the data statistical analysis indicators will also continue to innovate. As long as it is effective to analyze the data of the products and businesses that you operate, you can also create new data statistical analysis indicators yourself. This is not a privilege only for certain authorities.

4. I do n’t know what it is

      I am young and ignorant, there are a few indicators do not know how to classify. . .

1. vv (not W, but two V)

      vv (not W, but two V)-the number of independent sessions, the number of sessions in a day, that is, the number of ssid (the field that identifies the session) within a day after deduplication

2. Add new customers.

      The number of new customers. That is, customers who have never appeared before (you can use a field as the unique identifier of the customer and use this field to filter), similar to adding an IP. That is: the number of user_id within one day (assuming that this field is used as the unique identifier of the customer) and has never appeared in historical data.

3. Average length of visit.

      Average access duration: The average access duration of all sessions in a day (there is a field in the table that records the access time, assuming stime). A session duration = the maximum value of all access times in a session-the minimum value of all access times in a session. (This is a bit unclear, I will study it later)

select avg(atTab.usetime) as avgtime 
from
(	select max(stime) - min(stime) as usetime 
	from dataclear 
	/*  reportTime 是访问时间*/
	where reportTime='2020-02-17' 
	group by ssid
) as atTab;
Published 48 original articles · Like 36 · Visits 130,000+

Guess you like

Origin blog.csdn.net/weixin_42845682/article/details/105128496