Data analysis-summary

background

  With the increasing prosperity of the Internet and the increasing popularity of artificial intelligence, we will generate a large amount of data, which hides a large amount of useful and core information behind it. For example, by collecting Taobao , Jingdong , Lynx and other shopping data, you can probably understand what users prefer to buy goods, to build the knowledge map, then recommend some commodities to the user through the system recommendation algorithm, so as to promote consumption. Therefore, data analysis is becoming more and more important. Of course, data analysis is also one of the recommendations.

1. The concept of data analysis

  Data analysis refers to the use of appropriate statistical analysis methods to analyze a large amount of collected data, to summarize, understand and digest them, in order to maximize the development of data functions and play the role of data. Data analysis is a process of detailed research and generalization of data in order to extract useful information and form conclusions. In fact, the mathematical foundation of data analysis was established in the early 20th century, but it was not until the advent of computers that practical operations became possible and data analysis was promoted. Data analysis is the product of the combination of mathematics and computer science.
  The purpose of data analysis is to concentrate and refine the information hidden in a large number of seemingly chaotic data, so as to find out the inner law of the research object. In practical applications, data analysis can help people make judgments in order to take appropriate actions. Data analysis is the process of organizing and purposefully collecting data, analyzing data, and turning it into information. This process is the supporting process of the quality management system. In the entire life cycle of the product, including the various processes from market research to after-sales service and final disposal, data analysis processes need to be appropriately used to enhance effectiveness. For example, before starting a new design, a designer must conduct extensive design investigations and analyze the data obtained to determine the design direction. Therefore, data analysis has an extremely important position in industrial design.
  In the field of statistics, some people divide data analysis into descriptive statistical analysis, exploratory data analysis, and confirmatory data analysis; among them, exploratory data analysis focuses on discovering new features in the data, while confirmatory data analysis is Focus on the verification or falsification of existing assumptions. Data analysis mainly includes: exploratory data analysis, qualitative data analysis, offline data analysis, online data analysis and other processes.

2. Types of data analysis

  As mentioned earlier, the types of data analysis, but we roughly introduce these types of data analysis:

  1. Exploratory data analysis : refers to a method of analyzing data in order to form a test worthy of hypothesis, which is a supplement to traditional statistical hypothesis testing methods. This method is named by the famous American statistician John Tukey.
  2. Qualitative data analysis : This kind of data analysis method is also called "qualitative data analysis", "qualitative research" or "qualitative research data analysis". It refers to non-numerical data such as words, photos, observation results (or Said data) analysis.
  3. Offline data analysis : Offline data analysis is used for more complex and time-consuming data analysis and processing. It is generally built on cloud computing platforms, such as the open source HDFS file system and MapReduce computing framework. The Hadoop cluster consists of hundreds or even thousands of servers, storing several petabytes or even dozens of petabytes of data. Thousands of offline data analysis jobs are running every day, and each job processes hundreds of MB to hundreds of TB or more. Data, the running time is minutes, hours, days or even longer.
  4. Online data analysis : Online data analysis is also called online analysis and processing, which is used to process users' online requests. It requires relatively high response time (usually no more than a few seconds). Compared with offline data analysis, online data analysis can process user requests in real time, allowing users to change the constraints and restrictions of analysis at any time. Compared with offline data analysis, the amount of data that can be processed by online data analysis is much smaller, but with the development of technology, the current online analysis system has been able to process tens of millions or even hundreds of millions of records in real time. The traditional online data analysis system is built on a data warehouse with a relational database as the core, while the online big data analysis system is built on the NoSQL system of the cloud computing platform. If there is no online analysis and processing of big data, there will be no way to store and index a huge number of Internet pages, there will be no efficient search engines today, nor will there be microblogs, blogs, and social networks built on the basis of big data processing And so on.

3. Analysis method

  The methods of data analysis mainly include tabulation method and graphing method.
  The list method is to express the data in a list according to a certain rule, and it is the most commonly used method for recording and processing. The design of the table requires that the corresponding relationship is clear, simple and clear, which is conducive to discovering the correlation between the related quantities; in addition, it is also required to indicate the name, symbol, order of magnitude and unit of each quantity in the title bar: it can also be listed as needed. Calculation columns and statistical columns other than the original data.
  The mapping method can most conspicuously express the change relationship between various physical quantities. Some results required by the experiment can be easily obtained from the graphs and lines, and some complex functional relationships can also be represented graphically through certain transformations. Of course, there are two main ways to generate charts and graphs: manual tabulation and automatic program generation. The program tabulation is through the corresponding software, such as SPSS, Excel, MATLAB, etc. Enter the survey data into the program, and through the operation of these software, the final results can be obtained, and the results can be displayed in the form of charts or graphs. Graphs and charts can directly reflect the research results, which greatly saves designers' time, helps designers better analyze and predict the products required by the market, and pave the way for further design. At the same time, these analysis forms are also used in product sales statistics, so that the latest product sales can be given intuitively, and the future market sales can be analyzed and forecasted in time. Therefore, the data analysis method is widely used in industrial design, and it is extremely important.

3. Why study data analysis?

  We also mentioned the concepts, types, and commonly used methods of data analysis. Then, why should we learn the technology of data analysis?
  In our daily work, we always encounter some problems, including the following:

  • 1. Has the KPI of this month been completed , the business data has not been improved, and the analysis has no effect
  • 2. DBM staff who do SQL : run data for the business department every day, doing boring work every day.
  • 3. The work is not systematic, fragmented and fragmented.
  • 4. Every project report to the leader is scattered, fragmented, and lacking focus.

  The main contents of data analysis are: data thinking , business knowledge , EXCEL , data visualization , SQL , statistics , python (here we use python3). In fact, what really determines the upper limit of a data analyst is his thinking ability and business level, and these tools are just simple applications .

Fourth, the structural level of data analysis

1. Collection of underlying data/product-side collection

  Data collection is abbreviated as buried point, which collects user data on the web, product, and client terminals, as well as third-party external data. It should be noted here that the original data here is generated by user behavior.

2. What kind of data is needed for data business/products

  Transform the collected data into understandable, quantifiable, and observable business indicators. Mere data is a bunch of isolated numbers, without any meaning, only when it is connected with our daily business will it exert greater value. This process is the process from raw data to processed data.

3. Data decision and execution/how to make the product better

  When insights are gained from data, they need to be transformed into strategies. This also includes the analysis process, execution includes not only the specification of strategies, but also optimization and improvement. This is continuous. This process is to convert visual data/information into data decision-making.

4. Data models/products begin to operate automatically and systematically

  This is to make strategies into data applications and products, when you gain insight into the laws contained in the data. For example, what kind of user likes, what kind of product will be purchased, and what kind of activity is better, we need to make these questions into a mind map or system.

5. Data strategy/guide the future

  This is our last part and the most important. When we have accumulated a lot of data, a lot of scale, and a lot of data applications, the company-level data system has taken shape. It is not just data analysis, but data should be realized. This process is the transformation of data tools to data systems and strategies.
  The following are general framework diagram

  tools for data analysis . Each tool is single, and we should combine them. For example, the combination of mysql and python, and the combination of Excel and sql is a good data analysis library. The specific combination is as follows:

  Generally, for data analysts, the most important thing is: business analysis ability. Since business is the core competitiveness, it is a result-oriented ability. The next step is to exercise our data analysis capabilities. Here we need to exercise our data acumen, statistical knowledge, etc., but this process is a long process. The last is the use of tools. Although these tools are very useful, they are only a tool to assist us in drawing conclusions in the workplace. Therefore, data analysts tend to develop toward business or data analysis capabilities. Therefore, in the process of learning data analysis, we must pay attention to the cultivation of data analysis thinking and the training of data acuity rather than the use of data tools.

to sum up

  From the beginning of this article, we began to introduce data analysis. In recent years, data analysis has been quite popular. This technology requires us to be proficient in statistics, mysql, Excel, python, and machine learning. This article focuses on introducing the concepts, types, methods of data analysis and why you should learn data analysis and the structure of data analysis. The next article will give you a brief introduction to data analysis thinking, data analysis business, Excel, data visualization, sql database, statistics and python. Let everyone grasp the content of data analysis and learning from a macro perspective. Through these articles, I hope to give you an overall understanding of data analysis. Therefore, the next article will introduce you to data analysis thinking. Life is endless and struggle is endless. We work hard every day, study hard, constantly improve our abilities, and believe that we will learn something. Come on! ! !

Guess you like

Origin blog.csdn.net/Oliverfly1/article/details/108816045