Big data classic project case - Didi data analysis (cloud server - zero foundation from configuration to project implementation 1)

This time, our project uses Alibaba Cloud server and the following technologies and framework protocols for data analysis:

  1. HDFS
  2. Hive
  3. Spark SQL
  4. Zeppelin

Of course, we can also use the data cleaned by the database, using

1.Tableau

2.Python+echarts+web front end

3. Tencent Cloud, Alibaba Cloud BI report

4. Of course, we can also use Excel pivot tables and pivot charts to make

First, the configuration of the virtual machine in the cloud server

1.Hadoop configuration

Refer to the following blog to configure the hadoop pseudo-distributed environment under centos7.2 of Aliyun server.

Building hadoop pseudo - distributed environment under centos7.2 of Alibaba cloud server .net/feng_zhiyu/article/details/81018869?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522168397280916800192226753%2522%252C%2522scm%2522%2 53A%252220140713.130102334..%2522%257D&request_id=168397280916800192226753&biz_id=0&utm_medium=distribute. pc_search_result.none-task-blog-2~all~baidu_landing_v2~default-3-81018869-null-null.142%5Ev87%5Econtrol_2,239%5Ev2%5Einsert_chatgpt&utm_term=%E4%BA%91%E6%9C%8D%E5 %8A%A1%E5%99%A8centos7%E5%AE%89%E8%A3%85hadoop&spm=1018.2226.3001.4187 encountered a problem:

You can also learn from zero foundation! Hadoop pseudo-distributed cluster installation and configuration practice_mb634aa19ba764f's technical blog_51CTO blog can also learn with zero foundation! Hadoop pseudo-distributed cluster installation and configuration practice, you can learn with zero foundation! Hadoop pseudo-distributed cluster installation and configuration practice. This article aims to teach a zero-based beginner how to build a Hadoop pseudo-distributed cluster through practical demonstrations. The article first introduces the concept and working principle of Hadoop, and elaborates the structure and function of Hadoop cluster in detail. Then, the article introduces the installation and configuration method of Hadoop pseudo-distributed cluster in detail, including the installation of operating system, configuration of Java environment variables, initialization of Hadoop file system and other steps. Through the study of this article, readers can not only easily master the construction and configuration methods of Hadoop pseudo-distributed clusters. https://blog.51cto.com/u_15831056/6237232

Be sure to pay attention: java configuration environment! ! !

Prevent hadoop from finding java!

2. Database MySQL configuration

Refer to the following blog

Detailed steps to install MySQL on Centos7 Command yum -y install wget1.1.2 Online download mysql installation package wget https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm1.1.3 Install MySQLrpm -ivh mysql57-community-release -el7-8.noar https://blog.csdn.net/Bb15070047748/article/details/106245223?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522168399621816800182715943% 2522%252C%2522scm%2522%253A%252220140713.130102334 ..%2522%257D&request_id=168399621816800182715943&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~top_positive~default-2-106245223-null-null.142^v87^ control_2,239^v2^insert_chatgpt&utm_term=centos %E5%AE%89%E8%A3%85mysql&spm=1018.2226.3001.4187

 

Use Navicat to connect to the MySQL database on the Alibaba Cloud server /127234913?ops_request_misc=&request_id=&biz_id=102&utm_term=%E4%BA%91%E6%9C%8D%E5%8A%A1%E5%99%A8%E4%B8%AD%E7%9A%84MySQL%E9% 80%9A%E8%BF%87Navicat%E8%BF%9E%E6%8E%A5&utm_medium=distribute.pc_search_result.none-task-blog-2~all~sobaiduweb~default-0-127234913.142^v87^control_2,239^ v2^insert_chatgpt&spm=1018.2226.3001.4187

3. Install hive configuration

Install Hive3.1.2+MySQL5.7 on CentOS7_Install hive client_Zheng Xiangxiang's blog-CSDN blog article directory 1. Install Hive1.1 Unzip the hive installation package 1.2 Configure environment variables 1.3 Resolve log Jar package conflicts 1.4 Initialize metabase 2 Start hive2.1 Start HDFS, Yarn, historyserver2.2 Start hive2.3 Check the hive startup log 3 Install MySQL3.1 Exit the hive client 3.2 Copy the JDBC driver 1. Install Hive1.1 Unzip the hive installation package Unzip the hive installation package to the specified directory tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt/module/Modify the name to https://blog.csdn.net/qq_51490070/article/details/123718952?ops_request_misc=&request_id=&biz_id= 102 & UTM_TERM = CentOS%E5%AE%89%E8%A3%85hive & UTM_Medium = Distribute.pc_search_result.none-Task-BLOG-2 ~ Sobaiduweb ~ DEFAULT-3-12 3718952.NONECASE & SPM = 1018.2226.3001.4187 Because each person's file configuration Not the same, everyone must be clear about their files! ! ! Where is it configured

In the local directory

wget https://mirrors.aliyun.com/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz

Unzip the tar package to /usr/local/the directory and rename it to hive:

tar -zxvf apache-hive-x.y.z-bin.tar.gz
mv apache-hive-x.y.z-bin hive

Configure environment variables

Add the following to ~/.bashrcthe or :/etc/bashrc

export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin

Make changes take effect immediately:

source ~/.bashrc

At this point, the installation and configuration of Hive is complete.

Configure environment variables: You can add the following environment variables in /etc/profilethe file :

export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin
  • source /etc/profile
    

     Then initialize the metabase

  • Configure the Hive Metabase: Hive uses a metabase to store metadata information. You can use the following commands to create a MySQL database and authorize Hive users to use the database:

  • mysql -u root -p
    create database metastore;
    grant all privileges on metastore.* to 'hive'@'localhost' identified by 'your_password';
    

    Then, properties such as , and hive-site.xmlin the Hive configuration file need to be set as MySQL connection information.javax.jdo.option.ConnectionURLjavax.jdo.option.ConnectionUserNamejavax.jdo.option.ConnectionPassword

  • Start Hive: Hive can be started with the following command:

hive

If all went well, you should be able to see Hive's command line interface and be able to execute Hive SQL commands.

4.hive connects to the database

 

 

Guess you like

Origin blog.csdn.net/m0_62338174/article/details/130660664