hadoop ecological --Hive (1) - Hive entry

What is?

Hive, based hadoop data warehouse for processing structured data. Suitable for OLAP. Storing a data structure (schema) in the database, the data processing into the hdfs. 

MR lot of heavy task, in order to simplify the effort to write the MR, hive provides a framework that can translate a similar sql query into the MR program, and then submit the job to a query on hdfs; to query the table on which it hdfs ? This is another feature hive provided by the hdfs of a file, mapped into a hive table, the job is to process the file on hdfs at query time.

Then the mapping table and the relationship between the hive hdfs how to record it? hive is used when performing the mapping table records the mapping relationship hive mysal hdfs tables and files (metadata stored using mysql)

 

hive is responsible for translating similar sql query into the MR program; mysql stored mapping relationship table to be queried and file; run these programs on HDFS HDFS;

Table inside the hive to establish a working mechanism: mysql record table definition, create a directory in hdfs, the directory to store data on it for data processing.

Databases and data warehouses:

Databases and data warehouses in essence is the same, it is based on a model to organize, manage data. Just two different usage scenarios, the more business database transaction processing (OLTP) database is operational, and data warehouses are more concerned about data analysis and processing (OLAP) is an analytic database, the database model of the two will be very different.

Database transactions generally pursuit of speed, integrity, data consistency, etc., database models comply with the paradigm model, to minimize data redundancy, to ensure referential integrity; and data warehouse stressed between the efficiency of data analysis, the speed of complex queries and data the correlation analysis on the database model, data warehouse prefer to use a multi-dimensional model to improve the efficiency of data analysis.

 Discuss the difference between databases and data warehouses, in fact, it is in discussions OLTP and OLAP:

 Data processing can be roughly divided into two categories: online transaction processing OLTP (on-line transaction processing) , online analytical processing OLAP (On-Line Analytical Processing) . OLTP is a traditional relational database major applications, mainly basic, routine transactions, such as banking transactions. OLAP data warehouse system is the main application that supports sophisticated analysis operations, focusing on decision support, and provides intuitive query results. 

OLTP: online transaction processing. Responsible for the normal operation of basic services. Also known as transaction-oriented processing system, it is in the daily operation of the database online, usually for a small number of records, and modified for specific business. More concerned about problems of user operation response time, data security, integrity and concurrency support the number of users. Traditional database management system as the primary means of data, mainly for operational processing.

OLAP: Online Analytical Processing. Responsible for discovering the value of business generated at the time of data accumulation. General to analyze historical data for certain themes, support management decisions.


 hive installation and deployment

What needs to be deployed at first consider before you deploy, where to deploy.
Deploy what? hive, hdfs (store data), MySQL (storing metadata). Hive installed on the machine need to have hadoop, this machine can access mysql (remote or local)
Where to deploy? hive of hdfs concerned, is a client, it can be installed in any position, and can only communicate hdfs can not necessarily be on hdp cluster.
 Here are the steps:
0, hadoop installation and mysql
Installation hadoop: hadoop cluster structures, and mounted on the mounting hadoop hive machine as the client; Installation mysql: can be mounted on a machine may communicate machine hive.
1, hive download, unzip
2, increase in hive-site.xml $ HIVE_HOME / conf
Mysql configuration store metadata information, including the address database, the user name and password. content:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
         <value>jdbc:mysql://192.168.1.8:3306/hive?createDatabaseIfNotExist=true</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>admin</value>
    </property>
</configuration>
View Code
3, copy dependencies, into the lib
mysql-connector-java-5.1.37.jar notice maven official website to download the corresponding version of your mysql jar package
4, the configuration and hivehome hadoophome
Edit the configuration file to add content
vim /etc/profile   
Validate the configuration
source /etc/profile
5, some in favor of the use of Configuration
Linux in the current directory, create a file .hiverc, add content
[root@centos7 ~]# vim .hiverc
set hive.cli.print.header=true;
set hive.cli.print.current.db=true;
More than two display header setting display library currently in use when the query results are set

 

Guess you like

Origin www.cnblogs.com/Jing-Wang/p/10990886.html