What is?
Hive, based hadoop data warehouse for processing structured data. Suitable for OLAP. Storing a data structure (schema) in the database, the data processing into the hdfs.
MR lot of heavy task, in order to simplify the effort to write the MR, hive provides a framework that can translate a similar sql query into the MR program, and then submit the job to a query on hdfs; to query the table on which it hdfs ? This is another feature hive provided by the hdfs of a file, mapped into a hive table, the job is to process the file on hdfs at query time.
Then the mapping table and the relationship between the hive hdfs how to record it? hive is used when performing the mapping table records the mapping relationship hive mysal hdfs tables and files (metadata stored using mysql)
hive is responsible for translating similar sql query into the MR program; mysql stored mapping relationship table to be queried and file; run these programs on HDFS HDFS;
Table inside the hive to establish a working mechanism: mysql record table definition, create a directory in hdfs, the directory to store data on it for data processing.
Databases and data warehouses:
Databases and data warehouses in essence is the same, it is based on a model to organize, manage data. Just two different usage scenarios, the more business database transaction processing (OLTP) database is operational, and data warehouses are more concerned about data analysis and processing (OLAP) is an analytic database, the database model of the two will be very different.
Database transactions generally pursuit of speed, integrity, data consistency, etc., database models comply with the paradigm model, to minimize data redundancy, to ensure referential integrity; and data warehouse stressed between the efficiency of data analysis, the speed of complex queries and data the correlation analysis on the database model, data warehouse prefer to use a multi-dimensional model to improve the efficiency of data analysis.
Discuss the difference between databases and data warehouses, in fact, it is in discussions OLTP and OLAP:
Data processing can be roughly divided into two categories: online transaction processing OLTP (on-line transaction processing) , online analytical processing OLAP (On-Line Analytical Processing) . OLTP is a traditional relational database major applications, mainly basic, routine transactions, such as banking transactions. OLAP data warehouse system is the main application that supports sophisticated analysis operations, focusing on decision support, and provides intuitive query results.
OLTP: online transaction processing. Responsible for the normal operation of basic services. Also known as transaction-oriented processing system, it is in the daily operation of the database online, usually for a small number of records, and modified for specific business. More concerned about problems of user operation response time, data security, integrity and concurrency support the number of users. Traditional database management system as the primary means of data, mainly for operational processing.
OLAP: Online Analytical Processing. Responsible for discovering the value of business generated at the time of data accumulation. General to analyze historical data for certain themes, support management decisions.
hive installation and deployment
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.1.8:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>admin</value> </property> </configuration>