Hive explanation of big data introductory video course

  Big data introductory video courses, learning big data technology seems to be the pursuit of many people, after all, employment prospects and employment salary are in front of us. What we need to know is that to learn big data technology, we must first learn Hadoop, and hive, the data warehouse in Hadoop technology, is the top priority, so today, in addition to sharing the introductory video course on big data, I also compiled some information about Technical knowledge of hive.

  


  Hadoop is the core technology in the era of big data, but hadoop and mapreduce operations are too professional, so Facebook developed the hive framework based on these. After all, there are more people in the world who can know SQL than Java. It can be said that it is a breakthrough in learning hadoop-related technologies. Those who are independent in the development of hadoop technology can start with hive.

  In a relational database, the loading mode of a table is determined by force during data loading (the loading mode of a table refers to the file format in which the database stores data). If the loaded data does not conform to the schema when loading data, the relational database will refuse to load the data , this is called "write-time mode", and the write-time mode will check and verify the data mode when the data is loaded. Hive is different from relational databases when loading data. Hive does not check the data when loading data, nor does it change the loaded data file. The operation of checking the data format is performed during the query operation. This mode is called "" read mode". In practical applications, the write-time mode indexes the columns and compresses the data when loading data, so the speed of loading data is very slow, but when the data is loaded, when we query the data, the speed is very fast. However, when our data is unstructured and the storage mode is unknown, the scenario of relational data operation is much more troublesome, and hive will play its advantages at this time.

  An important feature of a relational database is that it can update and delete data in a row or certain rows. Hive does not support operations on a specific row. Hive's operations on data only support overwriting original data and appending data. Hive also does not support transactions and indexes. Updates, transactions and indexes are all features of relational databases. These hives do not support and do not intend to support them. The reason is that hive is designed to process massive data, and the scanning of all data is normal, and the efficiency of operating on some specific data. It is very poor. For update operations, hive transforms the data of the original table through query and stores it in the new table, which is very different from the update operation of traditional databases.

  Hive can also make its own contribution to hadoop's real-time query, that is, integrating with hbase, hbase can perform fast query, but hbase does not support SQL-like statements, then hive can provide hbase with a shell for sql parsing , you can use SQL-like statements to operate the hbase database.

  Today, I will share my big data introductory video tutorial courses and techniques here. For more exciting content, please pay attention to the editor!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325986386&siteId=291194637