Rice pocket began to send books to friends, activities for 21 days

Watch meters pocket Java.md

压抑了这么久了,是时候来一啵活动了。

  In order to thank the support of friends we have always been, and today is both an editor rice pocket to get the support of the fans is, will be presented from a friend written in the book "Hive data warehouse enterprise applications." This book really recommend my friends to read.

The book donation event rules:

The first stage: 11 2 - November 8, from click to see the article in the circle of friends of friends and forwarded to extract one, remember to add the late author group Oh, entity postal bags home.

The second stage: 11 2 - November 15, forwarding this article from friends, friends of friends punctuate most liked to extract one, remember to add the late author group Oh, entity postal bags home. [If the thumbs than or equal to 50 extra 5 friends, the extracted two friends]

The third stage: 11 8 - November 22, will draw two friends, remember to add the late author group Oh, entity postal bags home. Activities rule, announced after the completion of the first stage.

file

  Mobile Internet, e-commerce, social networks greatly expand the boundaries and application of the Internet, we are in an era of explosive growth of data, large amounts of data on the human ability to control the data raised new challenges and opportunities. At this point, it proposed the idea of ​​big data, big data refers to data that is more than the traditional database system capacity. And the size of its data transmission speed is high, or a structure is not suitable for the original database system. But we value them in order to get it, you have to use some techniques to deal with it. Big data analytics and cloud computing often linked because the real-time analysis of large data sets, like the MapReduce framework needs to be calculated in different racks or even a computer assigned to work in different data centers. It is because of MR Hadoop framework so that people can deal with TB-level data.

file
file
file
file

  As the above cartoons, due to the MapReduce framework only professional developers can use, SQL people there is no way to use this framework to deal with some data, so people invented Hive components, Apache Hive is a top-level project, its underlying computing MR engine is (off-line computing framework) or Tez (DAG computing framework based on the Hadoop YARN), one can use a simple class SQL statements can be run out of MR procedures, thereby to perform complex data processing, Hive let more and more people to handle big data, and make this complex work is no longer a very difficult thing. Hive is a bottom package of Hadoop data warehouse processing tools, using SQL-like language HiveQL data inquiry, all the data are stored in the Hive Hadoop compatible file system (e.g., Amazon S3, HDFS) in. Hive will not make any changes to the data in the data loading process, only to migrate data to the next set of HDFS directory. Hive design features are as follows,

  1. It supports the creation of indexes, data query optimization.
  2. Different types of memory, e.g., plain text files, Hbase file.
  3. Metadata stored in a relational database, significantly reduces the time to perform semantic checks in the query process.
  4. It can be used directly Hadoop data stored in the file system.
  5. Built-in functions UDF large number of users to manipulate time, strings and other data mining tools, enabling users to expand UDF function built-in functions to complete the operation can not be achieved.
  6. SQL-like query, the SQL query into a MapReduce job is executed on a Hadoop cluster.

  Finally, tell us about small series recently wrote a book called "Hive data warehouse enterprise applications."

  Before introducing the book, first of all to share with you little reason to write this book. I do not know what a lucky opportunity for teachers calf and electronic club find me. After promised to write a book, go back and have lost count of how many nights and weekends will continue to work overtime after the creation, in which I wrote this book of course, I deeply appreciate the shackles of their own ideas or words and expressions of subject. In persist more than a year's time, the electronic version of the paper version of proofreading and proofreading countless, ever since the content and expression of the problem, the draft is returned many times, and once wanted to give up, but I believe it stick You will get the desired results. Just after the fast lasted about two years, this book finally drew to a close, to see the results you want.

  Having said that, we look at the content of this book, the main starting point of this book is to allow more people to learn Hive, the real and the principle of combining. Let's talk about the most basic grammar,
actual combat is divided into 24 chapters in order to explain, from the most basic to the syntax HQL HQL optimization, and finally the real case, the whole process all of the code and contains the actual operating results.

  The key principle is part of the final source code analysis Hive, so that readers can follow Xiaobian to understand the mechanism of operation Hive, to facilitate a better understanding of our operating mechanism of the Hive. More importantly, we can learn a reason why such a component to be optimistic about its secret lies.

  SUMMARY book comprising substantially as follows:

------------------------------------------------- Hive introduction chapter --------------------------------------

  • Chapter 1 Hive basics: Hadoop, Hive Overview
  • Chapter 2 Hive Configuration

------------------------------------------------- Hive Beginners --------------------------------------

  • Basic operation of Chapter 3 Hive
  • Chapter 4 HiveQL: data definitions (the database and table)
  • Chapter 5 HiveQL data manipulation
  • Chapter 6 HiveQL: query (select, where, group by, join, order by, and sort by, distribute by, cluster by, Hive type conversion, sampling inquiry, UNION ALL)
  • Chapter 7 HiveQL: View
  • Chapter 8 HiveQL: Index
  • Chapter 9 mode design

------------------------------------------------- Hive Advanced article -------------------------------------

  • Chapter 10 Tuning
  • Chapter 11 other file formats and compression methods
  • Chapter 12 Development
  • Chapter 13 function (discovery and description, calls the polymerization, the table generating function)
  • Chapter 14 Streaming (description and coding, using distributed memory)
  • Chapter 15 Customizing Hive file and record format (SequenceFile, RCFile, CSV and TSV SerDe)

------------------------------------------------- Hive strengthening articles --------------------------------------

  • Chapter 16 HCatalog (description, command line, architecture)
  • Chapter 17 Hive and Oozie integration (Oozie Profile, Oozie a variety of operations, Oozie Coordinator use)
  • Chapter 18 Hive and Amazon Web Services System (AWS)
  • Chapter 19 stores a processing program and NoSQL (Storage Handler Background, HiveStorageHandler, Cassandra, DynamoDB)

------------------------------------------------- Hive actual articles --------------------------------------

  • Battle Chapter 20 Hive analysis of large data
  • Chapter 21 Hive ad log data development
  • Chapter 22 Hive electricity supplier data development
  • Chapter 23 Hive data analysis and scheduling regular tasks
  • Chapter 24 Hive TV viewership statistics Project Development

----------------------------------------------- Hive source code analysis articles ---------------------------------

  • Chapter 25 Hive source analysis (SemanticAnalyzer, MapRedTask, ExecDriver, source code analysis chart)

Innovations: Source Code section read from time environment to build, then every step of the function call will follow the small series of ideas, browse over Hive source. Real part of the project according to several enterprise-class real data and real scene to do, you can follow Xiao Bian Enterprise Hive scenario is kind of how.

Fitness crowd read: major source tend to want to learn, want to learn the process of enterprise development, and the great data of interest and want to learn the right.

Finally, "Hive data warehouse enterprise application" on small series to a book, interested friends can concern.

Research focused on big data, machine learning, cloud computing, and interested partners can sweep small code exchange, with the formation of technical exchange group, after the book was published, one by one will be mailed to the lucky friends. The following is the author's personal micro-channel and technical exchange group, Everyone is invited to join.

Technical exchange group On the personal micro-letter
file file

@END

Welcome concern meters pocket Java, in a note to share and exchange learning Java platform.

file

Guess you like

Origin www.cnblogs.com/midoujava/p/11785621.html