Wu Yuxiong - natural born HADOOP performing experiments study notes: hbase microblogging Case

Purpose

Familiar methods hbase table design

Familiar javaAPI hbase of

By control of the logical view data API appreciated hbase

Learn server MVC design approach

Principle

  The last time we had a preliminary design student elective cases, the specific function is not perfect, but implementation has been designed above all in good form, call hbase existing API, this time we will achieve a slightly more complex business logic, similar to Sina Weibo project. Weibo is in fact a particularly large systems, optical memory database Redis there are several thousand clusters, daily visits and almost the highest traffic sites nationwide, such a complex cluster architecture is particularly complicated, we just realized most the basic simple micro-Bo, brush microblogging, attention and other functions.

1. Table Design
  Last time we designed the table, in the big data business, at the expense of space to save more copies to speed up access time is more common a practice. So first we need to have user tables, user tables rowkey user id, user information is a column family, the other two column family are fans and followers. Similarly there are also micro-Bo table, rowkey Weibo microblogging table is id, the first column family is a microblogging information, including content, time, comments, etc. At the same time, in order to facilitate the access we need to add an additional inbox table, rowkey inbox table for the user id, column family can add your own based on business needs, such as adding a user sent a microblogging, microblogging users receive , users receive comments like this table just to speed up access, which the data is saved somewhere else, we can use the time to record multiple versions hbase messages. For example, each user receives it, and we can set the maximum time for the 1000 version, so that each user will be able to receive the nearest one thousand messages, this one thousand messages are placed in a cell within a cell.

2. The business logic
  table is the logic of realization (attention after a good design hbase column of the table and column families can dynamically add, but rowkey difficult to modify, so the design should pay special attention when rowkey design), our main business logic to achieve the following: attention, microblogging, microblogging brush, view fan list.
  The first is to create three tables, three user table column family: user information and your fans, followers, microblogging table a column family: tweets. Group Inbox two-column table: incoming and outgoing. Then create a user to simulate user registration.
  First, attention. Time to focus on the need to add a record in the corresponding table fans and concern form (actual business also need to focus on human microblogging put in the inbox).
  Second, micro-Bo. Microblogging need to add a record in his outbox and fans inbox.
  Third, brush microblogging. Brush microblogging just need to find the inbox display.
  Fourth, view the fan list. Similarly, just need to find the corresponding column family displays.

Achieve 3.dao layer
   (the code in this experiment under weibo hongya under hellohadoop, it can serve as a reference)
  DAO database access layer objects, methods, and similar implementations on a project, but more due to the new project requirements and more complex, the need to add and modify several functions. Last code does not have access to each cell version of the method all the time, because of our inbox need this feature, you need to add. In addition, we create a table version default time is 5, and the inbox and outbox need to modify the release time of 1000, so it is necessary to rewrite method. See all versions of the method is time getColumnCells, corresponding to achieve the following:

    public List<String> searchAllVersion(String key,String cf,String column)  {

        List<String> cells = new ArrayList<>();

        Table ta = conn.getTable(table);

        Get get = new Get(key.getBytes());

        get.addColumn(cf.getBytes(),column.getBytes());

        Result result = ta.get(get);

        List<Cell> columnCells = result.getColumnCells(cf.getBytes(), column.getBytes());

        for (Cell cell : columnCells){

            cells.add(new String(cell.getValueArray(),"utf-8"));

        }

        return cells;

    }

Achieve 4.service layer
  as more business demand, so the service will be more complex, the code under which the actual production conditions and standards of writing is the same. The specific method of each service call need only specific logic dao can be combined.
  1) User registration:

public void register(String id,String name) throws IOException {

        userDao.insert(id,COLUMN_FAMILY_INFO,COLUMN_NAME,name);

}

  2) focus on a user, you need to add the corresponding record in their own interest and the corresponding user column family fans column family:

public void follow(String id,String friend) throws IOException {

        userDao.insert(id,COLUMN_FAMILY_FOLLOW,friend,friend);

        userDao.insert(friend,COLUMN_FAMILY_FANS,id,id);

    }

  3) micro-Bo record needs to be added in their Outbox inbox and fans, also need to add a record in the table Twitter:

public void send(String id,String message) {

        // current timestamp plus the user id as microblogging id

        String weiboId = id + System.currentTimeMillis();

        // microblog Weibo into the table

        weiboDao.insert(weiboId,WeiboService.COLUMN_FAMILY_INFO,WeiboService.COLUMN_USER,id);

        weiboDao.insert(weiboId,WeiboService.COLUMN_FAMILY_INFO,WeiboService.COLUMN_CONTENT,message);

        // The microblogging into the sender's Outbox

        mailDao.insert(id,MailService.COLUMN_FAMILY_OUTBOX,MailService.COLUMN_ID,weiboId);

        // put the microblogging fans inbox

        List<String> fans = userDao.scanFamily(id, COLUMN_FAMILY_FANS);

        for (String fan: fans){

            mailDao.insert(fan,MailService.COLUMN_FAMILY_INBOX,MailService.COLUMN_ID,weiboId);

        }

    }

  4) Brush only need to find micro-blog Twitter inbox table id, and then find the corresponding table content Twitter

public List<String> scan(String id) {

        List<String> ids = mailDao.searchAllVersion(id, MailService.COLUMN_FAMILY_INBOX, MailService.COLUMN_ID);

        List<String> messages = new ArrayList<>();

        for (String weiboId : ids){

            messages.add(weiboDao.search(weiboId,WeiboService.COLUMN_FAMILY_INFO,WeiboService.COLUMN_CONTENT));

        }

        return messages;

    }

lab environment

1. OS
  server: Linux_Centos
  manipulator: Windows_7
  server default username: root, password: 123456
  manipulator default user name: Hongya, Password: 123456
2. Experimental tool 1.Xshell
  

 

 

Xshell is a powerful secure terminal emulation software that supports SSH1, SSH2, and TELNET protocol Microsoft Windows platform. Xshell secure connection through the Internet to the remote host and its innovative design and features to help users enjoy their work in a complex network environments. It xshell be used to access a remote server under different systems in the Windows interface to achieve better control of the remote terminal. The experiments we used XShell5, its new features are:
  1. The effective protection of information security; Xshell support a variety of security features, such as SSH1 / SSH2 protocols, passwords, and user authentication methods DSA and RSA public key, and encryption various encryption algorithms all traffic. It is important to maintain the security of user data with security features built Xshell, because, like the traditional connection protocols such as Telnet and Rlogin makes it easy for users of network traffic by anyone who has knowledge of the network to steal. Xshell will help users protect data from hackers.
  2. The best end-user experience; frequently require the end user to use at any given time a plurality of terminal sessions, and an output terminal with a different host or Comparative send commands to the same set of different hosts. Xshell can solve these problems. There are also user-friendly features, such as labels environment, widely split window, synchronization input, and session management, user can save time to do other work.
  3. Instead of unsafe Telnet client; Xshell supports VT100, VT220, VT320, Xterm, Linux, Scoansi and ANSI terminal emulation and offers a variety of options to replace the traditional appearance of the terminal Telnet client.
  4. Xshell single screen multi-languages; Xshell in UTF-8 is the first use of the terminal in the same software. With Xshell, it may be displayed in multiple languages ​​on a screen, without switching coded in different languages. More and more companies need to use databases and applications UTF-8 format, there is a demand for support UTF-8 encoding terminal emulator is increasing. Xshell can help users deal with multi-language environment. 5. Support any X11 and TCP / IP applications secure connection; in SSH tunneling mechanism, Xshell supports port forwarding function, without modifying any program that can make all the TCP / IP applications share a secure connection.


  2.Hive

 

 

Hive is based on Hadoop data warehousing tools, you can map the structure of the data file to a database table, and provide a complete sql query function, you can convert the sql statement to run MapReduce tasks. The advantage is the low cost of learning, you can quickly achieve a simple MapReduce statistics by type of SQL statements, without having to develop specialized MapReduce applications, data warehouse is very suitable for statistical analysis. Another is the Windows registry file.
  Hive is based on Hadoop data warehouse infrastructure. It provides a range of tools that can be used for data extraction transformation loading (ETL), which is a store, query, and mechanisms for large-scale data stored in Hadoop Analysis. Hive defines a simple SQL-like query language called HQL, which allows users familiar with SQL to query data. At the same time, the language also allows developers familiar with MapReduce developer of self-mapper and reducer defined to handle complex analysis built-in mapper and reducer can not be completed. Hive no specific data format. Hive can work well in the Thrift above, the control separator, also allows the user to specify the data format.


  .Hadoop

 

 

Hadoop implements a distributed file system (Hadoop Distributed File System), referred to HDFS. HDFS fault tolerant characteristic, and designed to be deployed on low (low-cost) hardware; and it provides a high throughput (high throughput) to access the application data, for those (large data sets with large data set) applications. HDFS relaxed requirements (relax) POSIX, and can access data (streaming access) file system in the form of a stream. The core design of Hadoop framework is: HDFS and MapReduce. HDFS provides storage of vast amounts of data, MapReduce is provided for the calculation of the mass of data

.Hbase

 

 

HBase is a distributed, column-oriented open source database, the technology comes from Google Fay Chang papers written by "Bigtable: a distributed storage system structured data." Like Bigtable advantage of the distributed data storage Google File System (File System) provided by the same, HBase provides Bigtable similar ability on top of Hadoop. HBase is a subproject of the Apache Hadoop project. HBase Unlike relational database, it is adapted to a database of unstructured data store. HBase is a different model based not on the line of the column.
  HBase-Hadoop Database, is a high-reliability, high performance, column-oriented, scalable, distributed storage system, using technology erected HBase mass storage cluster configuration on cheap PC Server.


  5.IntelliJ IDEA

 

 

IDEA全称IntelliJ IDEA,是java语言开发的集成环境,IntelliJ在业界被公认为最好的java开发工具之一,尤其在智能代码助手、代码自动提示、重构、J2EE支持、Ant、JUnit、CVS整合、代码审查、创新的GUI设计等方面的功能可以说是超常的。IDEA是JetBrains公司的产品,这家公司总部位于捷克共和国的首都布拉格,开发人员以严谨著称的东欧程序员为主。

步骤1:实验环境介绍

  实验中已经安装了hbase的集群(可能是伪分布式或者完全分布式),基于这个集群进行实验。
  首先打开IDEA,按照以前的方式新建项目,添加jar包依赖,本次实验的代码已经放在hellohadoop的com.hongya的course下,实验中我们以这个代码为例进行学习。然后打开xshell,连上集群后,按照前面的学习内容,依次启动zookeeper、hadoop、hbase。
  1.1编辑操作机本地hosts文件(C:\Windows\System32\drivers\etc下)。

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/tszr/p/12169215.html