HBase applications

 

 

Too many column family's influence
Each MemoryStore allocated less memory, leading to excessive merger, the performance impact
 

Several column family it is appropriate

Recommended is: 1-3
The principle of classification column family:
1, whether a data format similar
2, whether a similar type of access
Example 1: RowKey the same, there is a great need to store text data, there is a need to store picture data
For large text data we certainly want it to Compress after storage
The picture data yet, we do not want him to compressed storage, because for this kind of binary data compression does not save space
So, we can these two data into two column family is stored
create 'table',{NAME => 't', COMPRESSION => 'SNAPPY'},
{NAME => 'p'}
 

Several column family it is appropriate

Example 2: There is a hbase table, you need to store each user's information (such as name, age, etc.) and the user information visit the Web site every day
For user information is not changed frequently, and less
For user access information is constantly changing sites every day and a large amount of data
If the information in these two words in the same column family, increase user access to the site daily information data will lead to memory store the flush, then will lead to compaction, because compaction is a column family level, so will every user information (such as name, age, etc.) and the user information visit the Web site are merged into a file every day
 
 
In fact, the user's information is not, and does not change often, compaction is not necessary to have each user's information is written to disk, resulting in a waste of resources
So can the user's information and user information into a day visit the site to store two column family
 
 

Table Schema Design

1, the size of each region in a 10 to 50G
2, each table in the control regions 50-100
3, each of the control table 1-3 column family
4, each column family named best be short, because the column family is stored in a data file
 
 

Design a RowKey

 
Length principles:
Rowkey is generally recommended length of 10-100 bytes, but the shorter the better recommendations
1, data persistence HFile file is stored in accordance with keyvalue, if rowkey too long, such as 100 bytes, 10 million light Rowkey data will occupy 100 * 10 million = 1 billion bytes, nearly 1G data, which will greatly affect the storage efficiency of HFile
 
2, MemStore part of the data to the cache memory, if the effective utilization of Rowkey field is too long memory will be reduced, the system will not cache more data, which will reduce the search efficiency. Thus Rowkey byte length as short as possible.
 
3, there are 64-bit operating system is system memory 8 byte alignment. If rowkey is an integer multiple of 8 bytes, then use the best features of the operating system.
 

RowKey design two

Characteristics: rowkey is stored lexicographically
Similar rowkey will be stored in the same Region in
For example, our rowkey is the site domain name, as follows:
www.apache.org
mail.apache.org
jira.apache.org
 
 
 
The reverse domain name as rowkey better point, then, is as follows:
org.apache.www
org.apache.mail
org.apache.jira
 
 

RowKey design three

Because rowkey is lexicographically storage, so if rowkey not good design, it will lead to:
Hotspotting: only large number of requests sent in a Region
Hotspotting of three methods to solve:
1, Salting ((Salt) like spread, salt)
create 'test_salt', 'f',SPLITS => ['b','c','d']
 
 
The original rowkey:
boo0001
boo0002
boo0003
boo0004
boo0005
boo0003
 
 
salting rowkey:
a-boo0001
b-boo0002
c-boo0003
d-boo0004
a-boo0005
d-boo0003
 
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;

public class KeySalter {
    private AtomicInteger index = new AtomicInteger(0);

    private String[] prefixes = {"a", "b", "c", "d"};

    public String getRowKey(String originalKey) {
        StringBuilder sb = new StringBuilder(prefixes[index.incrementAndGet() % 4]);
        sb.append("-").append(originalKey);
        return sb.toString();
    }

    public List<String> getAllRowKeys(String originalKey) {
        List<String> allKeys = new ArrayList<>();
        for (String prefix : prefixes) {
            StringBuilder sb = new StringBuilder(prefix);
            sb.append("-").append(originalKey);
            allKeys.add(sb.toString());
        }
        //a-boo0001
        //b-boo0001
        //c-boo0001
        //d-boo0001
        return allKeys;
    }
}

  

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class SaltingTest {
    public static void main(String[] args) throws IOException {
        Configuration config = HBaseConfiguration.create();

        try (Connection connection = ConnectionFactory.createConnection(config);
             Table table = connection.getTable(TableName.valueOf("test_salt"))) {

            KeySalter keySalter = new KeySalter();

            List<String> rowkeys = Arrays.asList("boo0001", "boo0002", "boo0003", "boo0004");
            List<Put> puts = new ArrayList<>();
            for (String key : rowkeys) {
                Put put = new Put(Bytes.toBytes(keySalter.getRowKey(key)));
                put.addColumn(Bytes.toBytes("f"), null, Bytes.toBytes("value" + key));
                puts.add(put);
            }
            table.put(puts);
        }
    }

}

  

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class SaltingGetter {
    public static void main(String[] args) throws IOException {
        Configuration config = HBaseConfiguration.create();

        try (Connection connection = ConnectionFactory.createConnection(config);
             Table table = connection.getTable(TableName.valueOf("test_salt"))) {
            KeySalter keySalter = new KeySalter();
            List<String> allKeys = keySalter.getAllRowKeys("boo0001");    //读取boo001
            List<Get> gets = new ArrayList<>();

            for (String key : allKeys) {
                Get get = new Get(Bytes.toBytes(key));
                gets.add(get);
            }

            Result[] results = table.get(gets);

            for (Result result : results) {
                if (result != null) {
                    //do something
                }
            }
        }
    }

}

  RowKey design three

2、Hashing
create 'test_hash', 'f', { NUMREGIONS => 4, SPLITALGO => 'HexStringSplit' }
The original rowkey:
boo0001
boo0002
boo0003
boo0004
 
md5 hash rowkey:
4b5cdf065e1ada3dbc8fb7a65f6850c4
b31e7da79decd47f0372a59dd6418ba4
d88bf133cf242e30e1b1ae69335d5812
f6f6457b333c93ed1e260dc5e22d8afa
 
import org.apache.hadoop.hbase.util.MD5Hash;


public class KeyHasher {

    public static String getRowKey(String originalKey) {
        return MD5Hash.getMD5AsHex(originalKey.getBytes());
    }

}

  

package com.twq.hbase.rowkey.hash;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class HashingTest {
    public static void main(String[] args) throws IOException {
        Configuration config = HBaseConfiguration.create();

        try (Connection connection = ConnectionFactory.createConnection(config);
             Table table = connection.getTable(TableName.valueOf("test_hash"))) {

            List<String> rowkeys = Arrays.asList("boo0001", "boo0002", "boo0003", "boo0004");
            List<Put> puts = new ArrayList<>();
            for (String key : rowkeys) {
                Put put = new Put(Bytes.toBytes(KeyHasher.getRowKey(key)));
                put.addColumn(Bytes.toBytes("f"), null, Bytes.toBytes("value" + key));
                puts.add(put);
            }
            table.put(puts);
        }
    }

}

  

import com.twq.hbase.rowkey.salt.KeySalter;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class HashingGetter {
    public static void main(String[] args) throws IOException {
        Configuration config = HBaseConfiguration.create();

        try (Connection connection = ConnectionFactory.createConnection(config);
             Table table = connection.getTable(TableName.valueOf("test_hash"))) {

            Get get = new Get(Bytes.toBytes(KeyHasher.getRowKey("boo0001")));

            Result results = table.get(get);

            // process result...
            for (Cell cell : results.listCells()) {
                System.out.println(Bytes.toString(CellUtil.cloneRow(cell)) + "===> " +
                        Bytes.toString(CellUtil.cloneFamily(cell)) + ":" +
                        Bytes.toString(CellUtil.cloneQualifier(cell)) + "{" +
                        Bytes.toString(CellUtil.cloneValue(cell)) + "}");
            }

        }
    }

}

  

RowKey design three

3, reverse rowkey
create 'test_reverse', 'f',SPLITS => ['0','1','2','3','4','5','6','7','8','9']
 
Timestamp type of rowkey:
1524536830360
1524536830362
1524536830376
 
 
Reverse rowkey:
0630386354251
2630386354251
6730386354251
 

 

 

 

 

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;

public class DataFilter {
    public static void main(String[] args) throws IOException {
        Configuration config = HBaseConfiguration.create();
        //Add any necessary configuration files (hbase-site.xml, core-site.xml)
        config.addResource(new Path("src/main/resources/hbase-site.xml"));
        config.addResource(new Path("src/main/resources/core-site.xml"));

        try(Connection connection = ConnectionFactory.createConnection(config)) {
            Table table = connection.getTable(TableName.valueOf("sound"));

            Scan scan = new Scan();

            scan.setStartRow(Bytes.toBytes("00000120120901"));
            scan.setStopRow(Bytes.toBytes("00000120121001"));

            SingleColumnValueFilter nameFilter = new SingleColumnValueFilter(Bytes.toBytes("f"), Bytes.toBytes("n"),
                    CompareFilter.CompareOp.EQUAL, new SubstringComparator("中国好声音"));

            SingleColumnValueFilter categoryFilter = new SingleColumnValueFilter(Bytes.toBytes("f"), Bytes.toBytes("c"),
                    CompareFilter.CompareOp.EQUAL, new SubstringComparator("综艺"));

            FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
            filterList.addFilter(nameFilter);
            filterList.addFilter(categoryFilter);

            scan.setFilter(filterList);

            ResultScanner rs = table.getScanner(scan);
            try {
                for (Result r = rs.next(); r != null; r = rs.next()) {
                    // process result...
                    for (Cell cell : r.listCells()) {
                        System.out.println(Bytes.toString(CellUtil.cloneRow(cell)) + "===> " +
                                Bytes.toString(CellUtil.cloneFamily(cell)) + ":" +
                                Bytes.toString(CellUtil.cloneQualifier(cell)) + "{" +
                                Bytes.toString(CellUtil.cloneValue(cell)) + "}");
                    }
                }
            } finally {
                rs.close();  // always close the ResultScanner!
            }
        }
    }
}

  

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;

/**
 * create 'sound',
 */
public class DataPrepare {
    public static void main(String[] args) throws IOException {
        InputStream ins = DataPrepare.class.getClassLoader().getResourceAsStream("sound.txt");
        BufferedReader br = new BufferedReader(new InputStreamReader(ins));

        List<SoundInfo> soundInfos = new ArrayList<>();
        String line = null;
        while ((line = br.readLine()) != null) {
            SoundInfo soundInfo = new SoundInfo();
            String[] arr = line.split("\\|");
            String rowkey = format(arr[4], 6) + arr[1] + format(arr[0], 6);
            soundInfo.setRowkey(rowkey);
            soundInfo.setName(arr[2]);
            soundInfo.setCategory(arr[3]);
            soundInfos.add(soundInfo);
        }

        Configuration config = HBaseConfiguration.create();
        //Add any necessary configuration files (hbase-site.xml, core-site.xml)
        config.addResource(new Path("src/main/resources/hbase-site.xml"));
        config.addResource(new Path("src/main/resources/core-site.xml"));

        try (Connection connection = ConnectionFactory.createConnection(config)) {
            Table table = connection.getTable(TableName.valueOf("sound"));
            List<Put> puts = new ArrayList<>();
            for (SoundInfo soundInfo : soundInfos) {
                Put put = new Put(Bytes.toBytes(soundInfo.getRowkey()));
                put.addColumn(Bytes.toBytes("f"), Bytes.toBytes("n"), Bytes.toBytes(soundInfo.getName()));
                put.addColumn(Bytes.toBytes("f"), Bytes.toBytes("c"), Bytes.toBytes(soundInfo.getCategory()));
                puts.add(put);
            }
            table.put(puts);
        }
    }


    public static String format(String str, int num) {
        return String.format("%0" + num + "d", Integer.parseInt(str));
    }
}

  

After creating a scan objects, we setStartRow (00000120120901), setStopRow (00000120120914).
Thus, when only the scan data scanned userID = 1, and the time range defined in this specified period of time, to meet the users and filter the results according to the time range. Since the central storage and recording, high performance.
Then SingleColumnValueFilter (org.apache.hadoop.hbase.filter.SingleColumnValueFilter), a total of four, upper and lower bound respectively of the name, category of the upper and lower limits. Meet prefix match press simultaneously by file name and category name.
(Note: Use SingleColumnValueFilter affect query performance, handling massive amounts of data in real time will consume a lot of resources, and take a long time)
If you need to add a PageFilter page can also limit the number of records returned.

Guess you like

Origin www.cnblogs.com/tesla-turing/p/11490889.html