Streamsets use mysql to achieve real-time synchronization of data streams hbase

The download process streamsets :( I recommend downloading the full version)
https://blog.csdn.net/yao09605/article/details/104098797
streamsets the whole process of synchronization is codeless,
I introduced the whole process, and I met pit and volleyball pit method

After downloading the usual, we directly extract, into srv directory, then change the permissions, edit, .bashrcadd a STREAMSETS_HOME
here to write about simple

$ tar -xvf streamsets-datacollector-all-3.13.0.tgz
$ sudo cp streamsets-datacollector-3.13.0 /srv/
$ sudo chown -R hadoop:hadoop /srv/streamsets-datacollector-3.13.0
$ sudo ln -s /srv/streamsets-datacollector-3.13.0 /srv/streamsets
$ vim ~/.bashrc
# .bashrc 增加
export STREAMSETS_HOME=/srv/streamsets
export PATH=$PATH:$STREAMSETS_HOME/bin

$ source ~/.bashrc

Start streamsets

$ streamsets dc

At this point you may encounter an error:

Java 1.8 detected; adding $SDC_JAVA8_OPTS of “-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Djdk.nio.maxCachedBufferSize=262144” to $SDC_JAVA_OPTS
Configuration of maximum open file limit is too low: 1024 (expected at least 32768). Please consult https://goo.gl/6dmjXd

This is because the system default to open a file at the same time the number 1024 is too small, we need to set
the online search several methods, it did not seem a
way I'm using to enter the root

$ su
root@yaochenli-VirtualBox:/etc# ulimit -n 63553
root@yaochenli-VirtualBox:/etc# su hadoop

Switch back to the original user will be able to successfully open the service streamsets dc

In the browser, enter: http: // localhost: 18630 /
this is streamsets management page
after entering the user name and password are admin

After entering can create a new pipeline, I have here a new well up
Here Insert Picture Descriptionhere to create a new tutorial demonstrates
Here Insert Picture Descriptionselect origin is your original system
we have chosen the mysql binlog
Here Insert Picture Descriptionthen there will be two options, one is to select the middle handle piece or select the target system, the middle member may do some screening, then the target system is a guide to where you want the data.
We look at this example I do well:
Here Insert Picture Descriptionthen the focus is to configure the three components of the
first, mysql
This page is no focus, just fill in
Here Insert Picture Descriptionhere and you can not pay attention to serverID inside mysql's my.cnf file server_id repeat, otherwise time will run error
Here Insert Picture Descriptionwhere attention must be root, mysql root user is no initial password, if you do not know how to set a password look at this link to the root user https://blog.csdn.net/yao09605/article/details/104101433
Here Insert Picture Description here set your monitor to indicate that the format is [DATABASE]. [TABLE], the remaining parameter settings as needed.
Here Insert Picture DescriptionWe middleware without any treatment, there is not a display configuration.
Here is hbase configuration:
to note here is that CDH version is the same version hbase client to your hbase service, inconsistent reports an error, which is why I recommend download the full version, so you are very simple test several option rather than repeat to download.
Here Insert Picture DescriptionThe point to note here is the source system data format to be written in the form / Data / XXX's, column format of the target column family is [:] If there is no column name column name, column family also after colon.
Other people look at that say hello in the prompt bar.
Here Insert Picture DescriptionHere Insert Picture DescriptionAfter configuration ready here, we need to build the table on both sides of the system.

 CREATE TABLE `STOCK` (
  `time` text,
  `price` double DEFAULT NULL,
  `change` double DEFAULT NULL,
  `volume` bigint(20) DEFAULT NULL,
  `amount` bigint(20) DEFAULT NULL,
  `type` text
) 

首先把hadoop和hbase启动起来

$ start-dfs.sh
$ start-yarn.sh
$ start-hbase.sh
$ jps

如果你看到下面这些服务在运行,那么就没有问题了

10400 Jps
9600 HRegionServer
3812 SecondaryNameNode
3557 DataNode
4007 ResourceManager
4168 NodeManager
4618 BootstrapMain
9468 HMaster
9405 HQuorumPeer
3390 NameNode

在hbase那边也建表:

# 有可能不对,仅供参考
create 'stock','time','amount','change','price','type','volume'

两边建表好了之后就可以测试一下通路了
点击右上角的小眼睛
Here Insert Picture Description可能遇到的问题呢除了刚刚说的用户必须是root之外,你有可能没有打开binlog,如何查看呢
进入mysql的shell

mysql> show variables like 'log_bin%';
+---------------------------------+-----------------------------+
| Variable_name                   | Value                       |
+---------------------------------+-----------------------------+
| log_bin                         | ON                          |
| log_bin_basename                | /var/log/mysql/binlog       |
| log_bin_index                   | /var/log/mysql/binlog.index |
| log_bin_trust_function_creators | OFF                         |
| log_bin_use_v1_row_events       | OFF                         |
+---------------------------------+-----------------------------+
5 rows in set (0.01 sec)

看下自己的log_bin是否打开了
如果没有打开,那么打开他的方法如下:

$ sudo vim /etc/mysql/my.cnf
my.cnf

!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/mysql.conf.d/
#bin log
[mysqld]
log_bin=mysql_bin
binlog_format=ROW
server_id=100
log-bin=/var/log/mysql/binlog

增加上面的配置后,重启服务,或者你的电脑。
再检查一下应该就打开了
你的这个binlog文件就在/var/log/mysql目录下。

root@yaochenli-VirtualBox:/var/log/mysql# ls
binlog         binlog.000002  binlog.index  error.log.1.gz  error.log.3.gz  error.log.5.gz  error.log.7.gz
binlog.000001  binlog.000003  error.log     error.log.2.gz  error.log.4.gz  error.log.6.gz  product-bin.index

如果还是提示没有binlog的话,就往mysql里面插点东西,生成个文件。

应该没有别的坑了,我们点击run启动服务。然后,我们写个小程序往mysql里面插入数据:

#股票分笔数据
import tushare as ts
import pymysql
from sqlalchemy import create_engine
df = ts.get_tick_data('600848',date='2018-12-12',src='tt')
engine = create_engine('mysql+pymysql://hive:[email protected]:3306/STOCK?charset=utf8mb4')
con = engine.connect()
# 这里ubuntu下面给mysql输入中文有点问题,最后就放弃中文了。
df = df.replace('买盘',0)
df = df.replace('卖盘',1)
df = df.replace('中性盘',2)
df.to_sql(name='STOCK', con=con, if_exists='append', index=False)

运行之后我们可以看到仪表盘有数据了,
然后去hbase检查
hbase(main):001:0> scan ‘stock’
可以看到数据就可以了。

Published 78 original articles · won praise 7 · views 10000 +

Guess you like

Origin blog.csdn.net/yao09605/article/details/104115334
Recommended