The download process streamsets :( I recommend downloading the full version)
https://blog.csdn.net/yao09605/article/details/104098797
streamsets the whole process of synchronization is codeless,
I introduced the whole process, and I met pit and volleyball pit method
After downloading the usual, we directly extract, into srv directory, then change the permissions, edit, .bashrc
add a STREAMSETS_HOME
here to write about simple
$ tar -xvf streamsets-datacollector-all-3.13.0.tgz
$ sudo cp streamsets-datacollector-3.13.0 /srv/
$ sudo chown -R hadoop:hadoop /srv/streamsets-datacollector-3.13.0
$ sudo ln -s /srv/streamsets-datacollector-3.13.0 /srv/streamsets
$ vim ~/.bashrc
# .bashrc 增加
export STREAMSETS_HOME=/srv/streamsets
export PATH=$PATH:$STREAMSETS_HOME/bin
$ source ~/.bashrc
Start streamsets
$ streamsets dc
At this point you may encounter an error:
Java 1.8 detected; adding $SDC_JAVA8_OPTS of “-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Djdk.nio.maxCachedBufferSize=262144” to $SDC_JAVA_OPTS
Configuration of maximum open file limit is too low: 1024 (expected at least 32768). Please consult https://goo.gl/6dmjXd
This is because the system default to open a file at the same time the number 1024 is too small, we need to set
the online search several methods, it did not seem a
way I'm using to enter the root
$ su
root@yaochenli-VirtualBox:/etc# ulimit -n 63553
root@yaochenli-VirtualBox:/etc# su hadoop
Switch back to the original user will be able to successfully open the service streamsets dc
In the browser, enter: http: // localhost: 18630 /
this is streamsets management page
after entering the user name and password are admin
After entering can create a new pipeline, I have here a new well up
here to create a new tutorial demonstrates
select origin is your original system
we have chosen the mysql binlog
then there will be two options, one is to select the middle handle piece or select the target system, the middle member may do some screening, then the target system is a guide to where you want the data.
We look at this example I do well:
then the focus is to configure the three components of the
first, mysql
This page is no focus, just fill in
here and you can not pay attention to serverID inside mysql's my.cnf file server_id repeat, otherwise time will run error
where attention must be root, mysql root user is no initial password, if you do not know how to set a password look at this link to the root user https://blog.csdn.net/yao09605/article/details/104101433
here set your monitor to indicate that the format is [DATABASE]. [TABLE], the remaining parameter settings as needed.
We middleware without any treatment, there is not a display configuration.
Here is hbase configuration:
to note here is that CDH version is the same version hbase client to your hbase service, inconsistent reports an error, which is why I recommend download the full version, so you are very simple test several option rather than repeat to download.
The point to note here is the source system data format to be written in the form / Data / XXX's, column format of the target column family is [:] If there is no column name column name, column family also after colon.
Other people look at that say hello in the prompt bar.
After configuration ready here, we need to build the table on both sides of the system.
CREATE TABLE `STOCK` (
`time` text,
`price` double DEFAULT NULL,
`change` double DEFAULT NULL,
`volume` bigint(20) DEFAULT NULL,
`amount` bigint(20) DEFAULT NULL,
`type` text
)
首先把hadoop和hbase启动起来
$ start-dfs.sh
$ start-yarn.sh
$ start-hbase.sh
$ jps
如果你看到下面这些服务在运行,那么就没有问题了
10400 Jps
9600 HRegionServer
3812 SecondaryNameNode
3557 DataNode
4007 ResourceManager
4168 NodeManager
4618 BootstrapMain
9468 HMaster
9405 HQuorumPeer
3390 NameNode
在hbase那边也建表:
# 有可能不对,仅供参考
create 'stock','time','amount','change','price','type','volume'
两边建表好了之后就可以测试一下通路了
点击右上角的小眼睛
可能遇到的问题呢除了刚刚说的用户必须是root之外,你有可能没有打开binlog,如何查看呢
进入mysql的shell
mysql> show variables like 'log_bin%';
+---------------------------------+-----------------------------+
| Variable_name | Value |
+---------------------------------+-----------------------------+
| log_bin | ON |
| log_bin_basename | /var/log/mysql/binlog |
| log_bin_index | /var/log/mysql/binlog.index |
| log_bin_trust_function_creators | OFF |
| log_bin_use_v1_row_events | OFF |
+---------------------------------+-----------------------------+
5 rows in set (0.01 sec)
看下自己的log_bin是否打开了
如果没有打开,那么打开他的方法如下:
$ sudo vim /etc/mysql/my.cnf
my.cnf
!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/mysql.conf.d/
#bin log
[mysqld]
log_bin=mysql_bin
binlog_format=ROW
server_id=100
log-bin=/var/log/mysql/binlog
增加上面的配置后,重启服务,或者你的电脑。
再检查一下应该就打开了
你的这个binlog文件就在/var/log/mysql目录下。
root@yaochenli-VirtualBox:/var/log/mysql# ls
binlog binlog.000002 binlog.index error.log.1.gz error.log.3.gz error.log.5.gz error.log.7.gz
binlog.000001 binlog.000003 error.log error.log.2.gz error.log.4.gz error.log.6.gz product-bin.index
如果还是提示没有binlog的话,就往mysql里面插点东西,生成个文件。
应该没有别的坑了,我们点击run启动服务。然后,我们写个小程序往mysql里面插入数据:
#股票分笔数据
import tushare as ts
import pymysql
from sqlalchemy import create_engine
df = ts.get_tick_data('600848',date='2018-12-12',src='tt')
engine = create_engine('mysql+pymysql://hive:[email protected]:3306/STOCK?charset=utf8mb4')
con = engine.connect()
# 这里ubuntu下面给mysql输入中文有点问题,最后就放弃中文了。
df = df.replace('买盘',0)
df = df.replace('卖盘',1)
df = df.replace('中性盘',2)
df.to_sql(name='STOCK', con=con, if_exists='append', index=False)
运行之后我们可以看到仪表盘有数据了,
然后去hbase检查
hbase(main):001:0> scan ‘stock’
可以看到数据就可以了。