Big Data Course H2 - Realization of TELECOM's Telecom Traffic Project

E-mail of the author of the article: [email protected] Address: Huizhou, Guangdong

 ▲ Purpose of this chapter

⚪ Understand the data collection of the TELECOM project;

⚪ Understand the data cleaning of TELECOM project;

⚪ Understand the data export of TELECOM project;

⚪ Understand the data visualization of the TELECOM project;

⚪ Learn about other aspects of the TELECOM project;

1. Data collection

1. In the actual production environment, the telecom traffic log is definitely not only generated on one server, but every server will generate traffic logs. So at this time, it is necessary to build the fan-in flow model of Flume first, and then transfer the collected data to HDFS for storage.

2. Steps:

a. Create corresponding directories on the second and third servers for storing logs (take the second and third servers as the servers where the logs are generated).

cd /home

mkdir telecomlog

b. Enter the corresponding directory, upload or download the log to the specified directory (in the actual process, the log must be generated in real time).

cd telecomlog/

# The download address of the cloud host

 wget http://bj-yzjd.ufile.cn-north-02.ucloud.cn/103_20150615143630_00_00_000_2.csv

c. Collect logs on the second and third servers, and transfer the collected logs to the first server for data fan-in.

cd /home/software/apache-flume-1.9.0-bin/data

#edit file

vim telecomlog.conf

#Add the following to the file

a1.sources = s1

a1.channels = c1

a1.sinks = k1

# The log is placed in the specified directory

# So monitor the changes in the specified directory at this time

# If a new file is created in the directory

# Need to collect the contents of this new file

a1.sources.s1.type = spooldir

# Specify the directory to listen to

a1.sources.s1.spoolDir = /home/telecomlog

# Configure Channels

a1.channels.c1.type = memory

a1.channels.c1.capacity = 10000

a1.channels.c1.transactionCapacity = 1000

# Need to send the collected data to the first server

a1.sinks.k1.type = avro

a1.sinks.k1.hostname = hadoop01

a1.sinks.k1.port = 8090

# bind

a1.sources.s1.channels = c1

a1.sinks.k1.channel = c1

d. After the data is collected on the first server, the collected data needs to be written to HDFS.

cd /home/software/apache-flume-1.9.0-bin/data/

#edit file

vim telecomlog.conf

#Add the following to the file

a1.sources = s1

a1.channels = c1

a1.sinks = k1

# Need to receive the data transmitted by the second and third server

a1.sources.s1.type = avro

a1.sources.s1.bind = 0.0.0.0

a1.sources.s1.port = 8090

# Need to add a timestamp to the data

a1.sources.s1.interceptors = i1

a1.sources.s1.interceptors.i1.type = timestamp

# Configure Channels

a1.channels.c1.type = memory

a1.channels.c1.capacity = 10000

a1.channels.c1.transactionCapacity = 1000

# Configure Sink

# Need to write data to HDFS, it is best to be able to store data on a daily basis

a1.sinks.k1.type = hdfs

# Specify the storage path of the data on HDFS

a1.sinks.k1.hdfs.path = hdfs://hadoop01:9000/telecomlog/reporttime=%Y-%m-%d

# Specify the storage type of the file on HDFS

a1.sinks.k1.hdfs.fileType = DataStream

# Specify the scrolling interval of the file

a1.sinks.k1.hdfs.rollInterval = 3600

a1.sinks.k1.hdfs.rollSize = 0

a1.sinks.k1.hdfs.rollCount = 0

# bind

a1.sources.s1.channels = c1

a1.sinks.k1.channel = c1

e. Start HDFS.

start-dfs.sh

f. Start Flume on the first server.

../bin/flume-ng agent -n a1 -c ../conf -f telecomlog.conf -

Dflume.root.logger=INFO,console

g. Start Flume on the second and third servers.

../bin/flume-ng agent -n a1 -c ../conf -f telecomlog.conf -

Dflume.root.logger=INFO,console

2. Data cleaning

1. Use Flume to collect data on HDFS, then you need to create a table in Hive to manage the original data.

# start YARN

start-yarn.sh

#Enter the lib directory of the HBase installation directory

cd /home/software/hbase-2.4.2/lib

# enter the subdirectory

cd client-facing-thirdparty/

#double naming

mv commons-logging-1.2.jar commons-logging-1.2.bak

mv log4j-1.2.17.jar log4j-1.2.17.bak

mv slf4j-log4j12-1.7.30.jar slf4j-log4j12-1.7.30.bak

#Start the Hive service process

hive --service metastore &

hive --service hiveserver2 &

#Enter the hive client

hive

# create library

create database telecom;

# use this library

Guess you like

Origin blog.csdn.net/u013955758/article/details/132032424