Enterprise: use MySQL connector to synchronize MySQL data to Elasticsearch

The Elastic MySQL connector is a connector for MySQL data sources. It can help us synchronize the data in MySQL to Elasticsearch. In today's article, I will describe in detail how to achieve it step by step.

In the demonstration below, I will use Elastic Stack 8.8.2 for demonstration.

Seamless Integration: Connect Elasticsearch to MongoDB

Availability and prerequisites

This connector is available as a native connector in Elastic version 8.5.0 and later. To use this connector as a native connector, meet all Native Connector  requirements.

This connector is also available as a connector client for the Python connector framework. To use this connector as a connector client, meet all connector client requirements .

There are no prerequisites for this connector other than the shared requirements linked above.

usage

To use this connector as a native connector, use a connector workflow. See native connectors .

To use this connector as a connector client, see Connector clients and frameworks .

In the following demonstration, I will use the connector client to use it.

Install

Elasticsearch

We can refer to my previous article " How to install Elasticsearch on Linux, MacOS and Windows " to install Elasticsearch. In particular, we need to follow the Elastic Stack 8.x installation guide for installation.

In the Elasticsearch terminal output, find the elastic user's password and Kibana's registration token. These are printed when Elasticsearch is first started.

We note down this password and use it in the configuration below. At the same time it will also generate the corresponding certificate file:

$ pwd
/Users/liuxg/elastic/elasticsearch-8.8.2/config/certs
$ ls
http.p12      http_ca.crt   transport.p12

In order to facilitate the following configuration, we copy the http_ca.crt certificate to the following directory:

mkdir -p ~/connectors-python-config
cp http_ca.crt ~/connectors-python-config

Save the password, registration token, and certificate pathname. You'll need them in later steps. If you are not familiar with these operations, please refer to my previous article " Elastic Stack 8.0 Installation - Securing your Elastic Stack is now easier than ever ".

Install Kibana

We install Kibana next. We can refer to my previous article " How to install Kibana in the Elastic stack on Linux, MacOS and Windows " for our installation. Specifically, we need to install Kibana version 8.2. If you are not sure how to install Kibana 8.2, then please read my previous article " Elastic Stack 8.0 Installation - Securing your Elastic Stack is now easier than ever ". Before starting Kibana, we can modify the configuration file of Kibana as follows. Add the following sentences to config/kibana.yml:

config/kibana.yml

enterpriseSearch.host: http://localhost:3002

Then, we use the following command to start Kibana:

bin/kibana

We enter the address output above in the browser and enter the corresponding enrollment token to start Kibana.

Java installation

You need Java installed. version on  Java 8  or  Java 11 . We can refer to the link to find the required Java version.

Enterprise search installation

We find the version we need to download at the address Download Elastic Enterprise Search | Elastic . And follow the corresponding instructions on the page. If you want to install against your previous version, please refer to the address https://www.elastic.co/downloads/past-releases#app-search .

After we have downloaded the installation package of Enterprise Search, we can use the following command to decompress it:

$ pwd
/Users/liuxg/elastic
$ ls
elasticsearch-8.8.2                       kibana-8.8.2
elasticsearch-8.8.2-darwin-aarch64.tar.gz kibana-8.8.2-darwin-aarch64.tar.gz
enterprise-search-8.8.2.tar.gz
$ tar xzf enterprise-search-8.8.2.tar.gz 
$ cd enterprise-search-8.8.2
$ ls
LICENSE    NOTICE.txt README.md  bin        config     lib        metricbeat

As shown above, it contains a directory called config. Before we start Enterprise Search, we must do some corresponding configuration. We need to modify the config/enterprise-search.yml file. Add the following to this file:

config/enterprise-search.yml

allow_es_settings_modification: true
secret_management.encryption_keys: ['q3t6w9z$C&F)J@McQfTjWnZr4u7x!A%D']
elasticsearch.username: elastic
elasticsearch.password: "JUYrx8L3WOeG6zysQY2D"
elasticsearch.host: https://127.0.0.1:9200
elasticsearch.ssl.enabled: true
elasticsearch.ssl.certificate_authority: /Users/liuxg/elastic/elasticsearch-8.8.2/config/certs/http_ca.crt
kibana.external_url: http://localhost:5601

Above, note that elasticsearch.password is the password we generated during the Elasticsearch installation. elasticsearch.ssl.certificate_authority must be configured according to the certificate generated in your own Elasticsearch installation path. In the allocation above, we haven't configured secret_management.encryption_keys yet. We can use the above configuration to run first, and then let the system generate it for us. When configuring the password above, we need to add quotation marks. I found that including * characters in the password will give wrong information. We use the following command to start:

bin/enterprise-search

During the startup process, we can see the generated username and password information:

      username: enterprise_search
      password: r9kcpyb5x2g9dken

We note down this username and password. During startup, we can also see a generated secret_session_key:

We also copy it and add it to the configuration file:

allow_es_settings_modification: true
secret_management.encryption_keys: ['q3t6w9z$C&F)J@McQfTjWnZr4u7x!A%D'] 
elasticsearch.username: elastic
elasticsearch.password: "JUYrx8L3WOeG6zysQY2D"
elasticsearch.host: https://127.0.0.1:9200
elasticsearch.ssl.enabled: true
elasticsearch.ssl.certificate_authority: /Users/liuxg/elastic/elasticsearch-8.8.2/config/certs/http_ca.crt
kibana.external_url: http://localhost:5601

secret_session_key: 3a6d8ab8993a9818728eabd6513fd1c448263be6f5497c8d286bc8be05b87edffd95073582e3277f1e8fb8f753a3ab07a5749ce4394a16f69bdc4acb3d2826ae
feature_flag.elasticsearch_search_api: true

In order to enable us to use Elasticsearch search in App Search, we must set
feature_flag.elasticsearch_search_api: true. We restart enterprise search again:

./bin/enterprise-search 

After this boot, we will never see any configuration output again. In this way, our enterprise search is configured.

MySQL

For this tutorial, you need a source MySQL instance for Logstash to read from.  A free version of MySQL is available from the MySQL Community Server section of the MySQL Community Downloads site  . We can log in to MySQL with the following command:

mysql -u root -p

Above, we use root's password to log in. In my case the password is 1234 . After we log in, we run the following command:

CREATE DATABASE sample_db;
USE sample_db;

CREATE TABLE person (
    person_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255),
    age INT
);

CREATE TABLE address (
    address_id INT AUTO_INCREMENT PRIMARY KEY,
    address VARCHAR(255)
);

INSERT INTO person (name, age) VALUES ('Alice', 30);
INSERT INTO person (name, age) VALUES ('Bob', 25);
INSERT INTO person (name, age) VALUES ('Carol', 35);

INSERT INTO address (address) VALUES ('123 Elm St');
INSERT INTO address (address) VALUES ('456 Oak St');
INSERT INTO address (address) VALUES ('789 Pine St');

Above, we created the database sample_db, and also created two tables address and person.

 

Synchronize data to Elasticsearch

Step 1: Download the sample configuration file

Download a sample configuration file. You can download it manually or run the following command:

curl https://raw.githubusercontent.com/elastic/connectors-python/main/config.yml --output ~/connectors-python-config/config.yml

We can view the file:

$ pwd
/Users/liuxg/connectors-python-config
$ ls
config.yml  http_ca.crt

Remember to update the --output parameter value if your directory name is different, or if you want to use a different configuration file name.

Step 2: Update the configuration file of the self-management connector

Update the configuration file with the following settings to match your environment:

  • elasticsearch.host
  • elasticsearch.password
  • connector_id
  • service_type

Use mysql as the service_type value. Don't forget to uncomment mysql in the source section of the yaml file.

If you're running the connector service against a Dockerized version of Elasticsearch and Kibana, your configuration file will look like this:

elasticsearch:
  host: http://host.docker.internal:9200
  username: elastic
  password: <YOUR_PASSWORD>

connector_id: <CONNECTOR_ID_FROM_KIBANA>
service_type: mysql

sources:
  # UNCOMMENT "mysql" below to enable the MySQL connector

  #mongodb: connectors.sources.mongo:MongoDataSource
  #s3: connectors.sources.s3:S3DataSource
  #dir: connectors.sources.directory:DirectoryDataSource
  #mysql: connectors.sources.mysql:MySqlDataSource
  #network_drive: connectors.sources.network_drive:NASDataSource
  #google_cloud_storage: connectors.sources.google_cloud_storage:GoogleCloudStorageDataSource
  #azure_blob_storage: connectors.sources.azure_blob_storage:AzureBlobStorageDataSource
  #postgresql: connectors.sources.postgresql:PostgreSQLDataSource
  #oracle: connectors.sources.oracle:OracleDataSource
  #mssql: connectors.sources.mssql:MSSQLDataSource

Note that the config file you downloaded may contain more entries, so you will need to manually copy/change the settings that apply to you. Usually, you only need to update elasticsearch.host, elasticsearch.password, connector_id and service_type to run the connector service.

Let's get these configurations from the Kibana interface:

 

~/connectors-python-config/config.yml

elasticsearch:
  host: https://192.168.0.3:9200
  api_key: "OUkyM1E0a0JrWktfLVd2OTRPZkE6TmkxbUNuN3dROGlrT2cwWlNVaEZKQQ=="
  ca_certs: "/usr/share/certs/http_ca.crt"
  ssl: true
  bulk:
    queue_max_size: 1024
    queue_max_mem_size: 25
    display_every: 100
    chunk_size: 1000
    max_concurrency: 5
    chunk_max_mem_size: 5
    concurrent_downloads: 10
  request_timeout: 120
  max_wait_duration: 120
  initial_backoff_duration: 1
  backoff_multiplier: 2
  log_level: info

service:
  idling: 30
  heartbeat: 300
  max_errors: 20
  max_errors_span: 600
  max_concurrent_content_syncs: 1
  max_concurrent_access_control_syncs: 1
  job_cleanup_interval: 300
  log_level: INFO

connector_id: '8423Q4kBkZK_-Wv9z-en'
service_type: 'mysql'

sources:
  # mongodb: connectors.sources.mongo:MongoDataSource
  # s3: connectors.sources.s3:S3DataSource
  # dir: connectors.sources.directory:DirectoryDataSource
  mysql: connectors.sources.mysql:MySqlDataSource
  # network_drive: connectors.sources.network_drive:NASDataSource
  # google_cloud_storage: connectors.sources.google_cloud_storage:GoogleCloudStorageDataSource
  # google_drive: connectors.sources.google_drive:GoogleDriveDataSource
  # azure_blob_storage: connectors.sources.azure_blob_storage:AzureBlobStorageDataSource
  # postgresql: connectors.sources.postgresql:PostgreSQLDataSource
  # oracle: connectors.sources.oracle:OracleDataSource
  # sharepoint_server: connectors.sources.sharepoint_server:SharepointServerDataSource
  # mssql: connectors.sources.mssql:MSSQLDataSource
  # jira: connectors.sources.jira:JiraDataSource
  # confluence: connectors.sources.confluence:ConfluenceDataSource
  # dropbox: connectors.sources.dropbox:DropboxDataSource
  # servicenow: connectors.sources.servicenow:ServiceNowDataSource
  # sharepoint_online: connectors.sources.sharepoint_online:SharepointOnlineDataSource
  # github: connectors.sources.github:GitHubDataSource

Above, note that:

  • host is the access address of Elasticsearch

  • api_key is the API key used to access Elasticsearch. This is not needed if you use a username and password combination

  • ca_certs are the certificates used to access Elasticsearch. This is for self-managed Elasticsearch clusters

  • sevice_type must be mysql

  • The connector_id is generated in the configuration above. used to identify the connector

Step 3: Run the Docker image

docker run \
-v ~/connectors-python-config:/config \
--volume="$PWD/http_ca.crt:/usr/share/certs/http_ca.crt:ro" \
--network "elastic" \
--tty \
--rm \
docker.elastic.co/enterprise-search/elastic-connectors:8.8.2.0-SNAPSHOT \
/app/bin/elastic-ingest \
-c /config/config.yml

After running the above command, we return to the Kibana interface again:

Next let's configure MySQL. Since our connector client is running in a docker container, our MySQL can only be accessed at localhost:3306. The code in the container has no way to access the external localhost address. For this, I refer to the previous article " Kibana: Create a webhook alert - Elastic Stack 8.2 ". Run the following command:

bore local 3306 --to bore.pub

In this way, MySQL can be accessed by a public network address bore.pub:3332. We then use this address for configuration:

We schedule the synchronization to occur at 00:00 UTC every day. Of course, we can also choose to synchronize from time to time. Click Save:

We click on Sync above:

To verify that it can correctly sync new documents, we add a new document to MySQL:

We manually sync again in Kibana:

 

For some reason, during testing, I found that in the latest connector release, it has Sync rules, but the version I am running does not. It needs to be available in the latest release, but there is a bug in the way snapshot works.

We can synchronize the data we need through Sync rule, such as:

[
  {
    "tables": [
      "person"
    ],
    "query": "SELECT * FROM sample_db.person LIMIT 1;"
  },
  {
    "tables": [
      "address"
    ],
    "query": "SELECT * FROM sample_db.address LIMIT 1;"
  }
]

 

In this way, when synchronizing, it will only synchronize one piece of data in address and person. 

 

Similarly, we can define the WHERE query as follows:

[
  {
    "tables": ["person"],
    "query": "SELECT * FROM sample_db.person WHERE sample_db.person.age > 25;"
  }
]

 It will only sync documents from persons whose age is greater than 25.

View synced documents in Kibana

We can find the index by the following method:

GET _cat/indices

We can view its documentation with the following command:

GET search-mysql/_search

Install MySQL using Docker

Above, we use the local machine to install MySQL. In the actual test, we can use Docker to install MySQL more conveniently:

docker run --name mysql_container -p 3306:3306 -e MYSQL_ROOT_PASSWORD=changeme -e MYSQL_USER=elastic -e MYSQL_PASSWORD=changeme -d mysql:latest

Grant user permissions:

docker exec -it mysql_container mysql -u root -p
GRANT ALL PRIVILEGES ON sample_db.* TO 'elastic'@'%';
FLUSH PRIVILEGES;

Create database and tables:

CREATE DATABASE sample_db;
USE sample_db;

CREATE TABLE person (
    person_id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255),
    age INT
);

CREATE TABLE address (
    address_id INT AUTO_INCREMENT PRIMARY KEY,
    address VARCHAR(255)
);

INSERT INTO person (name, age) VALUES ('Alice', 30);
INSERT INTO person (name, age) VALUES ('Bob', 25);
INSERT INTO person (name, age) VALUES ('Carol', 35);

INSERT INTO address (address) VALUES ('123 Elm St');
INSERT INTO address (address) VALUES ('456 Oak St');
INSERT INTO address (address) VALUES ('789 Pine St');

When configuring, we can refer to the following content to configure:

Summarize 

In this article, we describe in great detail how to use the MySQL connector to synchronize MySQL and Elasticsearch indexes. It is very convenient to use. If you are familiar with Logstash, please refer to the " Database Data Synchronization " chapter in my previous article " Elastic: A Developer's Guide " . We can also use Pipeline to clean the data. This will not be shown. 

Guess you like

Origin blog.csdn.net/UbuntuTouch/article/details/131658774