The Elastic MySQL connector is a connector for MySQL data sources. It can help us synchronize the data in MySQL to Elasticsearch. In today's article, I will describe in detail how to achieve it step by step.
In the demonstration below, I will use Elastic Stack 8.8.2 for demonstration.
Seamless Integration: Connect Elasticsearch to MongoDB
Availability and prerequisites
This connector is available as a native connector in Elastic version 8.5.0 and later. To use this connector as a native connector, meet all Native Connector requirements.
This connector is also available as a connector client for the Python connector framework. To use this connector as a connector client, meet all connector client requirements .
There are no prerequisites for this connector other than the shared requirements linked above.
usage
To use this connector as a native connector, use a connector workflow. See native connectors .
To use this connector as a connector client, see Connector clients and frameworks .
In the following demonstration, I will use the connector client to use it.
Install
Elasticsearch
We can refer to my previous article " How to install Elasticsearch on Linux, MacOS and Windows " to install Elasticsearch. In particular, we need to follow the Elastic Stack 8.x installation guide for installation.
In the Elasticsearch terminal output, find the elastic user's password and Kibana's registration token. These are printed when Elasticsearch is first started.
We note down this password and use it in the configuration below. At the same time it will also generate the corresponding certificate file:
$ pwd
/Users/liuxg/elastic/elasticsearch-8.8.2/config/certs
$ ls
http.p12 http_ca.crt transport.p12
In order to facilitate the following configuration, we copy the http_ca.crt certificate to the following directory:
mkdir -p ~/connectors-python-config
cp http_ca.crt ~/connectors-python-config
Save the password, registration token, and certificate pathname. You'll need them in later steps. If you are not familiar with these operations, please refer to my previous article " Elastic Stack 8.0 Installation - Securing your Elastic Stack is now easier than ever ".
Install Kibana
We install Kibana next. We can refer to my previous article " How to install Kibana in the Elastic stack on Linux, MacOS and Windows " for our installation. Specifically, we need to install Kibana version 8.2. If you are not sure how to install Kibana 8.2, then please read my previous article " Elastic Stack 8.0 Installation - Securing your Elastic Stack is now easier than ever ". Before starting Kibana, we can modify the configuration file of Kibana as follows. Add the following sentences to config/kibana.yml:
config/kibana.yml
enterpriseSearch.host: http://localhost:3002
Then, we use the following command to start Kibana:
bin/kibana
We enter the address output above in the browser and enter the corresponding enrollment token to start Kibana.
Java installation
You need Java installed. version on Java 8 or Java 11 . We can refer to the link to find the required Java version.
Enterprise search installation
We find the version we need to download at the address Download Elastic Enterprise Search | Elastic . And follow the corresponding instructions on the page. If you want to install against your previous version, please refer to the address https://www.elastic.co/downloads/past-releases#app-search .
After we have downloaded the installation package of Enterprise Search, we can use the following command to decompress it:
$ pwd
/Users/liuxg/elastic
$ ls
elasticsearch-8.8.2 kibana-8.8.2
elasticsearch-8.8.2-darwin-aarch64.tar.gz kibana-8.8.2-darwin-aarch64.tar.gz
enterprise-search-8.8.2.tar.gz
$ tar xzf enterprise-search-8.8.2.tar.gz
$ cd enterprise-search-8.8.2
$ ls
LICENSE NOTICE.txt README.md bin config lib metricbeat
As shown above, it contains a directory called config. Before we start Enterprise Search, we must do some corresponding configuration. We need to modify the config/enterprise-search.yml file. Add the following to this file:
config/enterprise-search.yml
allow_es_settings_modification: true
secret_management.encryption_keys: ['q3t6w9z$C&F)J@McQfTjWnZr4u7x!A%D']
elasticsearch.username: elastic
elasticsearch.password: "JUYrx8L3WOeG6zysQY2D"
elasticsearch.host: https://127.0.0.1:9200
elasticsearch.ssl.enabled: true
elasticsearch.ssl.certificate_authority: /Users/liuxg/elastic/elasticsearch-8.8.2/config/certs/http_ca.crt
kibana.external_url: http://localhost:5601
Above, note that elasticsearch.password is the password we generated during the Elasticsearch installation. elasticsearch.ssl.certificate_authority must be configured according to the certificate generated in your own Elasticsearch installation path. In the allocation above, we haven't configured secret_management.encryption_keys yet. We can use the above configuration to run first, and then let the system generate it for us. When configuring the password above, we need to add quotation marks. I found that including * characters in the password will give wrong information. We use the following command to start:
bin/enterprise-search
During the startup process, we can see the generated username and password information:
username: enterprise_search
password: r9kcpyb5x2g9dken
We note down this username and password. During startup, we can also see a generated secret_session_key:
We also copy it and add it to the configuration file:
allow_es_settings_modification: true
secret_management.encryption_keys: ['q3t6w9z$C&F)J@McQfTjWnZr4u7x!A%D']
elasticsearch.username: elastic
elasticsearch.password: "JUYrx8L3WOeG6zysQY2D"
elasticsearch.host: https://127.0.0.1:9200
elasticsearch.ssl.enabled: true
elasticsearch.ssl.certificate_authority: /Users/liuxg/elastic/elasticsearch-8.8.2/config/certs/http_ca.crt
kibana.external_url: http://localhost:5601
secret_session_key: 3a6d8ab8993a9818728eabd6513fd1c448263be6f5497c8d286bc8be05b87edffd95073582e3277f1e8fb8f753a3ab07a5749ce4394a16f69bdc4acb3d2826ae
feature_flag.elasticsearch_search_api: true
In order to enable us to use Elasticsearch search in App Search, we must set
feature_flag.elasticsearch_search_api: true. We restart enterprise search again:
./bin/enterprise-search
After this boot, we will never see any configuration output again. In this way, our enterprise search is configured.
MySQL
For this tutorial, you need a source MySQL instance for Logstash to read from. A free version of MySQL is available from the MySQL Community Server section of the MySQL Community Downloads site . We can log in to MySQL with the following command:
mysql -u root -p
Above, we use root's password to log in. In my case the password is 1234 . After we log in, we run the following command:
CREATE DATABASE sample_db;
USE sample_db;
CREATE TABLE person (
person_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255),
age INT
);
CREATE TABLE address (
address_id INT AUTO_INCREMENT PRIMARY KEY,
address VARCHAR(255)
);
INSERT INTO person (name, age) VALUES ('Alice', 30);
INSERT INTO person (name, age) VALUES ('Bob', 25);
INSERT INTO person (name, age) VALUES ('Carol', 35);
INSERT INTO address (address) VALUES ('123 Elm St');
INSERT INTO address (address) VALUES ('456 Oak St');
INSERT INTO address (address) VALUES ('789 Pine St');
Above, we created the database sample_db, and also created two tables address and person.
Synchronize data to Elasticsearch
Step 1: Download the sample configuration file
Download a sample configuration file. You can download it manually or run the following command:
curl https://raw.githubusercontent.com/elastic/connectors-python/main/config.yml --output ~/connectors-python-config/config.yml
We can view the file:
$ pwd
/Users/liuxg/connectors-python-config
$ ls
config.yml http_ca.crt
Remember to update the --output parameter value if your directory name is different, or if you want to use a different configuration file name.
Step 2: Update the configuration file of the self-management connector
Update the configuration file with the following settings to match your environment:
elasticsearch.host
elasticsearch.password
connector_id
service_type
Use mysql as the service_type value. Don't forget to uncomment mysql in the source section of the yaml file.
If you're running the connector service against a Dockerized version of Elasticsearch and Kibana, your configuration file will look like this:
elasticsearch:
host: http://host.docker.internal:9200
username: elastic
password: <YOUR_PASSWORD>
connector_id: <CONNECTOR_ID_FROM_KIBANA>
service_type: mysql
sources:
# UNCOMMENT "mysql" below to enable the MySQL connector
#mongodb: connectors.sources.mongo:MongoDataSource
#s3: connectors.sources.s3:S3DataSource
#dir: connectors.sources.directory:DirectoryDataSource
#mysql: connectors.sources.mysql:MySqlDataSource
#network_drive: connectors.sources.network_drive:NASDataSource
#google_cloud_storage: connectors.sources.google_cloud_storage:GoogleCloudStorageDataSource
#azure_blob_storage: connectors.sources.azure_blob_storage:AzureBlobStorageDataSource
#postgresql: connectors.sources.postgresql:PostgreSQLDataSource
#oracle: connectors.sources.oracle:OracleDataSource
#mssql: connectors.sources.mssql:MSSQLDataSource
Note that the config file you downloaded may contain more entries, so you will need to manually copy/change the settings that apply to you. Usually, you only need to update elasticsearch.host, elasticsearch.password, connector_id and service_type to run the connector service.
Let's get these configurations from the Kibana interface:
~/connectors-python-config/config.yml
elasticsearch:
host: https://192.168.0.3:9200
api_key: "OUkyM1E0a0JrWktfLVd2OTRPZkE6TmkxbUNuN3dROGlrT2cwWlNVaEZKQQ=="
ca_certs: "/usr/share/certs/http_ca.crt"
ssl: true
bulk:
queue_max_size: 1024
queue_max_mem_size: 25
display_every: 100
chunk_size: 1000
max_concurrency: 5
chunk_max_mem_size: 5
concurrent_downloads: 10
request_timeout: 120
max_wait_duration: 120
initial_backoff_duration: 1
backoff_multiplier: 2
log_level: info
service:
idling: 30
heartbeat: 300
max_errors: 20
max_errors_span: 600
max_concurrent_content_syncs: 1
max_concurrent_access_control_syncs: 1
job_cleanup_interval: 300
log_level: INFO
connector_id: '8423Q4kBkZK_-Wv9z-en'
service_type: 'mysql'
sources:
# mongodb: connectors.sources.mongo:MongoDataSource
# s3: connectors.sources.s3:S3DataSource
# dir: connectors.sources.directory:DirectoryDataSource
mysql: connectors.sources.mysql:MySqlDataSource
# network_drive: connectors.sources.network_drive:NASDataSource
# google_cloud_storage: connectors.sources.google_cloud_storage:GoogleCloudStorageDataSource
# google_drive: connectors.sources.google_drive:GoogleDriveDataSource
# azure_blob_storage: connectors.sources.azure_blob_storage:AzureBlobStorageDataSource
# postgresql: connectors.sources.postgresql:PostgreSQLDataSource
# oracle: connectors.sources.oracle:OracleDataSource
# sharepoint_server: connectors.sources.sharepoint_server:SharepointServerDataSource
# mssql: connectors.sources.mssql:MSSQLDataSource
# jira: connectors.sources.jira:JiraDataSource
# confluence: connectors.sources.confluence:ConfluenceDataSource
# dropbox: connectors.sources.dropbox:DropboxDataSource
# servicenow: connectors.sources.servicenow:ServiceNowDataSource
# sharepoint_online: connectors.sources.sharepoint_online:SharepointOnlineDataSource
# github: connectors.sources.github:GitHubDataSource
Above, note that:
-
host is the access address of Elasticsearch
-
api_key is the API key used to access Elasticsearch. This is not needed if you use a username and password combination
-
ca_certs are the certificates used to access Elasticsearch. This is for self-managed Elasticsearch clusters
-
sevice_type must be mysql
-
The connector_id is generated in the configuration above. used to identify the connector
Step 3: Run the Docker image
docker run \
-v ~/connectors-python-config:/config \
--volume="$PWD/http_ca.crt:/usr/share/certs/http_ca.crt:ro" \
--network "elastic" \
--tty \
--rm \
docker.elastic.co/enterprise-search/elastic-connectors:8.8.2.0-SNAPSHOT \
/app/bin/elastic-ingest \
-c /config/config.yml
After running the above command, we return to the Kibana interface again:
Next let's configure MySQL. Since our connector client is running in a docker container, our MySQL can only be accessed at localhost:3306. The code in the container has no way to access the external localhost address. For this, I refer to the previous article " Kibana: Create a webhook alert - Elastic Stack 8.2 ". Run the following command:
bore local 3306 --to bore.pub
In this way, MySQL can be accessed by a public network address bore.pub:3332. We then use this address for configuration:
We schedule the synchronization to occur at 00:00 UTC every day. Of course, we can also choose to synchronize from time to time. Click Save:
We click on Sync above:
To verify that it can correctly sync new documents, we add a new document to MySQL:
We manually sync again in Kibana:
For some reason, during testing, I found that in the latest connector release, it has Sync rules, but the version I am running does not. It needs to be available in the latest release, but there is a bug in the way snapshot works.
We can synchronize the data we need through Sync rule, such as:
[
{
"tables": [
"person"
],
"query": "SELECT * FROM sample_db.person LIMIT 1;"
},
{
"tables": [
"address"
],
"query": "SELECT * FROM sample_db.address LIMIT 1;"
}
]
In this way, when synchronizing, it will only synchronize one piece of data in address and person.
Similarly, we can define the WHERE query as follows:
[
{
"tables": ["person"],
"query": "SELECT * FROM sample_db.person WHERE sample_db.person.age > 25;"
}
]
It will only sync documents from persons whose age is greater than 25.
View synced documents in Kibana
We can find the index by the following method:
GET _cat/indices
We can view its documentation with the following command:
GET search-mysql/_search
Install MySQL using Docker
Above, we use the local machine to install MySQL. In the actual test, we can use Docker to install MySQL more conveniently:
docker run --name mysql_container -p 3306:3306 -e MYSQL_ROOT_PASSWORD=changeme -e MYSQL_USER=elastic -e MYSQL_PASSWORD=changeme -d mysql:latest
Grant user permissions:
docker exec -it mysql_container mysql -u root -p
GRANT ALL PRIVILEGES ON sample_db.* TO 'elastic'@'%';
FLUSH PRIVILEGES;
Create database and tables:
CREATE DATABASE sample_db;
USE sample_db;
CREATE TABLE person (
person_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255),
age INT
);
CREATE TABLE address (
address_id INT AUTO_INCREMENT PRIMARY KEY,
address VARCHAR(255)
);
INSERT INTO person (name, age) VALUES ('Alice', 30);
INSERT INTO person (name, age) VALUES ('Bob', 25);
INSERT INTO person (name, age) VALUES ('Carol', 35);
INSERT INTO address (address) VALUES ('123 Elm St');
INSERT INTO address (address) VALUES ('456 Oak St');
INSERT INTO address (address) VALUES ('789 Pine St');
When configuring, we can refer to the following content to configure:
Summarize
In this article, we describe in great detail how to use the MySQL connector to synchronize MySQL and Elasticsearch indexes. It is very convenient to use. If you are familiar with Logstash, please refer to the " Database Data Synchronization " chapter in my previous article " Elastic: A Developer's Guide " . We can also use Pipeline to clean the data. This will not be shown.