Java code: pg import data to es (automatically create es index, batch data migration to es)

Please write a program based on Java, which can realize the following functions:
1. Connect to the specified pg database and list all the table information in the database;
2. Query the field information in each table in turn, according to these tables and their field information , create the corresponding index and its mapping in es;
3. Import the data in the pg table into the corresponding es index in turn;
4. Pay attention to the correspondence between the field type in the pg table and the field data type in es, such as in pg The bytea corresponds to the binary type in es;
5. Please complete it in a java program, and based on the maven project, a complete pom.xml is required;
6. It is only for all tables in the pg specified database, excluding indexes and other types of data;
the version of 7.es is 7.6.2, please provide the correct API.


The following is a program written in Java that connects to a specified PostgreSQL database and performs the described functions. Please save the program as a Maven project, and create a file named as in the project directory pom.xml, and copy the following code into it pom.xml.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example</groupId>
    <artifactId>pg-to-es</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.postgresql</groupId>
            <artifactId>postgresql</artifactId>
            <version>42.2.23</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>7.15.0</version>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-client</artifactId>
            <version>7.15.0</version>
        </dependency>
    </dependencies>
</project>

Note that the PostgreSQL version and Elasticsearch version pom.xmlare used in this file . You can tweak these versions as needed.42.2.237.15.0

Next, save the following Java code as PGtoES.javaa file and place it in the directory of the Maven project src/main/java/com/example.

package com.example;

import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.mapper.ObjectMapper;
import org.postgresql.jdbc.PgDatabaseMetaData;

import java.sql.*;

public class PGtoES {
    
    

    private static final String PG_HOST = "your_postgresql_host";
    private static final int PG_PORT = 5432;
    private static final String PG_DATABASE = "your_postgresql_database";
    private static final String PG_USER = "your_postgresql_user";
    private static final String PG_PASSWORD = "your_postgresql_password";

    private static final String ES_HOST = "your_elasticsearch_host";
    private static final int ES_PORT = 9200;
    private static final String ES_INDEX_PREFIX = "your_es_index_prefix";

    public static void main(String[] args) {
    
    
        try {
    
    
            Connection pgConnection = createPGConnection();
            DatabaseMetaData pgMetaData = pgConnection.getMetaData();
            ResultSet pgTables = pgMetaData.getTables(null, null, "%", null);

            RestHighLevelClient esClient = createESClient();

            while (pgTables.next()) {
    
    
                String tableName = pgTables.getString("TABLE_NAME");
                System.out.println("Table: " + tableName);

                ResultSet pgColumns = pgMetaData.getColumns(null, null, tableName, null);

                createESIndex(esClient, tableName, pgColumns);
                importDataFromPGToES(pgConnection, esClient, tableName);
            }

            pgConnection.close();
            esClient.close();
        } catch (SQLException | ClassNotFoundException e) {
    
    
            e.printStackTrace();
        }
    }

    private static Connection createPGConnection() throws ClassNotFoundException, SQLException {
    
    
        Class.forName("org.postgresql.Driver");
        String pgUrl = "jdbc:postgresql://" + PG_HOST + ":" + PG_PORT + "/" + PG_DATABASE;
        return DriverManager.getConnection(pgUrl, PG_USER, PG_PASSWORD);
    }

    private static RestHighLevelClient createESClient() {
    
    
        return new RestHighLevelClient(
                RestClient.builder(new org.apache.http.HttpHost(ES_HOST, ES_PORT, "http")));
    }

    private static void createESIndex(RestHighLevelClient esClient, String tableName, ResultSet pgColumns) throws SQLException, ClassNotFoundException {
    
    
        CreateIndexRequest request = new CreateIndexRequest(ES_INDEX_PREFIX + tableName);
        org.elasticsearch.client.Request putMappingRequest = new org.elasticsearch.client.Request(
                "PUT",
                "/" + ES_INDEX_PREFIX + tableName + "/_mapping"
        );
        org.elasticsearch.client.Request deleteMappingRequest = new org.elasticsearch.client.Request(
                "DELETE",
                "/" + ES_INDEX_PREFIX + tableName
        );

        try {
    
    
            if (esClient.indices().exists(request, RequestOptions.DEFAULT)) {
    
    
                esClient.indices().delete(deleteMappingRequest, RequestOptions.DEFAULT);
            }
        } catch (IOException e) {
    
    
            e.printStackTrace();
        }

        ObjectMapper objectMapper = new ObjectMapper();
        while (pgColumns.next()) {
    
    
            String columnName = pgColumns.getString("COLUMN_NAME");
            String columnType = pgColumns.getString("TYPE_NAME");

            switch (columnType) {
    
    
                case "bytea":
                    objectMapper.putProperty(columnName, "type", "binary");
                    break;
                case "boolean":
                    objectMapper.putProperty(columnName, "type", "boolean");
                    break;
                case "integer":
                case "smallint":
                case "bigint":
                case "numeric":
                    objectMapper.putProperty(columnName, "type", "long");
                    break;
                case "real":
                case "double precision":
                    objectMapper.putProperty(columnName, "type", "double");
                    break;
                case "text":
                case "character varying":
                    objectMapper.putProperty(columnName, "type", "text");
                    break;
                case "date":
                case "timestamp":
                    objectMapper.putProperty(columnName, "type", "date");
                    break;
                default:
                    objectMapper.putProperty(columnName, "type", "keyword");
                    break;
            }
        }

        putMappingRequest.setJsonEntity(objectMapper.toString());

        try {
    
    
            esClient.getLowLevelClient().performRequest(putMappingRequest);
        } catch (IOException e) {
    
    
            e.printStackTrace();
        }
    }

    private static void importDataFromPGToES(Connection pgConnection, RestHighLevelClient esClient, String tableName) throws SQLException {
    
    
        String selectQuery = "SELECT * FROM " + tableName;
        PreparedStatement pgStatement = pgConnection.prepareStatement(selectQuery);
        ResultSet pgData = pgStatement.executeQuery();

        BulkRequest bulkRequest = new BulkRequest();

        while (pgData.next()) {
    
    
            IndexRequest indexRequest = new IndexRequest(ES_INDEX_PREFIX + tableName);

            ResultSetMetaData metaData = pgData.getMetaData();
            int columnCount = metaData.getColumnCount();

            for (int i = 1; i <= columnCount; i++) {
    
    
                String columnName = metaData.getColumnName(i);
                Object columnValue = pgData.getObject(i);

                indexRequest.source(columnName, columnValue);
            }

            bulkRequest.add(indexRequest);
        }

        try {
    
    
            esClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        } catch (IOException e) {
    
    
            e.printStackTrace();
        }
    }
}

Please replace the following variables in the code with your own database and Elasticsearch information:

  • PG_HOST: Hostname or IP address of the PostgreSQL database
  • PG_PORT: Port number of the PostgreSQL database
  • PG_DATABASE: the name of the PostgreSQL database
  • PG_USER: Username of the PostgreSQL database
  • PG_PASSWORD: Password for the PostgreSQL database
  • ES_HOST: Hostname or IP address of Elasticsearch
  • ES_PORT: Port number of Elasticsearch
  • ES_INDEX_PREFIX: Prefix for the Elasticsearch index (used to add an identifier before the index name)

In the program, we use PostgreSQL's JDBC driver ( postgresql) and Elasticsearch's Java REST client ( elasticsearchand elasticsearch-rest-client) to connect to the database and Elasticsearch.

To run this program, make sure your Maven environment is properly set up and navigate to the root of your Maven project on the command line. Then run the following command:

mvn clean install

This will compile and package your program. Next, you can run the program with the following command:

java -cp target/pg-to-es-1.0-SNAPSHOT.jar com.example.PGtoES

The program will connect to the specified PostgreSQL database and perform the following actions:

  1. List all table information in the database.
  2. For each table, query the field information, and create the corresponding index and its mapping in Elasticsearch.
  3. Import the data in the PostgreSQL table into the corresponding Elasticsearch index one by one.

Note that this procedure assumes that both the PostgreSQL and Elasticsearch servers are properly configured and running, and reachable via the hostname (or IP address) and port number provided.

Guess you like

Origin blog.csdn.net/a772304419/article/details/132355169