Spring Boot integration Elasticsearch

Elasticsearch is a full-text search engine, designed to handle large data sets. According to the description, and naturally use it to store and search applications logs. And Logstash and Kibana together, it is part of the powerful solutions Elastic Stack of some of my previous articles in this has been described.

Retain Only use scenario of the application log is not Elasticsearch. It is usually used as an auxiliary database applications, it is a main relational database. If you must perform a full-text search or a lot of history only storage applications no longer modify the large data sets, this method is particularly useful. Of course, this method has advantages and disadvantages. When you use two different data sources containing the same data, you must first consider synchronization. You have several options: According to the relational database vendor, you can use the binary or transaction log that contains the history of SQL updates. This method requires some middleware to read the log, then the data into Elasticsearch. You can always move the entire responsibility on the database side (flip-flop) or Elasticsearch end (JDBC plugin).

No matter how you import data into Elasticsearch, we must consider another problem: data structures. Data in a relational database may be distributed among several tables. If you want to use Elasticsearch, you should store it as a single type. It forces you to keep redundant data, which can lead to more disk space usage. Of course, if the query Elasticsearch than an equivalent relational database queries faster, so this effect is acceptable.

Well, in this case continue long after the introduction. Spring Boot provides an easy way to interact through the Spring Data repository and Elasticsearch.

1 Enable Elasticsearch support

By convention Spring Boot, we do not have to provide any bean in context to enable support for the Elasticsearch. We only need to add the following dependencies in pom.xml:

 
<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-data-elasticsearch</artifactId>

</dependency>

By default, the application attempts to connect to the Elasticsearch on localhost. If we use another target URL, we need to override it in the configuration settings. This is a fragment of our application.yml file, which overrides the default cluster name and address, and the address of the start of the Docker container Elasticsearch:

spring:
  data:
    elasticsearch:
      cluster-name: docker-cluster
      cluster-nodes: 192.168.1.100:9300

Applications can by health endpoints Spring Boot Actuator health monitoring Elasticsearch connection. First, you need to add the following Maven dependency:

 
<dependency>

<groupId>org.springframework.boot</groupId>

<artifactId>spring-boot-starter-actuator</artifactId>

</dependency>

Healthcheck enabled by default, and automatically configure Elasticsearch checks. However, this is verified by Elasticsearch Rest API client execution. In this case, we need to override the property spring.elasticsearch.rest.uris- responsible for setting the address of the client to use REST:

spring:
  elasticsearch:
    rest:
      uris: http://192.168.1.100:9200

2 runs Elasticsearch

For our tests, we need to run Elasticsearch example of a single node in development mode. As usual, we will use the Docker containers. This is a Docker container up and made public on the 9200 and 9300 ports command.

$ docker run -d --name elasticsearch -p 9200 : 9200 -p 9300 : 9300 -e "discovery.type=single-node" elasticsearch: 6.6.2

3 Construction of Spring Data Library

To enable Elasticsearch repository, we only need to use the main class @EnableElasticsearchRepositories comments or configuration class:

 
@SpringBootApplication

@EnableElasticsearchRepositories

public class SampleApplication { ... }

The next step is to create an extension CrudRepository repository interface. It provides some basic operations, such as save or findById. If you want to find some additional methods, you should define the new naming method in the interface follows the Spring Data.

public interface EmployeeRepository extends CrudRepository<Employee, Long> {



List<Employee> findByOrganizationName(String name);

List<Employee> findByName(String name);



}

4 build documentation

Our physical relationship tile structure that contains related objects (organizations, departments) single Employee object. You can use this method to compare and create a view for the relevant group table in the RDBMS. In the Spring Data Elasticsearch nomenclature, the document is stored as a single object. Therefore, it is necessary to use @Document annotation objects. You should also set the target index name, type, and ID is Elasticsearch. Annotations can be used to configure @Field other maps.

@Document(indexName = "sample", type = "employee")

public class Employee {



@Id

private Long id;

@Field(type = FieldType.Object)

private Organization organization;

@Field(type = FieldType.Object)

private Department department;

private String name;

private int age;

private String position;



// Getters and Setters ...



}

5 initialization data

As mentioned in the introduction, you may decide to use Elasticsearch main reason is the need to handle large data. Therefore, it is best to use a large number of documents to fill our test Elasticsearch node. If you want to insert many documents in one step, then you must use the Bulk API. bulk API enables the implementation of a number of index / delete operation is possible in a single API call. This can greatly improve the indexing speed. You can perform bulk operations using Spring Data ElasticsearchTemplate bean. It can also be automatically configured in the Spring Boot. Template provides bulkIndex method that will index the list of queries as input parameters. This is accomplished inserted into the sample test data at application startup bean:

public class SampleDataSet {



private static final Logger LOGGER = LoggerFactory.getLogger(SampleDataSet.class);

private static final String INDEX_NAME = "sample";

private static final String INDEX_TYPE = "employee";



@Autowired

EmployeeRepository repository;

@Autowired

ElasticsearchTemplate template;



@PostConstruct

public void init() {

for (int i = 0; i < 10000; i++) {

bulk(i);

}

}



public void bulk(int ii) {

try {

if (!template.indexExists(INDEX_NAME)) {

template.createIndex(INDEX_NAME);

}

ObjectMapper mapper = new ObjectMapper();

List<IndexQuery> queries = new ArrayList<>();

List<Employee> employees = employees();

for (Employee employee : employees) {

IndexQuery indexQuery = new IndexQuery();

indexQuery.setId(employee.getId().toString());

indexQuery.setSource(mapper.writeValueAsString(employee));

indexQuery.setIndexName(INDEX_NAME);

indexQuery.setType(INDEX_TYPE);

queries.add(indexQuery);

}

if (queries.size() > 0) {

template.bulkIndex(queries);

}

template.refresh(INDEX_NAME);

LOGGER.info("BulkIndex completed: {}", ii);

} catch (Exception e) {

LOGGER.error("Error bulk index", e);

}

}



// sample data set implementation ...



}

If you do not need to insert data at startup, you can attribute initial-import changed from enabled to false to disable the process. This is SampleDataSet bean declaration:

@Bean

@ConditionalOnProperty("initial-import.enabled")

public SampleDataSet dataSet() {

return new SampleDataSet();

}

6 view data and run queries

Suppose you have started the sample application, the bean is not responsible for the expansion index is disabled, and have enough patience to wait for several hours, until all the data is inserted into Elasticsearch node, it now contains 100M employees the type of document. Some information about the cluster display is worth it. You can use Elasticsearch query to do this, you can also download a GUI tools available, such as ElasticHQ. As it happens, ElasticHQ can be used as Docker containers. You must perform the following command to start ElasticHQ container:

 
$ docker run -d --name elastichq -p 5000:5000 elastichq/elasticsearch-hq

After starting ElasticHQ, Web browser access via port 5000. GUI. Its Web console provides information about clustering, index and basic information allows the execution of the query. You only need to enter Elasticsearch node address, you will be redirected to the main dashboard with statistics. This is ElasticHQ the main instrument panel.

As you can see, we have an index named sample is divided into 5 slices. This is the default Spring Data @Document provided Frag field may be used to cover it. After clicking we can navigate to the index management panel. You can perform certain operations on an index such as clearing the cache or refresh indexes. You can also view all the statistics slices.

For current testing purposes, I have about 25M (about 3GB of space) Employee types of documents. We can perform some test queries. I have been disclosed two endpoints for the search: by employee name GET / employees / {name} and organization name GET / employees / organization / {organizationName}. Results are not overwhelming. I think the relational database using the same amount of data can achieve the same results.

7 Test

Well, we have completed the development of large data sets and some manual testing. Now, it's time to create some integrated test run when built. We can use the permit to start Docker container database automatically during the test JUnit library - Testcontainers. For more information about this library, see their site https://www.testcontainers.org or my previous article: Use Testcontainers Framework Spring integration test with the Vault and Postgres. Fortunately, Testcontainers support Elasticsearch. To enable it in the test range, you first need to add the following dependency in pom.xml:

<dependency>

<groupId>org.testcontainers</groupId>

<artifactId>elasticsearch</artifactId>

<version>1.11.1</version>

<scope>test</scope>

</dependency>

The next step is to define points or Elasticsearch container @ClassRule @Rule bean. Before it starts automatically before each test class or rely on the use of annotations. Publication port number is automatically generated, you'll need to set its value attribute spring.data.elasticsearch.cluster-nodes. This is the full realization of our JUnit integration tests:

@RunWith(SpringRunner.class)

@SpringBootTest

@FixMethodOrder(MethodSorters.NAME_ASCENDING)

public class EmployeeRepositoryTest {



@ClassRule

public static ElasticsearchContainer container = new ElasticsearchContainer();

@Autowired

EmployeeRepository repository;



@BeforeClass

public static void before() {

System.setProperty("spring.data.elasticsearch.cluster-nodes", container.getContainerIpAddress() + ":" + container.getMappedPort(9300));

}



@Test

public void testAdd() {

Employee employee = new Employee();

employee.setId(1L);

employee.setName("John Smith");

employee.setAge(33);

employee.setPosition("Developer");

employee.setDepartment(new Department(1L, "TestD"));

employee.setOrganization(new Organization(1L, "TestO", "Test Street No. 1"));

employee = repository.save(employee);

Assert.assertNotNull(employee);

}



@Test

public void testFindAll() {

Iterable<Employee> employees = repository.findAll();

Assert.assertTrue(employees.iterator().hasNext());

}



@Test

public void testFindByOrganization() {

List<Employee> employees = repository.findByOrganizationName("TestO");

Assert.assertTrue(employees.size() > 0);

}



@Test

public void testFindByName() {

List<Employee> employees = repository.findByName("John Smith");

Assert.assertTrue(employees.size() > 0);

}



}

to sum up

In this article, you will learn:

  • Use local instance of Docker run Elasticsearch

  • The Spring Boot application integration and Elasticsearch

  • Using Spring Data Repositories to store data and perform simple queries

  • Spring Data ElasticsearchTemplate user to perform batch operations on index

  • Use ElasticHQ monitor cluster

  • 使用Testcontainers为Elasticsearch构建自动集成测试 示例应用程序源代码通常在GitHub上的sample-spring-elasticsearch。

 

 

本文来源于微信公众号     java微技术

Guess you like

Origin blog.csdn.net/qq_29556507/article/details/92402665