Spring Boot + Spring Batch implements batch tasks, nanny-level tutorial! (Scene actual combat)

Click to follow the official account, Java dry goods will be delivered in timed56ae55fb382580aae5bada5479ec073.png

ff7831785758c8047ecdd8ea80380994.png There is no one of the strongest microservice frameworks in China!

ac7e0f2015d24e0281599c65d6d4c88d.png Covers almost all Spring Boot operations!

70e321b31ddbce50e1371118d5cfed5b.png 2023 New Java Interview Questions (2500+)


Source: blog.csdn.net/qq_35387940/article/details/108193473

foreword

I won’t say much about the concept words. Let me briefly introduce that spring batch is a more robust batch processing framework that is easy to use.

Why is it convenient to use, because this is a framework based on spring, with simple access, easy to understand, and clear processes.

Why it is relatively sound, because it provides log tracking, transaction granularity allocation, controllable execution, failure mechanism, retry mechanism, data reading and writing, etc. that we need to consider when processing large batches of data.

text

So back to the article, what will our article bring to you? (It is of course explained with examples)

From the realization of business scenarios, there are the following two:

  1. Read data from csv file, perform business processing and store

  2. Read data from the database, perform business processing and then store

That is, data cleaning or data filtering that is often encountered in daily life, or data migration and backup, etc. For a large amount of data, there are too many things to consider when implementing batch processing by yourself, and you are not at ease, so using the Spring Batch framework is a good choice.

First of all, before entering the example tutorial, let's take a look at this example. We use springboot to integrate the spring batch framework. What are the things to code?

Through a simple diagram to understand:

087103585b4986d616f9db9debcafa8e.png

Maybe when you see this picture, do you think of the timed task framework more or less? It does look a bit like that, but I have to tell you here that this is a batch processing framework, not a schuedling framework. But as mentioned earlier, it provides executable control, that is to say, when the execution is controllable, then obviously it can be expanded by itself and combined with the timing task framework to realize what you think.

ok, back to the topic, I believe you can see from the picture simply and clearly what we need to achieve in this example. So I will not describe the various widgets in large quantities.

So without further ado, let's start our tutorial by example.

First prepare a database, and build a simple table in it for writing, storing or reading instance data. Recommended learning: Jumping a trough increased 8K, which shocked me. .

bloginfo表

42d5c47e8793a96d9b3a1b593219b2e5.png

Relevant table creation sql statement:

CREATE TABLE `bloginfo`  (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
  `blogAuthor` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '博客作者标识',
  `blogUrl` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '博客链接',
  `blogTitle` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '博客标题',
  `blogItem` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '博客栏目',
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 89031 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;

Core dependencies in the pom file:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <scope>test</scope>
</dependency>

<!--  spring batch -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-batch</artifactId>
</dependency>

<!-- hibernate validator -->
<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-validator</artifactId>
    <version>6.0.7.Final</version>
</dependency>
<!--  mybatis -->
<dependency>
    <groupId>org.mybatis.spring.boot</groupId>
    <artifactId>mybatis-spring-boot-starter</artifactId>
    <version>2.0.0</version>
</dependency>
<!--  mysql -->
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <scope>runtime</scope>
</dependency>

<!-- druid数据源驱动 1.1.10解决springboot从1.0——2.0版本问题-->
<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>druid-spring-boot-starter</artifactId>
    <version>1.1.18</version>
</dependency>

yml file:

I won’t introduce the basics of Spring Boot. It is recommended to watch this practical project:

https://github.com/javastacks/spring-boot-best-practice

spring:
  batch:
    job:
#设置为 false -需要jobLaucher.run执行
      enabled: false
    initialize-schema: always
#    table-prefix: my-batch

  datasource:
    druid:
      username: root
      password: root
      url: jdbc:mysql://localhost:3306/hellodemo?useSSL=false&useUnicode=true&characterEncoding=UTF-8&serverTimezone=GMT%2B8&zeroDateTimeBehavior=convertToNull
      driver-class-name: com.mysql.cj.jdbc.Driver
      initialSize: 5
      minIdle: 5
      maxActive: 20
      maxWait: 60000
      timeBetweenEvictionRunsMillis: 60000
      minEvictableIdleTimeMillis: 300000
      validationQuery: SELECT 1 FROM DUAL
      testWhileIdle: true
      testOnBorrow: false
      testOnReturn: false
      poolPreparedStatements: true
      maxPoolPreparedStatementPerConnectionSize: 20
      useGlobalDataSourceStat: true
      connectionProperties: druid.stat.mergeSql=true;druid.stat.slowSqlMillis=5000
server:
  port: 8665
ae76e1ee7d346a39c3b4213f2477aa66.png

ps: Here we use the druid database connection pool, but there is actually a small pit, which will be discussed later in the article.

Because after the final data processing of our instance this time, it is written to the database storage (of course, you can also output to a file, etc.).

So we also built a table earlier, and we also integrated mybatis in the pom file, so before we integrate the main code of spring batch, let's briefly go through these general information about the database.

Recommended learning: Spring Boot's latest learning roadmap is here, with 16 modules, novices can learn it!

pojo layers

BlogInfo.java :

/**
 * @Author : JCccc
 * @Description :
 **/
public class BlogInfo {

    private Integer id;
    private String blogAuthor;
    private String blogUrl;
    private String blogTitle;
    private String blogItem;

    @Override
    public String toString() {
        return "BlogInfo{" +
                "id=" + id +
                ", blogAuthor='" + blogAuthor + '\'' +
                ", blogUrl='" + blogUrl + '\'' +
                ", blogTitle='" + blogTitle + '\'' +
                ", blogItem='" + blogItem + '\'' +
                '}';
    }

    public Integer getId() {
        return id;
    }

    public void setId(Integer id) {
        this.id = id;
    }

    public String getBlogAuthor() {
        return blogAuthor;
    }

    public void setBlogAuthor(String blogAuthor) {
        this.blogAuthor = blogAuthor;
    }

    public String getBlogUrl() {
        return blogUrl;
    }

    public void setBlogUrl(String blogUrl) {
        this.blogUrl = blogUrl;
    }

    public String getBlogTitle() {
        return blogTitle;
    }

    public void setBlogTitle(String blogTitle) {
        this.blogTitle = blogTitle;
    }

    public String getBlogItem() {
        return blogItem;
    }

    public void setBlogItem(String blogItem) {
        this.blogItem = blogItem;
    }
}
mapper layer

BlogMapper.java :

ps: You can see that I used annotations in this example, haha, to save trouble, and I didn’t write the servcie layer and impl layer, also to save trouble, because the focus of this article is not on these, so don’t learn from these bad ones .

import com.example.batchdemo.pojo.BlogInfo;
import org.apache.ibatis.annotations.*;
import java.util.List;
import java.util.Map;

/**
 * @Author : JCccc
 * @Description :
 **/
@Mapper
public interface BlogMapper {
    @Insert("INSERT INTO bloginfo ( blogAuthor, blogUrl, blogTitle, blogItem )   VALUES ( #{blogAuthor}, #{blogUrl},#{blogTitle},#{blogItem}) ")
    @Options(useGeneratedKeys = true, keyProperty = "id")
    int insert(BlogInfo bloginfo);

    @Select("select blogAuthor, blogUrl, blogTitle, blogItem from bloginfo where blogAuthor < #{authorId}")
     List<BlogInfo> queryInfoById(Map<String , Integer> map);

}

Next, the highlight, we started to code the various components involved in the previous picture.

First create a configuration class, MyBatchConfig.java:

From my name, we can know that this is basically some configuration components involved in our integration of spring batch will be written here.

First of all, let's look at the picture above, which contains:

JobRepository job的注册/存储器
JobLauncher job的执行器
Job job任务,包含一个或多个Step
Step 包含(ItemReader、ItemProcessor和ItemWriter)
ItemReader 数据读取器
ItemProcessor 数据处理器
ItemWriter 数据输出器

First, add annotations before the MyBatchConfig class:

@ConfigurationIt is used to tell spring that our class is a custom configuration class, and many beans in it need to be loaded into the spring container

@EnableBatchProcessingEnable batch support

16176b327e32633c482b79547283a246.png

Then start to write various widgets in the MyBatchConfig class.

In addition, if you are preparing for an interview to change jobs in the near future, it is recommended to brush up questions online in the Java interview library applet, covering 2000+ Java interview questions, covering almost all mainstream technical interview questions.

JobRepository

Written in the MyBatchConfig class

/**
 * JobRepository定义:Job的注册容器以及和数据库打交道(事务管理等)
 * @param dataSource
 * @param transactionManager
 * @return
 * @throws Exception
 */
@Bean
public JobRepository myJobRepository(DataSource dataSource, PlatformTransactionManager transactionManager) throws Exception{
    JobRepositoryFactoryBean jobRepositoryFactoryBean = new JobRepositoryFactoryBean();
    jobRepositoryFactoryBean.setDatabaseType("mysql");
    jobRepositoryFactoryBean.setTransactionManager(transactionManager);
    jobRepositoryFactoryBean.setDataSource(dataSource);
    return jobRepositoryFactoryBean.getObject();
}
JobLauncher

Written in the MyBatchConfig class

/**
 * jobLauncher定义:job的启动器,绑定相关的jobRepository
 * @param dataSource
 * @param transactionManager
 * @return
 * @throws Exception
 */
@Bean
public SimpleJobLauncher myJobLauncher(DataSource dataSource, PlatformTransactionManager transactionManager) throws Exception{
    SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
    // 设置jobRepository
    jobLauncher.setJobRepository(myJobRepository(dataSource, transactionManager));
    return jobLauncher;
}
Job

Written in the MyBatchConfig class

/**
 * 定义job
 * @param jobs
 * @param myStep
 * @return
 */
@Bean
public Job myJob(JobBuilderFactory jobs, Step myStep){
    return jobs.get("myJob")
            .incrementer(new RunIdIncrementer())
            .flow(myStep)
            .end()
            .listener(myJobListener())
            .build();
}

For the operation of Job, it is possible to configure the listener

JobListener

Written in the MyBatchConfig class

/**
 * 注册job监听器
 * @return
 */
@Bean
public MyJobListener myJobListener(){
    return new MyJobListener();
}

This is our own custom listener, so it is created separately MyJobListener.java:

/**
 * @Author : JCccc
 * @Description :监听Job执行情况,实现JobExecutorListener,且在batch配置类里,Job的Bean上绑定该监听器
 **/

public class MyJobListener implements JobExecutionListener {

    private Logger logger = LoggerFactory.getLogger(MyJobListener.class);

    @Override
    public void beforeJob(JobExecution jobExecution) {
        logger.info("job 开始, id={}",jobExecution.getJobId());
    }

    @Override
    public void afterJob(JobExecution jobExecution) {
        logger.info("job 结束, id={}",jobExecution.getJobId());
    }
}
Step(ItemReader  ItemProcessor  ItemWriter)

The step contains the implementation of three small components: data reader, data processor, and data outputter.

We also disassemble one by one to write.

As mentioned earlier in the article, this article implements two scenarios, one is to read a large amount of data from a csv file for processing, and the other is to read a large amount of data from a database table for processing.

Click to follow the official account, Java dry goods will be delivered in time70ea21decf300e4fef8de96b693680dd.png

Read data from CSV file
ItemReader

Written in the MyBatchConfig class

/**
 * ItemReader定义:读取文件数据+entirty实体类映射
 * @return
 */
@Bean
public ItemReader<BlogInfo> reader(){
    // 使用FlatFileItemReader去读cvs文件,一行即一条数据
    FlatFileItemReader<BlogInfo> reader = new FlatFileItemReader<>();
    // 设置文件处在路径
    reader.setResource(new ClassPathResource("static/bloginfo.csv"));
    // entity与csv数据做映射
    reader.setLineMapper(new DefaultLineMapper<BlogInfo>() {
        {
            setLineTokenizer(new DelimitedLineTokenizer() {
                {
                    setNames(new String[]{"blogAuthor","blogUrl","blogTitle","blogItem"});
                }
            });
            setFieldSetMapper(new BeanWrapperFieldSetMapper<BlogInfo>() {
                {
                    setTargetType(BlogInfo.class);
                }
            });
        }
    });
    return reader;
}

Simple code analysis:

1762d0ee0024540dd78415c26c88f0e6.png

For the data reader ItemReader, we arrange a read listener for it, create MyReadListener.java:

/**
 * @Author : JCccc
 * @Description :
 **/

public class MyReadListener implements ItemReadListener<BlogInfo> {

    private Logger logger = LoggerFactory.getLogger(MyReadListener.class);

    @Override
    public void beforeRead() {
    }

    @Override
    public void afterRead(BlogInfo item) {
    }

    @Override
    public void onReadError(Exception ex) {
        try {
            logger.info(format("%s%n", ex.getMessage()));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
ItemProcessor

Written in the MyBatchConfig class

/**
 * 注册ItemProcessor: 处理数据+校验数据
 * @return
 */
@Bean
public ItemProcessor<BlogInfo, BlogInfo> processor(){
    MyItemProcessor myItemProcessor = new MyItemProcessor();
    // 设置校验器
    myItemProcessor.setValidator(myBeanValidator());
    return myItemProcessor;
}

The data processor is customized by us, which mainly contains our business logic for data processing, and we have set up some data validators. Here we use the Validator of JSR-303 as the validator.

Validator

Written in the MyBatchConfig class

/**
 * 注册校验器
 * @return
 */
@Bean
public MyBeanValidator myBeanValidator(){
    return new MyBeanValidator<BlogInfo>();
}

create MyItemProcessor.java:

ps: My data processing logic is to get the blogItem field of each piece of data in the read data. If it is springboot, then replace the title field value.

In fact, it is to simulate a simple data processing scenario.

import com.example.batchdemo.pojo.BlogInfo;
import org.springframework.batch.item.validator.ValidatingItemProcessor;
import org.springframework.batch.item.validator.ValidationException;

/**
 * @Author : JCccc
 * @Description :
 **/
public class MyItemProcessor extends ValidatingItemProcessor<BlogInfo> {
    @Override
    public BlogInfo process(BlogInfo item) throws ValidationException {
        /**
         * 需要执行super.process(item)才会调用自定义校验器
         */
        super.process(item);
        /**
         * 对数据进行简单的处理
         */
        if (item.getBlogItem().equals("springboot")) {
            item.setBlogTitle("springboot 系列还请看看我Jc");
        } else {
            item.setBlogTitle("未知系列");
        }
        return item;
    }
}

Create MyBeanValidator.java:

import org.springframework.batch.item.validator.ValidationException;
import org.springframework.batch.item.validator.Validator;
import org.springframework.beans.factory.InitializingBean;
import javax.validation.ConstraintViolation;
import javax.validation.Validation;
import javax.validation.ValidatorFactory;
import java.util.Set;

/**
 * @Author : JCccc
 * @Description :
 **/
public class MyBeanValidator<T> implements Validator<T>, InitializingBean {

    private javax.validation.Validator validator;

    @Override
    public void validate(T value) throws ValidationException {
        /**
         * 使用Validator的validate方法校验数据
         */
        Set<ConstraintViolation<T>> constraintViolations =
                validator.validate(value);
        if (constraintViolations.size() > 0) {
            StringBuilder message = new StringBuilder();
            for (ConstraintViolation<T> constraintViolation : constraintViolations) {
                message.append(constraintViolation.getMessage() + "\n");
            }
            throw new ValidationException(message.toString());
        }
    }

    /**
     * 使用JSR-303的Validator来校验我们的数据,在此进行JSR-303的Validator的初始化
     * @throws Exception
     */
    @Override
    public void afterPropertiesSet() throws Exception {
        ValidatorFactory validatorFactory =
                Validation.buildDefaultValidatorFactory();
        validator = validatorFactory.usingContext().getValidator();
    }

}

ps: In fact, this article does not use this data validator. If you want to use it, you can add some validator annotations @NotNull @Max @Email and so on to the entity class. I prefer to process directly in the processor, and I want to write all the code about data processing in one piece.

ItemWriter

Written in the MyBatchConfig class

/**
 * ItemWriter定义:指定datasource,设置批量插入sql语句,写入数据库
 * @param dataSource
 * @return
 */
@Bean
public ItemWriter<BlogInfo> writer(DataSource dataSource){
    // 使用jdbcBcatchItemWrite写数据到数据库中
    JdbcBatchItemWriter<BlogInfo> writer = new JdbcBatchItemWriter<>();
    // 设置有参数的sql语句
    writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<BlogInfo>());
    String sql = "insert into bloginfo "+" (blogAuthor,blogUrl,blogTitle,blogItem) "
            +" values(:blogAuthor,:blogUrl,:blogTitle,:blogItem)";
    writer.setSql(sql);
    writer.setDataSource(dataSource);
    return writer;
}

Simple code analysis:

1c5e26c47085beaeccf1d59dfbadb651.png

Also for the data reader ItemWriter, we also arranged an output listener for it, creating MyWriteListener.java:

import com.example.batchdemo.pojo.BlogInfo;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.ItemWriteListener;
import java.util.List;
import static java.lang.String.format;

/**
 * @Author : JCccc
 * @Description :
 **/
public class MyWriteListener implements ItemWriteListener<BlogInfo> {
    private Logger logger = LoggerFactory.getLogger(MyWriteListener.class);

    @Override
    public void beforeWrite(List<? extends BlogInfo> items) {
    }

    @Override
    public void afterWrite(List<? extends BlogInfo> items) {
    }

    @Override
    public void onWriteError(Exception exception, List<? extends BlogInfo> items) {
        try {
            logger.info(format("%s%n", exception.getMessage()));
            for (BlogInfo message : items) {
                logger.info(format("Failed writing BlogInfo : %s", message.toString()));
            }

        } catch (Exception e) {
            e.printStackTrace();
        }

    }
}

ItemReader, ItemProcessor, ItemWriter, these three widgets are here, we have implemented them all, so the next step is to bind these three widgets to our step.

Written in the MyBatchConfig class

/**
 * step定义:
 * 包括
 * ItemReader 读取
 * ItemProcessor  处理
 * ItemWriter 输出
 * @param stepBuilderFactory
 * @param reader
 * @param writer
 * @param processor
 * @return
 */

@Bean
public Step myStep(StepBuilderFactory stepBuilderFactory, ItemReader<BlogInfo> reader,
                 ItemWriter<BlogInfo> writer, ItemProcessor<BlogInfo, BlogInfo> processor){
    return stepBuilderFactory
            .get("myStep")
            .<BlogInfo, BlogInfo>chunk(65000) // Chunk的机制(即每次读取一条数据,再处理一条数据,累积到一定数量后再一次性交给writer进行写入操作)
            .reader(reader).faultTolerant().retryLimit(3).retry(Exception.class).skip(Exception.class).skipLimit(2)
            .listener(new MyReadListener())
            .processor(processor)
            .writer(writer).faultTolerant().skip(Exception.class).skipLimit(2)
            .listener(new MyWriteListener())
            .build();
}

This Step, a little explanation.

As mentioned earlier, the spring batch framework provides mechanisms for transaction control, restart, detection skipping, and so on.

Well, the realization of these things lies in the setting of this step link.

First of all, see the first setting in our code, chunk( 6500 ) , the mechanism of Chunk (that is, read one piece of data each time, and then process one piece of data, accumulate a certain amount and then hand it over to the writer for writing operation.

That's right, for the entire step link, it is the reading of data, processing and finally output.

In this chunk mechanism, the 6500 we pass in is to tell it to read and process the data, and the accumulated data reaches 6500 for a batch processing to perform the write operation.

This value transfer depends on the specific business. It can be 500 records at a time, 1000 records at a time, or 20 records at a time, or 50 records at a time.

Use a simple small picture to help understand:

cb227bc130a1af1f8e826fde3401caed.png

In our large amount of data processing, whether it is reading or writing, it will definitely involve some unknown or known factors that cause a piece of data to fail.

So if we say that we do not set anything and fail a piece of data, then we will treat it as a failure? . Obviously this is too inhumane, so spring batch provides two settings of retry and skip (in fact, there is also restart), through these two settings to humanely solve some data operation failure scenarios.

retryLimit(3).retry(Exception.class)

That's right, this is to set retries, how many times to retry when an exception occurs. We set it to 3, that is to say, when a data operation fails, we will retry the data 3 times, and if it fails, it will be regarded as a failure. If we have configured skip (recommended configuration), then this data will fail The record will be left for skip to process.

skip(Exception.class).skipLimit(2)

skip, skip, that is to say, if we set 3, then the failure of 3 pieces of data can be tolerated. Only when the failure data reaches 3 times, we interrupt this step.

For the failed data, we made relevant listeners and exception information records for subsequent manual remediation.

Then remember that we start to call this batch job, we trigger this batch event through the interface, and create a new Controller TestController.java:

/**
 * @Author : JCccc
 * @Description :
 **/
@RestController
public class TestController {
    @Autowired
    SimpleJobLauncher jobLauncher;

    @Autowired
    Job myJob;

    @GetMapping("testJob")
    public  void testJob() throws JobParametersInvalidException, JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException {
     //    后置参数:使用JobParameters中绑定参数 addLong  addString 等方法
        JobParameters jobParameters = new JobParametersBuilder().toJobParameters();
        jobLauncher.run(myJob, jobParameters);

    }
}

By the way, I prepared a csv file bloginfo.csvwith about 80,000 pieces of data in it for batch testing:

c28f3da8f0c6a8a458a1d85cfa5ef50c.png

The path of this file must be consistent with the path read in our data reader.

c05f95e9607df390ad38b1e634def596.png 503c376962a80b9724560e0f3cd45857.png

Our database currently looks like this,

e73a3212c8b7c91ac6c55cf02a7ddd19.png

Next, we started our project, took another look at the database, and generated some batches to track and record some data tables of the job:

52f593166bf16781b1800a5826649d44.png

Let's call the testJob interface,

12b0c3cf5d5e62d25bb7d4f39db377da.png

Then look at the database, all the data that can be seen have been processed and inserted into the database:

3603b5aadec2488ae5ae97b0d43d8691.png

So far, we have actually completed the integration of spring batch with Springboot, and also realized the business scenario of reading data from csv files, processing and storing them.

read data from database

ps: The front row prompts that there are pitfalls in using druid. I will talk about it later.

Then the next step is to realize the scene, read the data from the database table, process and output it to the new table.

Then based on our above integration, we have achieved

JobRepository job的注册/存储器
JobLauncher job的执行器
Job job任务,包含一个或多个Step
Step 包含(ItemReader、ItemProcessor和ItemWriter)
ItemReader 数据读取器
ItemProcessor 数据处理器
ItemWriter 数据输出器
job 监听器
reader 监听器
writer 监听器
process 数据校验器

So for us to write a new job to complete a new scene, do we need to rewrite it all?

Obviously not necessary, of course, it is also possible to write a completely new set.

Then in this article, for a new scene, read data from csv file to read data from database table, we re-create:

  1. Data Reader: Previously used FlatFileItemReader, we are now usingMyBatisCursorItemReader

  2. Data processor: In new scenarios, in order to expand business, it is best for our processor to create a new one

  3. Data exporter:  In a new scenario, in order to expand the business, it is best to create a new one for our data exporter

  4. Step binding settings: new scenarios, business expansion, so it is best for our step to create a new one

  5. Job: Of course I have to write a new one

For the rest, we can just use the original ones, JobRepository, JobLauncher and various listeners, and we will not rebuild them for the time being.

New MyItemProcessorNew.java:

import org.springframework.batch.item.validator.ValidatingItemProcessor;
import org.springframework.batch.item.validator.ValidationException;

/**
 * @Author : JCccc
 * @Description :
 **/
public class MyItemProcessorNew extends ValidatingItemProcessor<BlogInfo> {
    @Override
    public BlogInfo process(BlogInfo item) throws ValidationException {
        /**
         * 需要执行super.process(item)才会调用自定义校验器
         */
        super.process(item);
        /**
         * 对数据进行简单的处理
         */
        Integer authorId= Integer.valueOf(item.getBlogAuthor());
        if (authorId<20000) {
            item.setBlogTitle("这是都是小于20000的数据");
        } else if (authorId>20000 && authorId<30000){
            item.setBlogTitle("这是都是小于30000但是大于20000的数据");
        }else {
            item.setBlogTitle("旧书不厌百回读");
        }
        return item;
    }
}

Then other redefined widgets are written in the MyBatchConfig class:

/**
 * 定义job
 * @param jobs
 * @param stepNew
 * @return
 */
@Bean
public Job myJobNew(JobBuilderFactory jobs, Step stepNew){
    return jobs.get("myJobNew")
            .incrementer(new RunIdIncrementer())
            .flow(stepNew)
            .end()
            .listener(myJobListener())
            .build();

}

@Bean
public Step stepNew(StepBuilderFactory stepBuilderFactory, MyBatisCursorItemReader<BlogInfo> itemReaderNew,
                    ItemWriter<BlogInfo> writerNew, ItemProcessor<BlogInfo, BlogInfo> processorNew){
    return stepBuilderFactory
            .get("stepNew")
            .<BlogInfo, BlogInfo>chunk(65000) // Chunk的机制(即每次读取一条数据,再处理一条数据,累积到一定数量后再一次性交给writer进行写入操作)
            .reader(itemReaderNew).faultTolerant().retryLimit(3).retry(Exception.class).skip(Exception.class).skipLimit(10)
            .listener(new MyReadListener())
            .processor(processorNew)
            .writer(writerNew).faultTolerant().skip(Exception.class).skipLimit(2)
            .listener(new MyWriteListener())
            .build();

}

@Bean
public ItemProcessor<BlogInfo, BlogInfo> processorNew(){
    MyItemProcessorNew csvItemProcessor = new MyItemProcessorNew();
    // 设置校验器
    csvItemProcessor.setValidator(myBeanValidator());
    return csvItemProcessor;
}

@Autowired
private SqlSessionFactory sqlSessionFactory;

@Bean
@StepScope
//Spring Batch提供了一个特殊的bean scope类(StepScope:作为一个自定义的Spring bean scope)。这个step scope的作用是连接batches的各个steps。这个机制允许配置在Spring的beans当steps开始时才实例化并且允许你为这个step指定配置和参数。
public MyBatisCursorItemReader<BlogInfo> itemReaderNew(@Value("#{jobParameters[authorId]}") String authorId) {

        System.out.println("开始查询数据库");

        MyBatisCursorItemReader<BlogInfo> reader = new MyBatisCursorItemReader<>();

        reader.setQueryId("com.example.batchdemo.mapper.BlogMapper.queryInfoById");

        reader.setSqlSessionFactory(sqlSessionFactory);
         Map<String , Object> map = new HashMap<>();

          map.put("authorId" , Integer.valueOf(authorId));
         reader.setParameterValues(map);
        return reader;
}

/**
 * ItemWriter定义:指定datasource,设置批量插入sql语句,写入数据库
 * @param dataSource
 * @return
 */
@Bean
public ItemWriter<BlogInfo> writerNew(DataSource dataSource){
    // 使用jdbcBcatchItemWrite写数据到数据库中
    JdbcBatchItemWriter<BlogInfo> writer = new JdbcBatchItemWriter<>();
    // 设置有参数的sql语句
    writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<BlogInfo>());
    String sql = "insert into bloginfonew "+" (blogAuthor,blogUrl,blogTitle,blogItem) "
            +" values(:blogAuthor,:blogUrl,:blogTitle,:blogItem)";
    writer.setSql(sql);
    writer.setDataSource(dataSource);
    return writer;
}

Points to note about the code

data readerMyBatisCursorItemReader

d2b1611c30e6197aa36429873a6b3a92.png

The corresponding mapper method:

6e7796f2e5a4f550ccf713dc54d26f94.png

Data processor MyItemProcessorNew:

56d927f6b1c1a783341c8f82db87e03b.png

Data exporter, newly inserted into other database tables, specially for testing:

468d49c7c97542986e34110fa88e6c76.png

Of course, in order to test this scenario, our database also created a new table, the bloginfonew table.

4c8eef27a7ad87fc8c44f53104b3a7b1.png

Next, we write a new interface to execute the new job:

3841417086bd03500ecea88288709934.png
@Autowired
SimpleJobLauncher jobLauncher;

@Autowired
Job myJobNew;

@GetMapping("testJobNew")
public  void testJobNew(@RequestParam("authorId") String authorId) throws JobParametersInvalidException, JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException {

    JobParameters jobParametersNew = new JobParametersBuilder().addLong("timeNew", System.currentTimeMillis())
            .addString("authorId",authorId)
            .toJobParameters();
    jobLauncher.run(myJobNew,jobParametersNew);

}

ok, let's call some of this interface:

28e575ca5a2fd32b29e963cf8d4ed733.png

Take a look at the console:

b6469851850b7be43c4aaf1fd7507b29.png

That's right, this is a failure, because it is related to druid, and a database function is not supported. This is an error reported when the data is read.

My preliminary test is that MyBatisCursorItemReaderthe druid database connection pool does not support it.

Then, we just need to:

Comment out the druid connection pool jar dependency

f81117429012f56f86b3599df97253de.png

Replace the connection pool configuration in yml

ff02bcc29830eb8e95db0c681aedfba1.png

In fact, we do not configure other connection pools. The springboot 2.X version has integrated the default connection pool HikariCP for us.

In the Springboot2.X version, the database connection pool is officially recommended to use HikariCP

If it is not for the background monitoring data of druid, sql analysis, etc., HikariCP is completely preferred.

The official words:

We preferHikariCPfor its performance and concurrency. If HikariCP is available, we always choose it.

translate:

We prefer hikaricpf for performance and concurrency. If HikariCP is available, we always choose it.

So we don't deserve any connection pool, and use the default HikariCP connection pool.

Recommend an open source and free Spring Boot practical project:

https://github.com/javastacks/spring-boot-best-practice

Of course, if you want to match, it is also possible:

53bd6489d8596502aaa88da3aa91d928.png

So after we get rid of the druid connection pool, let's call the new interface again:

d42b2118869ec31475220ee1beb290fa.png

It can be seen that obtaining data from the database and performing batch processing to write the job is successful:

f3c6dae92b6a3c1caab7540ad30a76d5.png

The data inserted in the new table has been processed by the logic written by myself:

4adafe175122e58b4d883723333d72e0.png

Well, springboot integrates the spring batch batch processing framework, and that's it.


If you want to learn Spring Boot systematically, I recommend my " Spring Boot Core Technology Course ", based on the latest 3.x version, covering almost all core knowledge points of Spring Boot , one-time payment, permanent learning...

Those who are interested scan the code to contact and subscribe to learn:

b57f8f1109ade0634e238e8613d23c65.png

If you want to try/preview, add Brother R on WeChat

bc25d78fdc5d8fb26b757e1ac030a1b2.jpeg

Add WeChat, please note: 399

7d7d52e6e8d6205d07c722337d8e76a6.png Click to read the original text to get 2000+ interview questions!

Guess you like

Origin blog.csdn.net/youanyyou/article/details/131950222