Spring Batch Guide

Introduction to SpringBatch

Currently, Spring Batch is one of the few excellent frameworks in the batch processing framework (Java language development).

Spring Batch is a lightweight and complete batch processing framework designed to help enterprises build robust and efficient batch processing applications.

Spring Batch is a sub-project of Spring, which uses Java language and is developed based on the Spring framework, making it easier for developers or enterprises that have used the Spring framework to access and utilize enterprise services;

Spring Batch provides a large number of reusable components, including logging, tracking, transactions, task job statistics, task restart, skip, repeat, resource management.

Simple, complex, and large-volume batch jobs can be supported through Spring Batch. At the same time, it also provides optimization and fragmentation technology for high-performance batch processing tasks.

A typical batch application looks like this:

  1. Read large numbers of records from a database, file or queue.
  2. process the data in a certain way.
  3. Write back the data in a modified form.

In SpringBatch, the job is the running framework of the step, and the specific running business is completed by the step

Step

Step
The figure below is the brief structure of Step

insert image description here

A Step usually covers three parts: reading data (Reader), processing data (Processor) and writing data (Writer). But not all Steps need to complete data processing by themselves. For example, stored procedures are completed through external functions, so Spring Batch provides two Step processing methods: 1) Fragment-oriented ChunkStep, 2) Process-oriented
TaskletStep
.

Generally use ChunkStep

ChunkStep

In Step, data is processed by record (by row), but submitting things immediately after each record is processed will cause huge pressure on IO. Therefore, Spring Batch provides the fragmentation function of data processing. After the sharding is set, a job will start from Read and then handed over to Processor for processing. After the processing is completed, aggregation will be performed. After a certain amount of data is aggregated, Write will be called once to submit the data to the physical database.
All of this data will not be written if any error occurs during the aggregation of the data.

Fragment-oriented processing diagram

insert image description here

@Bean
public Step testStep(PlatformTransactionManager transactionManager) {
    
    
	return stepBuilderFactory.get("testStep")
				.transactionManager(transactionManager)
				.<String, String>chunk(int) 
				.reader(testReader()) 
				.processor(testProcessor)
				.writer(testWriter()) 
				.build();
}

transactionManager: Use the default PlatformTransactionManager to manage things. When things are configured, Spring Batch will automatically manage things without requiring developers to display operations.

commit interval

Step uses PlatformTransactionManager to manage things. The interval of each transaction submission is executed according to the data configured in the chunk method. If the submission interval is set too small, unnecessary resources will be wasted. If the submission interval is set too long, the transaction chain will be too long to occupy space, and a failure will cause a large amount of data to be rolled back. Generally set to 10 to 20.

stepBuilderFactory.get("testStep")
	.<String, String>chunk(int)
	.reader(testReader()) 
	.processor(testProcessor)
	.writer(testWriter()) 
	.startLimit(1)
	.build();

step restart times

Certain Steps may be used to process some pre-requisite tasks, so when the Job restarts again, the Step does not need to be executed again. You can limit the number of restarts of a Step by setting startLimit. When set to 1, it means that it will only run once, and it will not be executed when there is a restart.

stepBuilderFactory.get("testStep")
	.<String, String>chunk(int)
	.reader(testReader()) 
	.processor(testProcessor)
	.writer(testWriter()) 
	.startLimit(1)
	.build();

step runs every reboot

You can tell the framework to execute the Step every time you restart it by setting allow-start-if-complete to true

stepBuilderFactory.get("testStep")
	 .<String, String>chunk(int)
	 .reader(testReader()) 
	 .processor(testProcessor)
	 .writer(testWriter()) 
	 .allowStartIfComplete(true)
	 .build();

Skip after step fails

stepBuilderFactory.get("testStep")
 	 .<String, String>chunk(int)
 	 .reader(testReader()) 
	 .processor(testProcessor)
 	 .writer(testWriter()) 
 	 .skipLimit(10)
	 .skip(Exception.class)
	 .noSkip(FileNotFoundException.class)
 	 .build();

The parameter configured by skip-limit (skipLimit method) indicates that when the number of skips exceeds the value, the entire Step will fail, thereby stopping the continuous operation of
skip, which means skipping when the Exception is caught. But Exception has many inheritance classes. At this time, you can use the noSkip method to specify that some exceptions cannot be skipped.

step retry

stepBuilderFactory.get("testStep")
 	 .<String, String>chunk(int)
  	 .reader(testReader()) 
  	 .processor(testProcessor)
   	 .writer(testWriter()) 
	    .faultTolerant()
	    .retryLimit(3)
	    .retry(DeadlockLoserDataAccessException.class)
   	 .build();

retry(DeadlockLoserDataAccessException.class) means retrying only if the exception is caught, retryLimit(3) means retrying up to 3 times, faultTolerant() means enabling the corresponding fault tolerance function.

step control does not roll back

stepBuilderFactory.get("testStep")
	    .<String, String>chunk(int)
   	 .reader(testReader()) 
   	 .processor(testProcessor)
        .writer(testWriter()) 
  	 .faultTolerant()
	   .noRollback(ValidationException.class) //不必回滚的异常
       .build();

The noRollback attribute provides Step with an exception configuration that does not need to roll back things

step data reread

stepBuilderFactory.get("testStep")
 	  .<String, String>chunk(int)
    .reader(testReader()) 
    .processor(testProcessor)
    .writer(testWriter()) 
	  .readerIsTransactionalQueue() //数据重读
    .build();

By default, if the error does not occur in the Reader stage, then there is no need to re-read the data again. However, in some scenarios, the Reader part also needs to be re-executed. For example, Reader consumes messages from a JMS queue. When a rollback occurs, the message will also be replayed on the queue. Therefore, the Reader should also be included in the rollback. According to this scenario, you can use readerIsTransactionalQueue to configure data rereading

step transaction attribute

	//配置事物属性
	DefaultTransactionAttribute attribute = new DefaultTransactionAttribute();
	attribute.setPropagationBehavior(Propagation.REQUIRED.value());
	attribute.setIsolationLevel(Isolation.DEFAULT.value());
	attribute.setTimeout(30);

	return this.stepBuilderFactory.get("testStep")
			.<String, String>chunk(int)
  		.reader(testReader()) 
 			.processor(testProcessor)
    	.writer(testWriter()) 
			.transactionAttribute(attribute) //设置事物属性
			.build();

step sequence execution

insert image description here

jobBuilderFactory.get("job")
		.start(stepA())
		.next(stepB()) //顺序执行
		.next(stepC())
		.build();

step conditional execution

insert image description here

jobBuilderFactory.get("job")
				.start(stepA()) //启动时执行的step
				.on("*").to(stepB()) //默认跳转到stepB
				.from(stepA()).on("FAILED").to(stepC()) //当返回的ExitStatus为"FAILED"时,执行。
				.end()
				.build();

EndExit

By default (end and fail methods are not used), the Job will be executed sequentially until it exits, which is called end. At this time, BatchStatus=COMPLETED and ExitStatus=COMPLETED indicate successful execution. In addition to step chain processing to exit naturally, you can also explicitly call end to exit the system.

jobBuilderFactory.get("job")
		.start(step1()) //启动
		.next(step2()) //顺序执行
		.on("FAILED").end()
		.from(step2()).on("*").to(step3()) //条件执行
		.end()
		.build();

Step1 to step2 are executed sequentially. When the exitStatus of step2 returns "FAILED", it will exit directly with End. In other cases, perform Step3.

Fail exit

In addition to end, you can also use fail to exit. At this time, BatchStatus=FAILED and ExitStatus=EARLY TERMINATION indicate that the execution failed. The biggest difference between this state and End is that the Job will try to restart and execute a new JobExecution. See the following code example:

@Bean
public Job job() {
    
    
	return this.jobBuilderFactory.get("job")
			.start(step1()) //执行step1
			.next(step2()).on("FAILED").fail() //step2的ExitStatus=FAILED 执行fail
			.from(step2()).on("*").to(step3()) //否则执行step3
			.end()
			.build();
}

@StepScope and @JobScope

@StepScope

The Spring Batch framework only needs to instantiate the job and the corresponding bottom-level processing units (ItemReader, ItemProcessor, ItemWriter, Tasklet) only during batch processing, and the running parameters after the job start cannot be modified once they are determined.

In order to enable the parameters of the processing unit to be dynamically modified each time the job is started (for example, the parameter birthDate="20210101" when the job is started for the first time, and the parameter is changed to birthDate="20210102" when the job is started for the second time). Therefore, @StepScope is designed to cooperate with @Value(“#{jobParameters['birthDate']}”) to obtain the required parameters from the job startup parameters.

Precautions for using @StepScope annotation:

It can only be used in the method of the lowest processing unit (ItemReader, ItemProcessor, ItemWriter, Tasklet), and used with @Bean.

The bean modified by the @StepScope annotation will only be initialized when the Step is loaded, and will be destroyed after the Step processing is completed (that is to say, the life cycle of the bean modified by the @StepScope annotation is synchronized with the life cycle of the Step)

@JobScope

The concept of Job Scope is similar to that of Step Scope, which is used to identify and add and inject Beans after a certain execution time period.

@JobScope is used to inform the framework that the corresponding @Bean is initialized when the JobInstance exists

@JobScope
@Bean
// 初始化获取 jobParameters中的参数
public FlatFileItemReader flatFileItemReader(@Value("#{jobParameters[input]}") String name) {
    
    
	return new FlatFileItemReaderBuilder<Foo>()
			.name("flatFileItemReader")
			.resource(new FileSystemResource(name))
			...
}

@JobScope
@Bean
// 初始化获取jobExecutionContext中的参数
public FlatFileItemReader flatFileItemReader(@Value("#{jobExecutionContext['input.name']}") String name) {
    
    
	return new FlatFileItemReaderBuilder<Foo>()
			.name("flatFileItemReader")
			.resource(new FileSystemResource(name))
			...
}

Listener listener

Spring Batch provides a variety of listeners for triggering our logic code during task processing. For details, please refer to the following table:

listener Specific instructions
JobExecutionListener Triggered before the Job starts (beforeJob) and after (afterJob)
StepExecutionListener Triggered before the Step starts (beforeStep) and after (afterStep)
ChunkListener Triggered before the start of the Chunk (beforeChunk), after (afterChunk) and after the error (afterChunkError)
ItemReadListener Triggered before the Read starts (beforeRead>, after (afterRead) and after an error (onReadError)
ItemProcessListener Triggered before the Processor starts (beforeProcess), after (afterProcess) and after an error (onProcessError)
ItemWriteListener Triggered before the Writer starts (beforeWrite), after (afterWrite) and after an error (onWriteError)
SkipListener At the time of Skip(reder), at the time of Skip(writer), at the time of Skip(processor)

Execution order of Listeners

  • JobExecutionListener.beforeJob()
  • StepExecutionListener.beforeStep()
  • ChunkListener.beforeChunk()
  • ItemReaderListener.beforeReader()
  • ItemReaderListener.afterReader()
  • ItemProcessListener.beforeProcess()
  • ItemProcessListener.afterProcess()
  • ItemWriterListener.beforeWriter()
  • ItemWriterListener.afterWriter()
  • ChunkListener.afterChunk()
  • StepExecutionListener.afterStep()
  • JobExecutionListener.afterJob()

It should be noted that if multiple Listeners are defined at the same level, for example, a StepListener is defined in the Parent Step, and a StepListener is defined by itself. If you need to execute both, you need to add the merge attribute to the listener itself.

Get the JobParameters parameter

In order to read the parameters in JobParameters in ItemReader, ItemWriter, and ItemProcessor, there are three methods:

  1. Use @BeforeStep annotation
@Component
@StepScope
public class PersonItemProcessor implements ItemProcessor<Person, Person> {
    
    
  
	JobParameters jobParameters;
  
	@BeforeStep
	public void beforeStep(final StepExecution stepExecution) {
    
    
	    jobParameters = stepExecution.getJobParameters();
	    log.info("jobParameters: {}", jobParameters);
	}
  1. Implement the StepExecutionListener interface
@Component
@StepScope
public class MyStepExecutionListener implements StepExecutionListener {
    
    
  
  private StepExecution stepExecution;

  @Override
  public void beforeStep(StepExecution stepExecution) {
    
    
     this.stepExecution = stepExecution;
     jobParameters = stepExecution.getJobParameters();
  }
}
  1. Define the corresponding Bean as @StepScope, and then inject it directly with @Value
@Configuration
@EnableBatchProcessing
public class BatchConfiguration {
    
    

	@Autowired
	public JobBuilderFactory jobBuilderFactory;

	@Autowired
	public StepBuilderFactory stepBuilderFactory;
  
	@Bean 
  @StepScope
	public FlatFileItemReader<Person> reader() {
    
    
		return new FlatFileItemReaderBuilder<Person>()
			.name("personItemReader")
			.resource(new ClassPathResource("sample-data.csv"))
			.delimited()
			.names(new String[]{
    
    "firstName", "lastName"})
			.fieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {
    
    {
    
    
				setTargetType(Person.class);
			}})
			.build();
	}

	@Bean
  @StepScope
	public PersonItemProcessor processor() {
    
    
		return new PersonItemProcessor();
	}

	@Bean 
  @StepScope
	public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
    
    
		return new JdbcBatchItemWriterBuilder<Person>()
			.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
			.sql("...")
			.dataSource(dataSource)
			.build();
	}

	@Bean("importUserJob") //这个不要定义StepScope
	public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
    
    
		return jobBuilderFactory.get("importUserJob")
			.incrementer(new RunIdIncrementer())
			.listener(listener)
			.flow(step1)
			.end()
			.build();
	}

	@Bean //这个不要定义StepScope
	public Step step1(JdbcBatchItemWriter<Person> writer) {
    
    
  
  }

It is important to note that ItemReader, ItemWriter, and ItemProcessor must be declared as StepScope, while Job and Step themselves cannot be declared as StepScope,
and then parameters can be directly injected into ItemReader, ItemWriter, and ItemProcessor:

public class PersonItemProcessor implements ItemProcessor<Person, Person> {
    
    
  
	@Value("#{jobParameters['dataUnitId']}")
	private Long dataUnitId;

}

Reference knowledge:

Configuration Step of Spring Batch Tutorial

Guess you like

Origin blog.csdn.net/ToBeMaybe_/article/details/130200264
Recommended