What’s new in Spring Batch 5.2

Dependencies upgrade

In this release, the Spring dependencies are upgraded to the following versions:spring-doc.cn

MongoDB job repository support

This release introduces the first NoSQL job repository implementation which is backed by MongoDB. Similar to relational job repository implementations, Spring Batch comes with a script to create the necessary collections in MongoDB in order to save and retrieve batch meta-data.spring-doc.cn

This implementation requires MongoDB version 4 or later and is based on Spring Data MongoDB. In order to use this job repository, all you need to do is define a MongoTemplate and a MongoTransactionManager which are required by the newly added MongoDBJobRepositoryFactoryBean:spring-doc.cn

@Bean
public JobRepository jobRepository(MongoTemplate mongoTemplate, MongoTransactionManager transactionManager) throws Exception {
	MongoJobRepositoryFactoryBean jobRepositoryFactoryBean = new MongoJobRepositoryFactoryBean();
	jobRepositoryFactoryBean.setMongoOperations(mongoTemplate);
	jobRepositoryFactoryBean.setTransactionManager(transactionManager);
	jobRepositoryFactoryBean.afterPropertiesSet();
	return jobRepositoryFactoryBean.getObject();
}

Once the MongoDB job repository defined, you can inject it in any job or step as a regular job repository. You can find a complete example in the MongoDBJobRepositoryIntegrationTests.spring-doc.cn

New resourceless job repository

In v5, the in-memory Map-based job repository implementation was removed for several reasons. The only job repository implementation that was left in Spring Batch was the JDBC implementation, which requires a data source. While this works well with in-memory databases like H2 or HSQLDB, requiring a data source was a strong constraint for many users of our community who used to use the Map-based repository without any additional dependency.spring-doc.cn

In this release, we introduce a JobRepository implementation that does not use or store batch meta-data in any form (not even in-memory). It is a "NoOp" implementation that throws away batch meta-data and does not interact with any resource (hence the name "resourceless job repository", which is named after the "resourceless transaction manager").spring-doc.cn

This implementation is intended for use-cases where restartability is not required and where the execution context is not involved in any way (like sharing data between steps through the execution context, or partitioned steps where partitions meta-data is shared between the manager and workers through the execution context, etc).spring-doc.cn

This implementation is suitable for one-time jobs executed in their own JVM. It works with transactional steps (configured with a DataSourceTransactionManager for instance) as well as non-transactional steps (configured with a ResourcelessTransactionManager). The implementation is not thread-safe and should not be used in any concurrent environment.spring-doc.cn

Composite Item Reader implementation

Similar to the CompositeItemProcessor and CompositeItemWriter, we introduce a new CompositeItemReader implementation that is designed to read data sequentially from several sources having the same format. This is useful when data is spread over different resources and writing a custom reader is not an option.spring-doc.cn

A CompositeItemReader works like other composite artifacts, by delegating the reading operation to regular item readers in order. Here is a quick example showing a composite reader that reads persons data from a flat file then from a database table:spring-doc.cn

@Bean
public FlatFileItemReader<Person> itemReader1() {
    return new FlatFileItemReaderBuilder<Person>()
            .name("personFileItemReader")
            .resource(new FileSystemResource("persons.csv"))
            .delimited()
            .names("id", "name")
            .targetType(Person.class)
            .build();
}

@Bean
public JdbcCursorItemReader<Person> itemReader2() {
    String sql = "select * from persons";
    return new JdbcCursorItemReaderBuilder<Person>()
            .name("personTableItemReader")
            .dataSource(dataSource())
            .sql(sql)
            .beanRowMapper(Person.class)
            .build();
}

@Bean
public CompositeItemReader<Person> itemReader() {
    return new CompositeItemReader<>(Arrays.asList(itemReader1(), itemReader2()));
}

New adapters for java.util.function APIs

Similar to FucntionItemProcessor that adapts a java.util.function.Function to an item processor, this release introduces several new adapters for other java.util.function interfaces like Supplier, Consumer and Predicate.spring-doc.cn

The newly added adapters are: SupplierItemReader, ConsumerItemWriter and PredicateFilteringItemProcessor. For more details about these new adapters, please refer to the org.springframework.batch.item.function package.spring-doc.cn

Concurrent steps with blocking queue item reader and writer

The staged event-driven architecture (SEDA) is a powerful architecture style to process data in stages connected by queues. This style is directly applicable to data pipelines and easily implemented in Spring Batch thanks to the ability to design jobs as a sequence of steps.spring-doc.cn

The only missing piece here is how to read and write data to intermediate queues. This release introduces an item reader and item writer to read data from and write it to a BlockingQueue. With these two new classes, one can design a first step that prepares data in a queue and a second step that consumes data from the same queue. This way, both steps can run concurrently to process data efficiently in a non-blocking, event-driven fashion.spring-doc.cn

Query hints support in JPA item readers

Up until version 5.1, the JPA cursor and paging item readers did not support query hints (like the fetch size, timeout, etc). Users were required to provide a custom query provider in order to specify custom hints.spring-doc.cn

In this release, JPA readers and their respective builders were updated to accept query hints when defining the JPA query to use.spring-doc.cn

Data class support in JDBC item readers

This release introduces a new method in the builders of JDBC cursor and paging item readers that allows users to specify a DataClassRowMapper when the type of items is a data class (Java record or Kotlin data class).spring-doc.cn

The new method named dataRowMapper(TargetType.class) is similar to the beanRowMapper(TargetType.class) and is designed to make the configuration of row mappers consistent between regular classes (Java beans) and data classes (Java records).spring-doc.cn

Configurable line separator in RecursiveCollectionLineAggregator

Up until now, the line separator property in RecursiveCollectionLineAggregator was set to the System’s line separator value. While it is possible to change the value through a System property, this configuration style is not consistent with other properties of batch artifacts.spring-doc.cn

This release introduces a new setter in RecursiveCollectionLineAggregator that allows users to configure a custom value of the line separator without having to use System properties.spring-doc.cn

Job registration improvements

In version 5.1, the default configuration of batch infrastructure beans was updated to automatically populate the job registry by defining a JobRegistryBeanPostProcessor bean in the application context. After a recent change in Spring Framework that changed the log level in BeanPostProcessorChecker, several warnings related to the JobRegistryBeanPostProcessor were logged in a typical Spring Batch application. These warnings are due to the JobRegistryBeanPostProcessor having a dependency to a JobRegistry bean, which is not recommended and might cause bean lifecycle issues.spring-doc.cn

These issues have been resolved in this release by changing the mechanism of populating the JobRegistry from using a BeanPostProcessor to using a SmartInitializingSingleton. The JobRegistryBeanPostProcessor is now deprecated in favor of the newly added JobRegistrySmartInitializingSingleton.spring-doc.cn