Spring BatchSprings

Introduction to Spring Batch

What is Spring Batch?

Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. It also provides more advance technical services and features that will enable extremely high-volume and high performance batch jobs though optimization and partitioning techniques. Simple as well as complex, high-volume batch jobs can leverage the framework in a highly scalable manner to process significant volumes of information.

1. Spring Batch Architecture

The application contains all batch jobs and custom code written by developers using Spring Batch. The Batch Core contains the core runtime classes necessary to launch and control a batch job. It includes things such as a JobLauncher, Job, and Step implementations. Both Application and Core are built on top of a common infrastructure. This infrastructure contains common readers and writers, and services such as the RetryTemplate, which are used both by application developers(ItemReader and ItemWriter) and the core framework itself.

2. Job

A Job is an entity that encapsulates an entire batch process. As is common with other Spring projects, a Job will be wired together via an XML configuration file or Java based configuration. Job is simply a container for Steps. It combines multiple steps that belong logically together in a flow and allows for configuration of properties global to all steps, such as restartability.

 

3. Item Reader

ItemReader is an abstraction that feeds in input for a Step, one item at a time. When the ItemReader completes reading all input records, it indicate this by returning null. ItemReader is the means for providing data from many different types of input.

Some of Different input types are:

  • Flat File: Flat File Item Readers read lines of data from a flat file that typically describe records with fields of data defined by fixed positions in the file or delimited by some special character .
  • XML: XML ItemReaders process XML independently of technologies used for parsing, mapping and validating objects.
  • Database: SQL ItemReaders read data from database using RowMapper and return resultsets.

4. Item Writer

ItemWriter is an abstraction that represents the output of a Step, one batch or chunk of items at a time. ItemWriter writes out, rather than reading in. Items writer is used for writing data to output source, it accepts List of objects as input.

5. Item Processor

ItemProcessor is an abstraction that provides transformation of the data and the place used to write business logic for data transformation. If, while processing the item, it is determined that the item is not valid, returning null indicates that the item should not be written out

A simple example of Job with ItemReader, ItemProcessor and ItemWriter is:

 

All the read and process happens in chunk based, means IteamReader will read and send to ItemProcessor for processing records one by one.

Commit-interval indicates that write should happen only when record size is 5. Once record size reaches 5 it will send to ItemWriter for writing in data to output. This helps in controlling number of I/O operation performed for writing data to output.

 

6. Spring batch Listeners

Often it is required to do some operation before starting any Job or step or after the Job or step has finished, for example to obtain some connection object or to do some clean up at the end of Job or Step. Listener comes in handy in these scenarios.

a. Job Execution Listener

Job Listeners allow us to intercept Job execution and to execute some custom code or logic as needed.

 

To configure Job Listener:

 

b. Step Execution Listener

Step Listeners allow us to intercept Step execution and to execute some custom code or logic as needed.

 

To configure Step listener:

 

7. JobRepository

JobRepository is used for basic CRUD operations within Spring Batch for various operations, such as JobExecution and StepExecution. It stores all the meta information about the job and its execution.

    • In-Memory Job Repository (org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean): Spring batch provides an in-memory Map version of the job repository, this helps in scenarios where you don’t want to persist your domain objects to the database. One reason may be speed; storing domain objects at each commit point takes extra time. Another reason may be that you just don’t need to persist status for a particular job.
      Note that the in-memory repository is volatile and so does not allow restart between JVM instances. It also cannot guarantee that two job instances with the same parameters are launched simultaneously, and is not suitable for use in a multi-threaded Job, or a locally partitioned Step.

 

    • Database Type Repository (org.springframework.batch.core.repository.support.JobRepositoryFactoryBean): JobRepositoryFactoryBean uses JDBC DAO implementations which persist batch metadata in database. It writes all metadata information into default table with prefix “BATCH_”. We can also change this default prefix with below configurations.

 

 

 

8. TaskExecutors

  • SyncTaskExecutor: Implemets TaskExecutor implementation that executes each task synchronously in the calling thread. (One by One)
  • ThreadPoolTaskExecutor:This class implements Spring’s TaskExecutor interface as well as the Executor interface. All the threads are managed by spring

9. Partitioning : Master-Slave

Spring batch Partitioning acts as Master Slave, where master is Partitioner and slave is batch step. Slave in above picture is identical Step, and each step runs separately.

Partitioner: Partitioner has responsibility to generate execution contexts as input parameters for new step executions only. The return value from this method has unique name for each step execution. Also we will bind the data for each thread into ExecutionContext Map with key as String and Value as ExecutionContext.

 

Master:

 

Similar to the multi-threaded step’s throttle-limit attribute, the grid-size attribute prevents the task executor from being saturated with requests from a single step.

Slave:

 

Each step is a tasklet for chunk based processing where each steps has reader, writer and processor.

10. Running a Job

We can Spring batch job in 2 ways:

  1. Command Line job runner
  2. Running within Web container

However, if running from within a web container within the scope of an HttpRequest, there will usually be one JobLauncher, configured for asynchronous job launching, that multiple requests will invoke to launch their jobs.

Running Jobs with Command Line:

Spring Batch provides CommandLineJobRunnerimplementation if you want to run the Job from Command Line. If launching a job from the command line, a new JVM will be instantiated for each Job, and thus every job will have its own JobLauncher. Users that want to run their jobs from an enterprise scheduler, the command line is the primary interface.

The CommandLineJobRunner performs four tasks:

  • Load the appropriate ApplicationContext
  • Parse command line arguments into JobParameters
  • Locate the appropriate job based on arguments
  • Use the JobLauncher provided in the application context to launch the job.

Running Jobs from within a Web Container:

To run a Spring batch job from within Web container we need to create a MVC controller and the request is initiated through HttpRequest. The controller launches a Job using a JobLauncher that has been configured to launch asynchronously, which immediately returns a JobExecution. The Job will likely still be running, however, this nonblocking behaviour allows the controller to return immediately, which is required when handling an HttpRequest.

 

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.