Summary: GigaSpaces Spring Batch Elastic Processing Unit Author: Shay Hassidim, Deputy CTO, GigaSpaces Recently tested with GigaSpaces version: XAP 8.0. Spring Batch 2.1.6. Last Update: March 2011
Batch processing involves usually complex flows using conditional or sequential steps. This involves relatively large CPU cycles and IO access. In such case the data access time required for the processing is relatively small compared to the processing/IO activities duration time.
To allow batch processing systems to leverage available resources on the network/cloud, the batch processing system should be able to scale in a dynamic manner across multiple machines.
The GigaSpaces Spring Batch PU provides:
Enhanced performance:
Distributed parallel processing.
Distributed Task execution partitioning.
In-memory distributed state management.
Management and Monitoring:
Task execution queuing.
Distributed Deployment environment.
Continuous High-Availability.
Scalability
Elastic and Dynamic scalability of the Spring batch PU instances.
Spring Batch Introduction
Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems.
Spring Batch builds upon the productivity, POJO-based development approach, and general ease of use capabilities people have come to know from the Spring Framework, while making it easy for developers to access and leverage more advance enterprise services when necessary.
Spring Batch is not a scheduling framework.
There are many good enterprise schedulers available in both the commercial and open source spaces such as Quartz, Tivoli, Control-M, etc.
It is intended to work in conjunction with a scheduler, not replace a scheduler.
Spring Batch provides reusable functions that are essential in processing large volumes of records, including:
Logging, Tracing, Transaction management, Job processing statistics, Job restart, Job skip, resource management, etc.
The Spring Batch runtime environment includes the following main components:
A typical batch program generally reads a large number of records from a database, file, or queue, processes the data in some fashion, and then writes back data in a modified form. Spring Batch automates this basic batch iteration, providing the capability to process similar transactions as a set, typically in an offline environment without any user interaction. Batch jobs are part of most IT projects and Spring Batch is the only open source framework that provides a robust, enterprise-scale solution.
GigaSpaces Spring Batch PU
In GigaSpaces XAP, you can implement the Master-Worker pattern using several methods:
Task Executors - Best for scenarios where the processing activity is collocated with the data. Designed for low latency situations.
Polling Containers - A remote consumer/worker component used with simple processing scenarios.
Spring Batch - A remote consumer/worker component used with complex processing scenarios. Leveraging Spring Bach Framework to manage flows.
Polling Container and Spring Batch approach should be used when the processing activity consumes relatively large amount of CPU and takes a large amount of time. It is also relevant if the actual data required for the processing is not stored within the space, or the time it takes to retrieve the required data from the space is much shorter than the time it takes to complete the processing.
GigaSpaces Spring Batch PU Architecture
The Spring Batch PU encapsulates all the required components to run a Spring Batch instance:
The Spring Batch PU supports the Round Robin Workers mode and the Dedicated Workers mode.
Round Robin Worker
With the Round Robin Worker mode a Spring Batch PU instance will be consuming requests from all the space partitions in round robin manner.
Dedicated Worker
With the Dedicated Worker mode a Spring Batch PU instance will be consuming requests from a dedicated specific space partition.
The Spring Batch PU Implementation
The Spring Batch PU implementation includes the following components:
Components
Description
Space
A space proxy used by the SpaceItemReader, SpaceItemProcessor and SpaceItemWriter to consume Requests and send back Results.
ItemRequest
A Request class. Generated by the Master and consumed by the SpaceItemReader.
ItemResult
A Result class. Generated by the SpaceItemProcessor and consumed by the Master.
Create a new folder named SpringBatchPU under the gigaspaces-xap-premium\deploy folder.
Copy the example bin folder content into the gigaspaces-xap-premium\deploy\SpringBatchPU folder.
Spring Batch PU libraries
Include the following libraries with the Spring Batch PU lib folder:
spring-batch-core-2.1.6.RELEASE.jar
spring-batch-infrastructure-2.1.6.RELEASE.jar
spring-batch-test-2.1.6.RELEASE.jar
antlr-2.7.6.jar
asm-1.5.3.jar
asm-attrs-1.5.3.jar
cglib-2.1_3.jar
common-1.0-SNAPSHOT.jar
commons-collections-2.1.1.jar
commons-dbcp-1.2.1.jar
commons-pool-1.2.jar
dom4j-1.6.1.jar
ehcache-1.2.3.jar
geronimo-spec-jta-1.0.1B-rc4.jar
hibernate-3.2.6.ga.jar
hibernate-annotations-3.2.1.ga.jar
hsqldb-1.8.0.7.jar
persistence-api-1.0.jar
To speed up the Spring Batch deploy time you should copy these libraries into the \gigaspaces-xap-premium\lib\optional\pu-common folder.
Set Deploy Tool Classpath
Add the spring-batch-core-2.1.6.RELEASE.jar to the deploy tool (GS-UI or gs CLI) CLASSPATH.
You may do that by running the following prior calling the deploy command:
set PRE_CLASSPATH=C:\gigaspaces-xap-premium\deploy\SpringBatchPU\lib\spring-batch-core-2.1.6.RELEASE.jar
Deploy the Space
Deploy a space call mySapce. You may deploy a single space or a space in a partitioned topology.
Found 1 GSMs
Deploying [datagrid] with name [mySpace] under groups [gigaspaces-8.0.0-XAPPremium-ga] and locators []
SLA Not Found in PU. Using Default SLA.
Overrding SLA cluster schema with [partitioned-sync2backup]
Overrding SLA numberOfInstances with [2]
Overrding SLA numberOfBackups with [0]
Waiting for [2] processing unit instances to be deployed...
[mySpace] [1] deployed successfully on [127.0.0.1]
[mySpace] [2] deployed successfully on [127.0.0.1]
Finished deploying [2] processing unit instances
Deploy the Spring Batch PU
Deploy the Spring Batch PU using the GS-UI or the CLI.
gs deploy -cluster total_members=2 SpringBatchPU
Here is the expected output:
Found 1 GSMs
Deploying [SpringBatchPU] with name [SpringBatchPU] under groups [gigaspaces-8.0
.0-XAPPremium-ga] and locators []
SLA Not Found in PU. Using Default SLA.
Overrding SLA numberOfInstances with [2]
Overrding SLA numberOfBackups with [null]
Waiting for [2] processing unit instances to be deployed...
[SpringBatchPU] [1] deployed successfully on [127.0.0.1]
[SpringBatchPU] [2] deployed successfully on [127.0.0.1]
Finished deploying [2] processing unit instances
Run the Master
To run the master execute the following:
java com.gigaspaces.springbatch.Master
The Master will write 100 Request objects with a specific Job ID into the space and will wait for 100 Result objects with the relevant Job ID. This cycle will repeat itself 10 times.