Friday, 12 December 2014

Springing Into Batch Job

Spring Batch

A batch job is a software job that processes large volume of data or records in a short span of time without any human intervention. As such batch job plays a crucial role in the functioning of many big enterprise applications. Batch job is there in the software industry since the beginning and I am sure it will be there till the end. Some examples of batch jobs are indexing files, updating the inventory catalog for an online store etc.

Spring Batch is a module of popular Spring Framework which enables writing batch applications / jobs easily and in an effective manner. Before Spring Batch hit the market in the year 2007, industry heavyweights used to write batch applications using their own proprietary technologies and legacy frameworks. Apart from defining the standard, Spring Batch provided some ready-to-use components and all the advantages of Spring framework like POJO, dependency injections etc. to write complex batch applications.



Example

To show how to write a simple batch job using Spring Batch, let us consider the following scenario. An apparel company ABC Ltd. has started its operations with its head office in Mumbai and has targeted to open branch offices in other parts of the country to expand its business in the coming days. So it has ramped up workforce in Mumbai office. Suppose in the first year of operation, the total no of employees is 1000. Finance Department maintains payroll info of all the employees in a flat file. At the end of the financial year, it has a mammoth task of calculating income tax of individual employees.

Now we will write a batch application that will be used by ABC Ltd. head office Finance Department to process these employee records, calculate taxes and store these in a central database.

Softwares / Tools

1. Spring 4.1.0
2. Spring Batch 3.0.1
3. Hibernate 4.3.6
4. MySQL 5.5
5. JDK 8
6. Eclipse Luna 4.4.1
7. Maven 3.0

Steps

  • Create a Maven project in eclipse by selecting File -> New -> Maven Project. Name the project as 'BatchApp'. The project structure of the BatchApp will look like the figure given below.

    Maven Project Structure
    Maven Project Structure of the BatchApp
  • Modify the generated POM file as follows.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com</groupId>
    <artifactId>BatchApp</artifactId>
    <version>0.0.1-SNAPSHOT</version>
 
    <properties>
         <spring-framework.version>4.1.0.RELEASE</spring-framework.version>
        <spring.batch.version>3.0.1.RELEASE</spring.batch.version>
        <hibernate.version>4.3.6.Final</hibernate.version>
        <mysql.driver.version>5.1.25</mysql.driver.version>
    </properties>
   
    <dependencies>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-context</artifactId>
            <version>${spring-framework.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-orm</artifactId>
            <version>${spring-framework.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.batch</groupId>
            <artifactId>spring-batch-core</artifactId>
            <version>${spring.batch.version}</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>${mysql.driver.version}</version>
        </dependency>
        <dependency>
            <groupId>org.hibernate</groupId>
            <artifactId>hibernate-entitymanager</artifactId>
            <version>${hibernate.version}</version>
        </dependency>
    </dependencies>   
   
</project>
  • Below is given the job configuration file, job.xml which is located in '/main/resources/config' folder. This is the core file of any Spring Batch job. It gives important info like what the job is, what steps it contains and what are the various components (e.g., readers, processors, writers etc.) involved.
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:batch="http://www.springframework.org/schema/batch"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.1.xsd
                        http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-3.0.xsd">
   
    <import resource="context.xml" />
            
    <batch:job id="taxCalculator">
      <batch:step id="step1">
        <batch:tasklet>
            <batch:chunk reader="fileReader" processor="itemProcessor"
                            writer="dbWriter" commit-interval="5">
            </batch:chunk>
        </batch:tasklet>
      </batch:step>     
    </batch:job>

    <bean id="fileReader" class="org.springframework.batch.item.file.FlatFileItemReader">
        <property name="resource" value="classpath:input/salary.csv" />
         <property name="lineMapper" ref="lineMapper"/>
     </bean>
    
     <bean id="lineMapper" class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
        <property name="lineTokenizer">
            <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
                <property name="names" value="id,name,salary" />
            </bean>
        </property>
        <property name="fieldSetMapper">
            <bean class="com.
programnplay.batch.EmployeeMapper" />
        </property>
    </bean>
   
     <bean id="itemProcessor" class="com.
programnplay.batch.TaxProcessor"/>
    <bean id="dbWriter" class="org.springframework.batch.item.database.HibernateItemWriter">
        <property name="sessionFactory" ref="sessionFactory"/>
    </bean>
   
</beans>


As you can see from the above file, here job configured is 'taxCalculator'. It consists of a simple step which reads data in chunk through  'fileReader', then processes data through 'itemProcessor' and finally writes data into database through 'dbWriter'. Size of the chunk is 5 as specified by the 'commit-interval' attribute. 

The reader is a FlatFileItemReader, a ready-made component provided by Spring Batch to read data from a flat file. Similarly we have used another component HibernateItemWriter provided by Spring Batch to write data into database using hibernate ORM framework. The flat file we have used here to read employee information is salary.csv that contains employee data in the following format. It is located in '/main/resources/input' folder.

001,Rajiba,200000
002,Rahul,340000
003,Bikash,600000
004,Rama,220000
005,Hari,300000

............................
............................
The file context.xml, which is imported by job.xml is described below.

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:jdbc="http://www.springframework.org/schema/jdbc"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.1.xsd
                        http://www.springframework.org/schema/jdbc http://www.springframework.org/schema/jdbc/spring-jdbc-4.1.xsd">

    <bean id="jobRepository" class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean">
        <property name="dataSource" ref="dataSource" />
        <property name="transactionManager" ref="transactionManager" />
        <property name="databaseType" value="mysql" />
    </bean>

    <bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
        <property name="jobRepository" ref="jobRepository" />
    </bean>
   
    <bean id="transactionManager" class="org.springframework.orm.hibernate4.HibernateTransactionManager" >
        <property name="sessionFactory" ref="sessionFactory"/>
    </bean>
   
    <bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
        <property name="driverClassName" value="com.mysql.jdbc.Driver" />
        <property name="url" value="jdbc:mysql://localhost:3306/test" />
        <property name="username" value="abc" />
        <property name="password" value="abc" />
    </bean>
   
    <bean id="sessionFactory" class="org.springframework.orm.hibernate4.LocalSessionFactoryBean">
        <property name="dataSource" ref="dataSource" />
        <property name="annotatedClasses">
            <list>
                <value>com.
programnplay.batch.Employee</value>
            </list>
        </property>
    </bean>

    <jdbc:initialize-database data-source="dataSource">
        <jdbc:script location="org/springframework/batch/core/schema-drop-mysql.sql" />
        <jdbc:script location="org/springframework/batch/core/schema-mysql.sql" />
    </jdbc:initialize-database>

</beans>

 
This file defines jobRepository, jobLauncher, sessionFactory, transactionManager beans. These are the beans required by Spring Batch to launch a batch job.
  • Next we will create Employee class that stores id, name, salary, tax of an individual employee. This class is a simple entity with four attributes and corresponding setter and getter methods.
package com.programnplay.batch;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Id;

@Entity
public class Employee {
   
    @Id
    @Column(name="id")
    private int eid;
    private String name;
    private int salary;
    private int tax;
   
    public int getEid() {
        return eid;
    }
    public void setEid(int eid) {
        this.eid = eid;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public int getSalary() {
        return salary;
    }
    public void setSalary(int salary) {
        this.salary = salary;
    }
    public int getTax() {
        return tax;
    }
    public void setTax(int tax) {
        this.tax = tax;
    }
}

  • EmployeeMapper class is used to map the fields in flat file to corresponding attributes of Employee entity.
package com.programnplay.batch;

import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.FieldSet;
import org.springframework.validation.BindException;

import com.
programnplay.batch.Employee;

public class EmployeeMapper implements FieldSetMapper<Employee> {

    public Employee mapFieldSet(FieldSet fieldSet) throws BindException {
        Employee emp = new Employee();
        emp.setEid(fieldSet.readInt(0));
        emp.setName(fieldSet.readString(1));
        emp.setSalary(fieldSet.readInt(2));
        return emp;
    }
}

  •  Let us now write the processor class, TaxProcessor to calculate tax of an individual employee.
package com.programnplay.batch;

import org.springframework.batch.item.ItemProcessor;

import com.
programnplay.batch.Employee;

public class TaxProcessor implements ItemProcessor<Employee, Employee> {

    public Employee process(Employee emp) throws Exception {
        int sal, tax;
        System.out.println("Calculating tax of..." + emp.getName());
        sal = emp.getSalary();
        System.out.println("Salary="+sal);
        if(sal > 0 && sal <= 300000)
            tax = (int)(0.1 * sal);
        else if(sal > 300000 && sal <= 500000)
            tax = (int)(0.2 * sal);
        else
            tax = (int)(0.3 * sal);       
        System.out.println("Setting tax of "+emp.getName()+" to "+tax);
        emp.setTax(tax);   
        return emp;
    }
}


This processor calculates tax based on following formula: if salary is less than or equal to Rs300000, tax will be 10% of salary and if it is greater than Rs300000 and less than or equal to Rs500000, tax will be 20% of salary, else it will be 30% of salary. 

Readers can write their own processing logic in this class depending on what they want their batch job to do or process.
  • The database table that corresponds to Employee entity has following structure. So create a table Employee in your database.
Field        Type      Null     Key    
-------     ---------  ---------  -------
Id             int(3)        NO      PRI                        
Name    varchar(30) YES                   
Salary      int(8)         YES               
Tax          int(5)         YES     
       

    With these 3 java classes, job configuration file, a flat file to read data and the database to write data, we have defined our Spring batch job 'taxCalculator'. Next we have to run the batch job through a client or test it through JUnit.
    • Below is the client class to start batch job that we have just finished creating.
    package com.programnplay;

    import org.springframework.batch.core.Job;
    import org.springframework.batch.core.JobExecution;
    import org.springframework.batch.core.JobParameters;
    import org.springframework.batch.core.launch.JobLauncher;
    import org.springframework.context.ApplicationContext;
    import org.springframework.context.support.ClassPathXmlApplicationContext;

    public class MainApp {

        public static void main(String[] args) {
            ApplicationContext context = new ClassPathXmlApplicationContext("config/job.xml");
            JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
            Job job = (Job) context.getBean("taxCalculator");
           
            try {
                JobExecution execution = jobLauncher.run(job, new JobParameters());
                System.out.println("Exit Status : " + execution.getStatus());
            } catch (Exception e) {
                e.printStackTrace();
            }
            System.out.println("Done");
        }
    }

     
    If we run the above client, we will get the following log in the console. Ensure that it displays Exit Status as COMPLETED. Otherwise, you have gone wrong somewhere in writing your batch application.

    Running Log
    Sample Running Log of BatchApp
    • Now our database will be updated with the employee info including the tax an individual has to pay. A sample screenshot of Employee table is given below. 
     
    Table with Updated Info
    MySQL Table with updated Tax Info

      That is all about developing a simple Spring Batch job. Depending on the complexity, processing logic, type or source of data we have to read/write, we have to include more steps, more ready-to-use components in our batch job. Just imagine how much code you would have to write, how much I/O operations you would have to do, how much time you would have to spend for the above simple batch job if there were no Spring Batch

      Looking for how to scale up a batch job when there are thousands of records divided into multiple files, then read my post here. Please do leave your feedback in the comment box below. Catch you here very soon.

          3 comments:

          1. Nice article! Have published the code? That would be very nice!

            ReplyDelete
            Replies
            1. Thanks a lot Alexander for your comment. I don't know what you mean by 'publishing code'. All the codes are there in the post. So you can easily refer it and develop your own application.

              Delete