How to scan java project to detect Annotations and execute on web application starts.

Java Annotation
Java Annotation

Java annotation is an important feature introduced in Java 1.5. You can see Hello World with Java annotation here. It makes your configuration so easy and execute code in runtime. All the major frameworks from Spring to Hibernate using Annotations effectively to load their framework. In this article we will see how we can create a custom annotation and execute it dynamically at run time using java byte code analysis.

Why we need byte code analyzer?

Well, it is practically resource consuming task to load each and every class using ClassLoader in to memory and analysis it. So the better way is to use java byte code analyzer. Instead of loading class, byte code analyzers takes the generated java byte code. It makes faster and less resource consuming. In this example we will use JavaAssist, where there are other byte code analyzers also available such ASM, Apache Common BCEL etc.

Step 1:

Maven configuration to include dependencies.

<dependency>
            <groupId>javassist</groupId>
            <artifactId>javassist</artifactId>
            <version>3.12.1.GA</version>
 </dependency>

Step 2:

Add a simple annotation

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface initCache {
}

Step 3:

use this Annotation in any of the method, say like


public class Test{
    @initCache
    public void testMe() {
        System.out.println("annotation working fine");
    }
}

Step 4:

Register a ContextAware Listener with the web.xml

<listener>
        <listener-class>
            org.test.init.context.ProjectInitializer.java
        </listener-class>
    </listener>

Step 5:

Write implementation on ProjectInitializer.java

public class MedPlotFrameworkInitializer implements ServletContextListener {
    
    ServletContext context;
    @Override
    public void contextInitialized(ServletContextEvent servletContextEvent) {
        context = servletContextEvent.getServletContext();        
        List<String> urlsList = new ArrayList<String>();
        Set libJars = context.getResourcePaths("/WEB-INF/lib");

        for (Object jar : libJars)
        {
            try
            {

                urlsList.add(context.getRealPath((String)jar));

            }
            catch (Exception e)
            {
                throw new RuntimeException(e);
            }

        }

        execute(urlsList);
        

    }

    @Override
    public void contextDestroyed(ServletContextEvent servletContextEvent) {
       
    }

    public static void execute(List<String> urlList) {


        try {

            for (String url : urlList) {

                JarFile jarFile = new JarFile(url);
                Enumeration<JarEntry> entries = jarFile.entries();
                while (entries.hasMoreElements()) {
                    JarEntry entry = entries.nextElement();
                    if (entry.getName().endsWith(".class")) {
                        InputStream is = jarFile.getInputStream(entry);
                        DataInputStream dstream = new DataInputStream(new BufferedInputStream(is));
                        ClassFile cf = new ClassFile(dstream);
                        List<MethodInfo> methodList = cf.getMethods();
                        for (MethodInfo method : methodList) {
                            AnnotationsAttribute visible = (AnnotationsAttribute) method.getAttribute(AnnotationsAttribute.visibleTag);
                            if (visible != null) {
                                for (Annotation ann : visible.getAnnotations()) {
                                    if (ann.getTypeName().equalsIgnoreCase(THREAD_ANNOTATION_MARKER) || ann.getTypeName().equalsIgnoreCase(CACHE_ANNOTATION_MARKER)) {
                                        Class<?> c = Class.forName(cf.getName());
                                        Object obj = c.newInstance();
                                        Method m = c.getDeclaredMethod(method.getName());
                                        m.invoke(obj, null);
                                        
                                    }

                                }
                            }

                        }

                    }
                }
            }


        } catch (IOException ioe) {
           ioe.printStackTrace();
        } catch (Exception use) {
            use.printStackTrace();
        }

    }
}

This will scan all you lib folder jar files and execute the annotation.

Advertisements

Hadoop vs RDBMS. Where Hadoop scores over RDBMS

Hadoop and RDBMS
Hadoop VS RDBMS

The emerging of Big data:

In the web2.0 era, the amount of data getting generated is reaching over peta bytes of data.Programmers and the Business analyst are looking to analyse the large amount of data to drive the business. The data are key for any business.

There are two important characteristic of big data that causes the challenges.

  • Store data fail safe
  • Process the data faster

1. Store data fail safe

Over the years the storage capacity of a single disk increased considerably to drive storing data. Now 1 TB of hard disk is very normal. But the speed of reading the data from the disks are not cope up with the increase of storage speed. On an average we get only 100MB/s. This is a lot of time to read all the data from the disk,that will take 2 and half hour to read all the data from a disk.

How to increase the reading time: Parallel access:

One way to improve the read process is to read data from multiple disk source, hence the computation will be faster.
The draw back of this approach is we may end up more disks than the actual data size. But the increase of storage space and reduce in prices of those disk made this approach dearer.

What about hardware failure?

It is inevitable that the hardware may fail. So there should be a technique to duplicate data to different storage systems, So even one system fails the other will picks up. So we need an effective file system(HDFS) to store data distributed.

2.Process the data faster:

Often the analysis needs to combine data from different node for computation like sorting and merging. So we need an effective programming model like MapReduce which Hadoop build-on.The ability process unstructured data and slowness in seek time are the biggest challenges in computing big data.

Seek time:

Seek time is the process of moving disk header to particular place of the disk to read or write.The data access is primarily depends on the seek time. But the traditional B-Tree algorithm used by RDMBS good for updating and selecting data, it is not as efficient as of MapReduce for sorting and merging. The batch processing which most of write less and read often where the relational database fit for continuously updated.

Processing semi structure data:

RDBMS is good fit when your data organized in a structured way such as XML or tables. Because the whole data structure build around the relationships of data. The semi-structured data like a spreadsheet, though it organized as rows and cells, each row and cell can hold any data. Unstructured data such as image file or PDF won’t fit in to a relational databases. Map Reduce works well with unstructured and semi-structured data since it interpret the data at the processing time unlike RDBMS which force while storing time (with constraints and data types).

Normalization:

RDBMS often normalized to reduce the duplication where as distributed data processing build on top of duplication of data over different node. Duplication required so that even one node gone down, the data should not get lost and the computation should go on undisturbed. Hadoop HDFS file system and MapReduce algorithm perfectly build for it.

Linear Scalability:

On MapReduce, If you increase the input data to be processed the processing speed with get reduce. On the other hand if you increase number of clusters, the processing speed will increase. It means the amount of data needs to be processed and the size of the cluster is directly proportional. It is not true in case of SQL queries.

Recommendation:

RDMBS is good when you have a Gigabytes of structured data, which read and write often and need high integrity.
Hadoop is good when you have a Peta bytes of semi-structure or unstructured (though fit for structure too), which often read and write once in a while, require to process in batch mode with linear scaling and low integrity. People now start to use Hadoop for real-time analytic too now.