Introduction to data mining – Types of data mining techniques

Data mining techniques can be classified broadly as

    1. Predictive:

  • a. Classification
  • b. Regression
  • c. Time series Analysis
  • d. Prediction
    2. Descriptive:

  • a. clustering
  • b. Summarization
  • c. Association Rules
  • d. Sequence Discovery
  • You can refer the below chat about that.

    Data mining techniques graph
    Data Mining Techniques Graph

    Now I am going to give brief intro about each types.


    It is often refered as “supervised learning”. It has a predefined set of groups or models based on that we predict values.

    (e.g) Airport security maintains a set of metrics and try to predict the terrorist
    The regression using known data formats like linear or logistic and assume the future data format will fall in to the data structure. It then try to predict the value by applying some mathematical algorithms on the data set.
    (e.g) Investing on Pension fund. Calculating your annual income and try to predict what you need after you retire. Then based on the present income and needed income makes investment decision. The Prediction done by simple regression formula to revise every year.
    Time series Analysis:
    With time series analysis, every attribute value determine by the different time interval.
    (e.g) Buying a company stock. Take X,Y,Z companies month by month performance and try to predict their next one year growth and based on the growth you buy stocks.
    Prediction is relates with time series but not time bound. It is used to predict value based on past data and current data.
    (e.g) Water flow of a river will be calculated by various monitors at different levels and different time intervals. It then using those information to predict the water flow of future.
    It is widely called as unsupervised learning. It is similar to classification except it won’t have any predefined groups. instead the data itself define the group.
    (e.g) Consider a super market has buying details like age, job and purchase amount we can group by age against percentage as well job against percentage to make meaningful business decision to target the specific user group.
    Summarization is associating the sample subset with small description or snippet.
    Association Rules:
    It is also called as linked analysis. It is all about under covering relationship among data.
    (e.g) Amazon “People bought this also bought this” model
    Sequence Discovery:
    Sequence discovery is about finding sequence of an activity.
    (e.g) In a shop people may often buy toothpaste after toothbrush. It is all about what sequence user buying the product and based on the shop owner can arrange the items near by each others.

how to view multiple log file in a single terminal

We do run lot of programs on multiple machines and often we want to see the log files in a single terminal to monitor the whole logging. We can do it pretty simple using a tool call multitail in ubuntu.

First install multitail by issuing sudo apt-get install multitail. The create a shell script like multitail -l "ssh -i username@SERVER-1 tail -f " -l "ssh -i @SERVER-2 tail -f ". Now make sure you change the file permission to
sudo chmod -R 745 to make sure it is runnable.

Now you can monitor many log files at your desktop.

hello world jruby

require 'java'
include_class 'java.util.ArrayList'
class Base
  def getname()
    return @name
  def setname(name)
    @name = name
  def getPassword()
    return @password
  def setPassword(password)
    @password = password
  def printTest()
    puts "ok printed on base"
    list =
    for i in 0..2
       base =
       base.setname("ananth"+ i.to_s)     
       base.setPassword("password" + i.to_s)
   list.each do |v|     
     puts "name #{v.getname()}"
     puts "passsword: #{v.getPassword()}"    
class RubyObj < Base
  def print()
    puts "ok printed"
rubyobj =

Python Script to calculate taste of two people based on the rating using Euclidean Distance Score

from math import sqrt
# Returns a distance-based similarity score for person1 and person2

def sim_distance(prefs,person1,person2):
   # Get the list of shared_items
   for item in prefs[person1]:
         if item in prefs[person2]:
   # if they have no ratings in common, return 0
    if len(si)==0: return 0
     # Add up the squares of all the differences
      for item in prefs[person1] if item in prefs[person2]])
      return 1/(1+sum_of_squares)

how to get idea for your killer products!!!

Every one wants to build something special. We want to get the pride of being listed in popluar blogs like techcrunch and been known for their passion. But how do can I really design a killer products? where I can get the idea? Do i need to be a tech giant? do I need to be a marketing geek? was Zuckerberg is a marketing genious? was Jobs a tech geek? Where they all these ideas from?

To do that you need to understand what kind of products been stand out in the business. Take down products like groupon, Ipod, facebook even google. They are all the product which serves the basic needs for mankind and makes their life better. Your product should be the one which makes people life lot easier and happier. Eventually it will make them to live along with the product. Can you live with out search something on google or checking what your remote friends doing on the facebook?

So how can we design a product like that. It is just around you only. In our life each and every moment we are consuming information. Whenever you talking to people or seeing something you are actually consuming information. Our brain is the bigger consumer of information than any other devices that exist in this world and it has also deep deep storage. Now if you can listen and analysis what information you are consuming and can we do things better with the technology around us that makes the killer product.

I realized this when i think of developing a auto sharing mobile app. I supposed to get back to my home and it is almost 10 miles away from where I am standing and the fare is bit expensive. I am preety sure there was a huge crowd waiting for bus to go near my place. I just thinking how good it is if someone going on that way or atleast half the way and share fare with me. But how can I ask to the huge crowd over there. Would it be nice if I have an app which checks my way and suggest me to share someone with me and create a quick network with people.

It is all simple to create idea for a product. Listen what you are consuming do some kind of mining and examine the available technology around you and there you are the idea for your next killer product ready.

what’s on java object creation-static factory method

Many of you aware of static factory method which is a simple static method returns an instance of the class. An example like public static Boolean valueOf(boolean yourVal). No let me explain some of the advantages of using static factory method.

Some of the advantages:

1. When ever we invoke a constructor like Emp emp = new Emp() it creates an object. But using static factory method like Emp emp = Emp.getInstance() (Consider getInstance() is a static method which return type is EMP object) will always returns single Emp object no matter how many times we called them.

2. We could create any number of static methods with different names like getCar() return car object, getBreak() returns break object, But constructors have the constraints like should have the class name as their name. So we might need to differentiate using the parameter (using method overloading) which is good but on the readability point of view it will confuse.

So all good what about disadvantages. yah it has a main disadvantages,

Since we have all static methods naturally the constructor will be private. So we can’t really extends the class. Some of the object orientations might me lost.

Product Development 2.0

Hmmm.. yes, The 2.0 is a buzz word around the internet world. People start to do web 2.0, gov 2.0, education 2.0 etc etc. So why can’t PD2.0?

These are my opinions about Product Development. We can define it as Product Development is all about marrying available technologies in to market needs. Now there comes two parts,

1. Available Technologies – where your software developers knows about.
2. Market Needs – Where your sales team and customer care team knows about.

As a developer and involved lots of start up and product development companies, I can clearly see there is no coordination or combine development strategy exist between these departments. Developers don know what is the market need of their product and sales person don know what are the technologies available. End of the day the product going to suffer.

A market winning product can came out of a team which is strong in technologies and good in sales. Neither one lag, you are out of your business. Now how we can use develop a team of developer and sales team? Do you wanna sit them together? Sharing lunch & beer? That wont do any good at all. how we can use technology?

Here you go. Twitter like tool is a perfect interaction tool to get co ordinated along with your team. It is simple way of expressing yourself. From their expression itself one would easily understand what kind of person he is and the way of thinking. It is more of knowing each others thought and ideal mind naturally close together and who know they may be your steve jobs & whizz.

Another way of expressing your self is blogs where one can express about their experience on sales and ideas, his daily life. A developer can blog about new findings and technologies, you could see all of the sudden there will be match among ideas and technology. there by developers can know the strength of their sales team and vice versa.

Most of the developers even don know about their company strategy, how they performing, what is their strength, where they lagging and who are their competitor. This is where blogging and twitting from the top level management is so important(Like from CEOs). A team with unique goal in their mind, a team which knows its strength and weakness can go miles.

This is where i strongly believe how the product development should be and companies should enable intra company bloggers and twitters (Not the public bloggers & Twitters) and encourage them to use it.