Tuesday, March 12, 2013

Polymorphism of Apache Pig's Eval Function

Polymorphism is not necessary for a programming language, but it will make our code more beautiful and clean. At least it will save us several lines of code.
In Apache Pig, a Eval Function is a class which extends EvalFunc, rather than a function, so we can't leverage java's polymorphism for function. But there are 2 back doors left/designed by EvalFunc designer:
1. The input of EvalFunc is a tuple which makes Input Polymorphism possible.
2. EvalFunc is generic which makes Output Polymorphism possible.

Input Polymorphism

Input Polymorphism is referring to the variance of input.
As the input of EvalFunc is a tuple, and the element of tuple is object which means you can embed any object to tuple and pass to EvalFunc.
For example:

public class Add extends EvalFunc<Double> {

@Override
public Double exec(Tuple input) throws IOException {
Object a = input.get(0);
Object b = input.get(1);
Double da, db;
if(a instanceof String){
da = Double.parseDouble(a);
}
else{
da = a;
}

if(b instanceof String){
db = Double.parseDouble(b);
}
else{
db = b;
}
return da+db;

}
}
In the previous example, the Add function tries to parse a string into double so that add between strings or between string and double is ok.

Output Polymorphism

Output Polymorphism is referring to the variance of output.
Usually you have to designate the output type of Eval Function. In the example above, Double is the return type. But if you want the return type to vary, you could just use Object as the return type.
For example:

public class AorB extends EvalFunc<Object> {

@Override
public Object exec(Tuple input) throws IOException {
Object a = input.get(0);
Object b = input.get(1);
if(a != null){
return a;
}
else{
return b;
}
}
}
In the example above, AorB returns a if a is not null or b otherwise.

Of course, the combination of input and output polymorphism make Eval Function more flexible and powerful.



Friday, March 8, 2013

R is interesting and powerful

Recently I'm participating a online course Data Analysis by Jeff Leek , which introduces some basic ideas of data analysis. But the best part of it is through this course I finally started to use R.
Sometime at work, I need to draw some graph, usually I will use excel for simple lines or bars, or pies. and use desmos to explore functions. I also used matlab to process data.
I hear about R along ago, but there no motivation to just learn a new thing. until recently, the data analysis course lead me to explore R. It's grammar is like javascript which makes it more easier for me.
I think the reason I like to use R is totally because of RStudio which makes it a rival of matlab.