Tuesday, March 12, 2013

Polymorphism of Apache Pig's Eval Function

Polymorphism is not necessary for a programming language, but it will make our code more beautiful and clean. At least it will save us several lines of code.
In Apache Pig, a Eval Function is a class which extends EvalFunc, rather than a function, so we can't leverage java's polymorphism for function. But there are 2 back doors left/designed by EvalFunc designer:
1. The input of EvalFunc is a tuple which makes Input Polymorphism possible.
2. EvalFunc is generic which makes Output Polymorphism possible.

Input Polymorphism

Input Polymorphism is referring to the variance of input.
As the input of EvalFunc is a tuple, and the element of tuple is object which means you can embed any object to tuple and pass to EvalFunc.
For example:

public class Add extends EvalFunc<Double> {

@Override
public Double exec(Tuple input) throws IOException {
Object a = input.get(0);
Object b = input.get(1);
Double da, db;
if(a instanceof String){
da = Double.parseDouble(a);
}
else{
da = a;
}

if(b instanceof String){
db = Double.parseDouble(b);
}
else{
db = b;
}
return da+db;

}
}
In the previous example, the Add function tries to parse a string into double so that add between strings or between string and double is ok.

Output Polymorphism

Output Polymorphism is referring to the variance of output.
Usually you have to designate the output type of Eval Function. In the example above, Double is the return type. But if you want the return type to vary, you could just use Object as the return type.
For example:

public class AorB extends EvalFunc<Object> {

@Override
public Object exec(Tuple input) throws IOException {
Object a = input.get(0);
Object b = input.get(1);
if(a != null){
return a;
}
else{
return b;
}
}
}
In the example above, AorB returns a if a is not null or b otherwise.

Of course, the combination of input and output polymorphism make Eval Function more flexible and powerful.



No comments:

Post a Comment