Sunday, July 14, 2013

Is a Promise actually a Wish in c++11

A typical usage of promise is like blow:

// future from a promise
std::promise<int> p;
std::future<int> f3 = p.get_future();
std::thread( [](std::promise<int>& p){ p.set_value(9); }, 
            std::ref(p) ).detach();
f3.wait();

It's confusing, because in real life a promise is what you are willing to do for others rather than asking others to do.
If we call promise here a wish, it all makes sense:
One makes a wish, then gives the wish to another(a thread) and waits for the wish to come true. The wish may not come true because the thread can throw a exception rather than set_value.

Friday, April 26, 2013

What is a big data problem?

Days ago I attend a representation of Tao Shi introducing how to detect community from directed graph using svd.
And his described a definition of big data problem from a statistician's perspective, he said and I quote "A small data problem is a problem that can be solved by sampling, and a big data problem can not."

Thursday, April 25, 2013

A Good Approximation for L0 Norm Regularization ?

In machine learning, Regularization are usually used to prevent over fitting. Different regularization are used subject to different purpose. Usually, L2 norm and L1 norm are used.
L2 norm is fully derivable, but converge slowly
L1 norm is only broken on 0, and faster than L2.
L0 norm are thought to be NP-complete.
the formular for L0 is 
$y=\sum _{n=0}^N1\left(W_n\right)$
where 1() is a indicate function, whether Wn is non-zero
Not only it is broken on 0, the derivative of other point is 0, which is not able to be used as regularization.
Think about the function below:
$y=-e^{-b\left|x\right|}+1$
where b is a parameter, the bigger b is the thinner the pitch is.
Approximation of 1()
This would be a good approximation of 1(), as it's only broken on 0 and the derivative of other points are
$y=be^{-b\left|x\right|}\sign \left(x\right)$
It seems like a Reverse of an Impulse.



Wednesday, April 10, 2013

Sparse Coding and Matrix Fatorization

Just read a article about sparse coding for deep learning, which remained me that it seems a little like Matrix Factorization for recommender system.

  • Both are trying to find a sparse representation of data.
  • Both are trying to balance between 2 objects, data representation vs features in sparse coding, item vs user in matrix factorization 
  • both need alternative optimization algorithm to achieve global optimization 

Tuesday, April 2, 2013

This Community Definition is What I'm Thinking

As discussed in my previous post study in community detection in network , I'm looking for a definition/algorithm that address communities from a network that has several properties as below:
1. high intrinsic clustering
2. communities can overlap on nodes (like your family community and friend community can overlap on you)
3. It's ok for some node to be single island

here is a paper described just like what I wanted:
uncovering the overlapping community structure of complex networks in nature and society
And I found it from cfinder.org

As in the paper:
"we define a community, or more precisely, a k-clique-community as a union of all k-cliques (complete sub-graphs of size k) that can be reached from each other through a series of adjacent k-cliques  (where adjacency means sharing k-1 nodes) "
For each node in a community, it's at least a member of a k-clique, which leads to intrinsic clustering.
As a node can belong to different cliques and those cliques can share less then k-1 nodes which mean they belong to different community. This leads to overlap between communities
Finally, there must be some nodes that don't belong to any clique then they form islands.
It's tricky to decide k. if k is too high, there must be too much islands.
If k is 2, then all connected nodes form a community which block the overlap property.
so maybe 3 or 4 would be good choice.



Tuesday, March 12, 2013

Polymorphism of Apache Pig's Eval Function

Polymorphism is not necessary for a programming language, but it will make our code more beautiful and clean. At least it will save us several lines of code.
In Apache Pig, a Eval Function is a class which extends EvalFunc, rather than a function, so we can't leverage java's polymorphism for function. But there are 2 back doors left/designed by EvalFunc designer:
1. The input of EvalFunc is a tuple which makes Input Polymorphism possible.
2. EvalFunc is generic which makes Output Polymorphism possible.

Input Polymorphism

Input Polymorphism is referring to the variance of input.
As the input of EvalFunc is a tuple, and the element of tuple is object which means you can embed any object to tuple and pass to EvalFunc.
For example:

public class Add extends EvalFunc<Double> {

@Override
public Double exec(Tuple input) throws IOException {
Object a = input.get(0);
Object b = input.get(1);
Double da, db;
if(a instanceof String){
da = Double.parseDouble(a);
}
else{
da = a;
}

if(b instanceof String){
db = Double.parseDouble(b);
}
else{
db = b;
}
return da+db;

}
}
In the previous example, the Add function tries to parse a string into double so that add between strings or between string and double is ok.

Output Polymorphism

Output Polymorphism is referring to the variance of output.
Usually you have to designate the output type of Eval Function. In the example above, Double is the return type. But if you want the return type to vary, you could just use Object as the return type.
For example:

public class AorB extends EvalFunc<Object> {

@Override
public Object exec(Tuple input) throws IOException {
Object a = input.get(0);
Object b = input.get(1);
if(a != null){
return a;
}
else{
return b;
}
}
}
In the example above, AorB returns a if a is not null or b otherwise.

Of course, the combination of input and output polymorphism make Eval Function more flexible and powerful.