Sunday, June 19, 2016

Java 8: Default Method Resolution Rules

With the introduction of default methods in Java 8, it is now possible for a class to inherit the same method from multiple places (such as another class or interface). The following rules can be used to determine which method is selected in such cases:

  1. A class or superclass method declaration always takes priority over a default method
  2. Otherwise, the method with the most specific default-providing interface is used
  3. Finally, if the methods are equally specific, there will be a compiler error and you will be forced to explicitly override the method and specify which one your class should call

Let's look at a few examples and apply these rules.

Example 1:

What does the following code print?

public interface A {
  default void name() {
    System.out.println("A");
  }
}

public interface B {
  default void name() {
    System.out.println("B");
  }
}

public class C implements A {
  @Override
  public void name() {
    System.out.println("C");
  }
}

public class D extends C implements A, B {
  public static void main(final String... args) {
    new D().name();
  }
}

Answer: C

This is because, as stated in Rule 1, the method declaration of name() from the superclass C takes priority over the default methods declarations in A and B.

Example 2:

What does the following code print?

public interface A {
  default void name() {
    System.out.println("A");
  }
}

public interface B extends A {
  @Override
  default void name() {
    System.out.println("B");
  }
}

public class C implements A {}

public class D extends C implements A, B {
  public static void main(final String... args) {
    new D().name();
  }
}

Answer: B

Unlike the previous example, C does not override name(), but since it implements A, it has a default method from A. According to Rule 2, if there are no methods in the class or superclass, the most specific default-providing interface is selected. Since B extends A, it is more specific and, as a result, "B" is printed.

Example 3:

What does the following code print?

public interface A {
  default void name() {
    System.out.println("A");
  }
}

public interface B {
  default void name() {
    System.out.println("B");
  }
}

public class D implements A, B {
  public static void main(final String... args) {
    new D().name();
  }
}

Answer: Compiler error! Duplicate default methods named name with the parameters () and () are inherited from the types B and A

In this example, there's no more-specific default-providing interface to select, so the compiler throws an error. To resolve the error, you need to explicitly override the method in D and specify which method declaration you want D to use. For example, if you want to use B's:

class D implements A, B {
  @Override
  public void name() {
    B.super.name();
  }
}

Example 4:

What does the following code print?

public interface A {
  default void name() {
    System.out.println("A");
  }
}

public interface B extends A {}

public interface C extends A {}

public class D implements B, C {
  public static void main(final String... args) {
    new D().name();
  }
}

Answer: A

The sub-interfaces B and C haven't overridden the method, so there is actually only the method from A to choose from. As a side note, if either B or C (but not both) had overridden the method, then Rule 2 would have applied. By the way, this is the diamond problem.

Saturday, June 18, 2016

Java 8: Debugging Stream Pipelines

I've found that stream pipelines can be difficult to debug because stack traces involving lambda expressions are quite cryptic. Consider the following contrived example:

import java.util.Arrays;
import java.util.List;
import java.util.function.Function;

public class Test {
  public static void main(final String[] args) {
    final List<String> list = Arrays.asList("foo", null, "bar");
    list.stream()
        .map(Function.identity())
        .filter(x -> true)
        .map(String::length)
        .forEach(System.out::println);
  }
}

You may have already guessed that the code above will throw a NullPointerException when String.length is called on the null element in the list. I've added extra map and filter operations, which do nothing, just to make the example a bit more interesting. In the real world, you will probably have a number of different operations in your stream pipeline.

Running the code, produces the following stack trace:

Exception in thread "main" java.lang.NullPointerException
  at Test$$Lambda$3/455659002.apply(Unknown Source)
  at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
  at java.util.Spliterators$ArraySpliterator.forEachRemaining(Unknown Source)
  at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
  at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
  at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source)
  at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source)
  at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
  at java.util.stream.ReferencePipeline.forEach(Unknown Source)
  at Test.main(Test.java:12)

The stack trace shows that a NullPointerException occurred but it doesn't tell you which operation in the pipeline failed. What does Test$$Lambda$3/455659002.apply(Unknown Source) mean and why is there no line number?! Since lambda expressions don't have a name, the compiler makes one up (similar to anonymous classes). In this case, it is Test$$Lambda$3 but that doesn't help us track the bug in our code.

So, what can we do? Let's go old-school and add some logging to our code! We can use peek to print out each element before it is consumed by the next operation in the pipeline.

import java.util.Arrays;
import java.util.List;
import java.util.function.Function;

public class Test {
  public static void main(final String[] args) {
    final List<String> list = Arrays.asList("foo", null, "bar");
    list.stream()
        .peek(x -> System.out.println("Running identity on: " + x))
        .map(Function.identity())
        .peek(x -> System.out.println("Running filter on: " + x))
        .filter(x -> true)
        .peek(x -> System.out.println("Running string length on: " + x))
        .map(String::length)
        .peek(x -> System.out.println("Running print on: " + x))
        .forEach(System.out::println);
  }
}

Running it produces the following output:

Running identity map on: foo
Running filter on: foo
Running string length on: foo
Running print on: 3
3
Running identity map on: null
Running filter on: null
Running string length on: null
Exception in thread "main" java.lang.NullPointerException
  at Test$$Lambda$6/295530567.apply(Unknown Source)
  at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$11$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$11$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
  at java.util.stream.ReferencePipeline$11$1.accept(Unknown Source)
  at java.util.Spliterators$ArraySpliterator.forEachRemaining(Unknown Source)
  at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
  at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
  at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Unknown Source)
  at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Unknown Source)
  at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
  at java.util.stream.ReferencePipeline.forEach(Unknown Source)
  at Test.main(Test.java:16)

Great! Now we know that the NullPointerException was thrown by the string length lambda!

In general, I think stack traces involving lambdas could be improved in future versions of Java.

Sunday, June 12, 2016

Java 8: Converting Anonymous Classes to Lambda Expressions

Refactoring anonymous classes (that implement a single method) to lambda expressions, makes your code more succint and readable. For example, here's an anonymous class for a Runnable and its lambda equivalent:

// using an anonymous class
Runnable r = new Runnable() {
  @Override
  public void run() {
    System.out.println("Hello");
  }
};

// using a lambda expression
Runnable r2 = () -> System.out.println("Hello");

However, it's not always that simple!

Here are a couple of gotchas:

1. Different scoping rules

There are different scoping rules between anonymous classes and lambda expressions. For example, in lambda expressions, this and super are lexically scoped, meaning they are relative to the enclosing class, but in an anonymous class, they are relative to the anonymous class itself. Similarly, local variables declared in lambda expressions will conflict with variables declared in the enclosing class, but in anonymous classes, they are allowed to shadow variables in the enclosing class. Here is an example:

int foo = 1;
Runnable r = new Runnable() {
  @Override
  public void run() {
    // this is ok!
    int foo = 2;
  }
};

Runnable r2 = () -> {
  // compile error: Lambda expression's local variable foo cannot
  // redeclare another local variable defined in an enclosing scope.
  int foo = 2;
};

2. Overloaded methods

If you have an overloaded method, using lambda expressions can result in an ambiguous method call and will require explicit casting. Here is an example:

// Functional interface
interface Task {
  public void execute();
}

// Overloaded methods
public static void go(final Runnable r) {
  r.run();
}
public static void go(final Task t) {
  t.execute();
}

// Calling the overloaded method:

// When using an anonymous class, there is no ambiguity because
// the type of the class is explicit at instantiation
go(new Task() {
  @Override
  public void execute() {
     System.out.println("Hello");
  }
});

// When using a lambda expression, there is a compile error!
// The method go(Runnable) is ambiguous
go(() -> {
  System.out.println("Hello");
});

// This ambiguity can be solved with an explicit cast
go((Task)() -> {
  System.out.println("Hello");
});

Saturday, May 07, 2016

kdb+/q - Running a function multiple times and concatenating results

Let's say that you have a q function that you want to run for multiple inputs (for example, a list of dates) and you want to concatenate the outputs of each function call into a single table. This is demonstrated below:

// A function that returns some data for a single date.
getData:{[dt]
    t:select from data where date=dt;
    // do some more stuff
    t}

// list of dates
dates:2016.01.01+til 10

// call the function for each date and join results
result:(uj/) getData each dates;

// or, if you want to do some processing after each function call:
result:(uj/) {[dt;param]
    t:getData[dt];
    // do some stuff with param
    t}[;`foo] each dates;

Saturday, April 23, 2016

kdb+/q - Joins

You're probably familiar with SQL joins, which are used to combine data from more than one table. This post shows how you can do the same in kdb.

I'll use the following two tables to demonstrate joins in kdb:

q) show age:([] name:`alice`charlie`dave`frank; age:25 26 31 33)
name    age
-----------
alice   25
charlie 26
dave    31
frank   33

q) show work:([] name:`alice`bob`charlie`eve; dept:`ops`dev`hr`eng)
name    dept
------------
alice   ops
bob     dev
charlie hr
eve     dev

Now let's join these tables on the name field using the different kinds of kdb joins:

Left Join (lj):

In a left join, all the rows of the left table are preserved, and those from the right table are only combined if the values in the key columns match those in the left table. If there are values in the left table that don't exist in the right, the joined columns will contain nulls.

q) age lj `name xkey work
name    age dept
----------------
alice   25  ops
charlie 26  hr
dave    31
frank   33

Union Join (uj):

A union join (referred to as a full outer join in SQL) returns rows from both tables, regardless of whether there is a match. If there is a match, it returns combined rows, and if there isn't, the missing side will contain nulls.

q) (`name xkey age) uj `name xkey work
name   | age dept
-------| --------
alice  | 25  ops
charlie| 26  hr
dave   | 31
frank  | 33
bob    |     dev
eve    |     eng

Inner Join (ij):

An inner join returns only those rows where there is a match in both tables.

q) age ij `name xkey work
name    age dept
----------------
alice   25  ops
charlie 26  hr

Equi Join (ej):

This is an inner join in which you can specify the list of columns to join on.

q) ej[`name;age;work]
name    age dept
----------------
alice   25  ops
charlie 26  hr

Plus Join (pj):

This is a left join, which also sums the values of common columns (other than key columns). This type of join doesn't exist in SQL.

q) t1:([] name:`alice`charlie`dave`frank; x:1 2 3 4)
q) t2:([] name:`alice`bob`charlie`eve; x:10 10 10 10)
q) t1 pj `name xkey t2
name    x
----------
alice   11
charlie 12
dave    3
frank   4

Cross Join (cross):

This is a "cartesian product". It joins everything to everything and isn't normally the kind of join people are looking for. Each row is a combination of the rows of the first and second table, so can be a dangerous join to run against large tables.

q) `name xasc work cross age
name    dept age
----------------
alice   ops  25
alice   dev  25
alice   hr   25
alice   eng  25
charlie ops  26
charlie dev  26
charlie hr   26
charlie eng  26
dave    ops  31
dave    dev  31
dave    hr   31
dave    eng  31
frank   ops  33
frank   dev  33
frank   hr   33
frank   eng  33