Serializing Java and the corner cases

Everytime you think of Java you may think about types as of classes and primitive types. Not many, right? Well, there’s a major difference between the thing and handling it. The latter brings a lot of corner cases.

This is Serializing Java – a series about writing your own serializer in case you didn’t like other serializers.

Not only types

In my older article, I have listed types and lots of basic cases of type inspection:

→ Serializing Java: inspecting data structure

so if you haven’t read that, I suggest you go there first. This time we’ll look deeper into the all aspects of serialization instead of just inspection phase.

Non-serializable things

Yes! You can’t serialize everything!There are some non-serializable things out there. I could enlist two types of those:

  1. fields/objects of certain classes like Thread, Socket, Stream and probably many other that would depend on external resources (external due to JVM)
  2. transient fields

There are two ways to handle both of them:

  1. totally ignore by not mentioning them
  2. ignore, by possibly putting an information if it’s null or not

However, I’d prefer a mix of those approaches:

  1. totally ignore transient fields
  2. in the case of Object referencing to an external source, put an information whether it’s null or not null and nothing else

Array of arrays, array of arrays of arrays, …

As usual, there are two cases to cover:

  1. explicit definition of multi-dimensional array
  2. an implicit instance of a multi-dimensional array

Those two classes below contain same data:

but have different structure a priori serialization.

Explicit definition of int[][][][] in the field type is much larger structure than just Object. Why? Serialized type is information about the type and subtype. The type of int[][][][]  is Array and subtype is… Array! This type has children of type int[][][] and has subtype int[][] which has a subtype int[]. Only the latest one is of type Array and subtype of primitive Integer. In the implicit case, Object is just an Object so during the inspection phase we can’t decide anything more than this.

Of course, it’s my way of defining the data structure in my serializer. You could treat this special case (of multi-dimensional arrays) by writing a flat structure where type=Array, subType=primitiveInteger and dimensions=4. One thing is sure – explicit case could be optimized during serialization both in serialization speed and output size.

Object references

Referencing Objects in Java is purely normal. References tend to build graphs which is followad by a need of way to handle cyclic references. However, as that’s obvious during serialization – it is NOT during deserialization!

Chicken or the egg problem. When we deserialize an object and some of it’s fields references to object that wasn’t deserialized yet – then we have a problem. To avoid problems we need some container that would be referenced and potentially filled later. However, it’s still not obvious how to fill those containers – how should we identify which container is for which object? Well, every single object should have an ID. That brings the topic of pointing at objects which I have also covered:

→ Serializing Java: point at without pointers

Non-static inner class

I’ve already mentioned that inner object contains a reference to the parent object. Read “Serializing Java: treat all your fields” to know more. However, what if we would actually need a reference to parent’s parent?

Let’s look at this sophisticated example of inner reality:

EvenMoreInnerClass  can access the great-grandfather OuterClass  with ease of innocent baby’s touch.

Now observe the instantiation of this beauty:

There’s a pattern of this$<number>. Normally, you would think that’s transitive – if object c is created in a context of b and b is created in the context of c then c is in the context of a. If so, then why c doesn’t have a direct pointer to a? In Java, it’s easy to just do a.b.c.d = true from the deepest object (as in the snippet above) but during serialization, we can’t make this direct connection. It’s even more interesting because variables named this$<number> are named (numbered) as if it was already considered to get any ancestor context. You may think that’s only shown be debugger but hell no – the  Class.getDeclaredFields() actually returns the same.

This case wasn’t especially important for my application or serialization algorithm but it’s worth to note that quirkiness.

Summary

This was probably the last post about basics of Serializing Java. Here are all the previous ones:

There’s much more to cover in real life (advanced) situations:

  • updating source objects based on deserialized value tree object
  • diffing observed objects
  • dealing with various collections in efficient manner
  • JIT optimization