Serializing Java: inspecting data structure

Every deserializer needs information about types – the data structure. In my serialization the deserializer can work behind network connection so it can’t count on Reflection mechanism. That’s why the serializer has to discover and serialize data structure that could be sent over the network and understood by deserializer.

This is Serializing Java – emerging series about writing your own serializer in case you didn’t like other serializers.

Data vs data structure

It’s very important to differentiate two things. Data structure is not the data per se. My definition of data structure here is – a hierarchical description of the type which is a direct result of inspection.

Example:

While inspecting the GameObject type, to have a full description, we have to inspect it’s all fields: position and size. Those fields are types easy to inspect too. Both are Vector3. Then Vector3 has 3 fields: x, y, z.

So data structure in this case is a list of fields in GameObject and Vector3, where each field is described as a name, type and parent type. As for parent type, for instance, field size has parent type = GameObject.

Pass-by: value vs reference

Java is pass-by-value:

Java works exactly like C. You can assign a pointer, pass the pointer to a method, follow the pointer in the method and change the data that was pointed to. However, you cannot change where that pointer points.

In C++, Ada, Pascal and other languages that support pass-by-reference, you can actually change the variable that was passed.

Reference is not a pointer like in C++. We can’t manipulate reference. We have a reference or don’t – in the latter case we have null.

Based on that, we could state that serializing data in Java is all about values:

  • primitive types
  • a reference which could be some identifier in form of Integer
  • null – which tells us that this is neither primitive value nor a reference

In fact, internally reference is somewhat an ID with a pointer (or actually, pointers). All of this seems flat – we have byte, short, int, long and special case – null. Then there are Strings which are length + array of bytes. It’s fairly easy to serialize.

How many types you’ll deal with

However, it’s not that easy. Frankly speaking, Java complicates things a lot.

For starters, we may think that those types have to be considered only:

  • primitive type
  • boxed primitive type (a reference or null)
  • class

The above is basically what you see when you code your stuff as usual. You know, implementing your software, game or whatever. Here’s some Integer, here’s some Class (having multiple fields), here’s some String.

And here’s what you’ll see when implementing a custom serializer:

  • enum (!)
  • inner class or inner enum
  • null value
  • array of primitive type
  • array of boxed primitive type
  • array of (any) objects
  • array of enum
  • collections
  • cycle references
  • references to static things
  • arrays of arrays…
  • arrays of collections

Now this is some list, isn’t it. Let’s dig into some of them.

Enum

Enum is a special case. It’s often seen as String because code is nicer. However, it’s Integer in terms of valuing it in memory. Couldn’t we just flatten enums into Integers? Well no, we want both information:

  • enumeration names
  • enumeration values

What’s worse, it’s a nullable thing so it doesn’t behave as primitive int. It’s not safe to deal with it as primitive. Why? Well, it wasn’t there since first version of Java. Here’s what enum really is:

which it was introduced before Java 6 (JDK 1.5, specifically).

What’s little worse is an array of enum values.

Array of primitive and boxed types

int is a primitive, while Integer is a boxed primitive. Boxed primitive is a class instance, so it can be replaced with null .

As I would love things to be efficient in terms of network throughput, I would like to write an array of boxed Integers as a serie of primitive integers. Whether I can do this or not – depends:

…depends on the type of array component which in this case is int  or Integer .

During serialization of data structure, this makes me to declare whether certain array is an array of primitive type or not.

Array of everything

Let’s inspect this class:

Very simple and popular case where component type of collection is not definite – it could be inherited. Array component types are treated covariantly. What it means, basically, is that I could instantiate an object of GameEntity  class which extends Entity  and put that object into entities . That makes things harder. Array of Entity (Entity[]) can’t be inspected deeply during serialization of data structure. It’s possible (or: just makes sense) to inspect type of each array element only during serialization of data.

Summary: Discovering data structure

Let’s remember – there’s data and data structure.

I want to inspect only those types that are needed to transmit over network and not the whole world of classes in a JVM process.

So, serializing all of this is a process that can’t be separated into serializing structure first then data as second thing. Well, it could be, but read the sentence above or please go back to my previous articles to understand more about my needs here:

Serializing Java: why I work on new serializer?

Artemis Entity Tracker – inspecting your game state through network

References