Serializing Java: why I work on new serializer?

April 9, 2017

Why reinventing the wheel again? Why popular serializers are useless to my project - Entity Tracker?

This is Serializing Java - emerging series about writing your own serializer in case you didn’t like other serializers.

So? Another “invention”?

Actually, yeah. What I’m currently working on is not a standard serializer. It has more specific use cases due to Entity Tracker project.

I don’t need a serializer that simply converts in-memory JVM object into bytes and the other way. What I need is a realtime inspection of objects over network thanks. In other words it’s like having a remote debugger similiar to those popular in IDEs but having only the Variables/Watches part (however, as a dynamically extendable tree).

While Entity Tracker has specific UI to this, the general view layer for this serializer would be a JSON or rendered tree, able to be dynamically expanded.

There’s one more thing - while I need serializer, deserializer is needed for sure. However, my **deserializer **is not going to recreate objects. UI does not have class types, remember? It’s unable to create objects with unavailable types.

Have I done research about this? You know, reinvention…

Yes, I’ve done a ton of research.

There is such thing as Java Object Inspector - visually works the way I want but it’s not remote and doesn’t have option to watch objects. And even more lacks about it.

Without remote inspection I couldn’t debug an Android game on PC.

Or there is Kryo - library for cloning and serialization, pretty useful for network case since it caches information about inspected types. However, it always wants to serialize object as much as it can. I need partial inspection to enable watching some fields or objects. The worse part is that both peer sides (or server and clients) need to contain bytecode for class types. Some “common lib”. I don’t want users to provide and share their libs with my Entity Tracker GUI. For more about network issues, read further.

There’s also FST - it has this feature - “optionally en/decode any Serializable object graph to JSON”. However, JSON is not efficient in the matter of size. Same with XStream or Jackson. Although, XStream features “Requires no modifications to objects” - still - XMLs are big. Also, I don’t expect BSON (binary JSON) would change much. More to it, redundant rediscovering structure of every object is a no-go in terms of performance.

Google’s Protocol Buffers makes you define the structure of data. But I prefer following situation…

The need: No changes to user code

I want my Entity Tracker to be fully transparent for a user. Ideally, user wouldn’t have to change anything in his project code to just inspect the object world for a while.

In other words - my serializer built into Entity Tracker will analyze objects, type structures and send all of these. What’s even more important - we’re not going to analyze all existing types - only the needed ones. Accessing some objects having fields of unknown types

No interfaces!

ISerializable? Give me a break. That’s useful to implement your stuff as performant as possible but not automatic. Thus, we go with another requirement…

No annotations

Any @Transient fields, @PostSerializer, @PreSerialize events or exclusion strategies / policies. There is no need for this.

No schemas

Protocol Buffer or colfer need external definitions of data structure. I would rather base on Reflection since projects would be very sensitive to changes between in-dev relaunches.

No need for extensive features

Exclusion strategies, versioning providing backward-compability, security, alternative output formats.

OutOfMemoryException / StackOverflowException

Basically, cyclic reference detection. Some serializers like protostuff deal with it already.

However, I detect it and instead of inspecting deep down the object, I just give a note that there’s a reference to same type.

Performance

They can’t diff

Well, some libraries can do versioning and provide some diffing. However, my only goal is to visually show user what’s changed since the last time he looked into his data. I could serialize whole object and find a diff. However, in gamedev frequent serializing would be bad, really bad. Even if serializing library offers small amount of memory allocations, there’s still some work for CPU.

The Size

It’s going to be sent over the network. Thus, I do:

cache types and automatically give them IDs
serialize to bytes, no useless strings for determining types
cache walk instructions for traversing through certain type object and send it over network

The Network

Let’s say our to-be-inspected-software is a server, so it contains .class data for certain to-be-inspected classes. Now, GUI of the visual inspector is the client side. An overwatch that doesn’t have info about class types from .class bytecode. And no, I don’t want to send the bytecode through the network. If you still don’t understand why - size, network speed + need of frequent relaunch of inspected software (that’s how development simply goes, right?). Also, most of the time I don’t need a huge amount of types.

Answer to those problems

no external dependencies on serializers because neither fits my use cases
just a crafted solution for specific needs, general usage may be a future outcome after extracting it from Entity Tracker
be inspired by other works and simultaneously work on my own stuff

More technically:

cache type structures
structures are serializable (able to be sent over the network)
inspect specific types only when it’s needed

References

Resources

jvm-serializers - comparison of JVM serializers in terms of performance
Why Java’s serialization slower than 3rd party APIs?

Artemis Entity Tracker, Daj Się Poznać, Get Noticed 2017, java, Serializing Java