Serializing Java: why I work on new serializer?

Why reinventing the wheel again? Why popular serializers are useless to my project – Entity Tracker?

This is Serializing Java – emerging series about writing your own serializer in case you didn’t like other serializers.

So? Another “invention”?

Actually, yeah. What I’m currently working on is not a standard serializer. It has more specific use cases due to Entity Tracker project.

I don’t need a serializer that simply converts in-memory JVM object into bytes and the other way. What I need is a realtime inspection of objects over network thanks. In other words it’s like having a remote debugger similiar to those popular in IDEs but having only the Variables/Watches part (however, as a dynamically extendable tree).

While Entity Tracker has specific UI to this, the general view layer for this serializer would be a JSON or rendered tree, able to be dynamically expanded.

There’s one more thing – while I need serializer, deserializer is needed for sure. However, my deserializer is not going to recreate objects. UI does not have class types, remember? It’s unable to create objects with unavailable types.

Have I done research about this? You know, reinvention…

Yes, I’ve done a ton of research.

There is such thing as Java Object Inspector – visually works the way I want but it’s not remote and doesn’t have option to watch objects. And even more lacks about it.

Without remote inspection I couldn’t debug an Android game on PC.

Or there is Kryo – library for cloning and serialization, pretty useful for network case since it caches information about inspected types. However, it always wants to serialize object as much as it can. I need partial inspection to enable watching some fields or objects. The worse part is that both peer sides (or server and clients) need to contain bytecode for class types. Some “common lib”. I don’t want users to provide and share their libs with my Entity Tracker GUI. For more about network issues, read further.

There’s also FST – it has this feature – “optionally en/decode any Serializable object graph to JSON”. However, JSON is not efficient in the matter of size. Same with XStream or Jackson. Although, XStream features “Requires no modifications to objects” – still – XMLs are big. Also, I don’t expect BSON (binary JSON) would change much. More to it, redundant rediscovering structure of every object is a no-go in terms of performance.

Google’s Protocol Buffers makes you define the structure of data. But I prefer following situation…

Want: No changes to user code

I want my Entity Tracker to be fully transparent for user. Ideally, user won’t have to change anything in his project code to just inspect the object world for a while.

In other words – my serializer built into Entity Tracker will analyze objects, type structures and send all of these. What’s even more important – we’re not going to analyze all existing types – only the needed ones. Accessing some objects having fields of unknown types

No interfaces!

ISerializable ? Give me a break. That’s useful to implement your stuff as performant as possible but not automatic. Thus, we go with another requirement…

No annotations

Any @Transient  fields, @PostSerializer , @PreSerialize  events or exclusion strategies / policies. There is no need for this.

No schemas

Protocol Buffer or colfer need external definitions of data structure. I would rather base on Reflection since projects would be very sensitive to changes between in-dev relaunches.

No need for extensive features

Exclusion strategies, versioning providing backward-compability, security, alternative output formats.

OutOfMemoryException / StackOverflowException

Basically, cyclic reference detection. Some serializers like protostuff deal with it already.

However, I detect it and instead of inspecting deep down the object, I just give a note that there’s a reference to same type.

Performance

They can’t diff

Well, some libraries can do versioning and provide some diffing. However, my only goal is to visually show user what’s changed since the last time he looked into his data. I could serialize whole object and find a diff. However, in gamedev frequent serializing would be bad, really bad. Even if serializing library offers small amount of memory allocations, there’s still some work for CPU.

The Size

It’s going to be sent over the network. Thus, I do:

  1. cache types and automatically give them IDs
  2. serialize to bytes, no useless strings for determining types
  3. cache walk instructions for traversing through certain type object and send it over network

The Network

Let’s say our to-be-inspected-software is a server, so it contains .class  data for certain to-be-inspected classes. Now, GUI of the visual inspector is the client side. An overwatch that doesn’t have info about class types from .class  bytecode. And no, I don’t want to send the bytecode through the network. If you still don’t understand why – size, network speed + need of frequent relaunch of inspected software (that’s how development simply goes, right?). Also, most of the time I don’t need a huge amount of types.

Answer to those problems

  1. no external dependencies on serializers because neither fits my use cases
  2. just a crafted solution for specific needs, general usage may be a future outcome after extracting it from Entity Tracker
  3. be inspired by other works and simultaneously work on my own stuff

In tech matters:

  • cache type structures
  • structures are serializable (able to be sent over the network)
  • inspect specific types only when it’s needed

References

Resources

  • Pascal de Kloe

    No dependencies and no code changes? Performance and reflection? At some point you have to make a decision what’s most important. Implement the following and you’re pretty close. https://github.com/pascaldekloe/colfer/issues/15

      • Pascal de Kloe

        Any progress? :-)

        • Well, sorry but no. A lot has happened since then (e.g. got a new very interesting job) and this project is paused for now. I stopped in point where whole serialization works in all cases I’ve found and then tried to implement it with GUI. It’s now more complicated than before so it needs some smart and efficient GUI updater. Swing (used before) was not a good choice for this so I decided to try with Elm lang (a functional language). I wrote almost whole Deserializer in Elm. Almost because it’s so strict about types (no polymorphism!) that makes non-trivial design of type system around it and I lack few details in there. When I’m done with deserializer in Elm then I can write GUI based on Virtual DOM. I have considered writing Virtual DOM in Java but Swing is so bad for this…

  • Pingback: Serializing Java: inspecting data structure – NamekDev()