server-client protocol buffer specs

It’s no secret around here that I’ve been burning the candle from both ends in order to complete “The Great Server-Client Divide” as we call this year-long task. A task that has been in planning since the very start.

With big-data applications, its challenging to get a server (simulation state machine) to interact (somewhat generically) with any number of clients without compromising on performance. We studied the principles and the mechanics of this issue and slowly arrived at a viable solution requiring extreme software engineering care.

For our engineering analysis software, we navigated many performance compromises. One notable compromise (compared to game engines) has been on maintaining both high (FP64) and low precision (FP32) data sets for computation vs render — every iteration we must convert and buffer relevant results from device to host in order to maintain a global state with which clients can interact.

(Still, we are finding that proper software design yields a compute bottleneck in GPU-like devices, rather than I/O bandwidth limitation over PCIe — so this extra process is not responsible for any slowdown. We’re measuring and reporting more than 25x speed-up over CPU-only).

A sever-client architecture utilizes a central sever host with any number of OpenCL-compatible devices and filesystem. One or more clients can connect to the server through a network connection, communicating needs and accepting pre-packaged data from the server. The client renders the data using local GPU device.

XCOMPUTE has gone through several thousand iterations to get where we are, and along the way we developed high-level and low-level optimizations and generalizations to further expand our capabilities and performance. For instance, we are approaching the minimum number of operations to synchronize arbitrary numerical data — and our C++ code syntax makes all these operations very clear and human-readable.

It should bit little surprise that eventually there would be a high degree of data structure unification (via dynamic compile-time and run-time tricks), and that the messages required to save/load could possibly be reused in wide-scale communication protocols. After all, both messages require serialization and de-serialization infrastructure, so if the encoding/decoding format is flexible and nearly run-time optimal, why not unify all I/O? Especially if it is easily parallelized and permits flexible usage and sharing with users.

That is exactly what we did; we implemented “protocol buffers” using a schema file definiton to build an array of sources, headers, and libraries that are later linked by the larger application during compile. There are no run-time libraries…it’s essentially a code generator.

The protobuf definition file assigns variable names to types and a specific integer spot; repeated and embedded messages are also possible. Developers have a clear way to package messages and the proto definition file can be made publicly available to bind external applications (to almost any language) to natively interface without compromising the legal intellectual property of the actual code-base. It’s just an interface.

I’m only aware of two good protocol buffer libraries, both authored by the same person (first at Google, then on his own). The only major limitation I’ve encountered is that for both libraries (for various reasons), the maximum message size is limited to about 2^30 bytes, or about 1GB. This presents a challenge to the size of any one system, but should work well for most as large problems should be decomposed into manageable systems, not one huge homogeneous domain with poor numerical complexity.

I could talk for days about message design and how it sort-of parallels your class structures — and how it also is sort of its own thing! Being introspective on “what constitutes a message” can yield huge optimizations across your application in practice. This is because if messages are not well-encapsulated, they will tend to have repetitive or unnecessary data per the context. Ideally, you’d only transmit what is needed, especially given bandwidth constraints. If you can constrain this to a finite set of messages, you’re off to a great start.

Another really neat byproduct of sever-client message unification is that servers already expect self-contained protobuf messages in order to perform operations, such as creating new objects (geometries, algorithms, etc). A command line interface (CLI) could also construct protobuf messages and invoke macro-level commands, just like a client. One could access a simulation via client, CLI, or through files on disk.

Applied to numerical computing, we developed four protocol buffer definition files, each applicable to specific contexts:

  • vector – array-like data that can benefit from arena allocations
  • geometry – topological information specific to a domain
  • setup – numerical system configuration data and associativity
  • meta – user preferences for a specific system

XCOMPUTE has implemented these messages for finite element, finite volume, and are formalizing support for finite difference, lattice-Boltzmann, and advanced geometric representations. The following unified XCOMPUTE file types that somewhat correspond to the aforementioned messages:

  • *.xco – numerical data array for a specific property-key (parallel CPU)
  • *.xcg – topology data for a specific geometry (structured / unstructured)
  • *.xcs – system setup (main project file, recursive directories)
  • *.xcm – metaobject media profile (technically a new 3d media format)

RSA or other encryption can wrap the serialized byte-stream as necessary. When you purchase an XCOMPUTE license, you receive a copy of these definitions along with a Creative Commons Attribution Non-Derivatives license to allow anyone to use them for their own projects and hopefully integrate with ours!

Leave a Reply

Your email address will not be published. Required fields are marked *