Better Than JSON

36 points by barremian 2 hours ago

Compressed JSON is good enough and requires less human communication initially.

Sure it will blow up in your face when a field goes missing or value changes type.

People who advocate paying the higher cost ahead of time to perfectly type the entire data structure AND propose a process to do perform version updates to sync client/server are going to lose most of the time.

The zero cost of starting with JSON is too compelling even if it has a higher total cost due to production bugs later on.

When judging which alternative will succeed, lower perceived human cost beats lower machine cost every time.

This is why JSON is never going away, until it gets replaced with something with even lower human communication cost.

morshu9001 44 minutes ago

I've gone the all-JSON route many times, and pretty soon it starts getting annoying enough that I lament not using protos. I'm actually against static types in languages, but the API is one place they really matter (the other is the DB). Google made some unforced mistakes on proto usability/popularity though.
esafak an hour ago

It won't go away in the same way COBOL won't. That does not mean we should be using it everywhere for greenfield projects.
DyslexicAtheist an hour ago

> People who advocate paying the higher cost ahead of time to perfectly type the entire data structure AND propose a process to do perform version updates to sync client/server are going to lose most of the time.
that's true. But people also rather argue about security vulnerabilities than getting it right from the get-go. Why spend an extra 15 mins effort during design when you can spend 3 months revisiting the ensuing problem later.
- jstanley an hour ago
  
  Alternatively: why spend an extra 15 mins on protobuf every other day, when you can put off the 3-month JSON-revisiting project forever?

pzmarzly 2 hours ago

> With Protobuf, that’s impossible.

Unless your servers and clients push at different time, thus are compiled with different versions of your specs, then many safety bets are off.

There are ways to be mostly safe (never reuse IDs, use unknown-field-friendly copying methods, etc.), but distributed systems are distributed systems, and protobuf isn't a silver bullet that can solve all problems on author's list.

On the upside, it seems like protobuf3 fixed a lot of stuff I used to hate about protobuf2. Issues like:

> if the field is not a message, it has two states:

> - ...

> - the field is set to the default (zero) value. It will not be serialized to the wire. In fact, you cannot determine whether the default (zero) value was set or parsed from the wire or not provided at all

are now gone if you stick to using protobuf3 + `message` keyword. That's really cool.

connicpu an hour ago

Regardless of whether you use JSON or Protobuf, the only way to be safe from version tears in your serialization format is to enforce backwards compatibility in your CI pipeline by testing the new version of your service creates responses that are usable by older versions of your clients, and vice versa.
brabel 2 hours ago

No type system survives going through a network.
- dhussoe 36 minutes ago
  
  yes, but any sane JSON parsing library (Rust Serde, kotlinx-serialization, Swift, etc.) will raise an error when you have the wrong type or are missing a required field. and any JSON parsing callsite is very likely also an IO callsite so you need to handle errors there anyways, all IO can fail. then you log it or recover or whatever you do when IO fails in some other way in that situation.
  this seems like a problem only if you use JSON.parse or json.loads etc. and then just cross your fingers and hope that the types are correct, basically doing the silent equivalent of casting an "any" type to some structure that you assume is correct, rather than strictly parsing (parse, don't validate) into a typed structure before handing that off to other code.

codewritero 2 hours ago

I love to see people advocating for better protocols and standards but seeing the title I expected the author to present something which would be better in the sense of supporting the same or more use cases with better efficiency and/or ergonomics and I don't think that protobuf does that.

Protobuf has advantages, but is missing support for a tons of use cases where JSON thrives due to the strict schema requirement.

A much stronger argument could be made for CBOR as a replacement for JSON for most use cases. CBOR has the same schema flexibility as JSON but has a more concise encoding.

port11 2 hours ago

I think the strict schema of Protobuf might be one of the major improvements, as most APIs don't publish a JSON schema? I've always had to use ajv or superstruct to make sure payloads match a schema, Protobuf doesn't need that (supposedly).
youngtaff 2 hours ago

We need browsers to support CBOR APIs… and it shouldn’t be that hard as they all have internal implementations now

morshu9001 an hour ago

Protos are great. Last time I did a small project in NodeJS, I set up a server that defines the entire API in a .proto and serves each endpoint as either proto or json, depending on the content header. Even if the clients want to use json, at least I can define the whole API in proto spec instead of something like Swagger.

So my question is, why didn't Google just provide that as a library? The setup wasn't hard but wasn't trivial either. They also bait most people with gRPC, which is its own separate annoying thing that requires HTTP/2, which even Google's own cloud products don't support well (e.g App Engine).

P.S. Text proto is also the best static config language. More readable than JSON, less error-prone than YAML, more structure than both.

noctune 36 minutes ago

You might be interested in https://connectrpc.com/. It's basically what you describe, though it's not clear to me how well supported it is.
- morshu9001 32 minutes ago
  
  Yeah that one looked good. I don't remember why I didn't use it that time, maybe just felt it was easy enough to DIY that I didn't feel like using another dep. The thing is, Google themselves had to lead the way on this if they wanted protobuf to be mainstream like JSON.

brabel 2 hours ago

Mandatory comment about ASN.1, a protocol from 1984, already did what Protobuf does, with more flexibility. Yes, it's a bit ugly but if you stick to the DER encoding it's really not worse than Protbuf at all. Check out the Wikipedia example:

https://en.wikipedia.org/wiki/ASN.1#Example_encoded_in_DER

Protobuf is ok but if you actually look at how the serializers work, it's just too complex for what it achieves.

zzo38computer an hour ago

I also think ASN.1 DER is better (there are other formats, but in my opinion, DER is the only good one, because BER is too messy). I use it in some of my stuff, and when I can, my new designs also use ASN.1 DER rather than using JSON and Protobuf etc. (Some types are missing from standard ASN.1 but I made up a variant called "ASN.1X" which adds some additional types such as key/value list and some others. With the key/value list type added, it is now a superset of the data model of JSON, so you can convert JSON to ASN.1X DER.)
(I wrote a implementation of DER encoding/decoding in C, which is public domain and FOSS.)
morshu9001 an hour ago

ASN.1 is way overengineered to the point of making it hard to support. You don't need inheritance for example.
- zzo38computer an hour ago
  
  it is not necessary to use or to implement all of the data types and other features of ASN.1; you can implement only the features that you are using. Since DER uses the same framing for all data types, it is possible to skip past any fields that you do not care about (although in some cases you will still need to check its type, to determine whether or not an optional field is present; fortunately the type can be checked easily, even if it is not a type you implement).
  - morshu9001 39 minutes ago
    
    Yes but I don't want to worry about what parts of the spec are implemented on each end. If you removed all the unnecessary stuff and formed a new standard, it'd basically be protobuf.
    
    zzo38computer 25 minutes ago
    
    I do not agree. Which parts are necessary depends on the application; there is not one good way to do for everyone (and Protobuf is too limited). You will need to implement the parts specific to your schema/application on each end, and if the format does not have the data types that you want then you must add them in a more messy way (especially when using JSON).
strongpigeon an hour ago

> Protobuf is ok but if you actually look at how the serializers work, it's just too complex for what it achieves.
Yeah. I do remember a lot of workloads at Google where most of the CPU time was spent serializing/deserializing protos.
bloppe 2 hours ago

What makes it too complex in your opinion?
dgan an hour ago

I honestly looked up for a encoder/decoder for python/c++ application, and couldnt find anything usable; i guess i would need to contact the purchase department for a license (?), while with protobuf i can make the decision myself & all alone
pphysch an hour ago

> ASN.1, a protocol from 1984, already did what Protobuf does, with more flexibility.
After working heavily with SNMP across a wide variety of OEMs, this flexibility becomes a downside. Or SNMP/MIBs were specified at the wrong abstraction level, where the ASN.1 flexibility gives mfgs too much power to do insane and unconventional things.
- morshu9001 an hour ago
  
  Yeah same, ASN.1 was a nightmare when I was dealing with LTE

written-beyond 2 hours ago

Idk I built a production system and ensured all data transfers, client to server and server to client were proto buf and it was a pain.

Technically, it sounds really good but the actual act of managing it is hell. That or I need a lot of practice to use them, at that point shouldn't I just use JSON and get on with my life.

Arainach an hour ago

What issues did you have? In my experience, most things that could be called painful with protobuf would be bigger pains with things like JSON.
Making changes to messages in a backwards-compatible way can be annoying, but JSON allowing you to shoot yourself in the foot will take more time and effort to fix when it's corrupting data in prod than protobuf giving you a compile error would.
- written-beyond an hour ago
  
  Well at the bare minimum setting up proto files and knowing where they live across many projects.
  If they live in their own project, making a single project be buildable with a git clone gets progressively more complex.
  You now need sub modules to pull in your protobuf definitions.
  You now also need the protobuf tool chain to be available in your environment you just cloned to. If that environment has the wrong version the build fails, it starts to get frustrating pretty fast.
  Compare that to json, yes I don't get versioning and a bunch of other fancy features but... I get to finish my work, build and test pretty quickly.

dhussoe 28 minutes ago

I like that JSON parsing libraries (Serde etc.) allow you to validate nullability constraints at parse-time. Protobuf's deliberate lack of support for required fields means that either you kick that down to every callsite, or you need to build another parsing layer on top of the generated protobuf code.

Now, there is a serde_protobuf (I haven't used it) that I assume allows you to enforce nullability constraints but one of the article's points is that you can use the generated code directly and:

> No manual validation. No JSON parsing. No risk of type errors.

But this is not true—nullability errors are type errors. Manual validation is still required (except you should parse, not validate) to make sure that all of the fields your app expects are there in the response. And "manual" validation (again, parse don't validate) is not necessary with any good JSON parsing library, the library handles it.

wg0 43 minutes ago

Protos don't work out of the box in any browsers as far as I checked last time unless you're willing to deploy a proxy in front to do the translation and it requires extra dependency on the browser as well.

Plus - tooling.

JSON might not be simpler or strict but it gets the job done.

If JSON's size or performance is causing you to go out of business, you surely have bigger problems than JSON.

jtrn an hour ago

I like Python-like indentation, but I usually read Python in an IDE or code blocks. JSON in a non-monospace environment might be problematic with some fonts. Hell, I pass JSON around in emails and word processors all the time.

Jemaclus 2 hours ago

"Better than JSON" is a pretty bold claim, and even though the article makes some great cases, the author is making some trade-offs that I wouldn't make, based on my 20+ year career and experience. The author makes a statement at the beginning: "I find it surprising that JSON is so omnipresent when there are far more efficient alternatives."

We might disagree on what "efficient" means. OP is focusing on computer efficiency, where as you'll see, I tend to optimize for human efficiency (and, let's be clear, JSON is efficient _enough_ for 99% of computer cases).

I think the "human readable" part is often an overlooked pro by hardcore protobuf fans. One of my fundamental philosophies of engineering historically has been "clarity over cleverness." Perhaps the corollary to this is "...and simplicity over complexity." And I think protobuf, generally speaking, falls in the cleverness part, and certainly into the complexity part (with regards to dependencies).

JSON, on the other hand, is ubiquitous, human readable (clear), and simple (little-to-no dependencies).

I've found in my career that there's tremendous value in not needing to execute code to see what a payload contains. I've seen a lot of engineers (including myself, once upon a time!) take shortcuts like using bitwise values and protobufs and things like that to make things faster or to be clever or whatever. And then I've seen those same engineers, or perhaps their successors, find great difficulty in navigating years-old protobufs, when a JSON payload is immediately clear and understandable to any human, technical or not, upon a glance.

I write MUDs for fun, and one of the things that older MUD codebases do is that they use bit flags to compress a lot of information into a tiny integer. To know what conditions a player has (hunger, thirst, cursed, etc), you do some bit manipulation and you wind up with something like 31 that represents the player being thirsty (1), hungry (2), cursed (4), with haste (8), and with shield (16). Which is great, if you're optimizing for integer compression, but it's really bad when you want a human to look at it. You have to do a bunch of math to sort of de-compress that integer into something meaningful for humans.

Similarly with protobuf, I find that it usually optimizes for the wrong thing. To be clear, one of my other fundamental philosophies about engineering is that performance is king and that you should try to make things fast, but there are certainly diminishing returns, especially in codebases where humans interact frequently with the data. Protobufs make things fast at a cost, and that cost is typically clarity and human readability. Versioning also creates more friction. I've seen teams spend an inordinate amount of effort trying to ensure that both the producer and consumer are using the same versions.

This is not to say that protobufs are useless. It's great for enforcing API contracts at the code level, and it provides those speed improvements OP mentions. There are certain high-throughput use-cases where this complexity and relative opaqueness is not only an acceptable trade off, but the right one to make. But I've found that it's not particularly common, and people reaching for protobufs are often optimizing for the wrong things. Again, clarity over cleverness and simplicity over complexity.

I know one of the arguments is "it's better for situations where you control both sides," but if you're in any kind of team with more than a couple of engineers, this stops being true. Even if your internal API is controlled by "us," that "us" can sometimes span 100+ engineers, and you might as well consider it a public API.

I'm not a protobuf hater, I just think that the vast majority of engineers would go through their careers without ever touching protobufs, never miss it, never need it, and never find themselves where eking out that extra performance is truly worth the hassle.

Aldipower an hour ago

Great writing, thanks. There are of course 2 sides as always. I think especially for larger teams and large projects Protobuf in conjunction with gRPC can play wisely with the backwards compatibility feature, which makes it very hard to break things.
Arainach an hour ago

If you want human readable, there are text representations of protobuf for use at rest (checked in config files, etc.) while still being more efficient over the wire.
In terms of human effort, a strongly typed schema rather than one where you have to sanity check everything saves far more time in the long run.

catchmeifyoucan 2 hours ago

I wonder if we can write an API w/ JSON the usual way and change the final packaging to send it over protobuf.

bglusman an hour ago

Sure... https://protobuf.dev/programming-guides/json/
I was pushing at one point for us to have some code in our protobuf parsers that would essentially allow reading messages in either JSON or binary format, though to be fair there's some overhead that way by doing some kind of try/catch, but, for some use cases I think it's worth it...
pstuart 2 hours ago

If you're using Go then this framework let's you work with protobufs and gives you a JSON rest-like service for free: https://github.com/grpc-ecosystem/grpc-gateway

esafak 2 hours ago

It's premature and maybe presumptuous of him to be advertising protobufs when he hasn't heard of the alternatives yet. I'll engage the article after he discovers them...

spagoop 2 hours ago

Is it just me or is this article insanely confusing? With all due respect to the author, please be mindful of copy editing LLM-assisted writing.

There is a really interesting discussion underneath of this as to the limitations of JSON along with potential alternatives, but I can't help but distrust this writing due to how much it sounds like an LLM.

port11 2 hours ago

I don't think it's LLM-generated or even assisted. It's kinda like how I write when I don't want to really argue a point but rather get to the good bits.
Seems like the author just wanted to talk about Protobuf without bothering too much about the issues with JSON (though some are mentioned).
dkdcio 2 hours ago

do you have any evidence that the author used a LLM? focusing on the content, instead of the tooling used to write the content, leads to a lot more productive discussions
I promise you cannot tell LLM-generated content from non-LLM generated content. what you think you’re detecting is poor quality, which is orthogonal to the tooling used
- spagoop 2 hours ago
  
  Fair point, to be constructive here, LLMs seem to love lists and emphasizing random words / phrases with bold. Those two are everywhere. Not a smoking gun but enough to tune out.
  I am not dismissing this as being slop and actually have no beef with using LLMs to write but yes, as you call out, I think it's just poorly written or perhaps I'm not the specific audience for this.
  Sorry if this is bad energy, I appreciate the write up regardless.
pavel_lishin an hour ago

> Is it just me or is this article insanely confusing?
I didn't find it confusing.
I found it unconvincing, but the argument itself was pretty clear. I just disagreed with it.

wilg 2 hours ago

One of the best parts of Protobuf is that there's a fully compatible JSON serialization and deserialization spec, so you can offer a parallel JSON API with minimal extra work.

bglusman an hour ago

Yes! Came to comments to see if that was discussed/commented on above with link: https://protobuf.dev/programming-guides/json/

taco_emoji an hour ago

i really hate this blog post format of posting the new thing you discovered as if it's objectively better than the previous thing. no it's not, you just like it better, which is FINE, just own it

volemo 2 hours ago

S-expressions exist since 1960, what more do you need? /s