Where it all ends up – the database

It’s a bit much to try and tackle the entire web application stack all at once. It can often be helpful to look at a particular area, see how the various technologies fit with your end goal, then let that guide you towards complementary technologies.

Database first?

One of the most approachable decisions is the data persistence layer. I think this is a great way to start narrowing your focus. Databases are pleasantly abstracted from applications, with generally usable middleware to make integration with everything else doable. Whether you use Redis or PostgreSQL or MongoDB, you can usually move from one to the other without impacting the majority of your application logic. That’s not to say it’ll be easy, just that you *can*. I recently took part in a MySQL to PostgreSQL migration, and it definitely had its quirks, but it was quite managable. Also, one of the only real manifestations of “I made a bad choice!” is performance suffers at scale – that’s a great problem to have!

NoSQL seems to be all the rage these days, I’m suspicious it’s mostly because people don’t want to learn SQL or genrally don’t know what they want to do, which is fair-ish. Certainly key-value stores like MongoDB come across as kind of a cop out. It seems like most data is pretty well structured, and worst case scenario you can store JSON in PostgreSQL, so some of the more “modern” choices seem a bit suspect. That’s not to say PostgreSQL solves all problems for all people. Graph databases like Neo4j are a great example of an excellent reason to use a different technology – it overlaps very little with what RDBMSs excel at. On the other hand, is a timeseries database a special enough case to warrant something like InfluxDB, or would you be better of with more traditional choices? That’s a much tougher call to make. For the general case, and for my own purposes, I feel like the ultimate answer is two fold.

You’re guaranteed to have structured data in your application. I contend that while it’s marginally more work to get started with a schema and have to deal with migrations and types, those are all good problems to have. Sure it sucks when you’re just starting and you can’t do anything until you write a migration or a schema, but as soon as you get out of that very first step and have data you want to keep around, it’s SO much easier when you have some structure and integrity. The bugs that arise from unstructured data storage feel far too much like the bugs that are only possible with dynamically typed languages. Use a SQL database for data that fits in it. Your user has a name and a phone number – sure some day you may add an address, but fundamentally I have a hard time believing the model will shift enough to justify a schema-less technology.

You’re guaranteed to have unstructured or semi-structured data in your application. This is where low friction when soliciting JSON is particularly valuable. You don’t want to have to translate from your form submission to your database, you want some magical ORM angel to swoop down and save you from knowing what you’re doing. Someone filled out an arbitrary 3/4 of a form – do you put that in your database? Probably not! You desperately don’t want to sacrifice data integrity, and you’re immediately going to run into constraints you’ve put in place that provide value in general but provide friction while iterating over invalid data to get it to a usable state. Side note: this is one of the fundamental design issues I have with ActiveRecord in Rails – it doesn’t provide any real separation from the application perspective between an “immutable” object that can be constructed from data in the database and the super mutable object that is destined to maybe one day make it into the database. To address that, storing things directly as JSON, or alternatively looking at a parallel deployment of someting like MongoDB or Redis (depending on use cases) may make more sense. Once they’ve been completed and validated, then you can fit them into the structure you’ve defined.

 

For my own foray into end-to-end web development, I’ve decided I’ll have a strong bias towards technologies that allow me to use PostgreSQL.

Pros:

  • It’s the data persistence technology I’m most familiar with (bit of a trump card)
  • It’s got a huge community to provide tooling and help
  • It extremely mature, which means its rock solid and scalable

Cons:

  • It’s not that cool, what with its rules and structure

In the same sense that I feel pretty strongly that static typing (when implemented properly in a language) is undeniably “better” than dynamic typing, I’m pretty suspicious that most of the reasons people use NoSQL are bullshit. I’d love to see some numbers that show modern MongoDB (for example) panning out better than PostgreSQL for… well… anything really. The only thing I’d buy is the same argument for dynamic typing. It’s “easier” at the very outset of a project, largely because it doesn’t make you stop and figure out what you’re doing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s