Let’s look at a few ways we can name our database fields. This is mostly subjective, with some personal opinion mixed in for added velocity.
To me, this is straight forward.
- All table names should be lower snake_case e.g.
- Table names should be plural e.g.
settings, etc. This slightly breaks down for words where the plural is the singular, for example
- Joining tables should be a merge of the tables they join e.g.
accounts_people, the order isn’t important.
- No upper case, ever
- No spaces, ever
Try to avoid meaningless words like
content. They are so general in nature, they could mean anything or nothing.
Almost every table having unique rows should have an
id field. Please do NOT name it
accounts.account_id, if you do this, you should be sad and feel ashamed. Such meta in the field names is just verbose nonsense. It should go without saying,
id should be unique.
We should address why auto-increment is a short sighted database design decision. If you like auto-increment, it’s likely you’ve only worked on smaller database projects. When a project grows bigger you need flexible solutions auto-increment is incapable of.
Seed Data: When using auto-increment, it’s impossible to create preexisting seed data for your database which does not conflict when reloading or working with data already in the database. This is a big problem. You will want to load up seed data, or preexisting rows in a table and using auto-increment, you would first have to insert the row, then get the inserted ID (hint in PG you would using
RETURNING *in your INSERT statement)
Sequences Auto increment uses an internal sequence, and different environments can easily get out of sync. As an experiment trying this query.
CREATE TABLE testing ( id SERIAL name TEXT ); INSERT INTO testing (name) VALUES (E'Shane'),(E'Barry'),(E'Bob'); DELETE FROM testing; INSERT INTO testing (name) VALUES (E'Shane'),(E'Barry'),(E'Bob'); SELECT * FROM testing;
You’ll notice the ids will be 4,5,6. Why is this? Well, the sequence does not get reset when you remove rows. It just keeps going. When you’re developing on your machine and someone else is adding rows, then your sequences will not be in line. This will need to be manually managed to make sure staging and productio do NOT get different ids for rows you may want to insert.
Uniqueness: Auto-increment ids are NOT unique. Enough said I guess. This causes a problem when you may want to combine similar tables.
Predictable: Auto-increment ids are entirely predictable. If ids get exposed to outside operators, this will open attack surfaces for bad actors, i.e. hackers love auto-incremented integers as ids.
Good question, it’s rather simple. The solution to the unique, unpredictable, pre-generated values for IDs is to use a UUID (or GUID).
Some minor drawbacks to using a UUID
- Cannot sort by UUID. Since they are not predictable or orderly, it’s impossible to sort them in any useful way. But lets face it, why are you sorting by an ID in the first place?
- They are long and ugly. There is no getting around this fact, compared to normal ID integers, UUIDs are long and ugly. They are difficult to relate verbally and would take more space in a URL. UUIDs are not really for human consumption.
- They take more space in the database. I consider this a moot point. The days of shaving bytes of data to conserve space have long since past. These minor differences are virtually irrelevant when we have hard drives with terabytes of space.
First, you’ll need to add support for it. It’s likely your version of PG already has it ready at your finger tips, but you need to enable the following plugins when you design your database.
CREATE EXTENSION IF NOT EXISTS "pgcrypto"; CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
Next, you would use it for your IDs like such
CREATE TABLE people ( id UUID PRIMARY KEY DEFAULT gen_random_uuid() );
You’ll see in this type of setup, we can provide a pre-generated UUID or fall back to an auto-generated one. The pre-existing would be when you have seed data providing the ID.
We’ve come to the part of the show where opinions will be most visible. What do you name your fields? First, I’m a big proponent of providing comments for all fields. Comments in postgres can be added to almost anything. Here we see a comment added to a column.
COMMENT ON COLUMN people.first_name IS 'The first name of the person';
Then, I would think of a name which fully describes the field.
images.url is not a good choice, because an image could have different types of URLs might needing to be tracked. Think maybe
images.placeholder_url. Remember: Try not to be short sighted in your database design, assume your schema will change, and prepare for it.
Again, avoid watered down, meaningless words like
Next, make sure you are not using reserved words in SQL or in the app language primarily in use. A perfect example is
class. While in most cases, it will not be a problem, it does get confusing when you are working with data and has conflicting reserved words.
A lot of database designs will use the convention
emails.person_id as a foreign key to the field
people.id. I can see the value in this, because it provides hints in your application layer. The value is just an ID to a foreign field. I do not like this convention, as I mentioned previously it’s too meta. As a database administrator you know it’s a foreign key, so it’s almost certainly an ID, putting it in the field name is redundant. As for the application layer, they should be cognizant of table structure and understand
emails.person is an ID.
- Add comments to your fields
- Avoid generic, easily confused naming
- Stay clear of obvious reserved words
_idto foreign key fields? It’s up to you
We’ll stop here, but you can imagine there are many different ways that we could postulate about database field naming. Some are better than others, but in the end, it needs to make sense to you and your teams. Do the best you can, adapt, and overcome.
I’ve been slightly swayed by the article How I Write SQL by Sehrope Sarkuni. It makes some good points.
- Names of tables should be singular
- FK should be like
account_idinstead of just
The only thing worse than bad naming conventions is multiple naming conventions. If your existing project already has a standard approach to naming its database objects then keep using it.
Did this help you out? It took me a few days to piece together all this information together, I hope this saves you some time (who knows, maybe the future me will be thankful I wrote this down). Let me know your thoughts. [email protected]