Post

How to Choose a Primary Key: Auto Increment vs UUID vs Natural Keys

Should a database primary key use auto increment, UUID, or a natural key? This article compares the trade-offs from performance, security, and scalability perspectives.

How to Choose a Primary Key: Auto Increment vs UUID vs Natural Keys

When you first start designing a database, one question comes up very quickly: what should you use as the primary key?

A primary key has to be unique, so many people instinctively think of fields that are already unique in the real world.

For example, if every person has a unique Social Security number, could you just use that as the primary key?

That is where the real discussion starts, because “can be used as a primary key” and “is a good primary key” are not the same thing.

This article walks through the most common primary key choices and the trade-offs behind each one.

Are business-generated values a good fit for primary keys?

Short answer: if a column can truly guarantee uniqueness, then yes, it can technically be used as a primary key.

In practice, though, business-generated values are often a poor fit. The main reasons are performance, security, and long-term flexibility.

1. Index performance

A field like a Social Security number is usually stored as a string, which is longer and more expensive to compare than an integer.

In a typical B+ Tree index, short and increasing integer values are usually more efficient than string-based primary keys. That is one of the main reasons auto increment IDs remain attractive.

2. Privacy and security

Primary keys often show up everywhere: URLs, API responses, logs, query conditions, and integrations with other systems.

If you use business-meaningful or sensitive values such as Social Security numbers, phone numbers, or employee IDs as primary keys, you are exposing information that should not sit at the center of your data model. From both a security and compliance perspective, that is usually a bad trade-off.

3. Flexibility

Even if performance and privacy were not an issue, flexibility often becomes the real problem later.

Here is a simple example.

Suppose your first design assumes that one Social Security number maps to one game account, so you decide to use it as the primary key.

Later, the business changes and now one person is allowed to bind three game accounts instead.

At that point, the original schema immediately becomes awkward. And if other tables already reference that primary key as a foreign key, the cost of refactoring can become very high.

What is usually a better choice?

If business fields are not ideal as primary keys, the two most common options in practice are usually these:

  1. Auto incrementing IDs
  2. UUIDs

Auto incrementing IDs

Auto incrementing IDs are the classic and still the most common primary key design.

In SQL Server, this is commonly implemented with IDENTITY. In PostgreSQL, it is often implemented with SEQUENCE or GENERATED AS IDENTITY.

The benefits are straightforward:

  1. Short values with efficient indexes
  2. Sequential inserts that are friendly to B+ Tree indexes
  3. Usually generated by the database, so application code does not need to manage them

If your system is a monolith, uses a single database, or simply does not need IDs to be created outside the database, auto incrementing IDs are usually a very stable choice.

UUID

The main advantage of UUIDs is that they do not depend on a database-side sequence. They can be generated in the application or in the database, which makes them useful in distributed systems, multi-service architectures, or workflows where IDs must be created offline.

UUIDs can be generated in application code or by the database itself, but the syntax differs depending on the database engine.

The most common UUID versions are roughly these:

  • v1: generated from a timestamp and MAC address
  • v4: generated from random numbers and still the most common version
  • v7: time-ordered and generally more index-friendly

Different databases generate UUIDs in different ways:

DatabaseUUID functionDefault return format
PostgreSQLgen_random_uuid()Standard string with hyphens
MySQLUUID()Standard string with hyphens
SQL ServerNEWID()UNIQUEIDENTIFIER
Oracle (older)SYS_GUID()RAW(16) without hyphens
Oracle (23ai)UUID()Standards-compliant RAW(16)

In Oracle, storing UUIDs as RAW(16) instead of VARCHAR2(36) is usually more efficient because it saves more than 50% of the storage space and improves index performance.

Can UUIDs really never collide?

Short answer: in theory they can, but in practice it is extremely unlikely.

You can think about it from three angles.

1. How small is the probability?

Using UUID v4 as the common example, you get around $2^{122}$ possible combinations, which is about $5.3 \times 10^{36}$.

That number is so large that even if you generated a billion UUIDs every second for 100 years, the chance of a collision would still be negligible.

2. If duplicates happen in the real world, what usually causes them?

If UUID duplication actually shows up, the problem is usually not that the UUID space is too small. It is more often caused by a broken randomness source.

Common causes include:

  1. Reused random seeds
  2. Poor library implementations
  3. Random generators that are not reliable enough

3. What can you do if collisions still worry you?

If you operate in a domain where collision risk matters a lot, such as finance, accounting, or other high-reliability systems, there are at least two practical safeguards:

  1. Add a unique constraint at the database layer
  2. Use something more ordered such as UUID v7 or ULID when it fits the use case

If you use auto increment IDs, how do you stop people from guessing them?

Many teams avoid auto increment IDs not because of performance, but because sequential values are easy to guess.

For example, if a URL is /user/1001, users will naturally try /user/1002 or /user/1003. But the real issue is not that a primary key is sequential. The real issue is that authorization should never rely only on a primary key value.

If your goal is just to make external IDs less predictable, you can consider:

  1. Not exposing internal table primary keys directly
  2. Providing a separate public-facing ID
  3. Using Hashids, ULID, or UUID as an external identifier when appropriate

Hashids is one of the most widely used libraries for this purpose, with implementations in PHP, Python, JavaScript, Go, Java, and many other languages. It converts an integer ID into a short string by using a salt, for example 1 -> jR. In an e-commerce system, if order numbers are exposed directly as sequential values, people can often estimate how many orders the business receives in a day just by looking at neighboring IDs. Even with Hashids, if the salt is weak or enough samples are collected, the original pattern may still be inferred.

It is worth stressing that these techniques improve obscurity, not authorization. Real security still depends on proper backend access control.

How should you choose in practice?

If you want a short summary, the rules of thumb are usually these:

  1. For a monolith with one database and strong performance requirements, prefer auto increment IDs
  2. For distributed systems, multi-service architectures, or workflows that need IDs before persistence, consider UUIDs
  3. Do not use business-meaningful or sensitive fields directly as the primary key

The ideal primary key is stable, simple, free of business meaning, and unlikely to force a schema rewrite when requirements change.

In many cases, the better design is to keep the business-meaningful field as a separate column with a unique constraint instead of making it the primary key.

This post is licensed under CC BY 4.0 by the author.