This document describes approach for generaiting a short, human friendly and unique entities identifiers without collisions.
API-resource IDs should not be easy predictable because of the following reasons:
-
Even though the proper permissions validation needs to be implemented to control API resource access, in case of a vulnerability a predictable ID format allows to leverage the vulnarebility for a much bigger set of resources via IDs enumeration. If by having set of IDs it's hard to predict what other IDs could have been in the system the vulnerability's impact gets descreased.
-
Sequential integer IDs allow to estimate number of entities in the system which also may be beneficial for the competitors.
In order to address the challeges listed above a radnom ID generation may be used like UUID or just a random string of a certain length in the given alphabet.
But leveraging this approach requires IDs length to be increased in order to
avoid collisions during ID generation. If there are
This follows from the following fact: there are
In fact this is exactly well known Birthday Problem and the probability formula has approximation:
The probability to meet at least one collision during
Approximation significantly simplifies this probability calculation for the big
values of
In order to make IDs easily readable and writable by humans the ID alphabet must
be limited to exclude the characters than may seem pretty similar in some fonts
and thus may cause confusion. Numbers
It has
If an entity representation is stored in a PostgreSQL table with an integer
primary key there are not more than
From the above follows, for
From the other side to represent any of
This means the ID length needs to be almost doubled in order to avoid collisions during random generations which makes such approach not very human friendly because a 7-characters IDs is obviously easier to type instead of a 13-characters ID.
Bellow an alternative approach is proposed for IDs generation. This approach
is collisions free and therefore does not require excess characters so only
A sequence of numbers
Applying for a sequence a bijective function
One example of such a function is RSA-encryption.
Let's denote
Let
Then ID generation algorithm is the following:
Consequence 1: This approach may be applied to an existing records in a database withput any DDL: IDs may be calculated "on the fly" during API calls without storing the encrypted IDs in a database.
Note 1: Consequence 1 is applicable only if there is no need to do a prefix search for IDs. In that case because of the randomization IDs would be better to be stored in a database explicitly.
Note 2: The approach proposed uses RSA encryption but this is not the only
possible option. Other asymmetric encryptions could also work. The symmetric
algorithms (DES, AES) use a certain block sizes (48 bit and more) and because
of that could not provide the necessary short ID length (
Python implementation example is provided in this file. It
should be executed as a command line script (python src/genid.py
) and provides
the following subcommands:
- inc -- generate RSA-encrypted sequence numbers (IDs)
- key -- print possible
$(e, d)$ key-paris - primes -- list prime numbers in the given range
- rng -- generate random 14-characters IDs, can be used to compare generation performance
This approach can also we applied to an existing database data quite easily: either use Consequence 1, or store the IDs generated in a column, this folder together with Justfile (see Justfile docs for more details about the tool) provides examples for that, to run them execute the following commands:
Run a PostgreSQL container docker compose up -dDuring startup it will create 2 tables in the main database with 10k rows in each. The table demo_labtest is used for integer PK table example, the table demo_assesement is used for uuid PK table example.
Apply database migrations just migrateIt will provide the necessary columns and functions.
Insert more data just insert demo_labtest just insert demo_assesementCheck results just psqlSelect table rows to check.
Note 3: There are several options to choose public and private keys