Generate CUIDs in Python

Resources  |  Generate CUIDs in Python

Generating a CUID (Collision-resistant Unique Identifier) in Python involves using various techniques such as the current timestamp, a counter, a machine fingerprint, and randomness. Below is an example implementation of generating a CUID in Python.

Steps to Generate a CUID in Python

  1. Timestamp: Get the current timestamp in milliseconds.
  2. Counter: Use a counter to handle multiple CUIDs generated in a short period.
  3. Fingerprint: Generate a machine-specific fingerprint.
  4. Randomness: Add random characters to further reduce the risk of collisions.
  5. Base36 Encoding: Encode the components using Base36 for readability.

Example Code

Here's a Python implementation to generate a CUID:

import hashlib
import random
import socket
import time

def generate_cuid():
    base36_chars = '0123456789abcdefghijklmnopqrstuvwxyz'
    counter = [0]
    last_timestamp = [0]

    def encode_base36(value):
        if value == 0:
            return '0'
        result = ''
        while value > 0:
            result = base36_chars[value % 36] + result
            value //= 36
        return result

    def get_machine_fingerprint():
        try:
            hostname = socket.gethostname()
            hash_value = hashlib.md5(hostname.encode('utf-8')).hexdigest()
            return hash_value[:4]
        except:
            return '0000'

    def get_random_string(length):
        return ''.join(random.choice(base36_chars) for _ in range(length))

    def generate_cuid():
        timestamp = int(time.time() * 1000)

        if timestamp == last_timestamp[0]:
            counter[0] += 1
        else:
            last_timestamp[0] = timestamp
            counter[0] = 0

        timestamp_base36 = encode_base36(timestamp)
        counter_base36 = encode_base36(counter[0])
        fingerprint = get_machine_fingerprint()
        random_string = get_random_string(4)

        return f'c{timestamp_base36}{counter_base36}{fingerprint}{random_string}'

    return generate_cuid()

# Example usage
print(generate_cuid())

Explanation

  1. Base36 Encoding:

    • The encode_base36 function converts a number to a Base36 encoded string.
  2. Timestamp:

    • timestamp = int(time.time() * 1000) gets the current timestamp in milliseconds.
  3. Counter:

    • A counter is used to handle multiple CUIDs generated within the same millisecond. It is incremented and reset as necessary.
  4. Machine Fingerprint:

    • get_machine_fingerprint generates a fingerprint using the MD5 hash of the machine's hostname.
  5. Random String:

    • get_random_string generates a random string of specified length using Base36 characters.
  6. Combine Components:

    • The generate_cuid function combines the timestamp, counter, fingerprint, and random parts to form the final CUID.

Summary

This Python implementation of CUID generation uses standard libraries (time, hashlib, random, socket) to handle timestamps, hashing, randomness, and counters. This approach ensures that the generated CUIDs are unique, readable, and collision-resistant, making them suitable for various applications. Adjustments can be made based on specific requirements, such as handling errors in obtaining the machine's hostname or adjusting the length of the random string portion.