Intermediate

Non-Comparison Sorts: Counting, Radix, Bucket

Comparison-based sorting has a proven lower bound of O(n log n). But when your data has special properties — bounded integers, fixed-length keys, or uniform distributions — non-comparison sorts achieve O(n) time. These algorithms are critical for feature binning, histogram construction, and high-throughput data processing in ML.

1. Counting Sort

Counting sort works by counting the occurrences of each value and using cumulative counts to place elements directly. It runs in O(n + k) time where k is the range of input values.

PropertyValue
TimeO(n + k) where k = range of values
SpaceO(n + k)
Stable?Yes (when implemented correctly)
ConstraintOnly works for non-negative integers (or values mappable to non-negative integers)
def counting_sort(arr: list, max_val: int = None) -> list:
    """Stable counting sort for non-negative integers.

    Time:  O(n + k) where k = max value
    Space: O(n + k)

    How it works:
    1. Count occurrences of each value
    2. Compute cumulative counts (prefix sum)
    3. Place elements in output using cumulative counts
    """
    if not arr:
        return []

    if max_val is None:
        max_val = max(arr)

    # Step 1: Count occurrences
    count = [0] * (max_val + 1)
    for val in arr:
        count[val] += 1

    # Step 2: Cumulative count (prefix sum)
    # count[i] now = number of elements <= i
    for i in range(1, len(count)):
        count[i] += count[i - 1]

    # Step 3: Build output array (iterate backwards for stability)
    output = [0] * len(arr)
    for i in range(len(arr) - 1, -1, -1):
        val = arr[i]
        count[val] -= 1
        output[count[val]] = val

    return output

# Example: sort user ratings (0-5 scale)
ratings = [3, 1, 4, 1, 5, 2, 3, 5, 3, 2, 1, 4]
sorted_ratings = counting_sort(ratings, max_val=5)
print(sorted_ratings)  # [1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5]

ML Application: Feature Binning with Counting Sort

Counting sort is perfect for discretizing continuous features into bins. When you convert float features to integer bin indices, counting sort processes them in O(n) instead of O(n log n).

import math

def bin_features(values: list, num_bins: int) -> dict:
    """Bin continuous features into equal-width bins using counting sort.

    ML use case: Feature discretization for gradient boosting trees
    (XGBoost, LightGBM use histogram-based binning internally).

    Time:  O(n + num_bins)
    Space: O(n + num_bins)
    """
    if not values:
        return {}

    min_val, max_val = min(values), max(values)
    bin_width = (max_val - min_val) / num_bins if max_val != min_val else 1

    # Map each value to a bin index
    bin_indices = []
    for v in values:
        idx = min(int((v - min_val) / bin_width), num_bins - 1)
        bin_indices.append(idx)

    # Count sort the bin indices
    count = [0] * num_bins
    for idx in bin_indices:
        count[idx] += 1

    # Build histogram
    histogram = {}
    for i in range(num_bins):
        bin_start = min_val + i * bin_width
        bin_end = bin_start + bin_width
        histogram[f"[{bin_start:.2f}, {bin_end:.2f})"] = count[i]

    return histogram

# Example: bin model confidence scores
scores = [0.12, 0.45, 0.67, 0.89, 0.23, 0.91, 0.34, 0.78, 0.56, 0.95]
hist = bin_features(scores, num_bins=5)
for bin_range, count in hist.items():
    print(f"  {bin_range}: {count}")
# Output:
#   [0.12, 0.29): 2
#   [0.29, 0.45): 2
#   [0.45, 0.62): 2
#   [0.62, 0.78): 1
#   [0.78, 0.95): 3

2. Radix Sort

Radix sort processes digits from least significant to most significant, using a stable sort (usually counting sort) as a subroutine for each digit position. It runs in O(d(n + k)) where d is the number of digits and k is the base.

PropertyValue
TimeO(d(n + k)) where d = digits, k = base (10 for decimal)
SpaceO(n + k)
Stable?Yes
ConstraintWorks for integers or fixed-length strings
def radix_sort(arr: list) -> list:
    """LSD Radix Sort for non-negative integers.

    Time:  O(d * (n + 10)) where d = number of digits
    Space: O(n + 10)

    Processes digits from least significant to most significant,
    using counting sort as a subroutine for each digit.
    """
    if not arr:
        return []

    max_val = max(arr)
    result = arr.copy()
    exp = 1  # Current digit position (1s, 10s, 100s, ...)

    while max_val // exp > 0:
        result = counting_sort_by_digit(result, exp)
        exp *= 10

    return result

def counting_sort_by_digit(arr: list, exp: int) -> list:
    """Stable counting sort on a specific digit position.

    exp=1 sorts by ones digit, exp=10 by tens, etc.
    """
    n = len(arr)
    output = [0] * n
    count = [0] * 10  # Digits 0-9

    # Count occurrences of each digit
    for val in arr:
        digit = (val // exp) % 10
        count[digit] += 1

    # Cumulative count
    for i in range(1, 10):
        count[i] += count[i - 1]

    # Build output (backwards for stability)
    for i in range(n - 1, -1, -1):
        digit = (arr[i] // exp) % 10
        count[digit] -= 1
        output[count[digit]] = arr[i]

    return output

# Example: sort user IDs
user_ids = [170, 45, 75, 90, 802, 24, 2, 66]
sorted_ids = radix_sort(user_ids)
print(sorted_ids)  # [2, 24, 45, 66, 75, 90, 170, 802]

ML Application: Sorting Embedding Indices

def sort_retrieval_results(doc_ids: list, scores: list) -> list:
    """Sort retrieval results by quantized score using radix sort.

    In retrieval systems, scores are often quantized to integers
    (e.g., BM25 scores * 1000). Radix sort handles millions of
    results faster than comparison sort.

    Time:  O(d * n) where d = digits in max score
    Space: O(n)
    """
    # Pair (quantized_score, doc_id) and sort by score
    n = len(doc_ids)
    paired = list(zip(scores, doc_ids))

    # Sort by score using radix sort on quantized values
    max_score = max(scores) if scores else 0
    exp = 1

    while max_score // exp > 0:
        # Counting sort by current digit
        output = [None] * n
        count = [0] * 10

        for score, doc_id in paired:
            digit = (score // exp) % 10
            count[digit] += 1

        for i in range(1, 10):
            count[i] += count[i - 1]

        for i in range(n - 1, -1, -1):
            digit = (paired[i][0] // exp) % 10
            count[digit] -= 1
            output[count[digit]] = paired[i]

        paired = output
        exp *= 10

    # Return doc_ids sorted by score (ascending)
    return [(doc_id, score) for score, doc_id in paired]

# Example: BM25 retrieval results
doc_ids = [101, 205, 303, 412, 509, 678]
bm25_scores = [2340, 1876, 3102, 987, 2340, 1543]
results = sort_retrieval_results(doc_ids, bm25_scores)
print("Sorted by BM25 score:")
for doc_id, score in results:
    print(f"  Doc {doc_id}: {score}")

3. Bucket Sort

Bucket sort distributes elements into buckets based on their value range, sorts each bucket individually, then concatenates. It achieves O(n) average time when elements are uniformly distributed.

PropertyValue
Average TimeO(n + k) when data is uniformly distributed
Worst TimeO(n²) when all elements land in one bucket
SpaceO(n + k) where k = number of buckets
Stable?Yes (if inner sort is stable)
def bucket_sort(arr: list, num_buckets: int = 10) -> list:
    """Bucket sort for floating-point numbers in [0, 1).

    Time:  O(n) average when data is uniformly distributed
    Space: O(n + num_buckets)

    Perfect for sorting probability scores, normalized features,
    and other float data in [0, 1) range.
    """
    if not arr:
        return []

    # Create empty buckets
    buckets = [[] for _ in range(num_buckets)]

    # Distribute elements into buckets
    for val in arr:
        bucket_idx = min(int(val * num_buckets), num_buckets - 1)
        buckets[bucket_idx].append(val)

    # Sort each bucket (insertion sort for small buckets)
    for bucket in buckets:
        bucket.sort()  # Small buckets, so O(k^2) is fine

    # Concatenate all buckets
    result = []
    for bucket in buckets:
        result.extend(bucket)

    return result

# Example: sort model prediction probabilities
probabilities = [0.78, 0.12, 0.95, 0.34, 0.67, 0.89, 0.23, 0.56, 0.91, 0.45]
sorted_probs = bucket_sort(probabilities)
print(sorted_probs)
# [0.12, 0.23, 0.34, 0.45, 0.56, 0.67, 0.78, 0.89, 0.91, 0.95]

Generalized Bucket Sort for Any Range

def bucket_sort_general(arr: list, num_buckets: int = 10) -> list:
    """Generalized bucket sort for any numeric range.

    Maps values to [0, 1) range then applies bucket sort.

    Time:  O(n) average
    Space: O(n + num_buckets)
    """
    if len(arr) <= 1:
        return arr.copy()

    min_val = min(arr)
    max_val = max(arr)

    if min_val == max_val:
        return arr.copy()

    range_val = max_val - min_val
    buckets = [[] for _ in range(num_buckets)]

    for val in arr:
        # Normalize to [0, 1) then map to bucket
        normalized = (val - min_val) / range_val
        bucket_idx = min(int(normalized * num_buckets), num_buckets - 1)
        buckets[bucket_idx].append(val)

    result = []
    for bucket in buckets:
        bucket.sort()
        result.extend(bucket)

    return result

# ML Application: Sort feature values for percentile computation
feature_values = [23.5, 67.8, 12.1, 89.3, 45.6, 34.2, 78.9, 56.7, 91.0, 15.4]
sorted_features = bucket_sort_general(feature_values)
n = len(sorted_features)
percentiles = {
    "p25": sorted_features[n // 4],
    "p50": sorted_features[n // 2],
    "p75": sorted_features[3 * n // 4],
    "p90": sorted_features[int(0.9 * n)]
}
print(f"Feature percentiles: {percentiles}")

When to Use Non-Comparison Sorts

AlgorithmUse WhenML Application
Counting SortSmall integer range (k << n²), need stabilitySorting categorical features, label distribution, rating histograms
Radix SortFixed-length integers, strings, or keysSorting embedding indices, IP addresses, document IDs at scale
Bucket SortUniformly distributed floats or continuous dataSorting probability scores, normalized features, percentile computation
Common mistake: Using counting sort when k (range) is huge. If you are sorting 1000 elements with values up to 10 billion, counting sort needs a 10-billion-element array. Use radix sort instead — it processes 10 digits with a size-10 count array each pass, using O(10 * (n + 10)) = O(n) total space.
💡
  • Non-comparison sorts break the O(n log n) barrier by exploiting data structure (integers, bounded range, uniform distribution)
  • Counting sort is ideal when k (value range) is O(n) — directly applicable to categorical features and small integer data
  • Radix sort handles large integers by processing one digit at a time — used in database index sorting and log processing
  • Bucket sort is perfect for uniformly distributed floats — exactly the distribution of probability scores from sigmoid/softmax outputs
  • XGBoost and LightGBM use histogram-based binning internally, which is essentially counting sort on discretized features