Efficiently Remove Duplicate Rows from a 2D Numpy Array
Suppose you have a 2d numpy array and you want to remove duplicate rows (or columns). In this blog post, I'll show you a trick you can use to do this more efficiently than using np.unique(A, axis=0) . This algorithm has time complexity O(\max(n \log{n}, n m)) for an n \times m matrix, and works almost surely . By "almost surely" I mean that it is a randomized algorithm that works correctly with probability 1. To find the unique rows of a matrix A, the algorithm works by generating a random vector x of real numbers, computing the dot product y = A x , then analyzing the unique elements of y. The indices of unique elements in y is the same as the unique row indices in A with probability 1. Below is an example to demonstrate how it works: > import numpy as np > A = np.random.randint(low=0, high=3, size=(10, 2)) > A array([[2, 1], [1, 2], [0, 0], [2, 2], [1, 2], [0, 0], [0, 2], ...