Numpy shuffle multidimensional array by row only, keep column order unchanged
np.random.shuffle returns none
numpy random permutation
numpy shuffle columns
np.random.shuffle seed
numpy shuffle not in place
np.random.shuffle not working
shuffle rows of two numpy arrays
How can I shuffle a multidimensional array by row only in Python (so do not shuffle the columns).
I am looking for the most efficient solution, because my matrix is very huge. Is it also possible to do this highly efficient on the original array (to save memory)?
Example:
import numpy as np X = np.random.random((6, 2)) print(X) Y = ???shuffle by row only not colls??? print(Y)
What I expect now is original matrix:
[[ 0.48252164 0.12013048] [ 0.77254355 0.74382174] [ 0.45174186 0.8782033 ] [ 0.75623083 0.71763107] [ 0.26809253 0.75144034] [ 0.23442518 0.39031414]]
Output shuffle the rows not cols e.g.:
[[ 0.45174186 0.8782033 ] [ 0.48252164 0.12013048] [ 0.77254355 0.74382174] [ 0.75623083 0.71763107] [ 0.23442518 0.39031414] [ 0.26809253 0.75144034]]
That's what numpy.random.shuffle()
is for :
>>> X = np.random.random((6, 2)) >>> X array([[ 0.9818058 , 0.67513579], [ 0.82312674, 0.82768118], [ 0.29468324, 0.59305925], [ 0.25731731, 0.16676408], [ 0.27402974, 0.55215778], [ 0.44323485, 0.78779887]]) >>> np.random.shuffle(X) >>> X array([[ 0.9818058 , 0.67513579], [ 0.44323485, 0.78779887], [ 0.82312674, 0.82768118], [ 0.29468324, 0.59305925], [ 0.25731731, 0.16676408], [ 0.27402974, 0.55215778]])
How to randomly shuffle an array in python using numpy, To randomly shuffle a 1D array in python, there is the numpy function multidimensional array by row only, keep column order unchanged� So, it seems using these np.take based could be used only if memory is a concern or else np.random.shuffle based solution looks like the way to go. Answer 3 After a bit experiment i found most memory and time efficient way to shuffle data(row wise) of ndarray is, shuffle the index and get the data from shuffled index
You can also use np.random.permutation
to generate random permutation of row indices and then index into the rows of X
using np.take
with axis=0
. Also, np.take
facilitates overwriting to the input array X
itself with out=
option, which would save us memory. Thus, the implementation would look like this 
np.take(X,np.random.permutation(X.shape[0]),axis=0,out=X)
Sample run 
In [23]: X Out[23]: array([[ 0.60511059, 0.75001599], [ 0.30968339, 0.09162172], [ 0.14673218, 0.09089028], [ 0.31663128, 0.10000309], [ 0.0957233 , 0.96210485], [ 0.56843186, 0.36654023]]) In [24]: np.take(X,np.random.permutation(X.shape[0]),axis=0,out=X); In [25]: X Out[25]: array([[ 0.14673218, 0.09089028], [ 0.31663128, 0.10000309], [ 0.30968339, 0.09162172], [ 0.56843186, 0.36654023], [ 0.0957233 , 0.96210485], [ 0.60511059, 0.75001599]])
Additional performance boost
Here's a trick to speed up np.random.permutation(X.shape[0])
with np.argsort()

np.random.rand(X.shape[0]).argsort()
Speedup results 
In [32]: X = np.random.random((6000, 2000)) In [33]: %timeit np.random.permutation(X.shape[0]) 1000 loops, best of 3: 510 µs per loop In [34]: %timeit np.random.rand(X.shape[0]).argsort() 1000 loops, best of 3: 297 µs per loop
Thus, the shuffling solution could be modified to 
np.take(X,np.random.rand(X.shape[0]).argsort(),axis=0,out=X)
Runtime tests 
These tests include the two approaches listed in this post and np.shuffle
based one in @Kasramvd's solution
.
In [40]: X = np.random.random((6000, 2000)) In [41]: %timeit np.random.shuffle(X) 10 loops, best of 3: 25.2 ms per loop In [42]: %timeit np.take(X,np.random.permutation(X.shape[0]),axis=0,out=X) 10 loops, best of 3: 53.3 ms per loop In [43]: %timeit np.take(X,np.random.rand(X.shape[0]).argsort(),axis=0,out=X) 10 loops, best of 3: 53.2 ms per loop
So, it seems using these np.take
based could be used only if memory is a concern or else np.random.shuffle
based solution looks like the way to go.
np.random.shuffle(array), 问题How can I shuffle a multidimensional array by row only in Python (so do Numpy shuffle multidimensional array by row only, keep column order unchanged. To randomly shuffle a 1D array in Numpy shuffle multidimensional array by row only, keep column order unchanged: randomly shuffle an array in python using
After a bit experiment i found most memory and time efficient way to shuffle data(row wise) of ndarray is, shuffle the index and get the data from shuffled index
rand_num2 = np.random.randint(5, size=(6000, 2000)) perm = np.arange(rand_num2.shape[0]) np.random.shuffle(perm) rand_num2 = rand_num2[perm]
in more detailsHere, I am using memory_profiler to find memory usage and python's builtin "time" module to record time and comparing all previous answers
def main(): # shuffle data itself rand_num = np.random.randint(5, size=(6000, 2000)) start = time.time() np.random.shuffle(rand_num) print('Time for direct shuffle: {0}'.format((time.time()  start))) # Shuffle index and get data from shuffled index rand_num2 = np.random.randint(5, size=(6000, 2000)) start = time.time() perm = np.arange(rand_num2.shape[0]) np.random.shuffle(perm) rand_num2 = rand_num2[perm] print('Time for shuffling index: {0}'.format((time.time()  start))) # using np.take() rand_num3 = np.random.randint(5, size=(6000, 2000)) start = time.time() np.take(rand_num3, np.random.rand(rand_num3.shape[0]).argsort(), axis=0, out=rand_num3) print("Time taken by np.take, {0}".format((time.time()  start)))
Result for Time
Time for direct shuffle: 0.03345608711242676 # 33.4msec Time for shuffling index: 0.019818782806396484 # 19.8msec Time taken by np.take, 0.06726956367492676 # 67.2msec
Memory profiler Result
Line # Mem usage Increment Line Contents ================================================ 39 117.422 MiB 0.000 MiB @profile 40 def main(): 41 # shuffle data itself 42 208.977 MiB 91.555 MiB rand_num = np.random.randint(5, size=(6000, 2000)) 43 208.977 MiB 0.000 MiB start = time.time() 44 208.977 MiB 0.000 MiB np.random.shuffle(rand_num) 45 208.977 MiB 0.000 MiB print('Time for direct shuffle: {0}'.format((time.time()  start))) 46 47 # Shuffle index and get data from shuffled index 48 300.531 MiB 91.555 MiB rand_num2 = np.random.randint(5, size=(6000, 2000)) 49 300.531 MiB 0.000 MiB start = time.time() 50 300.535 MiB 0.004 MiB perm = np.arange(rand_num2.shape[0]) 51 300.539 MiB 0.004 MiB np.random.shuffle(perm) 52 300.539 MiB 0.000 MiB rand_num2 = rand_num2[perm] 53 300.539 MiB 0.000 MiB print('Time for shuffling index: {0}'.format((time.time()  start))) 54 55 # using np.take() 56 392.094 MiB 91.555 MiB rand_num3 = np.random.randint(5, size=(6000, 2000)) 57 392.094 MiB 0.000 MiB start = time.time() 58 392.242 MiB 0.148 MiB np.take(rand_num3, np.random.rand(rand_num3.shape[0]).argsort(), axis=0, out=rand_num3) 59 392.242 MiB 0.000 MiB print("Time taken by np.take, {0}".format((time.time()  start)))
numpy.ndarray — NumPy v1.19 Manual, An array object represents a multidimensional, homogeneous array of fixedsize items. An associated Rowmajor (Cstyle) or columnmajor (Fortranstyle) order . Create an array, but leave its allocated memory unchanged (i.e., it contains “ garbage”). dtype If buffer is None, then only shape , dtype , and order are used. numpy.random.shuffle() “Modify a sequence inplace by shuffling its contents. This function only shuffles the array along the first axis of a multidimensional array. The order of subarrays is changed but their contents remains the same”. From the documentation.
You can shuffle a two dimensional array A
by row using the np.vectorize()
function:
shuffle = np.vectorize(np.random.permutation, signature='(n)>(n)') A_shuffled = shuffle(A)
numpy.ndarray — NumPy v1.20.dev0 Manual, An array object represents a multidimensional, homogeneous array of fixedsize items. Rowmajor (Cstyle) or columnmajor (Fortranstyle) order. See also. array Create an array, but leave its allocated memory unchanged (i.e., it contains “garbage”). dtype If buffer is None, then only shape , dtype , and order are used. Here's one way avoid loops completely and build the required array: Given an array X with n columns, construct an array Y with n copies of X. Create a mask to select the ith column from the ith copy of X in the array Y. Reassign a columnshuffled copy of X to the relevant indices of Y using the mask on Y. In NumPy it looks like this:
I tried many solutions, and at the end I used this simple one:
from sklearn.utils import shuffle x = np.array([[1, 2], [3, 4], [5, 6]]) print(shuffle(x, random_state=0))
output:
[ [5 6] [3 4] [1 2] ]
if you have 3d array, loop through the 1st axis (axis=0) and apply this function, like:
np.array([shuffle(item) for item in 3D_numpy_array])
Turn numpy array into df, Y: If you have a NumPy array which is essentially a row vector (or column vector) shuffle multidimensional array by row only, keep column order unchanged. currently im facing a problem regarding the permutation of 2 numpy arrays of different row sizes, i know how to to utilize the np.random.shuffle function but i cannot seem to find a solution to my specific problem, the examples from the numpy documentation only refers to nd arrays with the same row sizes, e.g x.shape=[10][784] y.shape=[10][784]
4. NumPy Basics: Arrays and Vectorized Computation, ndarray , a fast and spaceefficient multidimensional array providing vectorized Linear algebra, random number generation, and Fourier transform capabilities It's often only necessary to care about the general kind of data you're dealing with , Setting whole rows or columns using a 1D boolean array is also easy: numpy.random. shuffle (x) ¶ Modify a sequence inplace by shuffling its contents. This function only shuffles the array along the first axis of a multidimensional array. The order of subarrays is changed but their contents remains the same.
Look Ma, No ForLoops: Array Programming With NumPy – Real , I might be biased towards looking at 2D & 3D numpy arrays as having axis=0= rows and axis=1=columns (the same for 2D in pandas DataFrames). So while you'd� So you could use numpy.random.permutation function to generate the index array and use it to shuffle multiple arrays. For example def randomize (a, b): # Generate the permutation index array. permutation = np . random . permutation(a . shape[0]) # Shuffle the arrays by giving the permutation in the square brackets. shuffled_a = dataset
User Divakar, strongest skill. Nay loops , Yay MATLAB bsxfun / NumPy Broadcasting Numpy shuffle multidimensional array by row only, keep column order unchanged. order: {‘K’, ‘A’, ‘C’, ‘F’}, optional. Specify the memory layout of the array. If object is not an array, the newly created array will be in C order (row major) unless ‘F’ is specified, in which case it will be in Fortran order (column major). If object is an array the following holds.
Comments
 Option 1: shuffled view onto an array. I guess that would mean a custom implementation. (almost) no impact on memory usage, Obv. some impact at runtime. It really depends on how you intend to use this matrix.
 Option 2: shuffle array in place.
np.random.shuffle(x)
, docs state that "this function only shuffles the array along the first index of a multidimensional array", which is good enough for you, right? Obv., some time taken at startup, but from that point, it's as fast as original matrix.  Compare to
np.random.shuffle(x)
, shuffling index of ndarray and getting data from shuffled index is more efficient way to solve this problem. For more details comparision refer my answer bellow  I wonder if this could be sped up by numpy, maybe taking advantage of concurrency.
 @GeorgSchölly I thinks this is the most available optimized approach in python. If you want to speed it up you need to make changes on algorithm.
 I completely agree. I just realized that you are using
np.random
instead of the Pythonrandom
module which also contains a shuffle function. I'm sorry for causing confusion.  This shuffle is not always working, see my new answer here below. Why is it not always working?
 Is there a way to choose the axis on which the shuffling should be done (for a >2d array ? ) Or is it always implicitly the first dimension that is taken into account ? @Kasramvd
 This sounds nice. Can you add a timing information to your post, of your np.take v.s. standard shuffle? The np.shuffle on my system is faster (27.9ms) vs your take (62.9 ms), but as I read in your post, there is a memory advantage?
 @robert Just added, check it out!
 Hi, can you provide the code that produce this output?
 i lost the code to produce memory_profiler output. But it can be very easily reproduced by following steps in the given link.
 What I like about this answer is that if I have two matched arrays (which coincidentally I do) then I can shuffle both of them and ensure that data in corresponding positions still match. This is useful for randomising the order of my training set