NumPy - Splitting array by known sizes

pstatix :

I need to take an array and split it column wise by known widths.

Say I have an array:

>>> arr
array([[b'13456789'],
       [b'45678912']], dtype='|S8')

I need to convert it by widths know prior:

>>> arr
array([[b'13', b'456', b'7', b'89'],
       [b'45', b'678', b'9', b'12']], 
       dtype=[('a', '|S2'), ('b', '|S3'), ('c', '|S1'), ('d', '|S2')])

Using numpy.genfromtxt would seem ideal, but I'm reading in from a binary file that has data encoding as strings after a 160 byte header other data that is not encoding as strings, so I use struct.unpack for them. However, numpy.genfromtxt doesn't seem to support binary files or have an offset parameter; but it does have the delimiter parameter that is useful.

yatu :

Knowing the widths of the bytes that you want beforehand, you can take a view of the array with the desired structure as:

arr = np.array([[b'13456789'],
                [b'45678912']], dtype='|S8')

arr.view('S2,S3,S1,S2')

array([[(b'13', b'456', b'7', b'89')],
       [(b'45', b'678', b'9', b'12')]],
      dtype=[('f0', 'S2'), ('f1', 'S3'), ('f2', 'S1'), ('f3', 'S2')])

Yo just have to make sure that the total size in bytes of the last axis of the array with the new dtype is a divisor of the original size. Which means that you could also do something like:

arr.view('S2,S2')
array([[(b'13', b'45'), (b'67', b'89')],
       [(b'45', b'67'), (b'89', b'12')]],
      dtype=[('f0', 'S2'), ('f1', 'S2')])

The same would apply to an array of fixed size unicode strings, for instance:

a = np.array([['peach'],
              ['apple']], dtype='U5')

a.view('U1')
array([['p', 'e', 'a', 'c', 'h'],
       ['a', 'p', 'p', 'l', 'e']], dtype='<U1')

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=174188&siteId=1