Constructing 3D Pandas DataFrame

Can't you just use a panel?

import numpy as np
import pandas as pd

A = ['one', 'two' ,'three']
B = ['start','end']
C = [np.random.randint(10, 99, 2)]*6
df = pd.DataFrame(C,columns=B  )
p={}
for a in A:
    p[a]=df
panel= pd.Panel(p)
print panel['one']

First, I think you need to fill C to represent missing values

In [341]: max_len = max(len(sublist) for sublist in C)
In [344]: for sublist in C:
     ...:     sublist.extend([np.nan] * (max_len - len(sublist)))

In [345]: C
Out[345]: 
[[7, 11, 56, 45],
 [20, 21, 74, 12],
 [42, nan, nan, nan],
 [52, nan, nan, nan],
 [90, 213, 9, nan],
 [101, 34, 45, nan]]

Then, convert to a numpy array, transpose, and pass to the DataFrame constructor along with the columns.

In [288]: C = np.array(C)
In [289]: df = pd.DataFrame(data=C.T, columns=pd.MultiIndex.from_tuples(zip(A,B)))

In [349]: df
Out[349]: 
     one         two       three     
   start  end  start  end  start  end
0      7   20     42   52     90  101
1     11   21    NaN  NaN    213   34
2     56   74    NaN  NaN      9   45
3     45   12    NaN  NaN    NaN  NaN

As @Aaron mentioned in a comment above, panels have been deprecated. Also, @tlnagy mentioned his dataset would be likely to expand to more than 3 dimensions in the future.

This sounds like a good use-case for the xarray package, which provides semantically labelled arrays of arbitrarily many dimensions. Pandas and xarray have strong conversion support, and panels have been deprecated in favour of using xarray.

Initial setup of the problem.

import numpy as np

A = np.array([[7,11,56,45], [20,21,74,12]]).T
B = np.array([[42], [52]]).T
C = np.array([[90,213,9], [101, 34, 45]]).T

You can then create a three dimensional xarray.DataArray object like so:

import xarray

output_as_dataarray = xarray.concat(
    [
        xarray.DataArray(
            X,
            dims=["record", "edge"],
            coords={"record": range(X.shape[0]), "edge": ["start", "end"]},
        )
        for X in (A, B, C)
    ],
    dim="descriptor",
).assign_coords(descriptor=["A", "B", "C"])

We turn our three 2D numpy arrays into xarray.DataArray objects, and then concatenate them together along a new dimension.

Our output looks like so:

<xarray.DataArray (descriptor: 3, record: 4, edge: 2)>
array([[[  7.,  20.],
        [ 11.,  21.],
        [ 56.,  74.],
        [ 45.,  12.]],

       [[ 42.,  52.],
        [ nan,  nan],
        [ nan,  nan],
        [ nan,  nan]],

       [[ 90., 101.],
        [213.,  34.],
        [  9.,  45.],
        [ nan,  nan]]])
Coordinates:
  * record      (record) int64 0 1 2 3
  * edge        (edge) <U5 'start' 'end'
  * descriptor  (descriptor) <U1 'A' 'B' 'C'

Tags:

Python

Pandas