Tutorial 3. Loading data into the Layer object

This section will cover how to properly subclass DataLoader class if specialized loader is required.

First, let’s load the required data.

[1]:
# Load EBSD data
import numpy as np

EBSD = np.genfromtxt(
    "./data/SiC_in_NiSA.ctf", dtype=float, skip_header=15, delimiter="\t", names=True
)

Boilerplate

We will deal with the low-level API first. This pattern will be used repetitively. It is possible to wrap this boilerplate to your own function for the enhanced convenience.

[2]:
# Load data into the layer
from pyxc.core.layer import Layer
from pyxc.core.processor.arrays import column_parser
from pyxc.core.container import Container2D
from pyxc.core.loader import ImageLoader, XYDLoader
from pyxc.transform.homography import Homography

layer_ebsd = Layer(
    data=column_parser(EBSD, format_string="dxydddddddd"),
    container=Container2D,
    dataloader=XYDLoader,
    transformer=Homography,
)

Using XYDLoader

The XYDLoader object is useful to load 2-dimensional array-like data or structured arrays. It is required that the array’s first and second columns contain numeral values of X and Y data. Therefore, to use XYDLoader, preparation of data to a proper format is important.

Using column_parser function

The column parser function is the utility function to refine data. It reorders columns based on the provided format string. x and y means columns that contain x and y information, while _ means ignore. All other chracters are regarded data.

2-dimensional array-like

Let’s do continue with an example. First, 2-dimensional array-like.

[3]:
arr = np.random.random((3, 4))
arr
[3]:
array([[0.85474924, 0.35127279, 0.20732681, 0.89227011],
       [0.46410084, 0.14510209, 0.34306286, 0.84824198],
       [0.01501363, 0.14626721, 0.95489051, 0.44638695]])

Let’s assume we are setting the second and third columns as x and y, while the first column remains as data column. You can see, the x and y columns are moved to the first and second columns.

[4]:
column_parser(arr, "dxy")
[4]:
array([[0.35127279, 0.20732681, 0.85474924, 0.89227011],
       [0.14510209, 0.34306286, 0.46410084, 0.84824198],
       [0.14626721, 0.95489051, 0.01501363, 0.44638695]])

or, you can exclude some columns by specifying _ or explicitly set return_unspecified to False.

[5]:
column_parser(arr, "_dxy")
[5]:
array([[0.20732681, 0.89227011, 0.35127279],
       [0.34306286, 0.84824198, 0.14510209],
       [0.95489051, 0.44638695, 0.14626721]])
[6]:
column_parser(arr, "dxy", return_unspecified=False)
[6]:
array([[0.35127279, 0.20732681, 0.85474924],
       [0.14510209, 0.34306286, 0.46410084],
       [0.14626721, 0.95489051, 0.01501363]])

Structured array

For a structured array is works exactly same. Let’s use ebsd data we’ve previously loaded.

[7]:
EBSD
[7]:
array([(2.,  0.    ,  0.   , 11., 0., 160.45, 47.733, 233.82, 1.0211, 160., 255.),
       (2.,  0.2776,  0.   , 10., 0., 160.15, 47.888, 233.74, 1.3246, 161., 255.),
       (2.,  0.5553,  0.   , 10., 0., 160.14, 47.928, 234.  , 1.3319, 161., 255.),
       ...,
       (2., 26.932 , 20.546, 11., 0., 159.7 , 48.268, 235.35, 1.2136, 154., 255.),
       (2., 27.21  , 20.546, 10., 0., 159.24, 48.137, 234.97, 0.861 , 159., 255.),
       (2., 27.487 , 20.546, 10., 0., 158.98, 48.32 , 235.21, 1.125 , 162., 255.)],
      dtype=[('Phase', '<f8'), ('X', '<f8'), ('Y', '<f8'), ('Bands', '<f8'), ('Error', '<f8'), ('Euler1', '<f8'), ('Euler2', '<f8'), ('Euler3', '<f8'), ('MAD', '<f8'), ('BC', '<f8'), ('BS', '<f8')])

Let’s say, we need X, Y, Phase, Euler1, Euler2, Euler3. Then the format string should be dxy__ddd. Also we don’t want to retrieve trailing MAD, BC, and BS, so we will explicitly specify ‘return_unspecified’ to False.

[8]:
column_parser(EBSD, "dxy__ddd", return_unspecified=False)
[8]:
array([( 0.    ,  0.   , 2., 160.45, 47.733, 233.82),
       ( 0.2776,  0.   , 2., 160.15, 47.888, 233.74),
       ( 0.5553,  0.   , 2., 160.14, 47.928, 234.  ), ...,
       (26.932 , 20.546, 2., 159.7 , 48.268, 235.35),
       (27.21  , 20.546, 2., 159.24, 48.137, 234.97),
       (27.487 , 20.546, 2., 158.98, 48.32 , 235.21)],
      dtype={'names': ['X', 'Y', 'Phase', 'Euler1', 'Euler2', 'Euler3'], 'formats': ['<f8', '<f8', '<f8', '<f8', '<f8', '<f8'], 'offsets': [8, 16, 0, 40, 48, 56], 'itemsize': 88})

Correctly processed data (with proper format string) is compatible with XYDLoader. You can load the data to the Layer.

[9]:
# Load data into the layer
from pyxc.core.layer import Layer
from pyxc.core.processor.arrays import column_parser
from pyxc.core.container import Container2D
from pyxc.core.loader import ImageLoader, XYDLoader
from pyxc.transform.homography import Homography

layer_ebsd = Layer(
    data=column_parser(EBSD, format_string="dxy__ddd", return_unspecified=False),
    container=Container2D,
    dataloader=XYDLoader,
    transformer=Homography,
)
layer_ebsd.container
[9]:
Container2D([(   0, nan, nan,  0.    ,  0.   , 2., 160.45, 47.733, 233.82),
             (   1, nan, nan,  0.2776,  0.   , 2., 160.15, 47.888, 233.74),
             (   2, nan, nan,  0.5553,  0.   , 2., 160.14, 47.928, 234.  ),
             ...,
             (7497, nan, nan, 26.932 , 20.546, 2., 159.7 , 48.268, 235.35),
             (7498, nan, nan, 27.21  , 20.546, 2., 159.24, 48.137, 234.97),
             (7499, nan, nan, 27.487 , 20.546, 2., 158.98, 48.32 , 235.21)],
            dtype=[('row', '<u2'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Phase', '<f8'), ('Euler1', '<f8'), ('Euler2', '<f8'), ('Euler3', '<f8')])

Loading image data

Loading image data is very straightforward. Image data are 2- or 3-dimensional array-like objects with the shape of (i, j, k). Each channel of the image will be stored as serialized form, with the column name of Channel_{integer}. Let’s make sample image data. Just use ImageLoader.

[10]:
im3channel = np.random.random((4, 4, 3))

Then, we can easily load the data into the array.

[11]:
# Load data into the layer
from pyxc.core.layer import Layer
from pyxc.core.processor.arrays import column_parser
from pyxc.core.container import Container2D
from pyxc.core.loader import ImageLoader, XYDLoader
from pyxc.transform.homography import Homography

layer_im3c = Layer(
    data=im3channel,
    container=Container2D,
    dataloader=ImageLoader,
    transformer=Homography,
)
layer_im3c.container
[11]:
Container2D([( 0, nan, nan, 0., 0., 0.05420856, 4.73385223e-01, 0.70881433),
             ( 1, nan, nan, 1., 0., 0.39539594, 6.95276945e-01, 0.7717764 ),
             ( 2, nan, nan, 2., 0., 0.78372797, 9.02366167e-03, 0.1971367 ),
             ( 3, nan, nan, 3., 0., 0.7051739 , 3.47993489e-01, 0.05760242),
             ( 4, nan, nan, 0., 1., 0.77743903, 5.55172691e-01, 0.18899349),
             ( 5, nan, nan, 1., 1., 0.00738951, 3.55744846e-01, 0.06707628),
             ( 6, nan, nan, 2., 1., 0.29474259, 4.33824197e-01, 0.52470037),
             ( 7, nan, nan, 3., 1., 0.31682623, 7.05669480e-01, 0.44795738),
             ( 8, nan, nan, 0., 2., 0.76955486, 3.10971323e-04, 0.40151769),
             ( 9, nan, nan, 1., 2., 0.50360737, 4.22517705e-01, 0.89678063),
             (10, nan, nan, 2., 2., 0.15507781, 6.83865494e-01, 0.87749098),
             (11, nan, nan, 3., 2., 0.77473154, 3.93492165e-01, 0.62440758),
             (12, nan, nan, 0., 3., 0.17169831, 3.91749726e-01, 0.47013635),
             (13, nan, nan, 1., 3., 0.88395893, 8.45796766e-01, 0.35550729),
             (14, nan, nan, 2., 3., 0.90666611, 6.02317342e-01, 0.88776172),
             (15, nan, nan, 3., 3., 0.68885151, 1.53849234e-01, 0.22015321)],
            dtype=[('row', '<u2'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Channel_0', '<f8'), ('Channel_1', '<f8'), ('Channel_2', '<f8')])
[12]:
x = np.linspace(0, 1, 10)
y = np.linspace(2, 10, 10)
data = np.random.random((10, 3))
example_container = Container2D(x_raw=x, y_raw=y, data=data)

As you can see, an example_container is now initialized correctly.

[13]:
example_container
[13]:
Container2D([(0, nan, nan, 0.        ,  2.       , 0.28405352, 0.93266047, 0.01919117),
             (1, nan, nan, 0.11111111,  2.8888888, 0.82168929, 0.87435924, 0.90302795),
             (2, nan, nan, 0.22222222,  3.7777777, 0.00857905, 0.99363009, 0.78449845),
             (3, nan, nan, 0.33333334,  4.6666665, 0.77338238, 0.83640523, 0.9569604 ),
             (4, nan, nan, 0.44444445,  5.5555553, 0.86332771, 0.07118916, 0.41191495),
             (5, nan, nan, 0.5555556 ,  6.4444447, 0.99697855, 0.8878413 , 0.77650463),
             (6, nan, nan, 0.6666667 ,  7.3333335, 0.59973423, 0.17786739, 0.19939695),
             (7, nan, nan, 0.7777778 ,  8.222222 , 0.58948337, 0.7378187 , 0.94562527),
             (8, nan, nan, 0.8888889 ,  9.111111 , 0.91790323, 0.00856963, 0.43669122),
             (9, nan, nan, 1.        , 10.       , 0.54568669, 0.37494042, 0.90019696)],
            dtype=[('row', '<u2'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Channel_0', '<f8'), ('Channel_1', '<f8'), ('Channel_2', '<f8')])

We didn’t provide the structured array. Therefore, the column names for the data are automatically determined such as Channel_0, Channel_1, and Channel_2.

[14]:
example_container.dtype.names
[14]:
('row', 'x', 'y', 'x_raw', 'y_raw', 'Channel_0', 'Channel_1', 'Channel_2')