Tutorial 3. Loading data into the Layer object
This section will cover how to properly subclass DataLoader class if specialized loader is required.
First, let’s load the required data.
[1]:
# Load EBSD data
import numpy as np
EBSD = np.genfromtxt(
"./data/SiC_in_NiSA.ctf", dtype=float, skip_header=15, delimiter="\t", names=True
)
Boilerplate
We will deal with the low-level API first. This pattern will be used repetitively. It is possible to wrap this boilerplate to your own function for the enhanced convenience.
[2]:
# Load data into the layer
from pyxc.core.layer import Layer
from pyxc.core.processor.arrays import column_parser
from pyxc.core.container import Container2D
from pyxc.core.loader import ImageLoader, XYDLoader
from pyxc.transform.homography import Homography
layer_ebsd = Layer(
data=column_parser(EBSD, format_string="dxydddddddd"),
container=Container2D,
dataloader=XYDLoader,
transformer=Homography,
)
Using XYDLoader
The XYDLoader object is useful to load 2-dimensional array-like data or structured arrays. It is required that the array’s first and second columns contain numeral values of X and Y data. Therefore, to use XYDLoader, preparation of data to a proper format is important.
Using column_parser function
The column parser function is the utility function to refine data. It reorders columns based on the provided format string. x and y means columns that contain x and y information, while _ means ignore. All other chracters are regarded data.
2-dimensional array-like
Let’s continue with an example. First, 2-dimensional array-like.
[3]:
arr = np.random.random((3, 4))
arr
[3]:
array([[0.21854676, 0.0324817 , 0.54188185, 0.32167891],
[0.74546414, 0.41421726, 0.33751401, 0.88210784],
[0.87324441, 0.71617293, 0.62579317, 0.40981908]])
Let’s assume we are setting the second and third columns as x and y, while the first column remains as data column. You can see, the x and y columns are moved to the first and second columns.
[4]:
column_parser(arr, "dxy")
[4]:
array([[0.0324817 , 0.54188185, 0.21854676, 0.32167891],
[0.41421726, 0.33751401, 0.74546414, 0.88210784],
[0.71617293, 0.62579317, 0.87324441, 0.40981908]])
or, you can exclude some columns by specifying _ or explicitly set return_unspecified to False.
[5]:
column_parser(arr, "_dxy")
[5]:
array([[0.54188185, 0.32167891, 0.0324817 ],
[0.33751401, 0.88210784, 0.41421726],
[0.62579317, 0.40981908, 0.71617293]])
[6]:
column_parser(arr, "dxy", return_unspecified=False)
[6]:
array([[0.0324817 , 0.54188185, 0.21854676],
[0.41421726, 0.33751401, 0.74546414],
[0.71617293, 0.62579317, 0.87324441]])
Structured array
For a structured array is works exactly same. Let’s use ebsd data we’ve previously loaded.
[7]:
EBSD
[7]:
array([(2., 0. , 0. , 11., 0., 160.45, 47.733, 233.82, 1.0211, 160., 255.),
(2., 0.2776, 0. , 10., 0., 160.15, 47.888, 233.74, 1.3246, 161., 255.),
(2., 0.5553, 0. , 10., 0., 160.14, 47.928, 234. , 1.3319, 161., 255.),
...,
(2., 26.932 , 20.546, 11., 0., 159.7 , 48.268, 235.35, 1.2136, 154., 255.),
(2., 27.21 , 20.546, 10., 0., 159.24, 48.137, 234.97, 0.861 , 159., 255.),
(2., 27.487 , 20.546, 10., 0., 158.98, 48.32 , 235.21, 1.125 , 162., 255.)],
dtype=[('Phase', '<f8'), ('X', '<f8'), ('Y', '<f8'), ('Bands', '<f8'), ('Error', '<f8'), ('Euler1', '<f8'), ('Euler2', '<f8'), ('Euler3', '<f8'), ('MAD', '<f8'), ('BC', '<f8'), ('BS', '<f8')])
Let’s say, we need X, Y, Phase, Euler1, Euler2, Euler3. Then the format string should be dxy__ddd. Also we don’t want to retrieve trailing MAD, BC, and BS, so we will explicitly specify ‘return_unspecified’ to False.
[8]:
column_parser(EBSD, "dxy__ddd", return_unspecified=False)
[8]:
array([( 0. , 0. , 2., 160.45, 47.733, 233.82),
( 0.2776, 0. , 2., 160.15, 47.888, 233.74),
( 0.5553, 0. , 2., 160.14, 47.928, 234. ), ...,
(26.932 , 20.546, 2., 159.7 , 48.268, 235.35),
(27.21 , 20.546, 2., 159.24, 48.137, 234.97),
(27.487 , 20.546, 2., 158.98, 48.32 , 235.21)],
dtype={'names': ['X', 'Y', 'Phase', 'Euler1', 'Euler2', 'Euler3'], 'formats': ['<f8', '<f8', '<f8', '<f8', '<f8', '<f8'], 'offsets': [8, 16, 0, 40, 48, 56], 'itemsize': 88})
Correctly processed data (with proper format string) is compatible with XYDLoader. You can load the data to the Layer.
[9]:
# Load data into the layer
from pyxc.core.layer import Layer
from pyxc.core.processor.arrays import column_parser
from pyxc.core.container import Container2D
from pyxc.core.loader import ImageLoader, XYDLoader
from pyxc.transform.homography import Homography
layer_ebsd = Layer(
data=column_parser(EBSD, format_string="dxy__ddd", return_unspecified=False),
container=Container2D,
dataloader=XYDLoader,
transformer=Homography,
)
layer_ebsd.container
[9]:
Container2D([( 0, nan, nan, 0. , 0. , 2., 160.45, 47.733, 233.82),
( 1, nan, nan, 0.2776, 0. , 2., 160.15, 47.888, 233.74),
( 2, nan, nan, 0.5553, 0. , 2., 160.14, 47.928, 234. ),
...,
(7497, nan, nan, 26.932 , 20.546, 2., 159.7 , 48.268, 235.35),
(7498, nan, nan, 27.21 , 20.546, 2., 159.24, 48.137, 234.97),
(7499, nan, nan, 27.487 , 20.546, 2., 158.98, 48.32 , 235.21)],
dtype=[('row', '<i4'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Phase', '<f8'), ('Euler1', '<f8'), ('Euler2', '<f8'), ('Euler3', '<f8')])
Loading image data
Loading image data is very straightforward. Image data are 2- or 3-dimensional array-like objects with the shape of (i, j, k). Each channel of the image will be stored as serialized form, with the column name of Channel_{integer}. Let’s make sample image data. Just use ImageLoader.
[10]:
im3channel = np.random.random((4, 4, 3))
Then, we can easily load the data into the array.
[11]:
# Load data into the layer
from pyxc.core.layer import Layer
from pyxc.core.processor.arrays import column_parser
from pyxc.core.container import Container2D
from pyxc.core.loader import ImageLoader, XYDLoader
from pyxc.transform.homography import Homography
layer_im3c = Layer(
data=im3channel,
container=Container2D,
dataloader=ImageLoader,
transformer=Homography,
)
layer_im3c.container
[11]:
Container2D([( 0, nan, nan, 0., 0., 0.48352113, 0.65644934, 0.62632192),
( 1, nan, nan, 1., 0., 0.60310654, 0.49799451, 0.82988335),
( 2, nan, nan, 2., 0., 0.66448403, 0.08736397, 0.44356613),
( 3, nan, nan, 3., 0., 0.74440296, 0.31501181, 0.33852412),
( 4, nan, nan, 0., 1., 0.25765867, 0.9337544 , 0.01316468),
( 5, nan, nan, 1., 1., 0.78892876, 0.46884765, 0.85809962),
( 6, nan, nan, 2., 1., 0.24444665, 0.96828119, 0.90607162),
( 7, nan, nan, 3., 1., 0.98810052, 0.00771672, 0.73195071),
( 8, nan, nan, 0., 2., 0.90485324, 0.56224866, 0.85691854),
( 9, nan, nan, 1., 2., 0.40320464, 0.35602567, 0.2806191 ),
(10, nan, nan, 2., 2., 0.97604856, 0.29447735, 0.8645997 ),
(11, nan, nan, 3., 2., 0.40194131, 0.83200091, 0.86588967),
(12, nan, nan, 0., 3., 0.95771104, 0.75235027, 0.77791571),
(13, nan, nan, 1., 3., 0.24173232, 0.29699636, 0.77702952),
(14, nan, nan, 2., 3., 0.88252335, 0.48796859, 0.79854297),
(15, nan, nan, 3., 3., 0.91600357, 0.25005287, 0.17678003)],
dtype=[('row', '<i4'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Channel_0', '<f8'), ('Channel_1', '<f8'), ('Channel_2', '<f8')])
[12]:
x = np.linspace(0, 1, 10)
y = np.linspace(2, 10, 10)
data = np.random.random((10, 3))
example_container = Container2D(x_raw=x, y_raw=y, data=data)
As you can see, an example_container is now initialized correctly.
[13]:
example_container
[13]:
Container2D([(0, nan, nan, 0. , 2. , 0.52242415, 0.30731735, 0.72132394),
(1, nan, nan, 0.11111111, 2.8888888, 0.62465566, 0.04702027, 0.70809232),
(2, nan, nan, 0.22222222, 3.7777777, 0.9732913 , 0.59099104, 0.34125888),
(3, nan, nan, 0.33333334, 4.6666665, 0.49964169, 0.7824484 , 0.27183625),
(4, nan, nan, 0.44444445, 5.5555553, 0.91979214, 0.08942081, 0.02636504),
(5, nan, nan, 0.5555556 , 6.4444447, 0.32363013, 0.29185491, 0.03624065),
(6, nan, nan, 0.6666667 , 7.3333335, 0.10039757, 0.42025748, 0.72725504),
(7, nan, nan, 0.7777778 , 8.222222 , 0.55001476, 0.16645509, 0.04034066),
(8, nan, nan, 0.8888889 , 9.111111 , 0.96013546, 0.94048045, 0.37950566),
(9, nan, nan, 1. , 10. , 0.41519736, 0.81725484, 0.15236122)],
dtype=[('row', '<i4'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Channel_0', '<f8'), ('Channel_1', '<f8'), ('Channel_2', '<f8')])
We didn’t provide the structured array. Therefore, the column names for the data are automatically determined such as Channel_0, Channel_1, and Channel_2.
[14]:
example_container.dtype.names
[14]:
('row', 'x', 'y', 'x_raw', 'y_raw', 'Channel_0', 'Channel_1', 'Channel_2')