Tutorial 6. Subclassing dataloader object
We need to populate the Layer object!
In this tutorial, we will learn how to subclassify the DataLoader class. This is usually not necessary since the XYDLoader and ImageLoader already provide 95% of coverage for the scientific data format (author thinks). However, through this tutorial we will implement the EBSDLoader class to load EBSD data more conveniently.
This library is highly modularized. Therefore, a central class to combine various functionalities offered by different classes is needed. The Layer object is exactly performing this operation.
Let’s start with the most important detail: “The provided data is loaded to the container by the dataloader. The sampling distortion of the loaded data is corrected by the transformer”.
At this moment, the only available container object is Container2D. The dedicated container for 3D-sampled data will be released later on. However, this means our goal is to populate the Container2D correctly. To do this, you will need to implement a correct DataLoader to process the provided data to the correct format.
First, let’s visit thet Container2D object!
The Container2D class can be initialized with the x, y, and data columns. x and y are 1-dimensional array-like objects, while the columns are structured arrays or 2-dimensional arrays. If the given data is not a structured array, column names are automatically determined, such as Channel_0, Channel_1, and so on.
Container2D object is the subclass of the NumPy structured array. So you can use all NumPy functions that are working with Structured Arrays.
[1]:
# Load data into the layer
from pyxc.core.container import Container2D
import numpy as np
x = np.linspace(0, 1, 10)
y = np.linspace(2, 10, 10)
data = np.random.random((10, 3))
example_container = Container2D(x_raw=x, y_raw=y, data=data)
As you can see, example_container is now initialized correctly.
[2]:
example_container
[2]:
Container2D([(0, nan, nan, 0. , 2. , 0.9092607 , 0.95272012, 0.16540477),
(1, nan, nan, 0.11111111, 2.8888888, 0.68787674, 0.79661816, 0.53415113),
(2, nan, nan, 0.22222222, 3.7777777, 0.45964945, 0.76458326, 0.47344345),
(3, nan, nan, 0.33333334, 4.6666665, 0.14351352, 0.53747687, 0.72598613),
(4, nan, nan, 0.44444445, 5.5555553, 0.59945074, 0.56949856, 0.2094515 ),
(5, nan, nan, 0.5555556 , 6.4444447, 0.07509524, 0.02320218, 0.22068752),
(6, nan, nan, 0.6666667 , 7.3333335, 0.87607138, 0.15929025, 0.95797195),
(7, nan, nan, 0.7777778 , 8.222222 , 0.32153398, 0.92016871, 0.66088006),
(8, nan, nan, 0.8888889 , 9.111111 , 0.55449608, 0.92521225, 0.91784831),
(9, nan, nan, 1. , 10. , 0.63283821, 0.82753116, 0.49863366)],
dtype=[('row', '<i4'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Channel_0', '<f8'), ('Channel_1', '<f8'), ('Channel_2', '<f8')])
We haven’t provided the structured array. Therefore, column names for the data are automatically determined such as Channel_0, Channel_1, and Channel_2.
[3]:
example_container.dtype.names
[3]:
('row', 'x', 'y', 'x_raw', 'y_raw', 'Channel_0', 'Channel_1', 'Channel_2')
Subcalssing DataLoader class to load special data
Okay, we’ve been used the example of EBSD data for a while. Let’s make a data loader class to directly load the EBSD data. We will assume we need only X, Y, and Euler 1-3.
By using the default XYDLoader, we are going to do this. Note that we have used column_parser to extract x, y, and data columns accordingly.
[4]:
# Load data into the layer
from pyxc.core.layer import Layer
from pyxc.core.processor.arrays import column_parser
from pyxc.core.container import Container2D
from pyxc.core.loader import ImageLoader, XYDLoader
from pyxc.transform.homography import Homography
EBSD = np.genfromtxt(
"./data/SiC_in_NiSA.ctf", dtype=float, skip_header=15, delimiter="\t", names=True
)
layer_ebsd = Layer(
data=column_parser(EBSD, format_string="dxy__ddd", return_unspecified=False),
container=Container2D,
dataloader=XYDLoader,
transformer=Homography,
)
Similarily, we are able to put the whole parsing logic inside of the DataLoader class. Just implement the logic to the prep() method.
[5]:
from pyxc.core.loader import DataLoaderBase
from pyxc.core.processor.arrays import xyd_splitter, column_parser
class EBSDLoader(DataLoaderBase):
"""A subclass of DataLoaderBase for loading and preprocessing single or multichannel image data.
Image data is 2-dimensional array-like. It can be single channel, however it can be consisted of multiple channels.
"""
def prep(self, data):
x_serial, y_serial, prepped_data = xyd_splitter(
column_parser(data, "dxy__ddd", return_unspecified=False)
)
return x_serial.flatten(), y_serial.flatten(), prepped_data
Now you can see that the EBSDLoader class can directly handle the EBSD data.
[6]:
image = np.random.random((3, 3))
EBSDLoader(Container2D, EBSD)()
[6]:
Container2D([( 0, nan, nan, 0. , 0. , 2., 160.45, 47.733, 233.82),
( 1, nan, nan, 0.2776, 0. , 2., 160.15, 47.888, 233.74),
( 2, nan, nan, 0.5553, 0. , 2., 160.14, 47.928, 234. ),
...,
(7497, nan, nan, 26.932 , 20.546, 2., 159.7 , 48.268, 235.35),
(7498, nan, nan, 27.21 , 20.546, 2., 159.24, 48.137, 234.97),
(7499, nan, nan, 27.487 , 20.546, 2., 158.98, 48.32 , 235.21)],
dtype=[('row', '<i4'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Phase', '<f8'), ('Euler1', '<f8'), ('Euler2', '<f8'), ('Euler3', '<f8')])