Tutorial 3. Loading data into the Layer object
This section will cover how to properly subclass DataLoader class if specialized loader is required.
First, let’s load the required data.
[1]:
# Load EBSD data
import numpy as np
EBSD = np.genfromtxt(
"./data/SiC_in_NiSA.ctf", dtype=float, skip_header=15, delimiter="\t", names=True
)
Boilerplate
We will deal with the low-level API first. This pattern will be used repetitively. It is possible to wrap this boilerplate to your own function for the enhanced convenience.
[2]:
# Load data into the layer
from pyxc.core.layer import Layer
from pyxc.core.processor.arrays import column_parser
from pyxc.core.container import Container2D
from pyxc.core.loader import ImageLoader, XYDLoader
from pyxc.transform.homography import Homography
layer_ebsd = Layer(
data=column_parser(EBSD, format_string="dxydddddddd"),
container=Container2D,
dataloader=XYDLoader,
transformer=Homography,
)
Using XYDLoader
The XYDLoader object is useful to load 2-dimensional array-like data or structured arrays. It is required that the array’s first and second columns contain numeral values of X and Y data. Therefore, to use XYDLoader, preparation of data to a proper format is important.
Using column_parser function
The column parser function is the utility function to refine data. It reorders columns based on the provided format string. x and y means columns that contain x and y information, while _ means ignore. All other chracters are regarded data.
2-dimensional array-like
Let’s do continue with an example. First, 2-dimensional array-like.
[3]:
arr = np.random.random((3, 4))
arr
[3]:
array([[0.92877668, 0.89444663, 0.87047591, 0.91687574],
[0.0361073 , 0.67364892, 0.88375987, 0.2524567 ],
[0.68495167, 0.07070214, 0.7652993 , 0.04813512]])
Let’s assume we are setting the second and third columns as x and y, while the first column remains as data column. You can see, the x and y columns are moved to the first and second columns.
[4]:
column_parser(arr, "dxy")
[4]:
array([[0.89444663, 0.87047591, 0.92877668, 0.91687574],
[0.67364892, 0.88375987, 0.0361073 , 0.2524567 ],
[0.07070214, 0.7652993 , 0.68495167, 0.04813512]])
or, you can exclude some columns by specifying _ or explicitly set return_unspecified to False.
[5]:
column_parser(arr, "_dxy")
[5]:
array([[0.87047591, 0.91687574, 0.89444663],
[0.88375987, 0.2524567 , 0.67364892],
[0.7652993 , 0.04813512, 0.07070214]])
[6]:
column_parser(arr, "dxy", return_unspecified=False)
[6]:
array([[0.89444663, 0.87047591, 0.92877668],
[0.67364892, 0.88375987, 0.0361073 ],
[0.07070214, 0.7652993 , 0.68495167]])
Structured array
For a structured array is works exactly same. Let’s use ebsd data we’ve previously loaded.
[7]:
EBSD
[7]:
array([(2., 0. , 0. , 11., 0., 160.45, 47.733, 233.82, 1.0211, 160., 255.),
(2., 0.2776, 0. , 10., 0., 160.15, 47.888, 233.74, 1.3246, 161., 255.),
(2., 0.5553, 0. , 10., 0., 160.14, 47.928, 234. , 1.3319, 161., 255.),
...,
(2., 26.932 , 20.546, 11., 0., 159.7 , 48.268, 235.35, 1.2136, 154., 255.),
(2., 27.21 , 20.546, 10., 0., 159.24, 48.137, 234.97, 0.861 , 159., 255.),
(2., 27.487 , 20.546, 10., 0., 158.98, 48.32 , 235.21, 1.125 , 162., 255.)],
dtype=[('Phase', '<f8'), ('X', '<f8'), ('Y', '<f8'), ('Bands', '<f8'), ('Error', '<f8'), ('Euler1', '<f8'), ('Euler2', '<f8'), ('Euler3', '<f8'), ('MAD', '<f8'), ('BC', '<f8'), ('BS', '<f8')])
Let’s say, we need X, Y, Phase, Euler1, Euler2, Euler3. Then the format string should be dxy__ddd. Also we don’t want to retrieve trailing MAD, BC, and BS, so we will explicitly specify ‘return_unspecified’ to False.
[8]:
column_parser(EBSD, "dxy__ddd", return_unspecified=False)
[8]:
array([( 0. , 0. , 2., 160.45, 47.733, 233.82),
( 0.2776, 0. , 2., 160.15, 47.888, 233.74),
( 0.5553, 0. , 2., 160.14, 47.928, 234. ), ...,
(26.932 , 20.546, 2., 159.7 , 48.268, 235.35),
(27.21 , 20.546, 2., 159.24, 48.137, 234.97),
(27.487 , 20.546, 2., 158.98, 48.32 , 235.21)],
dtype={'names': ['X', 'Y', 'Phase', 'Euler1', 'Euler2', 'Euler3'], 'formats': ['<f8', '<f8', '<f8', '<f8', '<f8', '<f8'], 'offsets': [8, 16, 0, 40, 48, 56], 'itemsize': 88})
Correctly processed data (with proper format string) is compatible with XYDLoader. You can load the data to the Layer.
[9]:
# Load data into the layer
from pyxc.core.layer import Layer
from pyxc.core.processor.arrays import column_parser
from pyxc.core.container import Container2D
from pyxc.core.loader import ImageLoader, XYDLoader
from pyxc.transform.homography import Homography
layer_ebsd = Layer(
data=column_parser(EBSD, format_string="dxy__ddd", return_unspecified=False),
container=Container2D,
dataloader=XYDLoader,
transformer=Homography,
)
layer_ebsd.container
[9]:
Container2D([( 0, nan, nan, 0. , 0. , 2., 160.45, 47.733, 233.82),
( 1, nan, nan, 0.2776, 0. , 2., 160.15, 47.888, 233.74),
( 2, nan, nan, 0.5553, 0. , 2., 160.14, 47.928, 234. ),
...,
(7497, nan, nan, 26.932 , 20.546, 2., 159.7 , 48.268, 235.35),
(7498, nan, nan, 27.21 , 20.546, 2., 159.24, 48.137, 234.97),
(7499, nan, nan, 27.487 , 20.546, 2., 158.98, 48.32 , 235.21)],
dtype=[('row', '<u2'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Phase', '<f8'), ('Euler1', '<f8'), ('Euler2', '<f8'), ('Euler3', '<f8')])
Loading image data
Loading image data is very straightforward. Image data are 2- or 3-dimensional array-like objects with the shape of (i, j, k). Each channel of the image will be stored as serialized form, with the column name of Channel_{integer}. Let’s make sample image data. Just use ImageLoader.
[10]:
im3channel = np.random.random((4, 4, 3))
Then, we can easily load the data into the array.
[11]:
# Load data into the layer
from pyxc.core.layer import Layer
from pyxc.core.processor.arrays import column_parser
from pyxc.core.container import Container2D
from pyxc.core.loader import ImageLoader, XYDLoader
from pyxc.transform.homography import Homography
layer_im3c = Layer(
data=im3channel,
container=Container2D,
dataloader=ImageLoader,
transformer=Homography,
)
layer_im3c.container
[11]:
Container2D([( 0, nan, nan, 0., 0., 0.07663469, 0.88212315, 0.78914158),
( 1, nan, nan, 1., 0., 0.80959609, 0.0925775 , 0.51279725),
( 2, nan, nan, 2., 0., 0.20136356, 0.1289199 , 0.17604795),
( 3, nan, nan, 3., 0., 0.44349787, 0.65429355, 0.02392033),
( 4, nan, nan, 0., 1., 0.23167965, 0.6432105 , 0.85546794),
( 5, nan, nan, 1., 1., 0.40806222, 0.67816605, 0.13953804),
( 6, nan, nan, 2., 1., 0.48789052, 0.75402624, 0.99041575),
( 7, nan, nan, 3., 1., 0.10106112, 0.11615678, 0.53378242),
( 8, nan, nan, 0., 2., 0.54398295, 0.20209857, 0.81530971),
( 9, nan, nan, 1., 2., 0.05061458, 0.07861014, 0.26327131),
(10, nan, nan, 2., 2., 0.44572252, 0.04409116, 0.86691355),
(11, nan, nan, 3., 2., 0.25417377, 0.17514218, 0.81463213),
(12, nan, nan, 0., 3., 0.79069704, 0.87524102, 0.77381171),
(13, nan, nan, 1., 3., 0.4289442 , 0.66303235, 0.72313443),
(14, nan, nan, 2., 3., 0.05647999, 0.59419312, 0.2199056 ),
(15, nan, nan, 3., 3., 0.86261879, 0.24816038, 0.13209068)],
dtype=[('row', '<u2'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Channel_0', '<f8'), ('Channel_1', '<f8'), ('Channel_2', '<f8')])
[12]:
x = np.linspace(0, 1, 10)
y = np.linspace(2, 10, 10)
data = np.random.random((10, 3))
example_container = Container2D(x_raw=x, y_raw=y, data=data)
As you can see, an example_container is now initialized correctly.
[13]:
example_container
[13]:
Container2D([(0, nan, nan, 0. , 2. , 0.33694682, 0.19854985, 0.10199133),
(1, nan, nan, 0.11111111, 2.8888888, 0.8322274 , 0.45244863, 0.2476717 ),
(2, nan, nan, 0.22222222, 3.7777777, 0.18030242, 0.72097806, 0.30218732),
(3, nan, nan, 0.33333334, 4.6666665, 0.35674441, 0.13494588, 0.41146857),
(4, nan, nan, 0.44444445, 5.5555553, 0.51191495, 0.55930943, 0.60476699),
(5, nan, nan, 0.5555556 , 6.4444447, 0.5274495 , 0.35951617, 0.79689532),
(6, nan, nan, 0.6666667 , 7.3333335, 0.22327619, 0.80358869, 0.36978613),
(7, nan, nan, 0.7777778 , 8.222222 , 0.27191543, 0.36553741, 0.02672543),
(8, nan, nan, 0.8888889 , 9.111111 , 0.24711506, 0.3925559 , 0.6586755 ),
(9, nan, nan, 1. , 10. , 0.69465431, 0.48741596, 0.01988862)],
dtype=[('row', '<u2'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Channel_0', '<f8'), ('Channel_1', '<f8'), ('Channel_2', '<f8')])
We didn’t provide the structured array. Therefore, the column names for the data are automatically determined such as Channel_0, Channel_1, and Channel_2.
[14]:
example_container.dtype.names
[14]:
('row', 'x', 'y', 'x_raw', 'y_raw', 'Channel_0', 'Channel_1', 'Channel_2')