Tutorial 4. Performing queries
In this tutorial, we will correct sampling distortions. Let’s setup the simple layer object.
[1]:
import numpy as np
from pyxc.core.layer import Layer
from pyxc.core.processor.arrays import column_parser
from pyxc.core.container import Container2D
from pyxc.core.loader import ImageLoader, XYDLoader
from pyxc.transform.homography import Homography
EBSD = np.genfromtxt(
"./data/SiC_in_NiSA.ctf", dtype=float, skip_header=15, delimiter="\t", names=True
)
layer_ebsd = Layer(
data=column_parser(EBSD, format_string="dxydddddddd"),
container=Container2D,
dataloader=XYDLoader,
transformer=Homography,
)
You have two choices to query data. You can either query by single a coordinate or multiple coordinates.
The first option provides better flexibility. You can receive correlation results and you can run your own analysis. The second option provides better convenience but is rather limited.
Let’s see!
Single point query
You can query the data by a single object. Several columns are additionally provided along with the columns contained in the container object. 1. query_index: for internal reference. This will be dealt little later. 2. distance: Euclidean distance between given coordinate and nearby point. 3. x-coordinates: query x coordinate 4. y-coordinates: query y coordinate
Also, note that we’ve got several x and y related columns. Read this carefully: 1. x: distortion-corrected x 2. y: distortion_corrected y 3. x_raw: initially supplied x value, before correction. 4. Y_raw: initially supplied y value, before correction. 5. x-coordinates: x for query 6. y-coordinates: y for query
[2]:
layer_ebsd.query(5, 5)
[2]:
array([(1818, 4.9977, 4.9977, 4.9977, 4.9977, 2., 9., 0., 160.46, 48.224, 234.31, 1.1127, 161., 255., 0, 0.00325239, 5, 5)],
dtype=[('row', '<i4'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Phase', '<f8'), ('Bands', '<f8'), ('Error', '<f8'), ('Euler1', '<f8'), ('Euler2', '<f8'), ('Euler3', '<f8'), ('MAD', '<f8'), ('BC', '<f8'), ('BS', '<f8'), ('query_index', '<i8'), ('distance', '<f8'), ('x-coordinates', '<i8'), ('y-coordinates', '<i8')])
There are two important options, cut-off and output_number. If your data points’ nearest neighbour distances are larger than a specific cutoff, you might not get results. For example,
[3]:
layer_ebsd.query(5, 5, cutoff=0.0001)
/home/docs/checkouts/readthedocs.org/user_builds/pyxc/envs/latest/lib/python3.11/site-packages/pyxc/core/layer.py:317: UserWarning: Couldn't find the matching point. Please ignore rows containing NaN.
warn("Couldn't find the matching point. Please ignore rows containing NaN.")
[3]:
array([],
dtype=[('row', '<i4'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Phase', '<f8'), ('Bands', '<f8'), ('Error', '<f8'), ('Euler1', '<f8'), ('Euler2', '<f8'), ('Euler3', '<f8'), ('MAD', '<f8'), ('BC', '<f8'), ('BS', '<f8'), ('query_index', '<i8'), ('distance', '<f8'), ('x-coordinates', '<f8'), ('y-coordinates', '<f8')])
Furthermore, you can get more datapoints, if you want, by explicitly specifying the cut-off and output_number parameters.
[4]:
layer_ebsd.query(x=5, y=5, cutoff=5, output_number=5)
[4]:
array([(1818, 4.9977, 4.9977, 4.9977, 4.9977, 2., 9., 0., 160.46, 48.224, 234.31, 1.1127, 161., 255., 0, 0.00325239, 5, 5),
(1918, 4.9977, 5.2753, 4.9977, 5.2753, 2., 10., 0., 161. , 48.286, 234.02, 1.1432, 167., 255., 0, 0.27530963, 5, 5),
(1819, 5.2753, 4.9977, 5.2753, 4.9977, 2., 10., 0., 161.09, 48.425, 233.9 , 1.1092, 151., 255., 0, 0.27530963, 5, 5),
(1817, 4.72 , 4.9977, 4.72 , 4.9977, 2., 10., 0., 160.55, 48.13 , 233.82, 1.3118, 159., 255., 0, 0.28000965, 5, 5),
(1718, 4.9977, 4.72 , 4.9977, 4.72 , 2., 9., 0., 160.78, 48.349, 234.04, 1.0346, 159., 255., 0, 0.28000965, 5, 5)],
dtype=[('row', '<i4'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Phase', '<f8'), ('Bands', '<f8'), ('Error', '<f8'), ('Euler1', '<f8'), ('Euler2', '<f8'), ('Euler3', '<f8'), ('MAD', '<f8'), ('BC', '<f8'), ('BS', '<f8'), ('query_index', '<i8'), ('distance', '<f8'), ('x-coordinates', '<i8'), ('y-coordinates', '<i8')])
Multi point query
Let’s do it more conveniently! You can retrieve data from multiple points at once. If data is large, execute_queries might take approximately one or two minutes, but it is perfectly normal. It is preparing parallel execution.
[5]:
xs = [4.1, 4.2, 4.3]
ys = [4.5, 4.6, 4.7]
layer_ebsd.execute_queries(xs, ys)
Maximum worker: 6
Executing queries: 100%|██████████| 3/3 [00:00<00:00, 13781.94it/s]
[5]:
array([(1615, 4.1647, 4.4424, 4.1647, 4.4424, 2., 10., 0., 161.55, 48.938, 233.88, 0.96 , 172., 255., 0, 0.0866248 , 4.1, 4.5),
(1715, 4.1647, 4.72 , 4.1647, 4.72 , 2., 10., 0., 161.2 , 49.065, 233.51, 1.0923, 162., 255., 1, 0.12508412, 4.2, 4.6),
(1715, 4.1647, 4.72 , 4.1647, 4.72 , 2., 10., 0., 161.2 , 49.065, 233.51, 1.0923, 162., 255., 2, 0.13677015, 4.3, 4.7)],
dtype=[('row', '<i4'), ('x', '<f4'), ('y', '<f4'), ('x_raw', '<f4'), ('y_raw', '<f4'), ('Phase', '<f8'), ('Bands', '<f8'), ('Error', '<f8'), ('Euler1', '<f8'), ('Euler2', '<f8'), ('Euler3', '<f8'), ('MAD', '<f8'), ('BC', '<f8'), ('BS', '<f8'), ('query_index', '<i8'), ('distance', '<f8'), ('x-coordinates', '<f8'), ('y-coordinates', '<f8')])
Query performance tip
Please use small cut-off and small output_number. As you can see, by reducing the cut-off parameter, the performance enhances for almost 5 times.
[13]:
%%timeit
layer_ebsd.query(5, 5, cutoff=10, output_number=1000)
7.33 ms ± 179 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
[14]:
%%timeit
layer_ebsd.query(5, 5, cutoff=1, output_number=1000)
807 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
[15]:
%%timeit
layer_ebsd.query(5, 5, cutoff=1, output_number=10)
584 µs ± 11.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
[16]:
%%timeit
layer_ebsd.query(5, 5, cutoff=1, output_number=1)
530 µs ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)