How to use the process area
Follow along this step-by-step guide to learn about the ProcessArea
.
Open in Google Colab
Open the how-to guide as an interactive notebook in Google Colab or download the notebook to run it locally.
Create a process area
A process area specifies the area of interest by a set of coordinates of the bottom left corner of each tile.
By default, a new instance of the ProcessArea
has no coordinates.
You can access the coordinates of the process area with the coordinates
attribute,
which is a numpy array of shape (n, 2) and data type int32.
import aviary
process_area = aviary.ProcessArea()
print(process_area.coordinates)
[]
If you already have the coordinates, you can pass them to the initializer of the ProcessArea
.
import numpy as np
coordinates = np.array(
[
[363084, 5715326],
[363212, 5715326],
[363084, 5715454],
[363212, 5715454],
],
dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]]
We can visualize the process area given the tile size.
You can set the coordinates of an already created process area with the coordinates
attribute.
coordinates = np.array(
[
[363084, 5715326],
[363212, 5715326],
],
dtype=np.int32,
)
process_area.coordinates = coordinates
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]]
We can visualize the process area given the tile size.
A process area is an iterable object, so it supports indexing, length and iteration.
You can access the coordinates of the process area with the index operator.
coordinates_1 = process_area[0]
coordinates_2 = process_area[1]
print(coordinates_1)
print(coordinates_2)
(363084, 5715326)
(363212, 5715326)
You can slice the process area to create a new process area of a subset of the coordinates with the index operator
and the :
operator.
sliced_process_area = process_area[:-1]
print(sliced_process_area.coordinates)
[[ 363084 5715326]]
A process area has a length, which is equal to the number of coordinates, i.e. the number of tiles.
print(len(process_area))
2
You can iterate over the coordinates of the process area.
for coordinates in process_area:
print(coordinates)
(363084, 5715326)
(363212, 5715326)
Create a process area from a bounding box
You can create a process area from a bounding box with the from_bounding_box
class method.
bounding_box = aviary.BoundingBox(
x_min=363084,
y_min=5715326,
x_max=363340,
y_max=5715582,
)
process_area = aviary.ProcessArea.from_bounding_box(
bounding_box=bounding_box,
tile_size=128,
quantize=False,
)
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]]
We can visualize the process area given the tile size.
The red polygon represents the bounding box.
You can set the tile size of the process area with the tile_size
parameter.
If the bounding box is not divisible by the tile size, the tiles will extend beyond the bounding box.
bounding_box = aviary.BoundingBox(
x_min=363084,
y_min=5715326,
x_max=363340,
y_max=5715582,
)
process_area = aviary.ProcessArea.from_bounding_box(
bounding_box=bounding_box,
tile_size=96,
quantize=False,
)
print(process_area.coordinates)
[[ 363084 5715326]
[ 363180 5715326]
[ 363276 5715326]
[ 363084 5715422]
[ 363180 5715422]
[ 363276 5715422]
[ 363084 5715518]
[ 363180 5715518]
[ 363276 5715518]]
We can visualize the process area given the tile size.
The red polygon represents the bounding box.
You can quantize the process area with the quantize
parameter.
If the coordinates are not divisible by the tile size, the coordinates will be quantized to the tile size.
This might be useful when you want to ensure matching tiles for different process areas.
bounding_box = aviary.BoundingBox(
x_min=363084,
y_min=5715326,
x_max=363340,
y_max=5715582,
)
process_area = aviary.ProcessArea.from_bounding_box(
bounding_box=bounding_box,
tile_size=128,
quantize=True,
)
print(process_area.coordinates)
[[ 363008 5715200]
[ 363136 5715200]
[ 363264 5715200]
[ 363008 5715328]
[ 363136 5715328]
[ 363264 5715328]
[ 363008 5715456]
[ 363136 5715456]
[ 363264 5715456]]
We can visualize the process area given the tile size.
The red polygon represents the bounding box.
Create a process area from a geodataframe
You can create a process area from a geodataframe with the from_gdf
class method.
import geopandas as gpd
from shapely.geometry import box
gdf = gpd.GeoDataFrame(
geometry=[box(363084, 5715326, 363340, 5715582)],
crs='EPSG:25832',
)
process_area = aviary.ProcessArea.from_gdf(
gdf=gdf,
tile_size=128,
quantize=False,
)
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]]
We can visualize the process area given the tile size.
The red polygon represents the geodataframe.
The geodataframe may contain multiple polygons, e.g. the northern districts of Gelsenkirchen.
url = (
'https://raw.githubusercontent.com/geospaitial-lab/aviary/main'
'/docs/how_to_guides/api/data/districts.geojson'
)
gdf = gpd.read_file(url)
process_area = aviary.ProcessArea.from_gdf(
gdf=gdf,
tile_size=256,
quantize=True,
)
print(process_area.coordinates)
[[ 364288 5713664]
[ 364544 5713664]
[ 364800 5713664]
...
[ 363008 5721856]
[ 363264 5721856]
[ 363520 5721856]]
We can visualize the process area given the tile size.
The red polygons represent the districts.
Create a process area from a json string
You can create a process area from a json string with the from_json
class method.
json_string = (
'[[363084, 5715326], '
'[363212, 5715326], '
'[363084, 5715454], '
'[363212, 5715454]]'
)
process_area = aviary.ProcessArea.from_json(json_string=json_string)
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]]
We can visualize the process area given the tile size.
Add, subtract or intersect process areas
You can add two process areas with the +
operator.
If the process areas overlap, the resulting process area will contain the union of the two process areas.
coordinates_1 = np.array(
[
[363084, 5715326],
[363212, 5715326],
[363084, 5715454],
[363212, 5715454],
],
dtype=np.int32,
)
process_area_1 = aviary.ProcessArea(coordinates=coordinates_1)
coordinates_2 = np.array(
[
[363212, 5715454],
[363340, 5715454],
[363212, 5715582],
[363340, 5715582],
],
dtype=np.int32,
)
process_area_2 = aviary.ProcessArea(coordinates=coordinates_2)
print(process_area_1.coordinates)
print(process_area_2.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]]
[[ 363212 5715454]
[ 363340 5715454]
[ 363212 5715582]
[ 363340 5715582]]
process_area = process_area_1 + process_area_2
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]
[ 363340 5715454]
[ 363212 5715582]
[ 363340 5715582]]
We can visualize the process area given the tile size.
The red polygons represent the first process area and the blue polygons represent the second process area.
You can subtract two process areas with the -
operator.
coordinates_1 = np.array(
[
[363084, 5715326],
[363212, 5715326],
[363084, 5715454],
[363212, 5715454],
],
dtype=np.int32,
)
process_area_1 = aviary.ProcessArea(coordinates=coordinates_1)
coordinates_2 = np.array(
[
[363212, 5715454],
[363340, 5715454],
[363212, 5715582],
[363340, 5715582],
],
dtype=np.int32,
)
process_area_2 = aviary.ProcessArea(coordinates=coordinates_2)
print(process_area_1.coordinates)
print(process_area_2.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]]
[[ 363212 5715454]
[ 363340 5715454]
[ 363212 5715582]
[ 363340 5715582]]
process_area = process_area_1 - process_area_2
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]]
We can visualize the process area given the tile size.
The red polygons represent the first process area and the blue polygons represent the second process area.
You can intersect two process areas with the &
operator.
coordinates_1 = np.array(
[
[363084, 5715326],
[363212, 5715326],
[363084, 5715454],
[363212, 5715454],
],
dtype=np.int32,
)
process_area_1 = aviary.ProcessArea(coordinates=coordinates_1)
coordinates_2 = np.array(
[
[363212, 5715454],
[363340, 5715454],
[363212, 5715582],
[363340, 5715582],
],
dtype=np.int32,
)
process_area_2 = aviary.ProcessArea(coordinates=coordinates_2)
print(process_area_1.coordinates)
print(process_area_2.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]]
[[ 363212 5715454]
[ 363340 5715454]
[ 363212 5715582]
[ 363340 5715582]]
process_area = process_area_1 & process_area_2
print(process_area.coordinates)
[[ 363212 5715454]]
We can visualize the process area given the tile size.
The red polygons represent the first process area and the blue polygons represent the second process area.
Append coordinates to the process area
You can append coordinates to the process area with the append
method.
coordinates = np.array(
[
[363084, 5715326],
[363212, 5715326],
[363084, 5715454],
[363212, 5715454],
],
dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]]
process_area = process_area.append((363340, 5715582))
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]
[ 363340 5715582]]
We can visualize the process area given the tile size.
If you want to append coordinates that already exist, the process area will not change.
process_area = process_area.append((363340, 5715582))
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]
[ 363340 5715582]]
Chunk the process area
You can chunk the process area into multiple process areas with the chunk
method.
This might be useful when you want to run multiple pipelines in distributed environments.
coordinates = np.array(
[
[363084, 5715326],
[363212, 5715326],
[363084, 5715454],
[363212, 5715454],
],
dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]]
process_areas = process_area.chunk(num_chunks=2)
for process_area in process_areas:
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]]
[[ 363084 5715454]
[ 363212 5715454]]
Filter the process area
You can filter the process area with the filter
method.
This method applies a CoordinatesFilter
to the coordinates of the process area.
In this example, we will filter the process area based on geospatial data with the
GeospatialFilter
.
You can remove coordinates of tiles that are within the polygons in the geodataframe with the difference mode.
coordinates = np.array(
[
[363084, 5715326],
[363212, 5715326],
[363084, 5715454],
[363212, 5715454],
],
dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)
gdf = gpd.GeoDataFrame(
geometry=[box(363212, 5715454, 363468, 5715710)],
crs='EPSG:25832',
)
print(process_area.coordinates)
print(gdf)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]]
geometry
0 POLYGON ((363468 5715454, 363468 5715710, 3632...
from aviary.geodata import GeospatialFilter
geospatial_filter = GeospatialFilter(
tile_size=128,
gdf=gdf,
mode=aviary.GeospatialFilterMode.DIFFERENCE,
)
process_area = process_area.filter(coordinates_filter=geospatial_filter)
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]]
We can visualize the process area given the tile size.
The red polygon represents the geodataframe.
You can remove coordinates of tiles that don't intersect with the polygons in the geodataframe with the intersection mode.
coordinates = np.array(
[
[363084, 5715326],
[363212, 5715326],
[363084, 5715454],
[363212, 5715454],
],
dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)
gdf = gpd.GeoDataFrame(
geometry=[box(363212, 5715454, 363468, 5715710)],
crs='EPSG:25832',
)
print(process_area.coordinates)
print(gdf)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]]
geometry
0 POLYGON ((363468 5715454, 363468 5715710, 3632...
geospatial_filter = GeospatialFilter(
tile_size=128,
gdf=gdf,
mode=aviary.GeospatialFilterMode.INTERSECTION,
)
process_area = process_area.filter(coordinates_filter=geospatial_filter)
print(process_area.coordinates)
[[ 363212 5715454]]
We can visualize the process area given the tile size.
The red polygon represents the geodataframe.
Convert the process area to a geodataframe
You can convert the process area to a geodataframe with the to_gdf
method.
coordinates = np.array(
[
[363084, 5715326],
[363212, 5715326],
[363084, 5715454],
[363212, 5715454],
],
dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]]
gdf = process_area.to_gdf(
tile_size=128,
epsg_code=25832,
)
print(gdf)
geometry
0 POLYGON ((363212 5715326, 363212 5715454, 3630...
1 POLYGON ((363340 5715326, 363340 5715454, 3632...
2 POLYGON ((363212 5715454, 363212 5715582, 3630...
3 POLYGON ((363340 5715454, 363340 5715582, 3632...
Convert the process area to a json string
You can convert the process area to a json string with the to_json
method.
coordinates = np.array(
[
[363084, 5715326],
[363212, 5715326],
[363084, 5715454],
[363212, 5715454],
],
dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)
print(process_area.coordinates)
[[ 363084 5715326]
[ 363212 5715326]
[ 363084 5715454]
[ 363212 5715454]]
json_string = process_area.to_json()
print(json_string)
[[363084, 5715326], [363212, 5715326], [363084, 5715454], [363212, 5715454]]