How to use the process area

Follow along this step-by-step guide to learn about the ProcessArea.

Open in Google Colab

Open the how-to guide as an interactive notebook in Google Colab or download the notebook to run it locally.

Create a process area

A process area specifies the area of interest by a set of coordinates of the bottom left corner of each tile.

By default, a new instance of the ProcessArea has no coordinates.
You can access the coordinates of the process area with the coordinates attribute, which is a numpy array of shape (n, 2) and data type int32.

import aviary

process_area = aviary.ProcessArea()

print(process_area.coordinates)

Output

[]

If you already have the coordinates, you can pass them to the initializer of the ProcessArea.

import numpy as np

coordinates = np.array(
    [
        [363084, 5715326],
        [363212, 5715326],
        [363084, 5715454],
        [363212, 5715454],
    ],
    dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]]

We can visualize the process area given the tile size.

You can set the coordinates of an already created process area with the coordinates attribute.

coordinates = np.array(
    [
        [363084, 5715326],
        [363212, 5715326],
    ],
    dtype=np.int32,
)
process_area.coordinates = coordinates

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]]

We can visualize the process area given the tile size.

A process area is an iterable object, so it supports indexing, length and iteration.

You can access the coordinates of the process area with the index operator.

coordinates_1 = process_area[0]
coordinates_2 = process_area[1]

print(coordinates_1)
print(coordinates_2)

Output

(363084, 5715326)
(363212, 5715326)

You can slice the process area to create a new process area of a subset of the coordinates with the index operator and the : operator.

sliced_process_area = process_area[:-1]

print(sliced_process_area.coordinates)

Output

[[ 363084 5715326]]

A process area has a length, which is equal to the number of coordinates, i.e. the number of tiles.

print(len(process_area))

Output

You can iterate over the coordinates of the process area.

for coordinates in process_area:
    print(coordinates)

Output

(363084, 5715326)
(363212, 5715326)

Create a process area from a bounding box

You can create a process area from a bounding box with the from_bounding_box class method.

bounding_box = aviary.BoundingBox(
    x_min=363084,
    y_min=5715326,
    x_max=363340,
    y_max=5715582,
)
process_area = aviary.ProcessArea.from_bounding_box(
    bounding_box=bounding_box,
    tile_size=128,
    quantize=False,
)

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]]

We can visualize the process area given the tile size.
The red polygon represents the bounding box.

You can set the tile size of the process area with the tile_size parameter.
If the bounding box is not divisible by the tile size, the tiles will extend beyond the bounding box.

bounding_box = aviary.BoundingBox(
    x_min=363084,
    y_min=5715326,
    x_max=363340,
    y_max=5715582,
)
process_area = aviary.ProcessArea.from_bounding_box(
    bounding_box=bounding_box,
    tile_size=96,
    quantize=False,
)

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363180 5715326]
 [ 363276 5715326]
 [ 363084 5715422]
 [ 363180 5715422]
 [ 363276 5715422]
 [ 363084 5715518]
 [ 363180 5715518]
 [ 363276 5715518]]

We can visualize the process area given the tile size.
The red polygon represents the bounding box.

You can quantize the process area with the quantize parameter.
If the coordinates are not divisible by the tile size, the coordinates will be quantized to the tile size.
This might be useful when you want to ensure matching tiles for different process areas.

bounding_box = aviary.BoundingBox(
    x_min=363084,
    y_min=5715326,
    x_max=363340,
    y_max=5715582,
)
process_area = aviary.ProcessArea.from_bounding_box(
    bounding_box=bounding_box,
    tile_size=128,
    quantize=True,
)

print(process_area.coordinates)

Output

[[ 363008 5715200]
 [ 363136 5715200]
 [ 363264 5715200]
 [ 363008 5715328]
 [ 363136 5715328]
 [ 363264 5715328]
 [ 363008 5715456]
 [ 363136 5715456]
 [ 363264 5715456]]

We can visualize the process area given the tile size.
The red polygon represents the bounding box.

Create a process area from a geodataframe

You can create a process area from a geodataframe with the from_gdf class method.

import geopandas as gpd
from shapely.geometry import box

gdf = gpd.GeoDataFrame(
    geometry=[box(363084, 5715326, 363340, 5715582)],
    crs='EPSG:25832',
)
process_area = aviary.ProcessArea.from_gdf(
    gdf=gdf,
    tile_size=128,
    quantize=False,
)

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]]

We can visualize the process area given the tile size.
The red polygon represents the geodataframe.

The geodataframe may contain multiple polygons, e.g. the northern districts of Gelsenkirchen.

url = (
    'https://raw.githubusercontent.com/geospaitial-lab/aviary/main'
    '/docs/how_to_guides/api/data/districts.geojson'
)
gdf = gpd.read_file(url)
process_area = aviary.ProcessArea.from_gdf(
    gdf=gdf,
    tile_size=256,
    quantize=True,
)

print(process_area.coordinates)

Output

[[ 364288 5713664]
 [ 364544 5713664]
 [ 364800 5713664]
 ...
 [ 363008 5721856]
 [ 363264 5721856]
 [ 363520 5721856]]

We can visualize the process area given the tile size.
The red polygons represent the districts.

Create a process area from a json string

You can create a process area from a json string with the from_json class method.

json_string = (
    '[[363084, 5715326], '
    '[363212, 5715326], '
    '[363084, 5715454], '
    '[363212, 5715454]]'
)
process_area = aviary.ProcessArea.from_json(json_string=json_string)

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]]

We can visualize the process area given the tile size.

Add, subtract or intersect process areas

You can add two process areas with the + operator.
If the process areas overlap, the resulting process area will contain the union of the two process areas.

coordinates_1 = np.array(
    [
        [363084, 5715326],
        [363212, 5715326],
        [363084, 5715454],
        [363212, 5715454],
    ],
    dtype=np.int32,
)
process_area_1 = aviary.ProcessArea(coordinates=coordinates_1)

coordinates_2 = np.array(
    [
        [363212, 5715454],
        [363340, 5715454],
        [363212, 5715582],
        [363340, 5715582],
    ],
    dtype=np.int32,
)
process_area_2 = aviary.ProcessArea(coordinates=coordinates_2)

print(process_area_1.coordinates)
print(process_area_2.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]]
[[ 363212 5715454]
 [ 363340 5715454]
 [ 363212 5715582]
 [ 363340 5715582]]

process_area = process_area_1 + process_area_2

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]
 [ 363340 5715454]
 [ 363212 5715582]
 [ 363340 5715582]]

We can visualize the process area given the tile size.
The red polygons represent the first process area and the blue polygons represent the second process area.

You can subtract two process areas with the - operator.

coordinates_1 = np.array(
    [
        [363084, 5715326],
        [363212, 5715326],
        [363084, 5715454],
        [363212, 5715454],
    ],
    dtype=np.int32,
)
process_area_1 = aviary.ProcessArea(coordinates=coordinates_1)

coordinates_2 = np.array(
    [
        [363212, 5715454],
        [363340, 5715454],
        [363212, 5715582],
        [363340, 5715582],
    ],
    dtype=np.int32,
)
process_area_2 = aviary.ProcessArea(coordinates=coordinates_2)

print(process_area_1.coordinates)
print(process_area_2.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]]
[[ 363212 5715454]
 [ 363340 5715454]
 [ 363212 5715582]
 [ 363340 5715582]]

process_area = process_area_1 - process_area_2

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]]

We can visualize the process area given the tile size.
The red polygons represent the first process area and the blue polygons represent the second process area.

You can intersect two process areas with the & operator.

coordinates_1 = np.array(
    [
        [363084, 5715326],
        [363212, 5715326],
        [363084, 5715454],
        [363212, 5715454],
    ],
    dtype=np.int32,
)
process_area_1 = aviary.ProcessArea(coordinates=coordinates_1)

coordinates_2 = np.array(
    [
        [363212, 5715454],
        [363340, 5715454],
        [363212, 5715582],
        [363340, 5715582],
    ],
    dtype=np.int32,
)
process_area_2 = aviary.ProcessArea(coordinates=coordinates_2)

print(process_area_1.coordinates)
print(process_area_2.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]]
[[ 363212 5715454]
 [ 363340 5715454]
 [ 363212 5715582]
 [ 363340 5715582]]

process_area = process_area_1 & process_area_2

print(process_area.coordinates)

Output

[[ 363212 5715454]]

We can visualize the process area given the tile size.
The red polygons represent the first process area and the blue polygons represent the second process area.

Append coordinates to the process area

You can append coordinates to the process area with the append method.

coordinates = np.array(
    [
        [363084, 5715326],
        [363212, 5715326],
        [363084, 5715454],
        [363212, 5715454],
    ],
    dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]]

process_area = process_area.append((363340, 5715582))

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]
 [ 363340 5715582]]

We can visualize the process area given the tile size.

If you want to append coordinates that already exist, the process area will not change.

process_area = process_area.append((363340, 5715582))

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]
 [ 363340 5715582]]

Chunk the process area

You can chunk the process area into multiple process areas with the chunk method.
This might be useful when you want to run multiple pipelines in distributed environments.

coordinates = np.array(
    [
        [363084, 5715326],
        [363212, 5715326],
        [363084, 5715454],
        [363212, 5715454],
    ],
    dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]]

process_areas = process_area.chunk(num_chunks=2)

for process_area in process_areas:
    print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]]
[[ 363084 5715454]
 [ 363212 5715454]]

Filter the process area

You can filter the process area with the filter method.
This method applies a CoordinatesFilter to the coordinates of the process area.

In this example, we will filter the process area based on geospatial data with the GeospatialFilter.
You can remove coordinates of tiles that are within the polygons in the geodataframe with the difference mode.

coordinates = np.array(
    [
        [363084, 5715326],
        [363212, 5715326],
        [363084, 5715454],
        [363212, 5715454],
    ],
    dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)

gdf = gpd.GeoDataFrame(
    geometry=[box(363212, 5715454, 363468, 5715710)],
    crs='EPSG:25832',
)

print(process_area.coordinates)
print(gdf)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]]
                                            geometry
0  POLYGON ((363468 5715454, 363468 5715710, 3632...

from aviary.geodata import GeospatialFilter

geospatial_filter = GeospatialFilter(
    tile_size=128,
    gdf=gdf,
    mode=aviary.GeospatialFilterMode.DIFFERENCE,
)
process_area = process_area.filter(coordinates_filter=geospatial_filter)

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]]

We can visualize the process area given the tile size.
The red polygon represents the geodataframe.

You can remove coordinates of tiles that don't intersect with the polygons in the geodataframe with the intersection mode.

coordinates = np.array(
    [
        [363084, 5715326],
        [363212, 5715326],
        [363084, 5715454],
        [363212, 5715454],
    ],
    dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)

gdf = gpd.GeoDataFrame(
    geometry=[box(363212, 5715454, 363468, 5715710)],
    crs='EPSG:25832',
)

print(process_area.coordinates)
print(gdf)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]]
                                            geometry
0  POLYGON ((363468 5715454, 363468 5715710, 3632...

geospatial_filter = GeospatialFilter(
    tile_size=128,
    gdf=gdf,
    mode=aviary.GeospatialFilterMode.INTERSECTION,
)
process_area = process_area.filter(coordinates_filter=geospatial_filter)

print(process_area.coordinates)

Output

[[ 363212 5715454]]

We can visualize the process area given the tile size.
The red polygon represents the geodataframe.

Convert the process area to a geodataframe

You can convert the process area to a geodataframe with the to_gdf method.

coordinates = np.array(
    [
        [363084, 5715326],
        [363212, 5715326],
        [363084, 5715454],
        [363212, 5715454],
    ],
    dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]]

gdf = process_area.to_gdf(
    tile_size=128,
    epsg_code=25832,
)

print(gdf)

Output

                                            geometry
0  POLYGON ((363212 5715326, 363212 5715454, 3630...
1  POLYGON ((363340 5715326, 363340 5715454, 3632...
2  POLYGON ((363212 5715454, 363212 5715582, 3630...
3  POLYGON ((363340 5715454, 363340 5715582, 3632...

Convert the process area to a json string

You can convert the process area to a json string with the to_json method.

coordinates = np.array(
    [
        [363084, 5715326],
        [363212, 5715326],
        [363084, 5715454],
        [363212, 5715454],
    ],
    dtype=np.int32,
)
process_area = aviary.ProcessArea(coordinates=coordinates)

print(process_area.coordinates)

Output

[[ 363084 5715326]
 [ 363212 5715326]
 [ 363084 5715454]
 [ 363212 5715454]]

json_string = process_area.to_json()

print(json_string)

Output

[[363084, 5715326], [363212, 5715326], [363084, 5715454], [363212, 5715454]]