These tutorials make use of the publicly available Petawawa Research Forest (PRF) dataset provided by the Canadian Forest Service (CFS). The PRF is the oldest research forest in Canada, dating back to 1918. As such, the area has a rich historical record of remote sensing and forest inventory data, and has recently become a remote sensing "supersite".
The tutorials are also hosted at the following link:
https://subornaa.github.io/Data-Analytics-Tutorials
You can view the full PRF dataset at the following link:
https://opendata.nfis.org/mapserver/PRF.html
These tutorials are designed to be run on Google Collab. Jupyter notebooks can be downloaded and run locally, but a local python environment will need to be setup. We recommend using the uv python package manager to do this. uv can install packages listed in the pyproject.toml
file located in this repository. If you wish to set up a local python environment using an alternative package manager, such as conda, please consult the pyproject.toml
to get the correct package versions.
The tutorials will use several different datasets from the PRF, these are described in more detail in the following sections.
All data used in the tutorials is available on Google Drive at this link
File: trees.csv
Individual tree measurements were taken at permanent sample plots (PSPs) across the PRF in 2018. A data dictionary is provided below summarizing the trees.csv
. In this data, each tree is a row and each column is an attribute (e.g., height).
Column | Definition |
---|---|
PlotName | Unique plot identifier |
TreeID | Unique tree identifier |
species | Tree common species name |
Origin | Origin. N = natural (includes coppice), P = planted |
Status | Status. L = Live, D = Dead (only includes decayclass 1 & 2) |
DBH | Diameter at breast height (cm) |
CrownClass | Crown class (D = Dominant; C = Codominant; I = Intermediate; OS = Overtopped/suppressed; A = Anomaly; E = Emergent) |
DecayClass | Decay class (1 = recently dead, top is intact; 2 = greater than 50 % coarse; >3 = dead for several years) |
height | Tree top height in meters |
baha | Basal area/ha = Dbh * Dbh * 0.00007854 * stems |
codom | Whether a tree is codominant or not (either Y or N) |
mvol | Gross merchantable volume (m³/ha) |
tvol | Gross total volume (m³/ha) |
biomass | Aboveground biomass (kg/ha) |
size | Sawlog size category (Poles, Under, Small, Medium, Large) |
File: plots.gpkg
Field plots in the PRF containing the trees in trees.csv
are georeferenced, and their locations are provided in the plots.gpkg
file. This is a spatial point dataset stored in a GeoPackage (.gpkg) file.
Each field plot is circular, with a radius of 14.1m (625m^2). Note that this dataset is in a point format (i.e., only XY coordinates of plot centers).
Column | Definition |
---|---|
Plot | Unique identifier for plot, equivalent to "PlotName" column in trees.csv |
Date | String representating the date when plot was visited. |
Northing | Y coordinate in the CRS (see description above) |
Easting | X coordinate in the CRS (see description above) |
Source | Device used to collect the coordinates of the plot center. Note that the spatial accuracy of coordinates vary between devices. |
File: boundary.gpkg
Polygon dataset including the boundary of the Petawawa Research Forest (PRF). The data is projected in the WGS 84 / UTM zone 18N coordinate reference system (CRS). All other spatial datasets are projected in this CRS unless otherwise specified.
File: water.gpkg
Polygons delineating water bodies in the PRF including lakes, wetlands, rivers, and creeks.
File: als_metrics.tif
LiDAR (airborne laser scanning, ALS) derived metrics (i.e., statistical summaries) in raster raster format spanning the PRF. This raster can be used as a proxy for the forest canopy height.
The raster contains 67 bands (each band is one metric) which are descibed below. Some descriptions are for a range of bands due to redundancy.
Band(s) | Metric Name(s) | Description |
---|---|---|
B1 | avg_95 | Average height trimmed at 95% of max height |
B2 | avg | Average height |
B3 - B11 | b10, b20, b30, ... , b90 | Decile % of points between 0 and 99% height |
B12 - B24 | dns_2m, dns_4m, dns_5m, ... , dns_25m | Density percentage of all returns Xm - 49m divided by all returns |
B25 | kur_95 | Kurtosis height trimmed at 95% of max height |
B26 - 38 | p01, p05, p10, ... , p99 | Height percentiles |
B39 | qav | Average square height |
B40 | skew_95 | Skewness height trimmed at 95% of max height |
B41 - B64 | d0_2, d2_4, d4_6, ..., d4 8663 6_48 | Number of returns from X-Y meters divided by all returns |
B65 | std_95 | Standard deviation of height trimmed at 95% of max height |
B66 | vci_1mbin | Vertical Complexity Index (VCI) with a 1 m bin |
B67 | vci_0.5bin | Vertical Complexity Index (VCI) with a 0.5 m bin |
Files:
- petawawa_s2_2018.tif
- petawawa_s2_2024.tif
Sentinel-2 (S2) is a European Space Agency multispectral satellite constellation including 3 sensors. S2 imagery contains 12 bands spanning the visible, near infrared, and shortwave infrared portions of the electromagnetic spectrum, with a spatial resolution ranging from 10m - 60m depending on the band. The table below summarized all the S2 bands. For the purpose of this analyis, all S2 imagery was resampled to a 10m resolution, but understand that this does not account for the fact that some bands are inherently lower resolution.
Band | Wavelength (S2A / S2B) | Description |
---|---|---|
B1 | 443.9nm / 442.3nm | Aerosols |
B2 | 496.6nm / 492.1nm | Blue |
B3 | 560nm / 559nm | Green |
B4 | 664.5nm / 665nm | Red |
B5 | 703.9nm / 703.8nm | Red Edge 1 |
B6 | 740.2nm / 739.1nm | Red Edge 2 |
B7 | 782.5nm / 779.7nm | Red Edge 3 |
B8 | 835.1nm / 833nm | NIR |
B8A | 864.8nm / 864nm | Red Edge 4 |
B9 | 945nm / 943.2nm | Water vapor |
B11 | 1613.7nm / 1610.4nm | SWIR 1 |
B12 | 2202.4nm / 2185.7nm | SWIR 2 |
We include two time steps of S2 imagery to support temporal anlysis, including imagery of the PRF from 2018 and 2024.
S2 imagery was processed in Google Earth Engine (GEE) using the following script:
https://code.earthengine.google.com/e0a63220c15068398d6d432be5e3ccb8
The dataset is described in more detail at the link below:
https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED
File: forest_point_cloud.las
Files:
- forest_point_cloud.las
- forest_point_cloud_footprint.gpkg
LiDAR (airborne laser scanning) point cloud of a forested subset area in the PRF. Data is provided in the LAS file format, and includes XYZ coordinates of LiDAR returns.
The spatial coverage (i.e., footprint) of the LAS file is provided in the associated forest_point_cloud_footprint.gpkg
file.
Artificial intelligence (AI), specifically, large language model (LLMs) are no doubt useful tools for programming and data analysis with python. However, we must be cautious when implementing these tools, as we would with any new tool. To maximize the learning outcomes of this tutorial series, we enocurage users to always think critically about code you are using, especially if it has been generated by AI.
Using AI for basic tasks (e.g., removing columns from a data frame) is an effective way to save time. However, we recommend you try to perform higher level planning and program design yourself without relying too heavily on AI.
It is also worth noting that information available to AI is in many cases out of date. Since python packages are updated regularly (sometime weekly), it is likely that code provided to you by AI may be out of date. We strongly recommend always consulting python documentation as the primary source of information when trying to solve a problem.
You can learn more about the PRF in the following publications:
This tutorial series is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You may share, adapt the content and may distribute your contributions under the same license (CC BY-NC-SA 4.0), but you have to give appropriate credit, and cannot use material for the commercial purposes.