2. Using deforest on the command line¶
2.1. Downloading and Preprocessing Data¶
Instructions for downloaded and pre-processing data from Sentinel-2 can be found at sen2mosaic.
Note
SMFM deforest is designed for analysis of dense time series of Sentinel-2 data, which will require access to subsantial file storage and processing power. We recommend use of a cloud platform, where data do not have to downloaded of pre-processed on a local machine. See, for example, the ‘Data and Information Access Services’ (DIAS) platforms that provide centralised access to Copernicus data.
2.2. Calibrating SMFM deforest¶
SMFM deforest will work best where local data are used for calibration. This is performed in two steps: (i) Extracting training data, and (ii) Training the classifier.
2.2.1. Extracting training data¶
Training pixels are defined by either a geotiff image or shapefile defining locations of stable forest and stable non-forest. Classification features are derived for a series of Sentinel-2 images, and a random selection of pixel values extracted for forest and non-forest classes. The output is an array of feature values used to train a classifier to predict probabilities of forest and non-forest.
usage: extract.py [-h] [-te XMIN YMIN XMAX YMAX] [-e EPSG] [-res N]
[-t SHP/TIF] [-f [VALS [VALS ...]]] [-nf [VALS [VALS ...]]]
[-fn NAME] [-l 1C/2A] [-mi N] [-mp N] [-o DIR] [-n NAME]
[-p N] [-v]
[FILES [FILES ...]]
Extract indices from Sentinel-2 data to train a classifier of forest cover.
Returns a numpy .npz file containing pixel values for forest/nonforest.
positional arguments:
FILES Sentinel 2 input files (level 2A) in .SAFE format.
Specify one or more valid Sentinel-2 .SAFE, a
directory containing .SAFE files, multiple tiles
through wildcards (e.g. *.SAFE/GRANULE/*), or a text
file listing files. Defaults to processing all tiles
in current working directory.
required arguments:
-te XMIN YMIN XMAX YMAX, --target_extent XMIN YMIN XMAX YMAX
Extent of output image tile, in format <xmin, ymin,
xmax, ymax>.
-e EPSG, --epsg EPSG EPSG code for output image tile CRS. This must be UTM.
Find the EPSG code of your output CRS as
https://www.epsg-registry.org/.
-res N, --resolution N
Specify a resolution to output.
-t SHP/TIF, --training_data SHP/TIF
Path to training data geotiff/shapefile.
-f [VALS [VALS ...]], --forest_values [VALS [VALS ...]]
Values indicating forest in the training GeoTiff or
shapefile
-nf [VALS [VALS ...]], --nonforest_values [VALS [VALS ...]]
Values indicating nonforest in the training GeoTiff or
shapefile
optional arguments:
-fn NAME, --field_name NAME
Shapefile attribute name to search for training data
polygons. Defaults to all polygons. Required where
inputting a shapefile as training_data.
-l 1C/2A, --level 1C/2A
Input image processing level, '1C' or '2A'. Defaults
to '2A'.
-mi N, --max_images N
Maximum number of input tiles to extract data from.
Defaults to all valid tiles.
-mp N, --max_pixels N
Maximum number of pixels to extract from each image
per class. Defaults to 5000.
-o DIR, --output_dir DIR
Output directory. Defaults to current working
directory.
-n NAME, --output_name NAME
Specify a string to precede output filename. Defaults
to 'S2'.
-p N, --n_processes N
Maximum number of tiles to process in paralell. Bear
in mind that more processes will require more memory.
Defaults to 1.
-v, --verbose Make script verbose.
For example, for a directory containing Sentinel-2 data from tile 36KWD
(~/S2_data/
), specifying an appropriate bounding box and resolution (-r
, -e
, -te
), training data contained in a geotiff (~/training_data.tif
) coded as stable forest (-f 1
) and stable non-forest (-nf 2
), using 10 processes (-p 10
):
deforest extract ~/S2_data/ -r 20 -e 32736 -te 399980 7790200 609780 7900000 -t ~/training_data.tif --max_images 100 -f 1 -nf 2 -v -p 10
2.2.2. Training the model¶
SMFM deforest uses a Random Forest model to predict the probability of forest in each input Sentinel-2 image. This model can be calibrated using training data from the region of interest.
The training function takes a series of labelled forest and non-forest pixels (see ‘Extracting training data’) as input and returns a calibrated model (a .pkl
file). The process also returns a series of plots that can b eused to assess model performance.
usage: train.py [-h] [-m N] [-n NAME] [-o PATH] DATA
Ingest Sentinel-2 data to train a random forest model to predict the
probability of a pixel being forested. Returns a calibrated model and QA
graphics.
positional arguments:
DATA Path to .npz file containing training data, generated
by extract.py
optional arguments:
-m N, --max_samples N
Maximum number of samples to train the classifier
with. Smaller sample sizes will run faster and produce
a simpler model, possibly at the cost of predictive
power. Defaults to 100,000 points.
-n NAME, --output_name NAME
Specify a string to precede output filename. Defaults
to name of input training data.
-o PATH, --output_dir PATH
Directory to save the classifier. Defaults to the
current working directory.
For example, using the output of deforest extract
:
deforest train S2_training_data.npz
2.3. Classification and change detection¶
SMFM deforest uses a two-step process to produce change maps: (i) classification of individual Sentinel-2 images, and (ii) change detection.
2.3.1. Image classification¶
Sentinel-2 images are classified into a continuous probability of forest in each non-masked pixel. Inputs can be either Sentinel-1 L1C data or L2A data (preferable). The output is a set of geotiffs numbered 0 - 100%, with a set extent, resolution and coordinate reference system (UTM).
usage: classify.py [-h] [-te XMIN YMIN XMAX YMAX] [-e EPSG] [-r N] [-m PKL]
[-l 1C/2A] [-p N] [-n NAME] [-o DIR]
[FILES [FILES ...]]
Process Sentinel-2 to match a predefined CRS and classify each to show a
probability of forest (0-100%) in each pixel.
required arguments:
-te XMIN YMIN XMAX YMAX, --target_extent XMIN YMIN XMAX YMAX
Extent of output image tile, in format <xmin, ymin,
xmax, ymax>.
-e EPSG, --epsg EPSG EPSG code for output image tile CRS. This must be UTM.
Find the EPSG code of your output CRS as
https://www.epsg-registry.org/.
-r N, --resolution N Specify a resolution to output.
optional arguments:
FILES Sentinel 2 input files in .SAFE format. Specify one or
more valid Sentinel-2 .SAFE files, a directory
containing .SAFE files, or multiple granules through
wildcards (e.g. *.SAFE/GRANULE/*). Defaults to
processing all granules in current working directory.
-m PKL, --model PKL Path to .pkl model, produced with train.py. Defaults
to a test model, trained on data from Chimanimani in
Mozambique.
-l 1C/2A, --level 1C/2A
Processing level to use, either '1C' or '2A'. Defaults
to level 2A.
-p N, --n_processes N
Maximum number of tiles to process in paralell. Bear
in mind that more processes will require more memory.
Defaults to 1.
-n NAME, --output_name NAME
Specify a string to precede output filename. Defaults
to 'S2'.
-o DIR, --output_dir DIR
Optionally specify an output directory
For example, to classify probability of forest in all images in a directory containing Sentinel-2 data from tile 36KWD
(~/S2_data/
), specifying an appropriate bounding box and resolution (-r
, -e
, -te
), and a calibrated model named S2_model.pkl
:
deforest classify ~/S2_data/ -r 20 -e 32736 -te 399980 7790200 609780 7900000 -m S2_model.pkl
2.3.2. Change detection¶
The final step is to combine the time series of forest probability images under a Bayesian framework to detect changes in forest cover. The output is two geotiffs, one providing the year of change, the other an early warning of pixels flagged as possible changes at the final time step.
usage: change.py [-h] [-t N] [-b N] [-o DIR] [-n NAME] FILES [FILES ...]
Process probability maps to generate a map of deforestation year and warning
estimates of upcoming events.
required arguments:
FILES A list of files output by classify.py, specifying
multiple files using wildcards.
optional arguments:
-t N, --threshold N Set a threshold probability to identify deforestation
(between 0 and 1). High thresholds are more strict in
the identification of deforestation. Defaults to 0.99.
-b N, --block_weight N
Set a block weighting threshold to limit the range of
forest/nonforest probabilities. Set to 0 for no block-
weighting. Parameter cannot be set higher than 0.5.
-o DIR, --output_dir DIR
Optionally specify an output directory. If nothing
specified, downloads will output to the present
working directory, given a standard filename.
-n NAME, --output_name NAME
Optionally specify a string to precede output
filename. Defaults to the same as input files.
For example, using default change detection parameters and a set of classified images from classify.py
:
deforest change ./*.tif