- Name: Satyam Sinha
- Organisation: NumFocus
- Sub-Organisation: Data Retriever
- Project: neonwranglerpy- Tree health and mortality from NEON data
- Mentors: Henry Senyondo, Sergio Marconi , Ethan White
- Pull Requests: Link to all my contribution
- Retrieve and Predict Deepforest Boxes on NEON API
- Deepforest prediction on NEON RGB Airborne Observation Platform (AOP) Data
- Performing geo spatial join between Vegetation Structure (VST) Data and RGB prediction
- Extracting RGB training data
- Extracting LiDAR training data
- Extracting Hyperspectral imaging (HSI) data
The project aims to build a new function that populates the NEON field and remote sensing data for deepforest which will help us build a baseline classification for tree health status. NEON DATA API provides access to open-access ecological data where neonwranglerpy package helps in retrieving the data, cleaning, and providing a ready format for ecological analysis for researchers. With the help of this project, we will be able to train multi-class classification in deepForest. By this, we will improve the automated alignment of deepForest boxes to NEON individual tree coordinates, provided as stem location.
In the very begining of the project, I accomplished the following tasks:
- Function to retrieve data from NEON API
- Find the siteCodes with siteCode = “OSBS”
- Pick one plot ID
- Filter Data
- Use deepforest boxes for trees
- Extract from RGB data
- Retrieve field data using the retrive_vst_data function
- Collect RGB tiles
- Download RGB data for all tiles and generate boxes
With the help of above points I was able to create an independent function that retrives data from an API and predict boxes using Deepforest predict_image model
For deepforest prediction on NEON, we used retrive_aop_data
which requires field data i.e., retrieve_vst_data
for creating predict_aop_data
function. Pipeline for achiving this function is listed below:
- Retrieve Neon Vegetation Structure (VST) Data
- Filter data based on the site
- Retrieve AOP Data
- Find image path
- Open the raster file
- Get the bounding box coordinates and RAS Extension
- Mapping and iterating VST data over the rows
- Predict the model using the predict_image function
This function will further help us in performing geo spatial join between Vegetation Structure (VST) Data and RGB prediction
After prediction on AOP data, we will get each trees in the prediction dataframe, where we will convert boxes to the shapefile using boxes_to_shapefile
function.
- Convert prediction into geo dataframe using
boxes_to_shapefile
function - Concatinating all the geo dataframe in one dataframe
- Preform spatial join between the predicted geo data frame and field data
- Cleaning merged data based on duplicated coordinates
Using the above function, we will extract the RGB training data for deeplearning classification
For RGB deep learning classification, we updated the function as such that it will extract all the trees images based on the plant status, by following the below steps:
- Mapping of Canopy Position
- Masking duplicated entritries based on coordinates
- Cleaning the duplicated prediction based on stem diameter, height and canopy postion
- Iteratating all the rows and creating boxes based on the geometry
- Saving the images
With the help of above steps we were able to save all the tree RGB images for further classifcation
For LiDAR deep learning classification, we created a new function as such that it will extract all the tree data in numpy array, by following the below steps:
- Retrieve LiDAR data using retrieve_aop_data where
dpID="DP1.30003.001"
- Check pattern of the file name using unique tiles
- Read file using
laspy
- Filtering the data based on coordinates
- Saving the numpy array in
npy
format
With the help of above steps we were able to get all the tree
npy
files for further classifcation
The projects main goal is to create machine learning model, than can check the Tree's health and mortality rate based on the deeplearning model created above. After achiving all the above goal we can create a machine learning model futher that can run on any AOP data provided.
During the GSoC Period, my mentor Henry Senyondo and Sergio Marconi motivated me to write blogs and tutorials explaning my work in this project. Below are the lists of all my blogs:
Description | Blog Link |
---|---|
Creating Python Package From Scratch — Community Bonding Period [GSoC’23 NumFOCUS] | Link |
Retrieve and Predict Deepforest Boxes on NEON API — 1st Biweekly Blog GSoC’23 [NumFOCUS] | Link |
Predict boxes on GeoDataFrame— 2nd Biweekly Blog GSoC’23 [NumFOCUS] | Link |
Predict Airborne Observation Platform (AOP) Data — 3rdBiweekly Blog GSoC’23 [NumFOCUS] | Link |
Extract Training data — 4th Biweekly Blog GSoC’23 [NumFOCUS] | Link |
Cleaning the Merged Data and Testing with Pytest — 5th Biweekly Blog GSoC’23 [NumFOCUS] | Link |
Extracting LiDAR data and finalising extract_training_data function — 6th Biweekly Blog GSoC’23 [NumFOCUS] | Link |
For me, the last three months have been an incredible learning experience, and I am grateful for everything I've learned. My programming skills improved significantly through my participation in GSoC. At first, I had only basic knowledge of Python. However, during the community bonding stage, I had the chance to develop Python packages.
This project not only helped me better understand Python but also introduced me to popular packages like Numpy, Pandas, Geopandas, Rasterio, and Shapely, which are widely used in programming and geo-scientific programming. Furthermore, I learned how to write clear and informative docstrings to explain the purpose of functions and methods. I also gained insights into creating effective test cases to ensure the reliability of my code. Overall, GSoC gave me practical, hands-on experience that significantly enhanced my programming skills. I'm excited to continue using these skills in future projects.