Skip to content

Instantly share code, notes, and snippets.

@satsin06
Last active August 28, 2023 07:02
Show Gist options
  • Save satsin06/e3bd59a5b69df7dc5b38cf848b8f5b8f to your computer and use it in GitHub Desktop.
Save satsin06/e3bd59a5b69df7dc5b38cf848b8f5b8f to your computer and use it in GitHub Desktop.
Google Summer of Code'23 Final Work

Google Summer of Code 2023 Final Work

gsoc retriever

Proposed Objectives

  • Retrieve and Predict Deepforest Boxes on NEON API
  • Deepforest prediction on NEON RGB Airborne Observation Platform (AOP) Data
  • Performing geo spatial join between Vegetation Structure (VST) Data and RGB prediction
  • Extracting RGB training data
  • Extracting LiDAR training data
  • Extracting Hyperspectral imaging (HSI) data

Objectives Summary

The project aims to build a new function that populates the NEON field and remote sensing data for deepforest which will help us build a baseline classification for tree health status. NEON DATA API provides access to open-access ecological data where neonwranglerpy package helps in retrieving the data, cleaning, and providing a ready format for ecological analysis for researchers. With the help of this project, we will be able to train multi-class classification in deepForest. By this, we will improve the automated alignment of deepForest boxes to NEON individual tree coordinates, provided as stem location.

Objectives Completed

Retrieve and Predict Deepforest Boxes on NEON API

In the very begining of the project, I accomplished the following tasks:

  • Function to retrieve data from NEON API
  • Find the siteCodes with siteCode = “OSBS”
  • Pick one plot ID
  • Filter Data
  • Use deepforest boxes for trees
  • Extract from RGB data
  • Retrieve field data using the retrive_vst_data function
  • Collect RGB tiles
  • Download RGB data for all tiles and generate boxes

With the help of above points I was able to create an independent function that retrives data from an API and predict boxes using Deepforest predict_image model

Deepforest prediction on NEON RGB Airborne Observation Platform (AOP) Data

For deepforest prediction on NEON, we used retrive_aop_data which requires field data i.e., retrieve_vst_data for creating predict_aop_data function. Pipeline for achiving this function is listed below:

  • Retrieve Neon Vegetation Structure (VST) Data
  • Filter data based on the site
  • Retrieve AOP Data
  • Find image path
  • Open the raster file
  • Get the bounding box coordinates and RAS Extension
  • Mapping and iterating VST data over the rows
  • Predict the model using the predict_image function

This function will further help us in performing geo spatial join between Vegetation Structure (VST) Data and RGB prediction

Performing geo spatial join between Vegetation Structure (VST) Data and RGB prediction

After prediction on AOP data, we will get each trees in the prediction dataframe, where we will convert boxes to the shapefile using boxes_to_shapefile function.

  • Convert prediction into geo dataframe using boxes_to_shapefile function
  • Concatinating all the geo dataframe in one dataframe
  • Preform spatial join between the predicted geo data frame and field data
  • Cleaning merged data based on duplicated coordinates

Using the above function, we will extract the RGB training data for deeplearning classification

Extracting RGB training data

For RGB deep learning classification, we updated the function as such that it will extract all the trees images based on the plant status, by following the below steps:

  • Mapping of Canopy Position
  • Masking duplicated entritries based on coordinates
  • Cleaning the duplicated prediction based on stem diameter, height and canopy postion
  • Iteratating all the rows and creating boxes based on the geometry
  • Saving the images

With the help of above steps we were able to save all the tree RGB images for further classifcation

Extracting LiDAR training data

For LiDAR deep learning classification, we created a new function as such that it will extract all the tree data in numpy array, by following the below steps:

  • Retrieve LiDAR data using retrieve_aop_data where dpID="DP1.30003.001"
  • Check pattern of the file name using unique tiles
  • Read file using laspy
  • Filtering the data based on coordinates
  • Saving the numpy array in npy format

With the help of above steps we were able to get all the tree npy files for further classifcation

Future Work

Extracting Hyperspectral imaging (HSI) data

The projects main goal is to create machine learning model, than can check the Tree's health and mortality rate based on the deeplearning model created above. After achiving all the above goal we can create a machine learning model futher that can run on any AOP data provided.

Tutorials and Blogs

During the GSoC Period, my mentor Henry Senyondo and Sergio Marconi motivated me to write blogs and tutorials explaning my work in this project. Below are the lists of all my blogs:

Description Blog Link
Creating Python Package From Scratch — Community Bonding Period [GSoC’23 NumFOCUS] Link
Retrieve and Predict Deepforest Boxes on NEON API — 1st Biweekly Blog GSoC’23 [NumFOCUS] Link
Predict boxes on GeoDataFrame— 2nd Biweekly Blog GSoC’23 [NumFOCUS] Link
Predict Airborne Observation Platform (AOP) Data — 3rdBiweekly Blog GSoC’23 [NumFOCUS] Link
Extract Training data — 4th Biweekly Blog GSoC’23 [NumFOCUS] Link
Cleaning the Merged Data and Testing with Pytest — 5th Biweekly Blog GSoC’23 [NumFOCUS] Link
Extracting LiDAR data and finalising extract_training_data function — 6th Biweekly Blog GSoC’23 [NumFOCUS] Link

For me, the last three months have been an incredible learning experience, and I am grateful for everything I've learned. My programming skills improved significantly through my participation in GSoC. At first, I had only basic knowledge of Python. However, during the community bonding stage, I had the chance to develop Python packages.

This project not only helped me better understand Python but also introduced me to popular packages like Numpy, Pandas, Geopandas, Rasterio, and Shapely, which are widely used in programming and geo-scientific programming. Furthermore, I learned how to write clear and informative docstrings to explain the purpose of functions and methods. I also gained insights into creating effective test cases to ensure the reliability of my code. Overall, GSoC gave me practical, hands-on experience that significantly enhanced my programming skills. I'm excited to continue using these skills in future projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment