satsin06/GSoC2022.md

## GSoC2022.md

      
    Raw
  

              GSoC2022.md
            
          
    Google Summer of Code 2023 Final Work


Name: Satyam Sinha
Organisation: NumFocus
Sub-Organisation: Data Retriever
Project: neonwranglerpy- Tree health and mortality from NEON data 
Mentors:  Henry Senyondo, Sergio Marconi , Ethan White
Pull Requests: Link to all my contribution


Proposed Objectives


Retrieve and Predict Deepforest Boxes on NEON API
Deepforest prediction on NEON RGB Airborne Observation Platform (AOP) Data
Performing geo spatial join between Vegetation Structure (VST) Data and RGB prediction
Extracting RGB training data
Extracting LiDAR training data
Extracting Hyperspectral imaging (HSI) data

Objectives Summary

The project aims to build a new function that populates the NEON field and remote sensing data for deepforest which will help us build a baseline classification for tree health status. NEON DATA API provides access to open-access ecological data where neonwranglerpy package helps in retrieving the data, cleaning, and providing a ready format for ecological analysis for researchers. With the help of this project, we will be able to train multi-class classification in deepForest. By this, we will improve the automated alignment of deepForest boxes to NEON individual tree coordinates, provided as stem location.
Objectives Completed

Retrieve and Predict Deepforest Boxes on NEON API

In the very begining of the project, I accomplished the following tasks:

Function to retrieve data from NEON API
Find the siteCodes with siteCode = “OSBS”
Pick one plot ID
Filter Data
Use deepforest boxes for trees
Extract from RGB data
Retrieve field data using the retrive_vst_data function
Collect RGB tiles
Download RGB data for all tiles and generate boxes


With the help of above points I was able to create an independent function that retrives data from an API and predict boxes using Deepforest predict_image model

Deepforest prediction on NEON RGB Airborne Observation Platform (AOP) Data

For deepforest prediction on NEON, we used retrive_aop_data which requires field data i.e., retrieve_vst_data for creating predict_aop_data function. Pipeline for achiving this function is listed below:

Retrieve Neon Vegetation Structure (VST) Data
Filter data based on the site
Retrieve AOP Data
Find image path
Open the raster file
Get the bounding box coordinates and RAS Extension
Mapping and iterating VST data over the rows
Predict the model using the predict_image function


This function will further help us in performing geo spatial join between Vegetation Structure (VST) Data and RGB prediction

Performing geo spatial join between Vegetation Structure (VST) Data and RGB prediction

After prediction on AOP data, we will get each trees in the prediction dataframe, where we will convert boxes to the shapefile using boxes_to_shapefile function.

Convert prediction into geo dataframe using boxes_to_shapefile function
Concatinating all the geo dataframe in one dataframe
Preform spatial join between the predicted geo data frame and field data
Cleaning merged data based on duplicated coordinates


Using the above function, we will extract the RGB training data for deeplearning classification

Extracting RGB training data

For RGB deep learning classification, we updated the function as such that it will extract all the trees images based on the plant status, by following the below steps:

Mapping of Canopy Position
Masking duplicated entritries based on coordinates
Cleaning the duplicated prediction based on stem diameter, height and canopy postion
Iteratating all the rows and creating boxes based on the geometry
Saving the images


With the help of above steps we were able to save all the tree RGB images for further classifcation

Extracting LiDAR training data

For LiDAR deep learning classification, we created a new function as such that it will extract all the tree data in numpy array, by following the below steps:

Retrieve LiDAR data using retrieve_aop_data where dpID="DP1.30003.001"
Check pattern of the file name using unique tiles
Read file using laspy
Filtering the data based on coordinates
Saving the numpy array in npy format


With the help of above steps we were able to get all the tree npy files for further classifcation

Future Work

Extracting Hyperspectral imaging (HSI) data

The projects main goal is to create machine learning model, than can check the Tree's health and mortality rate based on the deeplearning model created above. After achiving all the above goal we can create a machine learning model futher that can run on any AOP data provided.
Tutorials and Blogs

During the GSoC Period, my mentor Henry Senyondo and   Sergio Marconi motivated me to write blogs and tutorials explaning my work in this project. Below are the lists of all my blogs:


Description
Blog Link


Creating Python Package From Scratch — Community Bonding Period [GSoC’23 NumFOCUS]
Link


Retrieve and Predict Deepforest Boxes on NEON API — 1st Biweekly Blog GSoC’23 [NumFOCUS]
Link


Predict boxes on GeoDataFrame— 2nd Biweekly Blog GSoC’23 [NumFOCUS]
Link


Predict Airborne Observation Platform (AOP) Data — 3rdBiweekly Blog GSoC’23 [NumFOCUS]
Link


Extract Training data — 4th Biweekly Blog GSoC’23 [NumFOCUS]
Link


Cleaning the Merged Data and Testing with Pytest — 5th Biweekly Blog GSoC’23 [NumFOCUS]
Link 


Extracting LiDAR data and finalising extract_training_data function — 6th Biweekly Blog GSoC’23 [NumFOCUS]
Link 


For me, the last three months have been an incredible learning experience, and I am grateful for everything I've learned. My programming skills improved significantly through my participation in GSoC. At first, I had only basic knowledge of Python. However, during the community bonding stage, I had the chance to develop Python packages.
This project not only helped me better understand Python but also introduced me to popular packages like Numpy, Pandas, Geopandas, Rasterio, and Shapely, which are widely used in programming and geo-scientific programming. Furthermore, I learned how to write clear and informative docstrings to explain the purpose of functions and methods. I also gained insights into creating effective test cases to ensure the reliability of my code. Overall, GSoC gave me practical, hands-on experience that significantly enhanced my programming skills. I'm excited to continue using these skills in future projects.
Description	Blog Link
Creating Python Package From Scratch — Community Bonding Period [GSoC’23 NumFOCUS]	Link
Retrieve and Predict Deepforest Boxes on NEON API — 1st Biweekly Blog GSoC’23 [NumFOCUS]	Link
Predict boxes on GeoDataFrame— 2nd Biweekly Blog GSoC’23 [NumFOCUS]	Link
Predict Airborne Observation Platform (AOP) Data — 3rdBiweekly Blog GSoC’23 [NumFOCUS]	Link
Extract Training data — 4th Biweekly Blog GSoC’23 [NumFOCUS]	Link
Cleaning the Merged Data and Testing with Pytest — 5th Biweekly Blog GSoC’23 [NumFOCUS]	Link
Extracting LiDAR data and finalising extract_training_data function — 6th Biweekly Blog GSoC’23 [NumFOCUS]	Link