✍ Kang Zhi Yong
🛠️ 🇲🇾 Data Enthusiast | About Myself

In 2021, I graduated with a bachelor's degree in biological sciences, equipped with only limited programming knowledge and R and SPSS as my primary tools for statistical analysis. However, my journey took an exciting turn when I embarked on a master's research program in marine biogeochemistry, just a year and a half ago. This new academic pursuit demanded the analysis of vast datasets from satellite observations and autonomous robotic platforms in the ocean. 🌊 🌏 🛰

Since then, I've been on a gradual learning curve, expanding my expertise in various domains, including shell scripting, distributed computing, cloud computing, and even venturing into the realm of machine learning. The path has not been without its challenges, particularly when tackling the intricate mathematical expressions inherent to machine learning concepts. My transformation from a biology graduate to a data enthusiast has been both challenging and rewarding, as I continue to bridge the gap between my biological background and the burgeoning world of data science and engineering. 🤖 📊

Recent Work

Top 100 Latest Repository in Github

An end-to-end data engineering project to ingest data from Github API to Amazon S3 bucket. For more info, visit my Github.

ETL using Apache Airflow

Ingestion of flood monitoring data through Open API developed by Malaysia government with 15 minutes interval, either in local or cloud environment. For more info, visit my Github.

Noting down my thought

I wrote down my thought regarding the project on medium to engage with the communities and seek for advices. For more info, check out my Medium article.

Chlorophyll-a anomalies in Malaysian water

A side project within my knowledge domain to visualise the chlorophyll anomalies within Malaysian water, a proxy to reflect health of environment.For more info, visit my Github..

IOT data streaming

Developing an automated feeding machine for aquaculture industry. Currently still developing. Check out the Github for update, visit my Github.

Customer Segmentation

Dedicated to perform customer segmentation using K-Mean Clustering method using the data published on Kaggle. For more info, visit my Github..

Wine Origin Classification

Dedicated to perform supervised learning on the classification of wine origin dataset published on UC Irvine Machine Learning Repository using DesiconTreeClassifier and RandomForestClassifier. For more info, visit my Github..

Ocean Hackathon 2022-LSTM Model

Dedicated to perform time-series modelling using LSTM framework offered by keras tensorflow. Trained the data with satellite observation for Coast of Sabah. For more info, visit my Github..

Ocean Hackathon 2022-Flood Alert App

Dedicated to perform visualisation of historial flooding hotspot and rainfall pattern in Kuching city, Sarawak for prototyping of app in Figma within 48 hours. For more info, visit my Github..

Data Engineering Zoomcamp Cohort 2024

An end-to-end pipeline for visualisng sales of the Malaysian home grown tech coffee start up. To check out more on my notes and portfolio along the zoomcamp, visit my Github..