NoteNotes – Music Library Management Web App
A fully functional end-to-end prototype of a personalized music library management application designed to simplify and centralize the management of song collections, setlists, and music resources. All original design and code by me, songs sourced using the Spotify API.
- Programming Languages: Python (Dash by Plotly)
- Tech Stack: Github, Spotify API
- Skills: Web Development, UI/UX Design, API Integration, Full Stack Development, Version Control (Git), Agile Development, Project Management
- Outputs:
Repository (GitHub)
2022 SF Sidewalk Cleanliness Report Replication and Extended Analysis
A replication and extension of the results to the “2022 Street and Sidewalk Report” conducted by the Department of Public Works (DPW) across San Francisco.
- Programming Languages: Python
- Tech Stack: Github, Colab, Jupyter Notebook
- Skills: NLP, Data Wrangling, Exploratory Data Analysis (EDA),Statistical Analysis, Geospatial Analysis, Replication Study, Machine Learning, Regression Analysis, Model Optimization, Model Interpretation, Data Visualization
- Outputs: Report (Google Doc),
Code (Github)
MBTI Classification with NLP
A text classification model that aims to determine the MBTI type of a user based on an example post.
- Programming Languages: Python
- Tech Stack: Github, Colab, Jupyter Notebook
- Skills: NLP, Data wrangling, Data analysis, Data visualization, Machine Learning, Model Optimization
- Outputs:
Report (Google Doc),
Code (Google Colab),
Repository (Github),
Presentation (Beautiful.ai),
Webapp (WIP)
HiveQL Netflix Demo
A presentation with a demo of queries on Hue that answer Big Data questions using HiveQL on a dataset expanding on the 2023 Netflix Engagement Report, focusing on investigating relationships between show metrics, yearly popularity trends, and high-level questions about viewer interests.
- Programming Languages: SQL (HiveQL)
- Tech Stack: Hue, Hadoop
- Skills: Data wrangling, Data analysis, Big Data Querying, Data Interpretation, Report Writing, Presentation skills, Demo Creation and Presentation
- Outputs:
HTML File of Hue Notebook,
Presentation (Beautiful.ai)
Unraveling the Complexity: The Nexus Between Homelessness and Housing Prices in the San Francisco Bay Area and Throughout State of California
An investigation of the relationship between homelessness and housing prices across three levels: San Francisco, Bay Area, and California. Employs Machine Learning (ML), Artifical Intelligence (AI), and statistical techniques for exploratory and predictive analytics (ex: regression models and NLP for text extraction).
- Programming Languages: Python
- Tech Stack: Github, Colab, Jupyter Notebook
- Skills: Data Collection, Data Cleaning and Preprocessing, Data Wrangling, Exploratory Data Analysis (EDA), Statistical Analysis, Machine Learning, Model Optimization, Dimensionality Reduction, Model Evaluation, Model Interpretation, Data Visualization
- Outputs:
Report (Google Doc),
Code (Google Colab),
Repository (Github),
Presentation (Beautiful.ai)
SQL for Data Science Course and Certification
99.82% final grade on a 4-6 month long 4-course series on "Learn SQL Basics for Data Science Specialization". Through hands-on projects and real-world case studies, I strengthened my foundation in advanced SQL techniques, data analysis, data visualization, decision-making skills, distributed computing with Spark SQL, AB testing.
- Programming Languages: SQL, SQLite, Spark SQL, Python
- Tech Stack: Apache Spark, Databricks, Delta Lake, Mode
- Skills: SQL Querying, SQL Data Retrieval, Data Wrangling, Data Quality Assurance, Data Analysis, AB Testing, Distributed Computing, Engineering Data Pipelines, Partitioning, Project Development and Proposal Writing, Data Storytelling
- Outputs:
Certificate Verification
Running a Multi Node Hadoop Cluster on a Mac with Docker to Test the HDFS’s Fault Tolerance
This article published on Medium outlines a tutorial on creating a multi-node Hadoop cluster containerized with Docker, as well as testing the fault tolerance of its HDFS under the CAP Theorem for distributed systems.
- Programming Languages: Bash
- Tech Stack: Docker, Hadoop, HDFS, Bash Terminal
- Skills: Distributed Systems Understanding, Fault Tolerance Testing, Data Replication, HDFS Architecture Comprehension, Containerization with Docker, Experimental Design and Analysis, System Monitoring and Reporting, Technical Documentation
- Outputs:
Medium Article