Nathan Torento

Data Scientist

Data Scientist based in SF with 1-2 years of marketing-focused data science and programming experience at a tech startup and a real estate company. I have a B.S. in Data Science and am completing an M.S. in Business Analytics. Dean's Lister and Published Research Author.

I believe in solving problems and making the world a better place through data and hard work centered on compassion. Proud San Franciscan and Filipino. Feel free to reach out to me at:

Portfolio Selection

Python, Machine Learning, Statistical Analysis, Data Visualization, Web Dev

NoteNotes – Music Library Management Web App

A fully functional end-to-end prototype of a personalized music library management application designed to simplify and centralize the management of song collections, setlists, and music resources. All original design and code by me, songs sourced using the Spotify API.

  • Programming Languages: Python (Dash by Plotly)
  • Tech Stack: Github, Spotify API
  • Skills: Web Development, UI/UX Design, API Integration, Full Stack Development, Version Control (Git), Agile Development, Project Management
  • Outputs: Repository (GitHub)

2022 SF Sidewalk Cleanliness Report Replication and Extended Analysis

A replication and extension of the results to the “2022 Street and Sidewalk Report” conducted by the Department of Public Works (DPW) across San Francisco.

  • Programming Languages: Python
  • Tech Stack: Github, Colab, Jupyter Notebook
  • Skills: NLP, Data Wrangling, Exploratory Data Analysis (EDA),Statistical Analysis, Geospatial Analysis, Replication Study, Machine Learning, Regression Analysis, Model Optimization, Model Interpretation, Data Visualization
  • Outputs: Report (Google Doc), Code (Github)

MBTI Classification with NLP

A text classification model that aims to determine the MBTI type of a user based on an example post.

HiveQL Netflix Demo

A presentation with a demo of queries on Hue that answer Big Data questions using HiveQL on a dataset expanding on the 2023 Netflix Engagement Report, focusing on investigating relationships between show metrics, yearly popularity trends, and high-level questions about viewer interests.

  • Programming Languages: SQL (HiveQL)
  • Tech Stack: Hue, Hadoop
  • Skills: Data wrangling, Data analysis, Big Data Querying, Data Interpretation, Report Writing, Presentation skills, Demo Creation and Presentation
  • Outputs: HTML File of Hue Notebook, Presentation (

Unraveling the Complexity: The Nexus Between Homelessness and Housing Prices in the San Francisco Bay Area and Throughout State of California

An investigation of the relationship between homelessness and housing prices across three levels: San Francisco, Bay Area, and California. Employs Machine Learning (ML), Artifical Intelligence (AI), and statistical techniques for exploratory and predictive analytics (ex: regression models and NLP for text extraction).

SQL for Data Science Course and Certification

99.82% final grade on a 4-6 month long 4-course series on "Learn SQL Basics for Data Science Specialization". Through hands-on projects and real-world case studies, I strengthened my foundation in advanced SQL techniques, data analysis, data visualization, decision-making skills, distributed computing with Spark SQL, AB testing.

  • Programming Languages: SQL, SQLite, Spark SQL, Python
  • Tech Stack: Apache Spark, Databricks, Delta Lake, Mode
  • Skills: SQL Querying, SQL Data Retrieval, Data Wrangling, Data Quality Assurance, Data Analysis, AB Testing, Distributed Computing, Engineering Data Pipelines, Partitioning, Project Development and Proposal Writing, Data Storytelling
  • Outputs: Certificate Verification

Running a Multi Node Hadoop Cluster on a Mac with Docker to Test the HDFS’s Fault Tolerance

This article published on Medium outlines a tutorial on creating a multi-node Hadoop cluster containerized with Docker, as well as testing the fault tolerance of its HDFS under the CAP Theorem for distributed systems.

  • Programming Languages: Bash
  • Tech Stack: Docker, Hadoop, HDFS, Bash Terminal
  • Skills: Distributed Systems Understanding, Fault Tolerance Testing, Data Replication, HDFS Architecture Comprehension, Containerization with Docker, Experimental Design and Analysis, System Monitoring and Reporting, Technical Documentation
  • Outputs: Medium Article