Clustering Algorithm Performance on NY City Transit Data

Link to Jupyter notebook; https://github.com/cdtalley/Data-Science-Portfolio/blob/main/Unsupervised_Learning_Capstone_New_York_City_Bus_Data.ipynb

This dataset is from the NYC MTA buses data stream service. It shows in 10 minute increments the buses location, route, bus stop and bus schedule. We can cluster this data for a variety of reasons; such as analyzing traffic flow, seeing which bus stops are most frequented, or which bus routes are most congested. I decided to reduce my datas dimensionality using PCA and t-SNE, and apply clustering algorithms KMeans and DBSCANS in order to cluster the bus data. We then tested the effectiveness of our clustering algorithms and feature reduction with statistical testing.

Link to dataset; https://www.kaggle.com/stoney71/new-york-city-transport-statistics?select=mta_1706.csv

Clustering Algorithm Performance on NY City Transit Data

Published by Drake Talley on January 5, 2021January 5, 2021

Portfolio

Corporate Bankruptcy Prediction

Portfolio

NJ Transit + Amtrak Rail Performance Business Solution: Predicting Delays to Improve Rider Satisfaction

Portfolio

Customer Acquisition: Predicting Telecom Customer Churn

Clustering Algorithm Performance on NY City Transit Data

Published by Drake Talley on January 5, 2021January 5, 2021

Related Posts

Portfolio

Corporate Bankruptcy Prediction

Portfolio

NJ Transit + Amtrak Rail Performance Business Solution: Predicting Delays to Improve Rider Satisfaction

Portfolio

Customer Acquisition: Predicting Telecom Customer Churn