Link to Jupyter notebook; https://github.com/cdtalley/Data-Science-Portfolio/blob/main/Unsupervised_Learning_Capstone_New_York_City_Bus_Data.ipynb
This dataset is from the NYC MTA buses data stream service. It shows in 10 minute increments the buses location, route, bus stop and bus schedule. We can cluster this data for a variety of reasons; such as analyzing traffic flow, seeing which bus stops are most frequented, or which bus routes are most congested. I decided to reduce my datas dimensionality using PCA and t-SNE, and apply clustering algorithms KMeans and DBSCANS in order to cluster the bus data. We then tested the effectiveness of our clustering algorithms and feature reduction with statistical testing.
Link to dataset; https://www.kaggle.com/stoney71/new-york-city-transport-statistics?select=mta_1706.csv