# This notebook is intended to uncover the popular Citi bike routes

Popular routes are defined to be routes starting from a popular start station and end at a popular station. A popular station is defined to be a station having top rides either started at or ended at.<br>
We are interested in popular routes between popular stations.<br>
The definition of popular is more restrictive since we are saying a station to be a popular one with the criteria of looking at if it has top departure rides or arrival rides rather than the sum of two types of rides.

## Import essential libraries

In [58]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import glob
import os

In [59]:
# Mount the drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [60]:
df = pd.read_csv('/content/drive/MyDrive/afterschool_projects/ny_bike/citi_sampled_data_2023_small.csv')

  df = pd.read_csv('/content/drive/MyDrive/afterschool_projects/ny_bike/citi_sampled_data_2023_small.csv')


In [61]:
df.shape

(35004, 20)

In [62]:
df.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual,ride_date,ride_year,ride_month,ride_day,is_weekend,duration,duration_minutes
0,866D455C8D1C2A22,electric_bike,2023-10-08 17:14:27.462,2023-10-08 17:25:21.035,West St & Liberty St,5184.08,W 16 St & The High Line,6233.05,40.711701,-74.014969,40.743349,-74.006818,member,2023-10-08,2023,10,8,True,0 days 00:10:53.573000,10.9
1,E3B6FF4A05756395,classic_bike,2023-08-06 08:51:43.861,2023-08-06 09:01:41.778,William St & Pine St,5065.12,N Moore St & Hudson St,5470.02,40.707179,-74.008873,40.719961,-74.008443,casual,2023-08-06,2023,8,6,True,0 days 00:09:57.917000,10.0
2,03B60B94F91D48B1,electric_bike,2023-10-22 22:29:25.387,2023-10-22 22:47:47.328,E 53 St & 3 Ave,6617.02,LaGuardia Pl & W 3 St,5721.14,40.757457,-73.969121,40.72917,-73.998102,member,2023-10-22,2023,10,22,True,0 days 00:18:21.941000,18.4
3,04AB87E8DFAC305F,electric_bike,2023-04-19 15:49:00.015,2023-04-19 15:56:09.629,Amsterdam Ave & W 66 St,7149.05,3 Ave & E 62 St,6762.04,40.774397,-73.984702,40.763126,-73.965269,member,2023-04-19,2023,4,19,False,0 days 00:07:09.614000,7.2
4,3E02C4169CD1CA5E,classic_bike,2023-04-25 15:45:45.132,2023-04-25 16:25:46.709,University Pl & E 8 St,5755.14,Washington Pl & 6 Ave,5838.09,40.731437,-73.994903,40.732241,-74.000264,member,2023-04-25,2023,4,25,False,0 days 00:40:01.577000,40.0


## filter the dataset for top stations


In [None]:
dff = df.groupby(['start_station_name', 'end_station_name', 'ride_month']).agg(route_cnt=('ride_id', 'count')).reset_index()

In [None]:
start_station = dff.groupby('start_station_name').agg(station_cnt=('route_cnt', 'sum')).reset_index()

In [None]:
start_station = start_station.sort_values(by='station_cnt', ascending=False).head(10)

In [None]:
end_station = dff.groupby('end_station_name').agg(station_cnt=('route_cnt', 'sum')).reset_index()

In [None]:
end_station = end_station.sort_values(by='station_cnt', ascending=False).head(10)

In [None]:
filtered_dff = dff[
    (dff['start_station_name'].isin(start_station['start_station_name'])) &
    (dff['end_station_name'].isin(end_station['end_station_name']))
]

In [None]:
filtered_dff

Unnamed: 0,start_station_name,end_station_name,ride_month,route_cnt
533,1 Ave & E 68 St,University Pl & E 14 St,5,1
538,1 Ave & E 68 St,W 21 St & 6 Ave,9,1
539,1 Ave & E 68 St,W 31 St & 7 Ave,6,1
932,11 Ave & W 41 St,Broadway & W 58 St,4,1
962,11 Ave & W 41 St,W 21 St & 6 Ave,12,1
...,...,...,...,...
33792,West St & Chambers St,West St & Chambers St,7,1
33793,West St & Chambers St,West St & Chambers St,9,1
33794,West St & Chambers St,West St & Chambers St,11,1
33795,West St & Chambers St,West St & Liberty St,4,2


## Export the sampled Data

In [None]:
# popular_route.to_csv('/content/drive/MyDrive/afterschool_projects/ny_bike/popular_route.csv', index=False)

In [None]:
filtered_dff.to_csv('/content/drive/MyDrive/afterschool_projects/ny_bike/popular_routes.csv', index=False)

In [None]:
filtered_dff.shape

(67, 4)

## Most Popular Routes - general sense

In [63]:
df2 = df.groupby(['start_station_name', 'end_station_name']).agg(station_cnt=('ride_id', 'count')).reset_index()

In [65]:
df2 = df2.sort_values(by='station_cnt', ascending=False)

In [70]:
df2 = df2.head(20)

In [71]:
df2.to_csv('/content/drive/MyDrive/afterschool_projects/ny_bike/popular_routes_general.csv', index=False)