Cyclistic Case Study

6 minute read

In today’s data-driven world, the ability to analyze and interpret data is crucial for making informed decisions. As part of the Data Analytics Certificate program, this capstone project focuses on the Cyclistic bike-sharing dataset, which provides a rich source of information on bike usage patterns in a major metropolitan area. By leveraging this dataset, I aim to uncover insights into user behavior, identify trends, and provide actionable recommendations to enhance the bike-sharing experience. This project not only demonstrates the practical application of data analytics skills but also contributes to the growing field of sustainable urban transportation.

About the Company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geo-tracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and return to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships.

Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. To do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

Define the Problem

The main problem for the director of marketing and marketing analytics team is this: Design marketing strategies aimed at converting Cyclistic’s casual riders into annual members. There are three questions that will guide this future marketing program. For the scope of this project, I will answer the following questions:

How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?

By analyzing the data, we can identify broad patterns within the two groups. Understanding these differences will enable us to create more accurate customer profiles for each group. These insights will assist the marketing analytics team in designing high-quality, targeted marketing strategies to convert casual riders into members. For the Cyclistic executive team, these insights will help maximize the number of annual members and drive future growth for the company.

The Task

Analyze historical bike trip data to identify trends in how annual members and casual riders use Cyclistic bikes differently and create a data-driven digital media campaign to increase ridership.

Environment Setup

readr for loading CSV files
scales for better graph scales
tidyverse and dplyr for knitting file
conflicted to manage and set conflict repairs
ggplot2 for visuals
fontawesome & paletteer for a pretty notebook

Load Data from CSV

q1_2019 <- read_csv("CSV/Divvy_Trips_2019_Q1.csv")
q1_2020 <- read_csv("CSV/Divvy_Trips_2020_Q1.csv")

Clean and Combine

Verify Column Names

colnames(q1_2019)
colnames(q1_2020)

Rename Columns of ‘q1_2019.csv’ for Consistency

q1_2019 <- rename(q1_2019
 ,ride_id = trip_id
 ,rideable_type = bikeid
 ,started_at = start_time
 ,ended_at = end_time
 ,start_station_name = from_station_name
 ,start_station_id = from_station_id
 ,end_station_name = to_station_name
 ,end_station_id = to_station_id
 ,member_casual = usertype
 )

Inspect the Dataframes

str(q1_2019)
str(q1_2020)

Convert ride_id and rideable_type columns to CHR

q1_2019 <- mutate(q1_2019, ride_id = as.character(ride_id), rideable_type = as.character(rideable_type))

str(q1_2019)

Create and Clean New Table ‘all_trips’

Stack Quarter Dataframes into One

all_trips <- bind_rows(q1_2019, q1_2020)

Remove lat, long

all_trips <- all_trips %>%
  select(-c(start_lat, start_lng, end_lat, end_lng, tripduration))

Load ‘all_trips’ and Preview

#colnames(all_trips)
#nrow(all_trips)
#dim(all_trips)
#head(all_trips)
#str(all_trips)
summary(all_trips) #only one line needed

Clean ‘member_casual’ Column for Consistency

#table(all_trips$member_casual) #view table for memberships
all_trips <- all_trips %>%
  mutate(member_casual = recode(member_casual ,"Subscriber" = "member" ,"Customer" = "casual"))
table(all_trips$member_casual) #verify table update contains 2 variables

Split Date into Columns for Better Aggregation

all_trips$date <- as.Date(all_trips$started_at) #The default format is yyyy-mm-dd
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")

Add ‘ride_length’ Calculation and Make Numeric

all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)
#is.factor(all_trips$ride_length)
all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
is.numeric(all_trips$ride_length) #verify numeric

Remove the Rows of Bad Data

data frame include a few hundred entries when bikes were removed for service

all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]

Prepare Descriptive Analysis

Analyze Ride Lengths

#calculate all ride lengths
#mean(all_trips_v2$ride_length) #average (total ride length / rides)
#median(all_trips_v2$ride_length) #midpoint ride length
#max(all_trips_v2$ride_length) #longest ride
#min(all_trips_v2$ride_length) #shortest ride
# or more simply on one line
summary(all_trips_v2$ride_length)

Compare Casual and Members

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)

Correct Day of Week Order

all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN =mean)

Analyze Data by Type and Weekday

all_trips_v2 %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>% #creates weekday field using wday
    group_by(member_casual, weekday) %>% #groups by usertype and weekday
      summarise(number_of_rides = n(), average_duration = mean(ride_length)) %>%
        arrange(member_casual, weekday)

Visualize Number of Rides by Rider Type

all_trips_v2 %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>%
    group_by(member_casual, weekday) %>%
      summarise(number_of_rides = n(), average_duration = mean(ride_length)) %>%
        arrange(member_casual, weekday) %>%
          ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
          geom_col(position = "dodge") +
          scale_x_discrete(name="Weekday") +
          scale_y_continuous(labels=comma, name="Number of Rides") +
          labs(title = "Number of Rides by Rider Type", fill= "Casual vs Member") +
          scale_fill_paletteer_d("PrettyCols::Bold")

numrider

Visualize Average Duration by Rider Type

all_trips_v2 %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>%
    group_by(member_casual, weekday) %>%
      summarise(number_of_rides = n(), average_duration = mean(ride_length)) %>%
        arrange(member_casual, weekday) %>%
          ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
          geom_col(position = "dodge") +
          scale_x_discrete(name="Weekday") +
          scale_y_continuous(name="Number of Rides") +
          labs(title = "Duration by Rider Type", fill= "Casual vs Member") +
          scale_fill_paletteer_d("PrettyCols::Bold")

numrider

Visualise Gender and Age Division of Riders

all_trips_v2 %>%
  filter(!is.na(gender)) %>%
  filter(birthyear >= 1950L & birthyear <= 2003L & !is.na(birthyear)) %>%
  ggplot() +
  aes(x = birthyear, color = gender, fill = gender) +
  geom_histogram(bins = 20L, show.legend = FALSE) +
  facet_wrap(vars(member_casual, gender)) +
  scale_x_continuous(name="Birth Year") +
  scale_y_continuous(name="Number of Rides") +
  labs(title = "Gender and Age Division of Riders") +
  scale_fill_paletteer_d("PrettyCols::Bold") +
  scale_color_paletteer_d("PrettyCols::Bold")

numrider

Export Summary File for Further Analysis

counts <- aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual +
                      all_trips_v2$day_of_week, FUN = mean,)
write.csv(counts, file = 'output_CSV/avg_ride_length.csv')

Observations

Casual users have significantly longer rides on Sundays compared to members.
Casual users have much shorter rides on weekdays compared to members.
Casual users typically have longer ride lengths than members.

Recommendations

Create weekend specific memberships plans, the “Weekend Warrior” plan.
Highlight benefits for daily commuters, the “Bike 2 Work” plan.
Implement a tiered membership to reward longer rides.
Create loyalty program with points, redeemed for merch from local partners.

Find this Project on the Web!

Kaggle Notebook in Python
Kaggle Notebook in R
Tableau Dashboard

Share on

Twitter Facebook LinkedIn

Gregory Whitmore

Cyclistic Case Study

About the Company

Define the Problem

The Task

Environment Setup

Load Data from CSV

Clean and Combine

Verify Column Names

Rename Columns of ‘q1_2019.csv’ for Consistency

Inspect the Dataframes

Convert ride_id and rideable_type columns to CHR

Create and Clean New Table ‘all_trips’

Stack Quarter Dataframes into One

Remove lat, long

Load ‘all_trips’ and Preview

Clean ‘member_casual’ Column for Consistency

Split Date into Columns for Better Aggregation

Add ‘ride_length’ Calculation and Make Numeric

Remove the Rows of Bad Data

Prepare Descriptive Analysis

Analyze Ride Lengths

Compare Casual and Members

Correct Day of Week Order

Analyze Data by Type and Weekday

Visualize Number of Rides by Rider Type

Visualize Average Duration by Rider Type

Visualise Gender and Age Division of Riders

Export Summary File for Further Analysis

Observations

Recommendations

Find this Project on the Web!

Share on

You May Also Enjoy

Data Scraping with AI

Human Resource Analytics

Python and the Weather

Tableau Dashboard: Customer Analysis