Customer Segmentation Analysis

Customer Segmentation Analysis

Client Goal

To uncover information about their sales patterns to gain insights and strategize for better customer segmentation.

Project Summary

grocery shopping

Overview

Instacart is already a successful online grocery store company and wish to understand more about their sales patterns. Through this exploration, they hope to discover insights to help advance sales and marketing efforts through customer segmentation.

Purpose and Context

This project was built as part of the Career Foundry “Become a Data Analyst” curriculum. The “client” for this fictional project is Instacart and leveraged real company data. 

My Role

For this project, I served as a business intelligence analyst. I worked with the data from start to finish, providing the business insights and final report deliverable.

Tools and Analytical Techniques

This project was created with the following tools:

  • Microsoft Excel
  • Python (including Pandas, NumPy, MatPlotLib, Seaborn, SciPy)
icon for microsoft excel in navy
icon for python coding in navy

The skills used to complete this project include:

  • Data wrangling
  • Data merging
  • Deriving variables
  • Grouping data
  • Aggregating data
  • Population flows
  • Insight development
  • Data visualization
  • Report development

Project

circled number 1 with outline

Project Scope and Planning

Align requirements, project scope, and desired outcomes of project.

The focus of the project was to discover the different types of customers in their database and their purchasing behaviors. In better understanding their customers, they could better target their marketing strategy.

Specifically, I set out to discover:

  • Busiest sales days and hours
  • Price range groupings and related sales trends
  • Department / product popularity

The analysis was designed to inform the marketing strategy by creating customer profiles to drive targeted campaigns.

Data Prep and Exploration

Determine and collect data for project, then profile, clean, and explore.

Given the datasets for this project were provided by Career Foundry, I was able to start with the profile-clean-explore parts of this phase right away.

This phase took the majority of time for this project and proved extremely critical to the analysis and insights that would come later.

Profile

Conducted basic descriptive exploratory tasks to understand dimensions, fields, data types, source, and relevance.

Also, merged department, orders, products, and customer datasets together to prepare for deeper exploration.

Clean

Data cleanup needed to ensure data is ready for analysis:

  • Changed data types of variables and renamed columns
  • Fixed mixed-type variables
  • Addressed missing values and duplicates
  • Per request of CFO, removed order information for customers placing less than five orders

Explore

Used various Python functions to start exploring and organizing data:

  • Created new dataframe subsets to support analysis
  • Derived new variables
  • Created flags to start categorizing customer behavior
  • Created summary columns with descriptive statistics

Analysis, Insights, and Visualization

Interpret data patterns and trends to uncover most impactful elements for project.

In this phase, there was a heavy focus on exploring data through visualizations to identify relationships and patterns with variables.

Here, I started to explore which variables could be used to segment the customer base based on demographic  information.

As the segments were built out, various patterns evolved that could start to shape how these groups could be leveraged in marketing campaigns.

Leveraged MatPlotLib and Seaborn libraries to explore data with:

  • histograms
  • bar charts
  • line charts and
  • scatterplots

Here are some of the key insights identified in this phase, answering some primary business questions for the project.

Instacart's Busiest Times

Key Trends:

  • The busiest days of the week are Saturday and Sunday.
  • The busiest hours of the day are daytime hours. 
distribution of orders by day of week
Figure 1 - weekends (day 0 and 1) are the most popular days for customers to place orders
distribution of customer orders by time of day
Figure 2 - daytime hours are the most popular time for customers to place orders

Purchase Timing Trends with Pricing

Key Trends

  • The highest priced products are purchased on Friday and Saturday.
  • There is no distinct pattern related to price and time of day. The price of the item purchased varies throughout the day.
product purchase price by day of the week
Figure 3 - more expensive items tend to be purchased on Friday (6) and Saturday (0)
product price by hour of day
Figure 4 - there is no clear pattern of when during the day customers are purchasing a specific priced item

Purchasing Trends with Price Points

Key Trends

In the analysis, items were categorized as low, medium, or high in price range using breakpoints of $5 and $15.

  • There was a clear pattern of mid-range priced items being by far the most purchased items.
  • High-priced items were very infrequently purchased.
order distribution by price range and profile
Figure 5 - mid-range products are popular across all customer segments
order distribution by price range and profile
Figure 6 - mid-range products are popular across all customer segments

Department Purchasing Trends

Key Trend: The majority of orders for all customer segments are from the produce, dairy/eggs, snack, and beverage departments.

Figure 7 - the majority of orders from these customer segments are from the produce, dairy/eggs, snack, and beverage departments
order distribution by department and customer segment
Figure 8 - the majority of orders from these customer segments are from the produce, dairy/eggs, snack, and beverage departments

Customer Segment Distribution

Key Trends

The majority of customers are:

  • Married with Children and an Average Income level
  • Adults (age 25-64) with Average Income level
customer profile distribution for Lifestyle-Income-Household segment
Figure 9 - customer profile distribution showing Married with Children and Average Income group as largest
profile distribution for Age-Income segment
Figure 9 - customer profile distribution showing Adults with Average Income group as largest

Challenges and Decisions in this Phase

  • The main challenge in this phase was identifying variables that would be helpful in identifying the various customer segments.
  • To address this challenge, I created two segment ccategories that leveraged a few key demographic pieces. The first segment combined marital status, income, and number of children. The second segment combined age and income.
  • Between the two segment groups, many key demographic factors were included to provide a broad range of customer types.
circled number 4 with outline

Storytelling and Presentation

Assemble actionable recommendations to drive the key outcomes for stakeholder presentation.

In this phase, I assembled the final report showing:

  • Analysis methodology
  • Various results and insights
  • Recommendations identified through the analysis

Final Recommendations

The following were the primary recommendations presented in the final report.

Target weekdays and early mornings for ads to help boost sales during these less busy times.

Run ads and/or offer coupons for higher priced items Sunday through Thursday when sales of these items are less frequent.

Run ads for lower priced items and offer coupons or other incentives on higher priced items to boost sales on these less frequently purchased items.

Run ads for and incentivize purchases in departments with less sales (all departments outside of produce, dairy/eggs, snack, and beverage).

Run ads that target our frequent, but not yet categorized as “loyal” customers. Incentivize their purchases to turn them “loyal.”

Target ads to the customer base in the Northeastern states, where sales are the lowest. Offer incentives to this group to encourage more activity.

Offer incentives to customer segments with the lowest sales: “single with children”, “young adult” and “elder” populations.

Challenges and Decisions in this Phase

  • The challenge in this phase was to create a report layout where the recommendations were clear and could be tied back to the insights derived from the analysis. There was a lot of data and visualizations, so to distill it down in a consumable way presented a design challenge.
  • My approach was to create a linear layout that tied the business question to the answer and the recommendation. I also made sure that the visualizations associated with the answers were referenced clearly.
  • This approach in the layout would make it easy for stakeholders to tie the question to the data so they could follow where the recommendation came from.

Dataset

There were three sources of data for this project.

  1. The main dataset used for analysis was the “The Instacart Online Grocery Shopping Dataset 2017”.  
  2. The second set of data, customer data, as well as the “prices” column in the Products data set, were both fabricated for the purpose of this project and provided by Career Foundry.
  3. The third dataset was used to categorize states into regions and that was done using the “List of regions of the United States” from Wikipedia.

Sources

The Instacart data for analysis was pulled from Kaggle on January 2, 2024. 

The customer dataset was pulled from Career Foundry.

Data for regional categorization of states was pulled from Wikipedia.