Introduction

With the advent of the era of big data, companies are more and more inclined to analyze customer consumption behaviors, in order to formulate specific marketing strategies to promote consumers to complete the transactions.

Customer groups of different ages and different incomes obviously have different consumption habits. Taking age as an example, young people are more susceptible to online and social media advertisements. And more, because of their relatively less savings, they will be more concerned about the price and the promotion.
In this context, Starbucks has developed an experimental system that simulates user consumption data, and analyzes the data to find the patterns of customer consumption, so as to conduct more targeted promotions and optimize revenue.

Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free).
Some users might not receive any offer during certain weeks.
Not all users receive the same offer.

This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app, it’s a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products.

The task is to combine transaction, demographic and offer data to determine which demographic groups respond best to which offer type.

More precisely, I aim to answer two questions:

  1. What are the main factors that driver customers or groups to complete an offer?
  2. Given offer characteristics and user demographics, can we predict whether the customer will complete the offer effectively? What’s more, how much money will the customer pay for?

Data Set

  • portfolio.json

containing offer ids and meta data about each offer (duration, type, etc.)

Columns Data Type Explanation Total Count NaN Count
id str id of offer 10 -
offer_type str type of offer
values: ‘bogo’,’discount’,’informational’
10 -
difficulty int the minimum consumption to complete the offer 10 -
reward int reward after completing the offer 10 -
duration int the valid duration of the offer 10 -
channels str list the channel to send the offer 10 -
  • profile.json
    demographic data for each customer | Columns | Data Type | Explanation | Total Count | NaN Count | | —- | —- | —- | —- | —- | | age | int | the age of customer | 14825 | 2175 | | became_member_on | int | the enroll date of customer
    e.g. 20170101 | 17000 | - | | gender | str | the gender of customer
    values:‘male’,’female’,’other’ | 17000 | - | | id | str | the id of customer | 17000 | - | | income | float | the income of customer | 14825 | 2175 |
  • transcript.json
    records for transactions, offers received, offers viewed, and offers completed. It shows user purchases made on the app including the timestamp of purchase and the amount of money spent on a purchase. | Columns | Data Type | Explanation | Total Count | NaN Count | | —- | —- | —- | —- | —- | | person | str | the id of customer | 306534 | - | | event | str | the description of transcript
    values:‘offer received’,’offer viewed’,’transaction’,’offer completed’ | 306534 | - | | time | int | the happend time of event(hour) | 306534 | - | | value | str dict | some includes id of offer, some includes amount of transaction | 306534 | - |

Data Cleaning and Preprocessing

At the begining, I try to know the basic informations about NaN values, they all bounded with the unusual age value “118”. After all these NaN records have been deleted, Here comes the distributions of gender, age and income.
image.png

Immediately after, I wrangled the data transcript by extracting the value column to offer id and amount columns. What’s more, I merged the ‘duration’ and ‘offer_type’ infos bounded with the offer id for further data preprocessing. Below is description of the wrangled ‘transcript’:

Column Type Explanation Total Count NaN Count
person int Id of customer 272762 -
event str object State of the record:
‘offer received’, ‘offer viewed’, ‘transaction’, ‘offer completed’
272762 -
time float Happend time of this record 272762 -
amount float How much money paid under this record
Notice: exist only when ‘event’ value is transaction
272762 -
offer_id str object The offer_id bound with this record
Notice: ‘-1’ means there is no offer
272762 123957 of ‘-1’ value
duration float The valid duration of the offer
272762 123957
offer_type str object The offer type 272762 123957

By using the data set above, based on a self designed program flow chart(File: data_preprocessing_class.py), I extract the transactions infomation direct to the related person and offer.
flow_chart.jpg
Meantime, I wrangle the data type of some features (e.g. the ‘value’ column in portfolio is a list, should be unfolded), and transform some features to normal form(e.g. transform the member enroll date from ‘int’ to ‘date’).

Then I divide all the customers into 12 segments according to ‘age’ and ‘income’, which tends to show a group characteristics.

What’s more important, considering all kinds of response situations to offer, I divide all transactions to 4 groups:


1. none_offer
never received offer

2. no_care_offer
received, but don’t care about the offer

3. tried_offer
tried to do some transaction, but not complete within the duration of offer

4. effective_offer
complete the offer

Finally, I get an ideal wrangled data set with label of segments and response groups described as follows(File: model_dataset_raw):

Column Type Explanation Total Count NaN Count
person int id of customer 66506 -
offer_id str object values: ‘-1’, ‘0’-‘9’
represent 10 offers, ‘-1’means no offer received
66506 -
time_received float time when offer received
‘NaN’ represents not received
66506 5
time_viewed float time when offer viewed
‘NaN’ represents not viewed
66506 16646
time_transaction str object time then transaction(s) takes place
‘’ represents there is no transaction
‘,3.0,5.0’ means there are two transactions under this offer, one is at time 3.0, another in at time 5.0
66506 8754
time_completed float time when offer completed
‘NaN’ represents not completed
66506 26099
amount_with_offer float How much money has been paid under this offer
‘0.0’ represent no transaction
66506 -
label_effective_offer int the label to mark the completing level of offer
More details See Notice below
66506 -
reward float Reward after completing the offer 66501 5
difficulty float The minimum consumption to complete the offer 66501 5
duration float The valid duration of the offer
‘NaN’ implies the offer_id is ‘-1’
66501 5
offer_type str object ‘bogo’, ‘discount’,’informational’ 66501 5
email float One Channel to send offer 66501 5
mobile float One Channel to send offer 66501 5
social float One Channel to send offer 66501 5
web float One Channel to send offer 66501 5
gender str object ‘male’,’female’,’other’ 66506 -
age int Age of the customer 66506 -
income float Yearly income of the customer 66506 -
member_days int The days from enroll date to 2019.01.01 66506 -
label_group str object 4 groups of resonse to offers
Values see Notice2. below
66506 -
label_seg int 1-12: 12 segments based on age and income 66506 -

Notice 1.: label_effective_offer (Label describes the completed level of offer)
(Attention: there is no infomation about ‘offer viewed’)

Values Meaning
1 for informational offer there is at least one transaction within duration;
for other offer there should be ‘offer completed’
0 for informational offer there is no valid transaction within the duration but ‘offer received’;
for other offers there is no ‘offer completed’, but within duration there maybe some amount, although the amount of transactions not fulfil requirements
-1 the initial label, when there is no ‘offer received’, the label keeps ‘-1’
-2 represent some people: they only have transactions within all the experimantal time , no offer was sent to them

Notice 2.: label_group (4 groups of resonse to offers)

Group received viewed valid completed transaction amount Scenario Logical expression
1.none_offer 0 0 0 haven’t received the offer label_effective_offer.isin([-1, -2])
& time_viewed == NaN
2.no_care 1 0 - received but not viewed.
regarded as “don’t care”
label_effective_offer.isin([0, 1])
& time_viewed == NaN
1 1 0 =0.0 received, viewed but no transaction label_effective_offer == 0
& amount == 0.0
& time_viewed.notnull()
1 1 1
viewed after completed
received, but completed unintentionally, namely viewing after completed label_effective_offer == 1
& time_viewed > time_completed
3.tried 1 1 0 >0.0 received, viewed, have transaction, but amount less than ‘difficulty’ label_effective_offer == 0
& amount > 0.0
& time_viewed.notnull()
4.effctive_offer 1 1 1
viewed before completed
viewed before completed, effctive offer label_effective_offer == 1
& time_viewed < time_completed

Notice 3.: label_seg (12 segments based on age and income)

Segment # Age Group (edge included)
(Experiment in 2018)
Income
1 Millenials(-21 & 22-37) low
2 Millenials(-21 & 22-37) medium
3 Millenials(-21 & 22-37) high
4 Gen X(38-53) low
5 Gen X(38-53) medium
6 Gen X(38-53) high
7 Baby Boomer(54-72) low
8 Baby Boomer(54-72) medium
9 Baby Boomer(54-72) high
10 Silent(73-90 & 91+) low
11 Silent(73-90 & 91+) medium
12 Silent(73-90 & 91+) high

Income Level:

Income Values($)
low 30,000-50,000
medium 50,001-82,500
high 82,501-120,000

image.png
Notice 4.: offer_id (10 Kinds of offer)

offer_id type duration requirement reward
0 bogo 7 10 10
1 bogo 5 10 10
2 infomational 4 - -
3 bogo 7 5 5
4 discount 10 20 5
5 discount 7 7 3
6 discount 10 10 2
7 informational 3 - -
8 bogo 5 5 5
9 discount 7 10 2

Exploration analysis

Below is the structure of the analysis data:
image.png
The data set has been divided into 12 Segments based on age and income, and each segment has 4 response groups.
With different combination of the 4 response groups, there are 3 questions to be explored.

Combined group set 1 Combined group set 2
1 Offers Distribution received none offer
Group(s): none_offer
received offer
Group(s): __no_care, tried, effective_offer
2 Interest Distribution towards different offers don’t care
Group(s): no_care
care
Group(s): __tried, effective_offer
3 Difficulty Distribution of different offers tried but not completed
Group(s): tried
effectively completed
Group(s): __effective_offer

What’s more, I will use the index IIR to discuss, whether the offer is significantly popular by customer.

Q1 Offers Distribution

Offer_Distributions.png
image.png

  • In general
  1. There are just 5 person, who never received the offer (offer_id is ‘-1’)
    1. Two in segment #7; One in each segment #8 #9 #11.
    2. They all are more than 50 years old and has more than one year membership. It seems they are regular customer and needn’t receive the offer.
  2. The offer distributions under income: see segment #3 VS. segment #12
    1. Young people have not so much money.
    2. Elder people tend to have more savings.
  3. The offer distributions under age: see segment #1 VS. segment #10
    1. In the low income group, compared with young person, the elder person seems to receive less offers
  • In segments(subplots)
  1. In each segment, person reveive almost the same quantity of offers. See the average line.

Q2 Interest Distribution towards different offers

Interest_Distributions.png
image.png

  • In general
  1. The customers care more about offer 0, 1, 5, 6, 8
  2. Offer 2, 3, 4 are not in interest
  3. For offer 7, 9, some cares, some doesn’t care

From the three tables above, we can conclude that:

  1. ‘social’ is an import factor to attract people to complete the offer
  2. ‘bogo’ with medium difficulty are more popular
  3. ‘discount’ with less difficulty are more popular
  • In segments(subplots)
  1. Offer 0, 1
    1. The high income groups show less interst compared with other income group. (see Segments #3 #6 #9 #12)
  2. Offer 5, 6, 8
    1. Customer shows great interst(in all Segments)

Q3 Difficulty Distribution of different offers

Difficulty_Distributions.png
image.png

  • Summary: Level of completion for each offer
  1. Offer 5, 6, 8 are better completed
    1. most are ‘discount’, and there is ‘social’ factor
    2. ‘difficulty’ of ‘bogo’ is not so much
  2. Offer 0, 1 are harder to complete
    1. ‘difficulty’ of ‘bogo’ is a little bit heavy
  3. Offer 7: an informational offer
    1. richer customers don’t care (see segment #6 #9 #12, compared to the average line)
    2. more attracted to less rich people (see average line)
  4. Offer 2, 3, 4 are more attracted in medium elder and rich people (see segment #5 #8)
  5. The person with high income tends to complete all offers (see segment #6 #9 #12)
    1. even for the offer 0, 1

Q4 Index IIR: is the offer significant popular?

Definition of IIR: Incremental Response Rate
Udacity Capstone | Analysing and Predicting Starbucks Customers' Reponses towards 10 kinds of Promotion Offers - 图11

Symbol Meaning
n number of Purchasers in Treated Group
sum Total number of Purchasers in Treated Group
n number of Purchasers in Control Group
sum Total number of Purchasers in Control Group

IIR.png
image.png

  • In general: which offer seems popular?
  1. Offer 2, 3, 4 have positive IIR
    1. for offer 4, the difficulty is 20, maybe few people want to complete it, so it shows high IIR
    2. for offer 3, difficulty is not big, but reward is ok, so it’s popular
  2. Offer 0,1 have a huge negtive IIR
    1. specially for the low income person(see segments #1 #4 #7 #10)
  • In segments(subplots)
  1. The rich people seem not so excited about offer received(see Segments #6 #9 #12)
    1. Maybe they are too rich to be encouraged from the reward of offer

Summary for Data Exploration

  1. The channel to send an offer
    • Through ‘social’ is a better way.
  2. The type of offer
    • People like ‘discount’, because the reward is real money compared to ‘bogo’(get another same thing) and ‘informational’(just information)
  3. The content of offer
    • If the ‘difficulty’ is too much, e.g. 20, people have less desire to complete the offer
    • 5-10 maybe a good range of ‘difficulty’
  4. For all offers, considering of Interest(care) and Difficulty(Level of completion)
    • Offer 5, 6, 8 are more attracted and more easy to complete: could be sent to all customers
    • Offer 0, 1 are hard to complete, but could be sent to high income customers
    • Offer 2, 3, 4 are more attracted to the medium elder, income people(Segment #5 #8)
    • Offer 7 for less income people(maybe the information is attracted by them)

Feature Extraction

Based on the model_dataset_raw above, after some objects columns(offer_id, offer_type, gender etc.) transformed to 0-1 variables, all the potential features have been extracted.

Features Type Explanation Total Count NaN Count
person int 66501 -
time_received float 66501 -
time_viewed float 66501 -
transaction_cnt int count of transactions under this offer of the customer 66501 -
time_completed float 66501 -
amount_with_offer float How much money has been paid under this offer
‘0.0’ represent no transaction
66501 -
amount_total float Total amount of paid money for each customer 66501 -
offer_received_cnt float Count of all received offers 66501 -
reward float Reward after completing the offer 66501 -
difficulty float The minimum consumption to complete the offer 66501 -
duration float The valid duration of the offer
66501 -
email float One Channel to send offer 66501 -
mobile float One Channel to send offer 66501 -
social float One Channel to send offer 66501 -
web float One Channel to send offer 66501 -
age int Age of the customer 66501 -
income float Yearly income of the customer 66501 -
member_days int The days from enroll date to 2019.01.01 66501 -
label_seg int 1-12: 12 segments based on age and income 66501 -
gender_F
gender_M
gender_O
int 0-1 variables of genedr 66501 -
group_effective_offer
group_no_care
group_tried
int 0-1 variables of group(the group of none_offer has been removed) 66501 -
offer_0
offer_1

……
offer_9 | int | 0-1 variables of 10 kinds offers | 66501 | - |


Model

I wonder whether Machine Learning will find some intersting points of the data. Especially in the following situations:

  1. Offer is going to be sent to a customer, will this offer effective?
  2. Offer is already sent to a customer, is this offer effective?
  3. Given basic infos of a customer, how to recommend an offer with the most effctivity?

To answer these questions, I build a model pipeline:

  • Select features and target(for different issue concerned use different features and target)
  • Select classifiers and compare the perform of all classifiers
  • Select the suitable parameters of the best performed classifier by using grid search method
  • Analyse the result


Notice:** One Neural Network is also built for regeression analysing.

Issue1: Offer is going to be sent to a customer, will this offer effective?

Object Description
Data Set Subset data of 3 offer response groups
- no_care
- tried
- effective_offer
Target label_group 0: customer doesn’t care the offer
1: Within the duration of offer, customer tried or completed the transactions
Features age basic info about customer
income basic info about customer
member_days basic info about customer
gender_ basic info about customer
(3 kinds of 0-1 variables)
offer_ offer id
(10 kinds of 0-1 variables)
amount_total amount paid of all transactions
offer_received_cnt number of all received offers
time_received receive time for this offer

Model training result:
image.png

Issue1 Summary:

  1. The first model KNeighborsClassifier could be a reference model
  2. SVC and NuSVC take more time, I attempt to continue without them.
  3. DecisionTreeClassifier and RandomForestClassifier both have a high score in training, is there something speciall?
    • Notice that: the test time is much more less than train time. The possible reason is that the data set has a simple structure so that all predict with same result(see the deployment of Issue2: no matter how I change the input data, it all shows the same result.)
  4. For all, the accuracy of predicting is around 70%, it seems models are not so appropriate in this situation.

Issue2: Offer is already sent to a customer, is this offer effective?

Object Description
Data Set Subset data of 3 offer response groups
- no_care
- tried
- effective_offer
Target label_group 0: customer doesn’t care the offer
1: Within the duration of offer, customer tried or completed the transactions
Features age basic info about customer
income basic info about customer
member_days basic info about customer
gender_ basic info about customer
(3 kinds of 0-1 variables)
offer_ offer id
(10 kinds of 0-1 variables)
amount_total amount paid of all transactions
offer_received_cnt number of all received offers
time_received receive time for this offer
amount_with_offer amount paid for this offer
time_viewed view time for this offer.
If not, the value is 0.0

Model training result:
image.png

Issue2 Summary:

  1. As a reference model, KNeighborsClassifier performs not bad.
  2. DecisionTreeClassifier and RandomForestClassifier both have a full score in training, is there something speciall?
    • Notice that: the test time is much more less than train time. The possible reason is that the data set has a simple structure so that all predict with same result(see the deployment of Issue2: no matter how I change the input data, it all shows the same result.)

P.S.: I’ve used the GradientBoostingClassifier as the target model to deploy my project.

Issue3: Given basic infos of a customer, how to recommend an offer with the most effctivity?

Object Description
Data Set Subset data of 2 offer response groups(at least transaction exists)

- tried
- effective_offer
Target offer_(10 classes) 0: uneffective in this offer_id
1: effective in this offer_id
Features age basic info about customer
income basic info about customer
member_days basic info about customer
gender_ basic info about customer
(3 kinds of 0-1 variables)
amount_total amount paid of all transactions
offer_received_cnt number of all received offers
time_received receive time for this offer
amount_with_offer amount paid for this offer
time_viewed view time for this offer.
If not, the value is 0.0

Model training result:
image.png

Issue3 Summary:

  1. For all, the predicting performs are totally bad.
    • But still, DecisionTreeClassifier and RandomForestClassifier have high score in training, what’s going on?
      • Notice that: the test time if much more less than train time. The possible reason is that the data set has a simple structure so that all predict with same result(see the deployment of Issue2: no matter how I change the input data, it all shows the same result.)
  2. Pay attention to that the GradientBoostingClassifier() is not suitable for multi-class problem.
    So I used MultiOutputClassifier(GradientBoostingClassifier())

Additional Issue: Neural Network for regeression

Object Description
Data Set all model_dataset
Target amount_total (float)

The total amount of money paid by a costomer
Features age basic info about customer
income basic info about customer
member_days basic info about customer
gender_ basic info about customer
(3 kinds of 0-1 variables)
reward bounded with offer
difficulty bounded with offer
duration bounded with offer
email bounded with offer
mobile bounded with offer
social bounded with offer
web bounded with offer
transaction_cnt count of all transactions for a customer
offer_received_cnt count of received offers for a customer
group_effctive_offer Label of group(effctive_offer)
group_no_care Label of group(no_care)
group_tried Label of group(tried)

Model training result:
image.png

Additional Issue summary:

This experiment for regression analysing seems useless because of the stagnant and huge training loss.


Results: deployment of Issue2

Offer is already sent to a customer, is this offer effective?
Classifier: GradientBoostingClassifier
Features importances order:
image.png
Deployment - input the data:
result_submit.png
Deployment - result:
result_pred.png


Conclusion

  1. Two analysis methods: heuristic exploration & model building
    1. The model method fits not good at the reality: for example the deployment of Issue2, when I change the input data, the result seems always the same.
    2. But the heuristic exploration makes more sense.
    3. In this case we could try unsupervised Machine Learning method like Cluster. See References[6]
  2. About Data Set
    1. Maybe when the amount of data is big enough, we could get some resonable founds by using the supervised Machine Learning method.
    2. Besides, there is no features of customer id, the customer exists in the form of the segment group, maybe when the transactions of an unique customer more frequently occurs, there would be some patterns of the consuming behavior.
  3. About the Segment
    1. Here I use the information of age and income. We can also use other method to segment the customers, e.g. age and gender.

For more details of this project, you could refer to my Github Repository
I would like to thank Udacity & Starbucks for all the supports, especially for teaching assistant.

References

[1]Create dummies from a column with multiple values in pandas
[2]Starbucks Capstone Challenge: Using Starbucks app user data to predict effective offers
[3]Starbucks Promotion Optimization
[4]generations-and-age
[5]single taxable income
[6]Investigating Starbucks Customers Segmentation using Unsupervised Machine Learning