Affine – 1/30/2011
Profiles 15th Oct, 2015
favorable transformations
Affine Analytics
Preparation – Microsof
Interview
A ffi n e C o n fi d e n ti a l Page 11
Affine – 1/30/2011
Profiles 15th Oct, 2015
Interviewer: Alperen Kok, Pratik Chawla, Daisy Wang
Topic of Discussion: Campaign Management end to end as a service
Key points to be kept in mind before the interview:
1. Say this to client - "I might take some time during our conversation to answer your questions due
to technological delays in communication as sometimes there is lag when we communicate from
offshore
2. Take some time to listen to the complete question before jumping on the answers
Following is a list of questions that were asked to different candidates -
1. Please walk me through your resume specially the area where you have worked on campaign
management piece for any of your client
2. What data did you use in your project and how you cleaned it up
Transaction Data, Membership Data, Campaign Data – Promotions
3. What all assumptions were taken while designing the campaign and how you will decide on a
certain hypothesis if business doesn’t provide you any input
4. Have you ever changed the hypothesis after receiving the data instead of using business input
a. Happens often – some hypothesis may sound intuitive, but the data proves it otherwise.
Give an example of an hypothesis that was proved to be counter intuitive after initial
exploratory analysis
5. How will you measure the campaign? Have you ever done it earlier by yourself
a. Measuring a campaign:
i. Test/Control –
ii. Pre and Post Campaign -
iii. Forecasting
iv. Tracking some of the KPIs – Change in Sales, purchase frequency, Customer
Acquisition, Conversion Rate, CLTV, traffic, ARPU
b. Yes, take an example from your project and explain how you did it
6. How you determine the size of control group for any campaign
a. Test and control group can be divided into any proportions (i.e 20:80, 50:50, 80:20 etc).
The best proportion to choose depends on the size of the sample. If the sample size is
small it is preferred to use 50:50 ratio as this reaches the significance level soon etc.
7. Based on what criteria will you decide the length of campaign measurement program as well as
campaign as a whole for upgrading Microsoft 365
a. Historical similar campaigns; gradual drop in the responses
b. If no history, then test and control
c. Historically we would know the conversion rate. Once we have determined the sample
size for a campaign, and leveraging the conversion rate from above, we can find out how
much time it will take to reach the sample size. This time should be the length of time to
carry out the campaign.
A ffi n e C o n fi d e n ti a l Page 11
Affine – 1/30/2011
Profiles 15th Oct, 2015
8. How will you decide the most effective channel for campaign while designing a subscription
based product campaign in our case its Microsoft 365
a. Historical information on subscription products like Azure; Target Customer Segments;
Response Modeling – Predict the probability of customer responses across channels; if
no history, then test and control
9. How you determine the effectiveness of a campaign what all KPIs will you measure
a. Change in Sales, Number of Leads, Lifetime Value – customer worth, Customer
Acquisition Cost, Conversion Rate, Click through rates (Clicks/Impressions)
10. Business case – if I were to send email campaigns and in product feature campaign which
channel I should choose and why.
a. Case 1 - I have some history of email only,
b. Case 2 – I have no history what so ever and will be doing it for the first time ever
Answer: Case 1 and Case 2 – Look for similar campaigns historically and see which once
performs better. Or, carry out a test campaign on both channels and see the
performance. The one with better results should be chosen as the preferred channel
11. How should I decide the number of customers to approach based on above case for each
campaign
a. First decide how much lift you want to see in the data and then corresponding to the lift
calculate the sample size requirement for a significant campaign. Use the formula –
i. SS = (z*σ/c)2 , where ss = sample size, z = z value for 95% confidence i.e 1.96
and c is confidence interval or lift required.
ii. For conversion proportions, use formula SS = (z/c)2*P(1-P) where p is conversion
proportion
12. If I send email to everyone and only few of the customers got in product feature how will I
decided the effectiveness of a channel in this scenario
a. Compare the responses from each of those channels and decide the better one based on
different KPIs
13. If I have no restrictions on number of customers to reach out then how many should I approach
optimally for each of the channel
a. Again, decide the desired lift and calculate the optimum sample size. Sending out
campaigns to everyone consumes time and money
14. How will you calculate the lift in the respective campaigns? What all KPIs will you be considering
to determine the lift
a. Pre/Post or Test/Control or Time Series Forecasting
b. Change in Sales, Number of Leads, Lifetime Value – customer worth, Customer
Acquisition Cost, Conversion Rate, Click through rates (Clicks/Impressions)
15. How you calculate the significance in lift and how can you say it is significant and we should
launch nationally
Lift is increase in Sales in response to promotions. Z-test helps decide whether the lift is
significant (z-test)
16. Suggest few KPIs if I go ahead with just email campaign. Which conversion rate do you think
would be most important to take action.
a. KPIs that generate activity by a campaign - Conversion Rate, Click through rate,
Subscription Rate (subscribe/unsubscribe), Open Rate, Delivery Rate (Delivered/Sent) –
A ffi n e C o n fi d e n ti a l Page 11
Affine – 1/30/2011
Profiles 15th Oct, 2015
Delivered = (sent – bounced), Forwards/Shares, Complaints in the form of Junk/Spam,
Some Performance Indicators – Conversion Rate, Revenue, Gross Profits
17. What is the industry average in conversion rate of e-commerce websites do you have any
ballpark figure in mind
a. For ecommerce websites, it is about 2-3%, but this varies depending on factors like –
what Channels, product categories, devices, holidays/seasonality, platforms etc.
18. What all tools you have worked on in your career. Have you ever worked on Cosmos/Hadoop/R
19. Hypothesis tests, 2sample t-test, 1 sample t-test, z-test
20. SQL related:
a. Joining two tables (simple left joins)
b. Get the count of elements and average no. of days since start_date for each segment
Using the function today()
i. Select Segment_ID, count(elements), avg(today()-activity_start_date)
From table
Where activity status = ‘A’
Group by segment_ID
21. Probability Questions:
a. Probability of pair of cards from a deck of 52 cards
b. Probability of any card from the deck of 52 cards
c. Probability of pair of cards from a deck of 52 cards if I can pick the first card anyway I
want
If Daisy Wang is the interviewer, please keep the actual formulae handy for things like sample
size calculation, and formulae/examples of hypothesis test, 2 sample t-test, 1 sample t-test
22. Some of the SQL Questions :
a. Display entire content on the table (Select *)
b. Get employee count in every project (Simple Group by query)
c. Update query (Update table_name set x = 2 where y = 3)
d. Simple Left Join query (select a.id, count(distinct employee_id) from projects as a left
join employees_projects as b on a.id=b.project_id group by a.id )
e. Determine the minimum license date and the subscription date for all the subscribers
and calculate the difference in both the dates (in days) ( Use function datediff in SQL)
i. Finally, calculate the count of the customers falling in each of these difference of
days (Using Group by function)
23. Some of the R Questions :
a. Import csv file from desktop (read.csv)
b. Arithmetic operation to one column in a data frame (xyz$salary = xyz$salary*2)
A ffi n e C o n fi d e n ti a l Page 11
Affine – 1/30/2011
Profiles 15th Oct, 2015
c. For loop syntax (for (I in 1:50)) {}
d. Summarize a data frame (summary(data))
24. How to decide on the sample size?
a. See answer number 11
25. If there is new app that is launched, what KPI’s do we have to look at? What to be done next
once I know what KPI’s to track
a. No of downloads
b. Average daily usage time
c. Daily average users (DAU) etc
d. Send campaigns to all the users who are using the app regularly and see the effect on
daily usage time of these customers. This would be a campaign where you are trying to
keep the users engaged
26. I am have sample data (2000) of a survey(1-5). What can I tell about the population based on
this data?
27. AB testing. (How to decide on what should be null and alternate hypothesis, test statistic and
significance level), Confidence Intervals definition.
28. What is the difference between having and group by in sql
a. Having clause – Used to filter or put cut off on all the aggregated records
b. Group by – Used to specify level of aggregation of the data
29. Walk through a project where you have worked on logistic regression
30. What all checks you performed while selecting variables?
31. How you conclude your model is good? What was the value of AUC
32. How will you go about if you got a new dataset?
33. Asked to write a query by using group by, where condition
34. Dept. with highest no. of students enrolled
a. Select top 1 dname from ( select B.did, B.dname, c.count(sid) from B left join C on
B.did=C.did group by B.did, B.dname order by C.count(sid) desc )x
35. country that chose a specific dept. the most
a. Select top 1 Scountry from (select Scountry, count(Did) from A left join C on A.sid = C.sid
where A.did = ‘Specific department’s id’ group by Scountry order by count(did) desc)x
36. Dept. that was not preferred by any student
a. Select did, count(sid) from C group by did having count(sid) = 0
37. Tables for above3 questions –
A
A ffi n e C o n fi d e n ti a l Page 11
Affine – 1/30/2011
Profiles 15th Oct, 2015
Sid Sname Sage Scountry
B
Did Dname HOD
C
Did Sid
38. You are given a college database. Prepare a report of students by Geo and department. (Explain
your approach towards building that report)
39. If state data is missing in the above question, what approach will you take to build the report
40. Behavioural question – what will you do if you are met with a conflict with the business
stakeholders in a meeting room. Situation- Your numbers after imputing missing data doesnot
seem correct to the stakeholders.
41. How will you convince the stakeholders that your approach for solving a given problem is correct
at a presentation.
42. You started the campaign last week, within 1 week you want to measure the effectiveness of the
campaign. How will you do that?
a. You can do that by measuring the KPIs like open rate, click through rate etc
43. How will you decide on what time frame to run the case-control or pre-post analysis on?
a. See answer 7c
44. What is the difference between a Where and a Having clause?
a. Where – First level of data filtering at a record level
b. Having – Use to filter aggregated values. Used in conjunction with group by clause
45. Which will run faster and why? SELECT * FROM TABLE WHERE COL1=”abc” or SELECT COUNT(*)
FROM TABLE WHERE COL1=”abc”
a. Select count(*) will run faster as it takes a lot of time to display all the records on screen
i.e to run select *
46. You have a table with salaries of employees, how will you get the 10 th highest salary (keep in
mind that multiple employees might have the same salaries)
a. Select top 1 salary from (select distinct top 10 salary from emp order by salary desc) x
order by salary
47. What is a UNION? What is the difference between UNION and UNION ALL
a. Used to append records from two tables
b. Union de-dupes the records and then appends all the records. Union all doesn’t dedupe
the records hence as a results if there are common records in two table, it will be shown
in union all but not in union
48. You have IDs on 2 tables, you want the IDs which are in the first table, but not in the second one.
Answer with both JOIN and SubQuery
a. Do a left join where b.ID is null ( here b is the second table)
b. Select ID from a where id not in (select id from b)
49. One question on campaign management. How do you measure the success of an email
campaign to shift users to mobile version of the software? If we send out emails to 100,000
existing customers and there is 4% increase in the mobile version users, how do you measure
the success of the campaign?
A ffi n e C o n fi d e n ti a l Page 11
Affine – 1/30/2011
Profiles 15th Oct, 2015
a. Use test control within the sample. Send emails to the test group and do not send to the
control group. Measure all KPIs and check if the lift is significant or not.
b. Do t- test to see if the lift is significant. If significant, the campaign would be a success
50. Suppose you carried out a campaign and the p-value comes out to be 0.5, what can you can
about the campaign? What can you do to make it significant?
a. The campaign is inconclusive
b. Increase the sample size to make it significant
51. Comparing decision tree based model and logistic regression based model, how can you
determine which one is a better model?
a. Look at the AUC, AIC and other parameters to decide which is a better model
52. What if difference between rank and dense_rank windows function?
a. Rank – gives the same rank to all the records with similar partition, the ranking for the
next partition has a gap and depends on the number of previous record
b. Dense_rank - gives the same rank to all the records with similar partition, the ranking for
the next partition doesnot have a gap and is the next number in the sequence. Example -
IDs Rank Dense_rank
A 1 1
A 1 1
A 1 1
B 4 2
B 4 2
C 6 3
Questions for them -
1. How soon will we start if everything falls in place
2. Which team will I be joining? How big is that team and what are their roles right now?
Jairaj & Sheldon: XBox
1. Suppose you are asked to do a marketing campaign for a bank to promote a credit card balance
transfer offer. What should you do? What kind of customers should you target? Suppose there
are several APR offer, how do you determine which APR offer is the optimal?
c. First step should be to find the appropriate customers to send these campaigns to. This
should be done by looking historically at the customers data, seeing if he has done
balance transfer ever, have he opted for EMI on purchases etc
d. Carry out test campaigns to see which APR bucket has the best conversion rate and then
expand it to a larger audience
2. Have you ever used Hadoop, Python?
A ffi n e C o n fi d e n ti a l Page 11
Affine – 1/30/2011
Profiles 15th Oct, 2015
3. There is a table which has three columns: user1, user2, and number of messages sent from
user1 to user2. Please write code to create a new table which has the total number of messages
sent between user1 and user2.
4. LTV: suppose the Xbox Live Gold membership churn rate is 5%. How to calculate the customer
LTV?
a. Assume the churn period and use the decay rate formula to get the total time and then
multiply it with the subscription charge to get the life time value
5. How to do customer segmentation?
a. Decide all the clustering variables based on business i.e variables which determine
records which are similar/distinct from each other. Prepare the data set, decide the
number of segments from the scree plot and run k-mean to get the segments.
6. Design a marketing campaign with Xbox Live subscription discount? How to determine the best
discount rate? What’s the objective and constraint?
a. Divide the test group into different groups based on the discount rate. See which once
performs the best. That could be the best discount rate
7. Why choosing Microsoft Xbox Live?
a. Free games every month
• Access to online play
• large discounts on games + DLC
•Xbox live player chat/party
•Full access to all entertainment apps such as sports and Netflix (some services require
third party subscription)
8. There are two tables with the following structure –
SubscriptionID SubscriptionDate
S1 SD1
S2 SD2
S3 SD3
MachineID SubscriptionID ActivationDate MachineFamily
M1 S1 D1 Android
M2 S1 D2 Android
A ffi n e C o n fi d e n ti a l Page 11
Affine – 1/30/2011
Profiles 15th Oct, 2015
PC
What metrics would you compute if the business seeks the information about the difference in
subscription and activation?
Write a sql query to create a table at subscription level to find out the metric mentioned above.
9. There are people who are churning from office 365 subscription. What will you do if the
stakeholders want you to identify the churners before 5 or 6 months of their final renewal date?
What kind of approach would you suggest?
What should be the dependent and independent variables in your model?
What is the output of your model and how will you flag the players who are about to churn?
How will you measure the accuracy?
How will you identify the time period for your analysis i.e. whether it should be 6 or 5 or 4
months?
a. Build a churn model to identify potential churners
b. Dependent variable – Churn flag (1,0) in next 6 months. Independent variables – All
other factors live daily average usage time, frequency of use etc
c. Output will be an equation which will score all the potential customers. Accuracy can be
calculated by accuracy formula
d. Churn period could be found out by basic EDA
10. Tell me about your background. Kind of work you have done
11. Tell me about a predictive analytics problem that you have worked on
12. Why do you want to work for the Xbox
13. What are the kind of problems you think you would be working on?
14. Do you have experience working on Cosmos
15. Have you worked in dynamic and fast paced projects, maybe more than 1 at a time? How did
you ensure quality and timeliness?
16. 5. Role Player: Let’s say a client asks you to deliver 3 analyses in a week. How will you handle it?
Jose Question
1. Tell me about yourself?
2. Tell me about your job profile (i.e. what sort of work you do on daily basis)?
3. What technology you work on?
4. Tell me about any of your project
5. Have you worked on SSAS?
6. What kind of role do you prefer? (Individual contributor or team)
7. What sort of preparation you would do if you have to join me from next week?
A ffi n e C o n fi d e n ti a l Page 11
Affine – 1/30/2011
Profiles 15th Oct, 2015
None of the questions Jose asked were technical. He just wanted to get an idea of our past experience
and how were we able to bring value to the business. Below are a few more points that will help
structure our answers while in interview with Jose –
1. He prefers a person who can work independently. Being a team player is good but the role that
he usually looks for relates to someone working independently
2. He expects that the person working for him will take full ownership of the work allocated. He
does not want to get involved in multiple follow up discussions (related to doubts)
3. He is interested in discussing our past projects that were quick turnaround (ad-hocs) and had
good insights from the business
4. He does not look for exact answers to the questions he asks. He just wants to get an idea about
how well can you approach the problem. Similarly, if asked about a tool like SSAS that we are not
aware of, we just need to convince him, that we have worked on similar or more complex tools
in past and will be able to learn this one as well in short time span
Interviewee (Jose) – Mohd Azad, Kumar Sanjog
Reach out to above people to know more in detail about the interviews.
Daisy’s Interview
1. What do you about the team and the role? Do you have any questions about it?
2. Tell me about yourself, your background, and why you are interested in this position.
3. Walk me through a project that you have worked on, starting with the business objective, then
walk me through what data you used, how you cleaned and modeled the data, and what tools
you used. Specifically, if you have a project where you used Python. (Some follow up questions
that were project specific)
4. What is the difference between WHERE and HAVING clauses in SQL?
5. If I have a client who hosts an app with us, what KPIs would you use to tell them how the app is
performing?
6. What questions do you have for me?
Alperen interview
1. What analytical tools you are familiar with?
2. Rate your SQL skill from 1-10.
3. How is your R skill? What kind of task you use it for?
4. Tell me more about business case where you used predictive modeling technique
5. What is decision tree? When to use decision tree, when to use regression? Is it interchangeable?
6. What is difference between rpart and randomforest?
7. How do we evaluate the model?
8. Pick one project and explain the process from beginning to end.
9. What is your role in the project? How did you and your team collaborate? Who is the person
who do hands on analytics? Who is the person who gather requirement?
10. We will have a lot of hands analysis. Are you comfortable with that?
11. Any questions for me?
Alperen Interview - Shashank
A ffi n e C o n fi d e n ti a l Page 11
Affine – 1/30/2011
Profiles 15th Oct, 2015
1. Please walk me through your resume specially the area where you have worked on campaign
management piece for any of your client
Yes, I did. I worked for a large manufacturer in India where we optimized the marketing by
testing the delivery time and day of the week for maximizing revenue and increase lead
generation. Our recommendations included the channel of marketing and when to contact. We
used predictors such as demographics, past campaign performance and Product variants. The
solution we arrived at predicts the lead conversion from each channel by days of a week and the
times during the day. Our recommendations included the best channel to contact a lead, the best
day of the week and the best times of the day to contact. By calculating the estimated
probability that a lead converts we reduced the number of leads to be targeted and thereby ran
the campaign with the allocated budget
We used data such as campaignOffers (ex: discounts, in-house financing, package options etc),
variantdetails, region, campaignduration, customer info such as age, incomebracket,
highestdegree, dependentscount, gender, marital status, citydweller indicator, past campaign
information such as channel, day of the week, time of the day, conversion indicator, product info
such as product variantnames, productprice, productdiscounts, financingavailable etc
We built out models and iterated to finalize on random forests
For model evaluation, we built out some metrics such as the ROC (receiving operator
characteristic) curve and calculated the area under curve, we created confusion matrix to
compare the observed outcome to the predicted outcome to compute
Accuracy = (tp+tn)/(tp+tn+fp+fn)
Precision = tp/(tp+fp)
Our recommendations included the channel of marketing and when to contact. We used
predictors such as demographics, past campaign performance and Product variants. The solution
we arrived at predicts the lead conversion from each channel by days of a week. Our
recommendations included the best channel to contact a lead by the best day of the week. By
calculating the estimated probability that a lead converts we reduced the number of leads to be
targeted and thereby ran the campaign with the allocated budget. The campaign resulted in a
sales growth excess of 40 percent on a year over year basis.
2. Did you work on any other models other than the ones mentioned above
3. How hands on are you when it comes to using SQL and R
4. Are you someone who works with minimal supervision
5. Pull an Output for two columns customer_id, Tenure - (Calculated field defined as number of
days from the start_date for user_id with the earliest start_date)
Table 1: User_Segment - one row per user
User_id(unique_id) Start_date IsActive(1,0) Channel(source User_Segment(high,
of customer) , medium, low)
(web, retail,
others)
1 2016/10/21 1 web medium
Table2: Customer_Users_Association – one row per user and one customer can have many users
User_id Customer_id
A ffi n e C o n fi d e n ti a l Page 11
Affine – 1/30/2011
Profiles 15th Oct, 2015
1 1
2 1
3 2
4 2
Select t1.customer_id,datediff(day, t1.start_date, getdate()) as Tenure (Select a.customer_id,
a.user_id, b.start_date ,rank() over (partition by a.customer_id order by b.start_date asc) as
rank_for_earliest_start_date from customer_users a join user_segment b on a.user_id =
b.user_id )t1 where t1.rank_for_earliest_start_date = 1
6. Do you have any questions for me
A ffi n e C o n fi d e n ti a l Page 11