Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

Data Science

For Fun And For Profit

@lukasvermeer
lukasvermeer.nl/datascience

Questions?

Presentations often end with this slide.

This is not one of those presentations.

Questions!

Presenters will often try to give you answers.

I am not one of those presenters.

What is Data Science?

If you’re not using

data

is it really proper

science?

What is science?

*a quick recap

Question
Hypothesis
Prediction
Experiment
Analysis
Do people understand science?
Everybody in this audience now does.
Anyone I ask here will know.
"Can you tell me how science works?"
[insert results here]

Was this proper science?

Of course not!

But why?

Was science the problem,

or was I doing it wrong?

Science is easy;

good science

is very difficult.

This is not a problem of

scale.

randomized

controlled

experiments

the gold standard for empirical science

A|B

testing

Humans excel at pattern recognition.

We even find them where there are none to be found.

Significance can help.

But it is no panacea.

Mo' data. Mo' significant results.

The more we look, the more we see.


Not all patterns are in the data.

Some emerge from the way data is analysed.

Simpson's Paradox.

Why sailors are more likely to drown whilst wearing a life-jacket than without.

Week 1: Cautious beginnings.

Base variant 99% - Treatment 1%.

0,050%

49.500 / 990.000

 
0,055%

550 / 10.000

data from one week - green wins with p = 0.023
total conversion rate

Week 2: Ramp up the new thing!

Base variant 50% - Treatment 50%.

0,047%

69.500 / 1.490.000

 
0,044%

22.550 / 510.000

data from two weeks - red wins with p < 0.001
total conversion rate





Wait. Did that trend just reverse!?

Or are we holding it wrong?

Week 2: Excluding data week 1.

Global conversion trending downwards.

0,04%

20.000 / 500.000

 
0,044%

22.000 / 500.000

data from one week - green wins with p < 0.001
total conversion rate





Green consistently beat red by 10%.

In the real world, you are unlikely to be that lucky.

Simpson's Paradox is lurking.

Trends can disappear when groups are combined. There are always more groups.

(Sailors are more likely to wear life-jackets in bad weather.)
  1. Prediction.
  2. ???????
  3. PROFIT!

Supermarket Shopping Basket Decision Tree Predictor 2.0™


Customer data:
  1. Sex:
  2. Age:
  3. Hair:
> execute
Model output:
  1. ???????
  2. ???????
  3. ???????
Some predictions are pointless.

Profit does not automatically follow from knowing what lies ahead.

Do not over optimize for a proxy.

Business specific domain knowledge is not a luxury. Focus on the money.

The four P's of Data Science.

p & P(A) presentation and profit
References: