►
From YouTube: Time series in overview - Sci-fu community meeting 2.1
Description
A presentation and discussion about Time series and R
A
And
so,
if
we
want
to
do
python
moving
toward
closure,
then
I
could
go
first.
B
Perfect,
I'm
fine
with
either
one
I'm
it
would
take.
You'd
have
to
wait
a
couple
minutes
for
me
because
I'm
setting
up
a
new
project
to
demo
this
stuff.
You
know
how
that
is.
So
if
you
want
to
go
ahead,
tim.
A
I
I
won't
stop
and
talk
about
the
technologies
as
we
go
through,
but
I
will
we
can
talk
about
them
later
and
I
will
try
to
call
out
when
I
use
this
particular
technology.
So
this
is
windows
10
I
use
conda
to
manage
environments
which
is
conda,
is
a
it's
more
environment
management
than
than
maven.
So
it
does
do
dependency
resolution,
but
it
also
actually
sets
up
a
shim
environment,
and
so
I
use
it
for
not
only
python
but
also
java.
A
We
talked
about
jupiter,
but
here's
jupiter
I
use
jupiter
lab,
which
has
a
few
bells
and
whistles
added
on
aws
a
lot
of
the
the
built-in
aws
stuff
that
they're
they're
selling
as
a
machine
learning
suite
is
just
jupiter
lab
with
some
some
stuff
built
in,
but
it's
mostly
jupiter
lab
and
so
jupiter
lab
is
jupiter
notebooks
with
some
little
file
management
and
also
you
can
start
other
kernels,
which
is
the
runtime
environment
for
jupiter,
including
you
can
start
a
bash
shell
and
do
some
other
online
editing
which
doesn't
matter
as
much
on
your
machine,
but
when
you're
using
a
remote
infrastructure
that
comes
in
really
handy
to
be
able
to
install
packages
and
that
kind
of
stuff
I'm
going
to
talk
about
the
kind
of
classical
idea
of
stationarity
in
time
series
data.
A
So
we
were
talking
about
what
we
mean
by
time
series
and
there
is
a
precise
statistical
meaning
of
all
of
these
terms,
so
yeah
I'll
try
to
to
cover
some
of
those
stationarity
is
a
property
with
with
the
time
series
data
so
that
such
that
the
mean
vary
and
variance
remain
constant
over
time.
A
The
reason
that's
important
is
because
a
lot
of
the
classic
assumptions
that
you
can
make
when
you're
trying
to
predict
on
time
series
are
based
on
the
property
of
stationarity.
So
a
lot
of
transformations
are
geared
toward
getting
time
series
to
be
stationary,
so
that
then
you
can
make
further
assumptions
and
from
a
subject
matter
perspective.
We
see
that
a
lot
in
these
data
science
areas
where
a
lot
of
work
of
the
data
transformations
early
on,
is
to
try
to
get
it
to
to
see
to
what
extent
the
data
has
these
certain
statistical
properties.
A
So
in
in
a
lot
of
things
in
the
not
time
series
we
want
to
see
if
it's
gaussian
right,
if
it's
a
normal
distribution,
so
you'll
see
a
lot
of
the
statistical
tests
assume
a
normal
distribution,
because
that's
where
everybody
has
done
the
work.
So
that's
where
people
got
their
phds
because
they
proved
certain
things
and
they
had
a
set
of
assumptions.
And
one
of
those
is
the
data
is
basically
distributed
like
this.
A
So
stationary
is
another
one
of
those
technical
terms
that
has
some
precise
meanings,
but
let's
see
if
we
can
get
an
intuitive
understanding
of
that
first,
so
in
so
this
is
jupiter
notebook.
It
has
a
python
3
kernel
running
with
all
these
libraries
all
already
installed
and
you
can
just
hit
shift
tab
and
it
will
execute
that.
A
Cell
and
then
highlight
the
next
cell,
so
you
can
see
which
cell
has
run
and
kind
of
what
order
they
ran
in
this
uses,
a
data,
that's
in
a
csv
and
it's
really
simple
it's
just
a
month
and
then
the
number
of
air
passengers
for
some
location
built
in
pandas
is
the
library
and
we'll
look
at
a
little
more
at
that
and
that's
you
know,
kind
of
what
we're
we're
modeling
tablecloth
on
it
is
modeled
on
our
data
sets.
A
A
A
A
It's
going
to
parse
a
string
with
yyyy
and
turn
that
into
a
date
object
a
datetime
object,
and
so
this
is
the
function,
it's
called
dateparse
and
then
we
can
add
some
new
parameters
to
read
csv,
to
say,
as
you
read
this
in
take
this
month,
column,
that's
based
on
the
header
of
the
csv
and
parse
dates
on
it,
but
it's
not
in
the
default
date
format.
A
A
On
the
column
called
month
which
actually
before
I
so
I
haven't
run
that
yet
so
we
still
have
the
old
data
and
I'm
actually
going
to
show
what
the
index
is
so
the
index
on
this
data,
which
is
shown
in
this
left
hand,
column.
So
it's
not
part
of
the
data,
it's
something
that
the
data
frame
added
to
the
the
csv
representation
and
currently
it's
just
a
range
of
integers
that
start
with
0
and
go
to
144
instead
of
1..
A
A
And
so
now
we
have
a
data
frame
that
still
has
number
of
passengers
isn't
a
column,
but
that's
the
only
column.
Now
it
knows
that
this
month
is
now
something
called
the
index
and
the
index
is
this
other
object
that
starts
at
1949
date.
So
when
it
read
in
this
month
that
before
we
saw
the
text
was
just
194901,
it
made
it
january
1st,
I'm
sorry
it
made
it
the
first
day
of
that
month
and
made
it
a
daytime
time
object
and
then
yeah,
here's
datetime64
object
and
there's
144
of
them.
A
A
A
So
we
see
that
a
lot
in
r
also
that
the
the
reason
we're
doing
a
certain
transformation
is
because
that's
what's
expected
by
some
of
the
functions
we're
going
to
use
and
these
functions
all
expect
an
indexed
series
rather
than
a
data
frame
itself.
So
we'd
have
to
keep
doing
those
transformations
as
we
pass
the
data
or
now
we
just
have
this
ts
series
and
we
can
also
look
ts.index
right.
So
it's
it's
it's
a
a
series.
Object
which
really
just
is
is
a
painted.
A
So
now
we
can
see.
What's
in
there,
I
imported
matplotlib
and
a
lot
of
times.
If
you
go
through
tutorials
you'll
see
matplotlib.plot
as
plt.
A
This
is
pylab
and
it
is
just
some
wrapper
on
matplotlib.plot
that
lets
you
import
that
lets.
You
show
plots
inline,
so
you
may
see
in
other
tutorials
something
that'll
be
like.
A
I
don't
remember
what
it
is
in
line
or
something
in
in
python
notebook
speak.
This
is
called
a
magic,
so
it's
got
its
own
little
mini
language,
where
you
can
tell
it
to
turn
on
certain
features,
and
so
you
can
tell
you
can
tell
jupiter
notebooks
to
show
matplotlib
plots
inline
or
if,
if
you
import
it
with
this
wrapper
around
it,
then
that's
what
it
does.
A
A
A
If,
if,
if
this
date,
column
was
a
column
rather
than
the
index,
then
the
access
to
it
would
look
completely
different.
So
that's
something
else
that
came
from
r
that
I
personally
have
to
always
look
up
when
I'm
trying
to
slice
out
portions
of
payday's
data
frames,
either
just
to
get
certain
columns
or
just
to
get
certain
rows,
and
that's
something
that
I
don't
think
that
we
need
to
replicate
in
closure.
Closure
has
built-in
syntax
for
accessing
pieces
of
larger
collections
that
hopefully
we
can
leverage.
A
D
A
D
Yeah:
okay,
because
I'm
looking
at
this
chart
and
the
x-axis
and
the
y-axis.
D
And
I'm
wondering
what
makes
this
different
from
any
other
chart
which
has
numbers
on
some
numbers
on
the
x-axis
and
some
numbers
on
the
y-axis
and.
D
A
Yeah
and
we
gave
it
some
yes,
so
it
has
built
in
it
recognizes
those
things
built
in.
So
if,
like
ts
lock
only
looks
like
I
mean
the
the
actual
values
are
only
204
188
235,
they
have
no
meaning
by
themselves,
but
we've
assigned
meaning
by
giving
it
this
index
and
telling
it
knows,
because
we
told
it
that
this
index
is
a
date.
Time
object.
So
yeah
there's
a
lot
of
assumptions
built
in,
especially
in
this.
B
Yeah
also,
I
think,
right
now,
right
now,
you're,
it's
just
a
time
on
the
x
is
the
only
difference,
but
I
think
another
big
difference
in
the
time
series
is
the
aggregation
rules,
so
the
aggregation
rules
you're
only
allowed
to
aggregate
based
off
of
the
rows
proceeding
which
we
haven't
quite
gotten
to
yet.
I
think.
A
A
A
Sure,
and
if
you're
doing
some
kind
of
signal
analysis,
then
you
probably
don't
use
the
term
seasonality,
but
the
fact
that
we
call
this
seasonality.
We
call
this
trend
of
seeing
this
shape
over
and
over
seasonality,
which
has
a
time
connotation
right,
so
yeah.
So
yet
time
is
even
built
into
what
we
call
this
type
of
cyclic
occurrence.
A
So
this
is
one
of
the
seasons
and
it
looks
like
people
travel
more
toward
the
summer
in
1954
and
they
travel
less
right
toward
the
end
of
the
year
with
a
little
peek
around
the
holidays.
So
we
can
explain
those
again
based
on
our
knowledge
of
being
subject
matter.
Experts
in
the
culture
of
what
happens
over
the
course
of
the
year
and
again,
we
didn't
have
to
tell
plot
much
at
all.
It
inferred
a
lot
just
based
on
this
and
gave
us
axes
of
the
corrupt.
You
know
with
with
some
assumptions
built.
B
In
random
quick
question,
I
know
you
didn't
write
the
the
the
pandas
api,
but
do
you
have
any
intuition
about
why
you
use
the
dot
loc
to
do
the
time
slicing,
as
opposed
to
just
the
indices
directly
like
doing
ts
brackets?
Why
do
you
do
ts.loc
brackets
and
celsius.
A
A
I
think
it's
because
wes
mckinney
I'll
talk
about
him
a
little
bit
in
a
minute
was,
you
know,
appealing
to
our
users
and
the
r
access
for
these
things
has
a
specific
again
weird
only
to
this
type
of
object
in
r.
A
You
do
if
you're
going
to
slice
slice
it
a
certain
way,
then
you
use
certain
idioms
and
the
python
idioms
kind
of
reflect
the
our
way
of
doing
it.
So,
yes,
I
always
get
it
wrong.
I
never
know
when
to
use
loc
versus
iloke
versus
a
a
truth
value
expression
inside
some
parentheses.
I
always
have
to
go.
Look
it
up.
Okay,
because.
A
Here
is
just
another
transformation,
so
we're
gonna
take
the
log,
so
so
the
pandas
this
will
take
the
log
of
every
value
and
the
shape
looks
very
similar,
but
it's
actually
been
smoothed
out
a
little
bit
the
peak.
Let
me.
A
A
So
if
we
can
get
both
of
these
on
the
screen
at
the
same
time,
so
this
is
the
the
log
transformation
of
this,
and
you
can
see
that
the
severity
of
the
distance
between
this
peak
and
this
peak
has
been
smoothed
out
so
right.
This
distance
here
of
1960
is
much
bigger
than
the
distance
here
in
1949,
and
here
in
1916,
the
log
that's
been
smoothed
out
and
that's
another
transformation.
You'll
see
just
a
lot
in
data.
Science
is
transformations
that
try
to
smooth
out
extremes
in
a
reversible
way.
A
So
a
lot
of
the
functions
just
work
better
they're
more
efficient
because
they
don't
have
to
jump
around
as
much
or
there's
some
other
assumptions
again
the
subject
matter.
Experts
mathematicians
in
this
area
have
decided
that
they
want
to
smooth
out
extremes
for
whatever
reason,
and
so
that's
just
a
common
transformation.
So
now
we
have
this
ts
log
series,
because
all
we
did
was
apply
the
log
to
every
one
of
those
numbers,
and
now
we
have
this
so
now
what
is
stationarity
me?
A
A
A
A
So
that's
what
he
did
was
deal
with.
The
data
that
he
was
dealing
with
in
python
at
aqr
was
time
series
data.
It
was
financial
data,
so
there's
a
lot
of
time
series
stuff
functionality
built
into
the
concept
of
the
panda's
data
frame,
and
this
will
be
one
example
of
that
that
there's
just
a
built-in
function
of
this.
A
This
series
that
no
it
knows
about
rolling-
and
so
I
give
it
a
period
of
how
far
to
look
back-
and
I
can
give
it
the
mean
so
I'll-
take
the
the
mean
of
the
previous
12
occurrences,
and
so
now
we
can
plot
all
of
those.
A
So
the
standard
deviation
is
so
far
away
that
it
kind
of
messes
up
the
plot.
So
I'm
gonna
just
comment
that
out
so
now
this
is
the
the
the
standard
deviation
again
we
can
see.
That,
of
course,
is
not
the
mean,
I'm
sorry,
this
is
the
mean
the
mean.
Let's
mess,
that
up
in
jupiter
notebooks,
you
can
have
code,
note,
code,
paragraphs
or
markdown
paragraphs,
so
this
is
obviously
marked
down.
A
A
It
doesn't
we
could,
we
could
do
the
same
thing
with
the
ts.
C
Because
some
processes
tend
to
to
grow
in
a
way
that
is
proportional
to
the
size.
So
if
my.
C
Budget
is
bigger.
I
may
tend
to
earn
more,
for
example,
if
I'm
this
kind
of
business,
so
some
economical
processes
have
this
property
the
as
much
as
as
they're
bigger.
They
will
tend
to
grow
faster
and
then
they
the
curve,
would
not
be
linear,
as
we
see
in
this
plot,
but
also
the
variance
of
increments.
The
various
the
variance
of
changes
will
grow
as
the
process
grows,
and
typically
many
economic
processes
tend
to
have
this
property
and
after
the
log
transformation
they
sometimes
become
more
linear
and
the
variances
become
more
stationary
if
it
makes
sense.
A
And
we
can
see
that
here,
if,
if
I
had
to,
if
I
wanted
to
write
a
a
function
right-
like
maybe
a
fourth
degree,
polynomial
right
that
had
this
four
or
fifth
that
had
this
many
turns
in
it,
it's
hard
to
see
that
I
could
write
a
polynomial
that
described
this.
What
was
going
on
here
and
what
was
going
on
here,
but
after
the
log
transformation,
it's
much
easier
to
see
that
the
same
equation
potentially
could
govern
here.
A
You
know
1955
and
govern
1959
right,
so
I
could
could
write
an
equation
that
could
approximate
the
seasonality
for
any
given
given
season
and
that's
much
clearer
to
see
when
we
take
when
we
log
we
wouldn't
do.
The
log
transformation
like
as
daniel
said
part
of
what
we're
seeing
here
is
just
the
natural
growth.
Even
so
so
the
the
economic
growth
of
the
airline
industry
we're
seeing
that
it
really
is
the
same
equation.
It's
just
that
over
time.
A
D
Okay,
I
think
that
kind
of
answered,
a
question
that
I
had
where
it
was
like.
Yeah,
you
know,
is
the
main
purpose.
I
wasn't
seeing
the
purpose
of
this
in
terms
of.
Is
it
just
to
make
the
graph
or
output
you
know
the
visualization
fit
on
on
a
on
a
chart
more
more
easier
or
was
there
an
actual
purpose?
D
And
I
think
tim
you
answered
my
question
there
where,
yes,
it
is
to
fit
it
easier,
but
it
also
allows
you
to
discern
some
patterns
that
may
not
be
easily
visible
when
you
have
a
huge
difference
in
the
scale
so
in
in
this
example.
Actually
it
may
you
know
if
it's
it's
not
too
bad
right,
the
the
normal
one.
D
Also,
you
can
discern
a
pattern,
but
if
the
scale
was
much
bigger
right
between
the
mints
and
the
max,
then
this
you
know
figuring
out
a
pattern
is,
is
a
little
bit
more
difficult,
and
this
seems
to
allow
that
you
know
a
way
to
understand.
Okay,
they
seem
similarity,
even
though
the
values
are
different.
The
pattern
is
still
the
same.
A
Oh,
they
tried
this
and
this
and
this
and
this
and
all
of
these
transformations
and
the
one
that
made
sense-
and
that's
really
the
art
of
this,
no
matter
how
much
we
try
to
automate
things
is
some
people,
with
their
set
of
experiences,
are
just
better
at
figuring
out
the
transformations
that
are
going
to
reveal
the
pattern
right
and
fortunately,
like
daniel,
said
because
of
you
know,
log
transformations
are
based
on
natural
log.
Natural
log
e
governs
the
growth
of
many
many
natural
processes,
including
in
economics.
A
A
So
more
people
came,
but
the
fact
that
there
were
this
many
people
traveling
in
1954,
actually
makes
the
fact
that
there
are
this.
Many
people
travel
in
1960,
bigger
right
and
the
same
thing.
If
rabbits
are
having
babies
down
here,
then
those
same
rabbits
are
having
babies
up
here
and
their
babies
are
having
babies
right,
so
they
they
they
grow
exponentially,
and
it
turns
out
that's
governed
by
e,
which
we
can
undo
what
the
growth
that
e
does
by
taking
a
natural
log.
A
A
D
A
D
So
tim,
if
I
understand
correctly,
what
you
just
said
is
for
for
rabbit
airlines,
where
you
have
rabbits
which
follow
a
natural
order,
this
kind
of
makes
sense,
but
for
a
people,
airlines,
with
with
access
to
contraception
and
other
measures,
this
may
not
make
sense
and
not
because
it's
it
doesn't
follow
a
natural
or
a
nature.
Driven.
You
know,
rules.
A
A
Other
other
ways
of
growth
would
not
do
this.
But
again,
that's
part
of
the
art
of
and
experience
that
data
sciences
bring
to.
This
is
what
so
in
in
webs
a
lot
of
times,
we'll
see
a
gamma
distribution,
which
is
people
waiting
at
a
bus.
Stop
a
gamma
distribution
falls
off.
It
has
bigger
things
over
on
the
left
and
it
tends
to
fall
off.
A
So
all
of
those
things
you
know
so
and
so
data
scientists
spend
a
lot
of
time,
and
the
thing
for
us
is
that's
why
plotting
is
really
important,
because
in
real
life
coming
to
data
that
you
don't
understand,
you
have
to
try
to
determine
these
patterns
and
the
reason
you're
looking
for
a
particular
distribution
is
because
the
tools
that
you
try
to
use
later
to
predict
are
predicated
on
one
of
these
distributions,
because
the
mathematicians
who
did
that
that's
what
they
studied.
They
said.
A
If
you
can
get
this
into
a
gamma
distribution,
then
I
can
say
pretty
certainly
that
this
is
going
to
happen
and
that's
the
same
thing
with
stationarity
they're
going
to
say.
If
you
can
get
this
to
a.
If
you
can
get
this
time
series
to
a
stationary
situation,
then
you
can
make
these
predictions
about
it
with
certain
confidence
values.
C
So
that
is
a
step
in
the
way,
but
still
we
do
not
have
a
complete
model
of
how
these
data
were
generated,
so
we
could
go
further
and
model
them
more
completely
right,
but
but
these
are
a
few
steps
on
the
way
if
it
makes
sense
and
past
models
of
typical
data
of
this
kind.
Help
us
reason
about
what
kind
of
transformation
would
be
useful
like
we
all
we.
We
all
know
that
people
tend
to
say
that
the
pandemic
is
exponential
and
it's
certain
phases
of
it.
C
A
B
A
I
was
gonna
say
if
you
want
me
to
wrap
up
or
for
james
or
continue
until
the
hour.
C
B
I'm
fine.
I
want
to
see
the
dramatic
conclusion
of
this.
We
don't
have
stationarity.
Are
we
screwed
what's
going
to
happen
right.
C
D
A
A
naive
thing
we
could
do
is
we
have
this
trend?
We
we
we
have
these
up
and
downs
around
this
trend.
We
can
just
subtract
so
in
pandas
subtracting
see
one
series
from
another
series
just
gives
you
the
the
results
in
another
series.
So
I
can
say
you
know,
for
every
point
here
subtract
the
mean
and
that's
going
to
squish
everything
down.
So,
let's
see
what
that
does.
A
And,
let's
see
what
is
actually
so,
it's
you
know.
It's
just
numbers
except
now:
they're
squished
around
this
line
and
we're
missing
some.
So
you
can
see
in
the
red
we're
missing
some
down
here,
because
for
the
first
12
months
we
couldn't,
we
couldn't
do
anything
and
that
shows
up
as
not
a
number
in
other
languages
that
could
be
called
missing.
It
could
be
represented
by
a
dot,
but
in
pandas
it's
it's
represented
by
this.
A
This
nan
thing
not
a
number
so,
but
we
can
still,
oh
so
yeah,
we'll
actually
drop
we'll
drop
those
later
so
now
we
can
plot
our
moving
average
diff.
So
this
is
going
to
be
this.
The
result
of
that
subtraction
and
now
we
look
a
lot
better
right.
If
we
have
this
goal
of
getting
the
mean
and
variance
to
remain
constant
over
time,
we've
zoomed
in
the
plot.
A
So
here's
zero
and
we're
only
going
up
to
point
three
and
to
negative
point
two,
and
we
can
see
that
the
mean
and
variance
now
look
much
more
like
they
remain
constant
over
time,
but
stationarity,
and
that
idea
of
remaining
constant
over
time
is
actually
a
technical
term
that
has
technical
definitions
so
stationarity.
We
can't
just
look
and
say
well
that
that
does
look
better.
It
looks
more
stationary
right,
just
as
a
value
judgment
we
might
bring
on.
A
So
if
you
want
to
so
so
we'll
look
at
the
stats
model,
python
library,
so
there's
time
series
analysis
called
tsa
statsmodels.tsa.
It
has
a
bunch
of
essentially
fancier
ways
to
do
what
we
did
when
we
did
this
subtraction.
So
we
did
this
subtraction.
That's
why
I
called
it
naive.
Is
we
just
took
this
rolling
mean
and
we
subtracted
it
out?
A
A
A
Two
p,
I'm
assuming
guys
two
people
and
if
you
really
want
to
know
what
that
means,
it
has
to
do
with
the
presence
of
a
unit
root,
and
it
follows
this
equation
and
all
of
this
stuff,
and
then
you
have
the
advanced,
dickey,
fuller
test,
where's
my
link
for
advanced,
augmented,
sorry,
augmented,
dickey,
filler
test,
which
does
even
more
stuff.
A
But
that
is
wrapped
up
for
us
in
this
library
and
we
can
see
what
it
does.
So
I
we
imported
the
adfuller
method
from
this
library
and
now
we
can
apply
it
to
our
moving
average
diff.
So
we're
going
to
we're
going
to
see
is
this
stationary
and
we're
passing
it
at
a
parameter?
When
you
hear
data
science,
people
talk
about
tuning
hyper
parameters.
This
is
what
they
mean.
There
are
actually
a
bunch
of
default
values
for
this
ad4
test.
A
There
may
be
a
bunch
of
different
auto
lag
parameters,
and
you
could
test
over
all
of
those
we're
just
going
to
look
at
one
and
it
blows
up,
because
we
still
have
these
null
values.
There's
places
where
there's
no
values
so
you'll
see
a
clean
up
called
drop
n
a
in
r
and
other
languages
in
any
any
situation.
Dealing
with
missing
values
is
a
big
deal.
If
somebody
does
not
answer
a
question
on
a
survey,
you
have
to
decide
what
to
do
with
that.
Do
you
drop
it?
Do
you
make
it
a
zero?
A
Do
you
average
all
the
other
guesses
or
all
the
other,
the
other
answers
and
give
it
that
so
cleaning
up
data
is
a
really
big
deal,
but
in
time
series
you
will
often
see
just
drop
it
just
drop
it
we'll
fill
it
in
later.
We
don't
care
so
drop
in
a
is
another
built-in
time
series
thing.
So
that's
something
else
we
should.
A
We
might
consider
is
what
kind
of
functions
work
on
these
time
series
things
that
should
be
automatically
available
and
then-
and
it
just
says
everywhere-
in
python,
in
pandas,
you
never
know
whether
you
are
mutating
something
in
place
or
getting
a
new
one.
A
So
yeah,
that's
just
one
of
the
frustrations
of
python
after
coming
to
closure
and
then
back
to
python,
is
you're
always
kind
of
guessing
or
have
to
read
the
docs
or
do
experiments
to
see
whether
you're
mutating
or
not.
So
in
this
one
we're
going
to
mutate
in
place,
so
we're
not
going
to
give
it
a
new
name
and
then
so
we're
going
to
do
that
in
place.
A
So
I
can
just
do
that,
and
this
is
just
a
nicer
printing
of
that
and
daniel
you
can.
You
can
definitely
pick
apart
my
explanation
as
a
non-statistician
about
p-values,
but
here's
my
explanation
of
p-values
is
that
the
the
null
hypothesis,
so
so
what
we're
trying
to
disprove
is
that
this
is
not
stationary
and
is
it
fair
to
say,
nope,
that's
wrong.
This
is
stationary
and
the
p-value.
A
If
it's
low
and
it's
you
know
a
lot
of
times,
we'll
have
an
arbitrary
value
of
0.5,
we'll
say
yes,
it
is
fair
for
me
to
say
that
this
is
stationary,
even
though
there's
actually
a
chance
that
just
randomly
some
other
thing,
I
don't
understand,
is
making
this
non-stationary,
I'm
going
to
say
yeah.
This
is
good
enough.
This
test
is
good
enough
to
say
that
I'm
going
to
reject
the
null
hypothesis
of
that.
A
A
So
now,
let's
look
at
some
others,
so
this
is
just
going
to
put
together
a
few
of
those
things
we're
going
to
pass
in
just
any
time
series
as
a
function
and
calculate
that
mean
that
standard
deviation
plot
them
all
together,
so
that
when
we
look
at.
D
D
A
Right
and
is
it,
and
that
would
be
something
to
look
for
whether
that's
a
real
observation
or
not
is
are
they
is
that
related,
maybe
because
they
sure
look
opposite
here,
but
then,
when
we
look
here,
here's
another
place
where
the
mean
you
know
gets
lower
again
and
it
doesn't
seem
like
we
have
that.
A
So
I
you
know,
I
think,
maybe
that
you,
you
could
conclude
that's
random,
but
there's
other
tests.
You
could
do
to
say.
Okay,
I
have
this
thing.
I
have
this
thing.
Are
they
correlated
right?
So
you
would
say
you
know
just
looking
at
this
part
here,
I
might
say
they're
negatively
correlated
right
when
this
goes
down.
This
goes
up,
so
you
can
run
a
test
between
these
two
series
and
look
at
the
r
squared
value
and
say
yeah.
A
These
are
correlated
or
not
correlated
and
same
thing,
you'd
get
a
p-value
right
and
if
it's
low
enough,
you
may
say,
oh
you
know
I'm
comfortable
with
saying
that
these
are
negatively
correlated
or
just
looking
at
the
plot.
I
think
we
might
say
no,
even
though
they
look
like
it
here.
Overall,
I'm
not
going
to
say
these
are
negatively
correlated
because
look
here.
This
guy
goes
down
and
down
down
down
and
this
guy
still
stays
about
the
same.
So
I'm
going
to
say
you
know
this
little,
it's
interesting,
but
we've
we've.
A
Right,
which
is
why,
like
they
couldn't
start
to
hear
so
the
average
of
all
12
of
these
data
points
is
you
know,
0.15
or
whatever.
That
is
the
average
of
all
the
previous
12.,
so
we
can't
start
it
until
there
yeah
okay
and
that's
why
for
dickie
fuller,
we
had
to
drop
those
because
dickie
fuller
did
not
like
all
those
all
those
n
a's.
So
we
ended
up
having
to
to
drop
those
out
of
our
series
in
order
to
get
these
dickey
fuller
result.
A
A
A
This
is
not
stationary
right,
so
this
is
a
just
a
different
way
of
saying
what
we
observe
is
that
this
trend
is
not
stationary,
but
when
we
look
at
this
same
one,
we
did
before
where
the
movie
average
diff,
then
we're
going
to
say:
oh
that's
more
likely
to
be
stationary
and,
of
course
we
I
call
those
naive
ways
of
subtracting
out
things.
Well,
there's
more
sophisticated
ones,
and
because
seasonality
is
such
a
an
important
part
of
this
time.
A
A
And
the
decomposition
is
again
just
time
series,
it's
just
another
series
which
of
course
is
is
mostly
at
the
beginning
and
the
end,
not
numbers,
but
we
can
see.
Well,
let's,
let's
go
look
at
the
plot,
so
this
is
just
going
to
plot
all
of
these
seasonality
results
together.
So
here's
our
original
here's,
the
trend
they
probably
did
not
use
rolling
12
months.
They
did
something
fancier.
I
don't
know
what
they
did
now
we're
back
to.
A
With
that
log
transformation.
They
were
able
to
come
up
with
a
consistent
seasonality
function
again.
I
don't
know
what
it
is,
but
they
didn't
tell
me,
but
here's
what
they
did.
They
took
out
the
seasonality
out
of
that
data
and
were
able
to
come
up
with
a
consistent
function
that
governs
that
and
then
they
also
subtracted
all
of
those
things
out
for
us.
So
they
subtracted
out
the
trend
they
subtracted
out
the
seasonality.
They
gave
us
the
differences,
the
residuals,
and
now
we
can
test
the
stationarity
of
that.
A
A
B
A
Okay,
so
yeah
so
yeah.
This
is
way
zoomed
in
right.
This
is
goes
from
negative
point
two
to
point
three.
This
one
goes
from
negative
point,
one
to
point:
zero,
zero
five,
so
this
is
zoomed
in
by
a
factor
of
ten
at
least,
and
so,
even
though
this
doesn't
look
like
a
straight
line
at
this
granularity.
If
we
zoomed
out
it
would
look
like
a
straight
line
and
we
can
see
on
dickie
fuller
now
this
is
two
times
ten
to
the
negative
eight.
A
A
So
we
did
it
manually
yeah.
We
did
it
manually
by
this
subtraction
up
here
or
wherever
we
did
it
yeah.
We
did.
We
just
subtracted
the
values.
They
did
the
same
thing
except
it's
just
built
into
that
library
that
when
we
get
this
decomposition
object
inside
it,
it
has
a
series
called
trend,
a
series
called
seasonal,
a
series
called
resid,
okay,
and
so
we
just
plotted
those.
D
I,
like
I,
like
this
graph,
the
the
the
blue
and
the
red
and
the
black
lines
one,
and
I
can
all
I
mean
tell
me
if
my
interpretation
is
correct,
because
this
is.
I
think
this
is
part
of
the
learning
right.
So
the
blue
lines
are
the
original
values,
not
the
log
values
right.
D
A
D
A
We
can
see
that
there
really
is
in
a
single
equation
that
governs
the
seasonality
over
this
time
period
and
they
were
able
to
subtract
that
out,
which
means
that
now
we
can
make
some
predictions,
because
the
other
methods
that
we
can
see
which
are
fancier,
so
we
can
introspect
in
python
by
doing
a
dur,
and
we
can
see
some
fancier
ways
and
we,
you
know
so
somebody
get
got
their
phd
for
all
of
these
things
and
we
can
read
about
these
processes
and
see
these
fancier
ways
of
either
de-trending
or
of
taking
a
stationary
set
of
data
and
making
predictions
on
it.
A
So
that's
really
why
we
did
all
that
data
transformation
is
because
the
data
scientist
is
now
going
to
be
able
to
say.
Okay
with
this
assumption
of
stationarity
that
now
I
can
be
confident
in
now.
I
can
make
predictions
in
the
future
and
give
you
a
confidence
range
of
I'm
pretty
sure
this
is
what's
going
to
happen
in
march
of
1962.
A
D
So
the
trend
here
is,
I
mean
the
the
so
the
red
line
is
the
mean
of
the
values
and
what
the
standard
deviation
is
showing,
with
its
with
its
fluctuations.
Right
is
saying
how
much
variation
there
is
between
the
min
and
the
max
right
off
the.
D
D
A
A
So
you
know
we're
not
looking
at
for
fun
we're
looking
at
that,
because
those
people
said
we
need
to
look
at
mean
and
variance
and
we're
using
standard
deviation
as
our
variance.
So
we
can
put
it
on
the
plot
and
it
makes
sense
so
yeah.
That's
why
we're
focusing
on
those
is
because
that's
the
subject
matter.
Experts
gave
us
those
measures
as
something
to
be
interested
in.
B
B
It
was
that
was
that
that
was
excellent.
It
was
a
very
good,
very
good
explanation
too,
for
for
those
of
us
who
are
just
starting
to
pick
some
of
this
work
up,
because
I
never
would
have
guessed
any.
I
never
would
have
guessed
stationarity.
For
instance,
you
know
I'd
be
applying
things
on
data
where
it
didn't
make
sense
or
anything.
A
Yeah
for
me,
when
I
treat
it
like
a
you
know,
just
a
business
problem
with
their
subject
matter,
experts
and
I
treat
those
people
as
having
knowledge
that
I
don't
have,
even
though
I
might
be
somewhat
familiar
with
some
things
right.
So
I
may
understand
terminology
from
a
mathematical
perspective,
but
that
doesn't
mean
I
understand
the
context
of
their
jargon
and
why
they're
doing
certain
things-
and
that
was
why
I
started
getting
this
in
the
first
place-
is
in
big
data
stuff
in
you
know,
2012
2013
people
were
asking
me
to
make.
A
A
A
Well,
there
are
certain
the
the
phd
work.
Somebody
else
did
that
we're
going
to
leverage
expects
the
data
in
this
format,
and
there
are
probably
really
really
good
reasons
for
that,
but
that
I
don't
need
to
understand.
I
just
need
to
understand
that
my
subject
matter.
Experts
were
not
crazy
that
they
were
building
on
the
people
who
came
before
them
and
not
reinventing
things
and
the
format
that
this
data
needs
is.
This
way,
that's
why
they
were
asking
me
to
make
new
tables
with
literally
thousands
of
columns.
A
D
So
tim,
I
think
you
gave
a
very
good
example
of
where
a
an
application,
specific
information
model
is
needed,
based
on
the
core
data
going
back
to
the
the
ontology
information
model,
and
all
that
I
was
mentioning
earlier
right.
D
You
are
taking
the
same
data
that
was
there
in
a
in
a
normalized
form
and
a
fully
normalized
form
with
the
star
schema
and
all
that,
and
then
you
are
creating
a
denormalized
view
of
that
right
now,
whether
you
put
those
extra
columns
in
the
same
place
or
you
put
it
into
a
different
data
store,
which
is
like
a
derived
data
store
and
provide
that
to
your
to
your
statisticians
right.
So
that
is
an
information
model
you're
serving
for
the
purpose.
A
Right
and
and
my
interest
in
it
now
is
so
so
back,
then
we
have
certain
massively
parallel
systems
that
were
database
systems,
and
if
you
didn't
want
to
do
hadoop
and
write
mapreduce
yourself,
then
you
used
one
of
these
other
systems,
which
essentially
meant
in
order
to
take
these
models
that
had
been
built
on
very
small
subsets
and
apply
them
to
a
bigger
set
in
a
parallel
fashion.
A
I
had
do
rewrite
them
in
sql,
which
some
models
can
and
some
models
can't,
and
you
can't
just
you
kind
of
have
to
you-
have
to
learn
that
subject
area
to
know
which
of
those
is
possible,
which
of
them
is
feasible
and
to
estimate
projects
on
how
long
it's
going
to
take
and
whether
that's
worth
the
investment
in
order
to
be
able
to
run
this
model
on
live
data
in
10
seconds,
and
sometimes
it's
worth
a
really
really
big
investment
to
be
able
to
say,
like
I
did
at
a
at
a
car,
an
automotive
search
company
when
I
pop
this
ad
in
front
of
these,
if
I
pop
it
randomly
in
front
of
just
random
site
users,
only
0.01
of
people
are
going
to
click
on
it.
A
But
if
I
pre-qualify
these
people
based
on
their
search
history,
you
know
that
I've
seen
on
my
site,
they're
searching
for
red
sedans
or
whatever
their
search
history
is
based
on
five
five
elements.
I
can
pop
this
and
it's
now
point
zero
zero
one
percent
chance
they're
going
to
click
on
it.
You
know
I
made
it
like
a
million
times
more
likely
of
all
these
unlikely
things
they're
going
to
click
on
this
ad.
A
Then
it's
worth
the
investment
of
rewriting
taking
months
or
weeks,
at
least
to
rewrite
a
model
that
this
data
scientist
the
subject
matter,
expert
came
up
with
in
r
or
sas
rewrite
it
in
this
massively
parallel
system.
Just
to
get
that
result
well
now
we
don't
have
to
rewrite
that
stuff.
We
can
kind
of
converge
on
tools
and
that's
what
we're
talking
about
is.
How
can
we
get
these
subject
matter,
experts
to
use
a
tool
set
that
we
can
then
apply
distribu
distributed
computing?
We
can
apply
to
a
bunch
of
new
situations.
A
We
can
take
the
work
they've
done
and
much
easily
much
more
easily
leverage
the
technology
than
we
could
seven
or
eight
years
ago.
A
So
you
know,
I
think,
that's
one
thing
we're
doing
and
and
the
fact
of
the
work
that
that
chris
and
chris
and
james
have
been
doing
with
you-
know
web
python
and
now
there's
lib
julia,
so
data
scientists
who
the
new
cool
kids.
I
don't
know
if
you
know,
if
you
all
know
the
cool
kids
now
use
julia,
they
they
moved
beyond
python,
they
use
julia.
Why
is
it
better?
A
C
Yeah,
so
maybe
maybe
let
us
think
for
a
moment,
we
are
past
the
end
of
this
session.
Maybe
a
few
you
can
stay
a
bit
more
tim.
Was
there
anything
else
to
present
it.
A
C
In
this
case
we
do
have
all
the
functions,
and
so,
if
anybody
wants
to
meet
later
this
week
and
hack
some
team-like
notebook
enclosure,
then
let
us
do
it,
and-
and
we
can
also
try
to
do
it
offline
and
and
talk
about
it
but
yeah
here.
In
this
case
it
was
an
example
of
something
that
is
all
accessible
and
and
really
enlightening.
C
So
I.
D
C
D
D
B
D
Today
this
session
was
at
7.
00
am
my
time,
so
I'm
just
looking
at
my
calendar
for
tomorrow
and
I
am
good
I
can
start
at
a.m
an
hour
earlier.
If
that
works
for
daniel
yeah
yeah.
C
Yeah
so
yeah,
let
us
call
it
shifu
2.3
and
let
us
write
it
at
the
stream
and
maybe
more
people
will
join
and
we
will
be
focused
on
just
making
it
happen.
I
hope
and
okay
so
what
about
today?
So
today
we
will
have
a
session
in
three
hours
right,
a
little
less
than
three
hours,
and
then
we
will
have
a
demo
by
james
about
sequential
things
happening
in
the
in
the
tecmo
dataset
platform.
C
And
after
that,
maybe
we
can
make
some
plan
about
what
we're
building
and
if
you
have
time
before
the
next
session,
then
maybe
some
of
you
could
look
for
data
sets
for
like
some
pro
problems
that
we
would
like
to
be
able
to
solve.
These
could
be
like
notebooks
of
kaggle
that
you
find
like
clear
and
enlightening
and
something
that
would
be
helpful
to
have
on
our
implementation
too,
or
maybe
something
that
some
data
problem
that
you
find
interesting
and
then
we
will
have
this
collection
of
things
that
we
need
to
solve.
C
Does
it
make
sense?
Maybe
someone
has
has
time
to
to
search
for
some
data
sets
and
I
will
try
to
prepare
some
short
demo
of
just
something
short
that
so
that
we
can
see
the
ergonomics
and-
and
I
think
one
thing
we
didn't
discuss
about,
the
the
demo
of
team
is-
is
what
an
index
is.