►
Description
Marcos Alcozer—Analytics Engineer at Alcozer Consulting LLC—who presented on how K12 education leverages analytics engineering using Dagster
🌟 Socials 🌟
Checkout (and star!) our Github ➡️ https://github.com/dagster-io/dagster
Check out our Documentation ➡️ https://docs.dagster.io/
Join our Slack ➡️ http://dagster-slackin.herokuapp.com
Visit our Website ➡️ https://dagster.io/
Follow us on Twitter ➡️ https://twitter.com/dagsterio
A
A
The
responsibilities
of
an
analytics
engineer,
more
broadly,
are
also
true
of
those
in
k-12
education.
However,
we
do
have
have
our
own
k-12
specific
goals.
You
know
we
do
this
work
because
our
goal
is
to
provide
a
complete
view
of
a
student.
There
are
many
things
we
want
to
know
about
students
to
ensure
that
they
are
on
a
path
to
success.
We
want
to
know
who
they
are.
A
We
want
to
know
their
grade
level,
their
race,
their
ethnicity,
if
they
have
a
learning
disability
and
we
want
to
know
things
like
their
attendance
and
things
like
their
their
grades,
and
so,
let's
look
at
the
challenge
that
that
we
face
with
data
in
k12,
so
our
disparate
data
is
spread
across
many
different
systems
and
that
may
sound
super
familiar
and
similar
to
to
your
environment,
student
data,
attendance
grades,
assessment,
etc.
It's
housed
in
various
systems
and,
unfortunately,
five
tran
and
stitch
just
don't
have
connectors
for
us.
A
A
Sometimes
we
have
to
log
into
their
system
and
we
have
to
download
an
extract
of
our
student
data
or
sometimes
we
don't
get
anything
and-
and
we
have
to
engage
in
in
conversations
with
that
vendor
to
to
see
what's
what's
possible
in
terms
of
putting
what
can
be
put
on
their
roadmap
now,
once
we
get
the
data
out
well,
similar
data
can
be
represented
very
differently.
So,
on
the
left
column
we
have
the
student
last
name.
We
can
call
that
student
last
name,
but
other
vendors
may
call
that
last
name.
A
Some
may
call
it
family
name
and
and
others
last
surname,
and
so
that's
just
a
look
at
kind
of
how
how
different
how
disparate
the
data
can
be
in
terms
of
location
access
and
what
it
actually
looks
like
and
end
of
the
day.
People
are
our
customers,
who
are
our
educators,
our
school
leaders,
our
students,
our
families.
They
just
want
the
insight
right,
and
so
they
just
want
to
know
for
the
specific
education
organization,
so
think
school
district
think
state,
think
charter
management
organization.
A
They
just
want
to
know
what
is
their
student
nutrition,
how
many
students
have
left
since
the
start
of
the
school
year,
how
many
students
are
chronically
absent
and
who
are
they,
which
students
have
a
learning,
disability
or
or
an
iep,
is
what
we
call
it?
And
when
you
look
at
data,
how
can
you
cut
that
data
by
the
different
things?
Look
at
things
like
attendance
by
race
and
ethnicity
or
by
gender,
et
cetera,
et
cetera,
and
so
let's
look
at
how
how
we're
solving
that
in
k-12.
A
So
the
solution
is
being
built
through
a
community
led
by
the
edfi
alliance.
It's
a
501c3
nonprofit
and
the
edfi
alliance
has
committed
to
publishing
all
technology
open
source
under
an
apache
license.
A
The
community
works
on
a
set
of
rules
for
collecting
and
organizing
student
data,
and
those
rules
manifest
themselves
in
the
edfi
data
standard.
The
edfi
data
standard
is
a
set
of
restful
apis,
and
so
the
screenshot
on
the
right
may
look
familiar
to
you.
This
is
just
a
swagger
document,
but
the
power
is
is
that
all
of
the
endpoints
have
been
decided
by
the
community
and
then
we
can
go
to
the
vendors
and-
and
we
can
say
this
is
how
we
collect
student
data.
A
We
use
these
endpoints,
the
endpoints
have
these
payloads
and
it
uses
this
nomenclature,
and
so
it
just
sets
sets
the
tone
for
the
conversation.
So
we
are
all
talking
similarly
and
and
our
conversations
themselves
can
be
interoperable.
A
So
let's
look
at
this
diagram,
so
here
is
where
the
collection
of
disparate
operational
data
gets
really
cool
through
either
state
mandates,
philanthropic
funding
or
just
good
intentions.
The
vendors
create
an
edfi
api
client
that
takes
their
operational
data
and
sends
it
back
to
customers
via
the
customer's
ed
fi.
Api
said
another
way.
It
is
the
responsibility
of
the
vendor
to
submit
data
back
to
the
education
organization.
A
The
education
organization
does
not
pull
data
from
the
source
systems;
they
stand
up
their
own
edify
api.
That's
following
that
data
standard,
and
then
they
receive
the
data
back
to
them
and
we
have
had
a
lot
of
success
with
states
creating
mandates.
So,
for
example,
if
you
do
business
in
the
state
of
texas
in
the
state
of
arizona
wisconsin,
you
can
only
do
business
in
the
state.
If
you
maintain
an
edfi
integration.
A
So
edfa
eases
the
process
for
education
organizations
to
receive
their
operational
data
via
common
data,
standard
and
model.
This
solves
the
operational
piece
but
does
not
touch
on
analytics,
and
that
is
where
dexter
comes
in
once
an
education
organization
has
their
disparate
operational
data
in
a
common
operational
data
store
or
ods.
A
So
here's
that
swagger
document
I
shared,
and
so
the
dagster
graph
today
looks
a
little
something
like
this.
So
it's
your
kind
of
traditional
elt
job
where
you've
got
this
central
extract
and
load,
but
I've
got
to
run
some
ops
at
the
top,
because
the
edfi
api
has
this
idea
of
change
versions
and
what
that
allows
me
to
do
is
pull
deltas.
So
I
can
pull
all
data
for
an
education
organization
on
the
first
run
and
then
on
subsequent
runs.
I
can
just
pull
what
has
been
changed
or
or
deleted.
A
So
I
run
those
ops
first
and
then
I
move
into
the
extract
and
load,
and
this
is
all
one
one
single
op
up
today,
because
I'm
using
as
I'm
fetching
from
the
api,
I'm
using
a
generator
to
be
able
to
yield
results
back
and
upload
to
the
data
lake
to
reduce
the
memory
footprint.
But
it's
definitely
like
a
to
do
on
my
my
side
to
be
able
to
split
that
out.
If
I
can
and
will
likely
reach
out
to
all
of
you
on
slack
for
some
thought:
partnership
at
some
point
and
then
the
final
op.
A
There
is
running
the
edfi
models
now
something
that
sandy
said
earlier
in
in
this
meeting
really
caught
my
ear
around
being
able
to
mix
assets
and
apps
that's
going
to
be
super
powerful,
because
in
this
example,
I'm
running
a
bunch
of
ops
to
be
able
to
produce
an
initial
asset
to
then
produce
additional
assets
via
dbt
tooling,
and
that's
where,
if
you
haven't
checked
out
software
defined
assets,
I
really
recommend
it,
because
this
graph
is
hiding
the
dbt
models
or
those
assets,
and
this
is
what
software
defined
assets
allows
you
to
do.
A
This
is
what
that
data
lake
looks
like
so
I'm
using
high
partitioning
inside
of
google
cloud
storage
to
be
able
to
have
my
various
api
resources
and
then
to
be
able
to
have
my
data
segmented
by
school
year
and
if
you've
used
high
partitioning
you.
You
know
this,
this
equal
sign
and
to
be
able
to
run
dexter
jobs
where
they
always
just
extract
and
load
into
the
data
lake.
Nothing
gets
deleted,
everything
gets
preserved
and
everything
is
put
there
as
just
raw
json
and
then
in
bigquery.
A
I'm
able
to
have
these
tables
where
they
are
external
tables.
So
if
I
look
under
details,
it's
an
external
table,
it's
pointing
to
my
to
my
sandbox
environment.
That
is
my
my
bucket
that
has
my
raw
json
and
is
using
high
partitioning,
and
so
I
can
build
dbt
models
that
are
using
sql
to
to
query
this
external
table,
which
is
really
just
my
portal
into
my
data
lake.
So
I
can
get
into
so.
A
I
can
create
my
various
data
marts,
so
things
like
grades
fact
table
my
my
dims
and
and
what
have
you?
So
that's
just
a
little
look
at
that
piece
and
then
a
few
other
things
that
I
I
wanted
to
mention.
So
this
work
can
get
pretty
com
complex,
and
so
let's
talk
about
community
to
see
how
it's
it's
taking
place.
So
as
an
as
a
k-12
analytics
engineer,
I
maintain
open
source
repositories
that
implement
everything
that
you've
seen
here.
You
can
access
it
at
k12,
analytics
engineering,
dot
dev.
A
Learn
how
to
do
it.
Do
it
themselves
have
the
community
for
support
again,
all
all
free,
publicly
accessible
one
more
thing
before
we
move
into
questions,
so
there
are
times
when
a
vendor
does
not
build
an
edfi
api
client
to
send
data
back
to
an
education
organization.
There
we
left
we
leveraged
dagster
as
well.
A
For
example,
I
maintain
an
open
source
repo
that
extracts
data
from
the
google
forms
api
and
submits
that
data
to
the
edfi
api
via
their
surveys
and
coin,
and
so
that's
another
part
is
when
you,
when
the
work
just
gets
even
more
complex,
is
looking
to
the
community
to
solve
things
together
and
to
share
those
those
resources,
so
that
we
can
all
all
benefit,
because
we
all
have
the
same
goal,
which
is
you
bring
that
data
together
to
provide
insights
back
to
our
educators
so
that
they
know
how
to
where
and
how
to
to
help
students?