►
From YouTube: Education & Workforce WG:Case study:Applied Data Science & ML Capstone for externally-funded project
Description
August 2021 South Big Data Hub Education & Workforce Working Group:
Case study: Applied Data Science & Machine Learning Capstone designed for externally-funded projects; Dr. Arko Barman; Rice University
A
So
today,
I'm
going
to
talk
about
a
data
science,
a
capstone
program
which
is
actually
consists
of
students
who
work
on
externally
funded
projects.
First
I'll
start
off
by
describing
our
center
a
little
bit.
We
lovingly
call
it
the
d2k
lab,
but
the
it
has
like
a
mouthful.
A
A
So
we
provide
a
lot
of
interdisciplinary
experiences
for
students
and
the
focus
is
immersive
learning
or
experiential
learning,
and
our
our
flagship
course
is
actually
the
d2k
capstone
course
which
I'm
going
to
talk
about
in
detail.
But
we
also
have
a
very
interesting
consulting
course.
We
call
it
consulting
clinic
and
we
also
have
courses
on
intro
to
data
science
and
we're
building
more
courses
on
data
science
like
intro
to
deep
learning
and
nlp
and
other
areas.
A
We
also
host
a
number
of
programs
and
events
throughout
the
year
like
hackathons
and
workshops
and
seminars,
and
so
on.
So
we
have
some
partners
and
affiliates
who
work
with
us
and
collaborate
with
us.
They
include
researchers
from
inside
rice
and
the
texas
medical
center,
which
is
a
really
big
center
for
a
lot
of
hospitals
and
medical
research
in
houston.
We
also
have
a
lot
of
government
and
non-profit
organizations
like
the
city
of
houston,
the
houston
police
department
and
so
on,
and
also
a
lot
of
projects
from
the
industry.
A
So
let
me
move
on
to
our
capstone
program,
so
the
capstone
program
was
developed
inspired
from
other
capstone
programs
in
engineering
and
computer
science,
where
there
are
client-facing
projects
and
students
work
directly
with
clients
to
solve
real-world
problems.
A
So
in
our
capstone
program,
students
are
generally
split
into
teams
of
four
to
six,
usually
for
a
semester,
and
sometimes
even
for
year,
long
projects
like
a
long
project
which
is
split
up
into
two
semesters.
A
So
all
our
projects
are
client
sponsored
they
use
real-world
data
and
we
encourage
the
projects
to
be
open-ended
and
all
our
teams
are
both
vertically
and
horizontally.
Integrated
what
that
means
is
that
vertically
we
have
undergrads
masters
and
phd
students
all
enrolled
in
this
program,
and
also
we
have
students
from
multiple
disciplines
across
the
campus.
A
In
fact,
this
capstone
course
is
actually
a
requirement
for
some
and
elective
for
other
degree
programs
at
rise.
For
example,
there
is
a
data
science
minor,
for
which
this
is
a
requirement,
and
it's
an
elective
in
computer
science
statistics
ece
applied,
math
physics
and
a
bunch
of
different
majors.
A
So
the
challenge
for
this
program
is
actually
the
client-facing
aspect
so,
like
I
said,
the
teams
work
with
actual
clients
outside
the
the
closed
d2k
lab
and
which
entails
working
with
data
owned
by
the
client
very
often.
A
So
what
we
provide
is
a
templates
sponsored
research
agreement
and
we
need
to
get
that
signed
from
the
from
the
client
and
ourselves,
and
that
includes
details
on
how
data
will
be
transferred,
how
data
will
be
used
and
so
on.
A
So
we
have
non-confidential
data,
like
maybe
publicly
available
data
or
data,
with
less
restrictions
directly
shared
with
rice
and
students
are
given
access
to
confidential
data
in
a
controlled
environment
like
it's
in
a
server
from
where
they
cannot.
I
remove
the
data,
or
maybe
it's
on
a
machine
on
aws
or
azure,
from
which
they
cannot
move
the
data.
A
A
The
curriculum
is
basically
the
development
of
the
project
by
the
students,
and
we
provide
some
lectures
and
materials
and
guidance
with
respect
to
what
we
call
a
data
science
pipeline
and
how
to
design
that
and
what
technologies
they
might
use
for.
Designing
the
data
science
pipeline.
We
also
discuss
ethics
with
them.
This
includes
data
privacy
and
fairness
and
other
ethics
concerns
in
data
science.
A
A
A
We
have
three
rounds
of
presentations
and
reports
due,
and
we
also
check
the
software
of
the
students
twice
a
year
and
also
for
the
final,
and
we
try
to
ensure
that
they
follow
the
best
practices
and
software
development,
so
that
it's
already
at
a
professional
level
and
they
can
hand
it
off
directly
to
to
their
clients,
and
they
will
have
no
problem
in
using
the
software.
A
Just
quickly
an
overview
of
how
we
grade,
so
we
stress
on
individual
grading
a
bit
because
it's
a
capstone
program
and
students
are
in
teams,
so
it's
difficult
to
have
individual
grading.
So
what
we
encourage
is
peer
evaluations.
A
A
The
instructors
also
evaluate
the
contributions
of
the
students,
and
sometimes
if
we
have
questions,
we
ask
them
to
write
like
a
one-page
memo
where
they
describe
the
contributions
of
every
student
and
that's
how
we
create
them
based
on
individual
contributions
and
class
participation,
includes
discussions
in
the
classroom
and
participation
in
the
the
small
team
meetings
that
we
have
with
the
instructors.
A
But
the
bulk
of
the
grading
is
actually
the
the
team
products
right.
So
we
have
three
rounds
of
presentations
and
reports,
and
software
checks
and
the
interim
evaluations
constitute
15
of
that,
and
the
bulk
of
the
grading
goes
into
the
final
report
and
some
parts
of
it
into
presentation
and
the
software.
A
So
I'm
going
to
present
the
results
of
2019
2020,
because
2020
and
2021
is
not
yet
over,
because
we
are
teaching
this
course
over
the
summer
as
well.
So
as
you
can
see,
we
have
a
a
decent
demographic
diversity.
A
Male
to
female
ratio
is
not
really
one
is
to
one
but
we're
trying
to
get
there
and,
as
you
can
see,
although
it's
offered
primarily
for
a
data
science,
minor
undergraduates,
but
a
lot
of
graduate
students
also
enroll,
and
we
have
a
good
diversity
in
terms
of
departments.
As
you
can
see
like
a
major
part
of
the
student
body,
consists
of
statistics
cs
and
ece.
A
So,
for
this
academic
year
there
were
15
teams
and
a
total
of
76
students,
and
we
actually
create
the
teams
by
hand
all
the
instructors
we
sit
down
and
create
the
teams
where
we
ensure
that
there
are
men
and
women
on
every
team.
Every
team
consists
of
both
undergrad
and
grad
students,
and
we
try
to
have
at
least
two
majors
represented
in
a
team.
A
Looking
at
the
feedback,
54
percent
of
students
say
that
they
have
an
outstanding
experience
and
32
percent
said.
The
experience
was
good.
A
lot
of
students
87
said
that
the
course
was
extremely
challenging
and
89
of
the
students
said
that
they
would
be
more
confident
in
tackling
a
large-scale
project.
After
taking
this
course,
so
we
believe
that
students
are
satisfied
and
we're
providing
them
a
good
level
of
training
for
them
to
go
out
into
the
real
world
and
solve
real-world
data
science
problems.
A
So
as
far
as
the
diversity
of
the
teams,
as
you
can
see,
we
have
a
fair
balance
of
industry
teams
which
are
like
company
and
there's
energy
companies,
finance
companies,
tech
companies,
and
we
have
some
teams
which
work
on
projects
from
the
texas
medical
center.
Like
I
mentioned,
and
also
there
are
teams
working
for
not-for-profit
companies
and
even
for
local
government
agencies,
and,
as
you
can
see,
the
data
science
area
column,
there
is
a
wide
variety
of
projects
that
we
get.
A
A
The
sponsors
are
mostly
satisfied,
but
sometimes
they're,
not
so
it's
it's
a
fine
balance
between
trying
to
satisfy
sponsors.
You
know
assuring
that
the
learning
outcomes
are
achieved
and
ensuring
student
satisfaction
and
so
on.
So
it's
a
very
fine
balance
trying
to
set
all
these
up
together.