►
From YouTube: Education & Workforce WG: A software and analytics tool to map graduate data science courses.
Description
Presenter: Somya D. Mohanty
Institution: University of North Carolina – Greensboro
A
So,
just
to
give
you
a
little
bit
of
a
background
on
this
project,
I
was
actually
involved
in
as
a
faculty
member
developing
a
curricula
for
our
newly
established
informatics
and
analytics
program.
So
this
was
about
two
years
ago
that
we
wanted
to
figure
out
so
who,
which
courses
should
be
included
as
a
part
of
informatics
and
analytics
program
and
also
as
a
part
of
our
data
science,
concentration
that
we
have
in
our
department
and
the
question
basically
became
that
okay,
whom
do
we
know?
A
Who
knows
a
lot
of
people
who
offer
courses
that
are
similar
in
concept
of
data
science
right?
So
that
becomes
a
little
tricky.
A
So
we
wanted
to
take
a
quantitative
approach
to
see
if
we
can
figure
out
what
courses
are
related
to
data
science
using
data
science
techniques.
So
that's
where
this
whole
project
kind
of
started,
so
in
a
particular
University.
You
have
this
course
prescription
right.
So
basically,
we
wanted
to
develop
an
automated
model
which
can
analyze
these
course
prescriptions,
and
especially
the
words
and
the
sequence
of
words
that
are
present
in
these
course
descriptions
to
the
ultimate
goals
of
okay.
A
So
let's
say
if
we
have
five
thousand
ten
thousand
courses
in
the
University,
which
topic
or
domains
do
these
courses
fall
under?
What
are
the
similarities
between
courses?
What
are
dissimilarities
between
courses?
What
are
so
if
we
aggregate
higher
what
profiles
formulate
for
different
departments
and
for
their
colleges
and
then
as
a
university,
then
we
can
compare
what
are
some
of
some
of
the
underlying
gaps
in
a
particular
University
and
what
are
the
difference
or
overlaps
between
universities?
A
Also,
so,
if
we
look
at
it,
UNCG
has
about
I
would
say
three
thousand
to
four
thousand
courses,
so
there
are
about
1600
graduate
courses
and
about
2
000
undergraduate
courses.
A
So
we
we
have
this
data
and
if
you
can
think
about
it,
no
one.
No
human
can
basically
read
through
all
of
these
courses
and
just
figure
out
okay,
which
are
data
science
and
which
are
non-data
science.
So
that's
that's
where
the
whole
quantitative
approach
kind
of
comes
in.
A
So
this
is
the
distribution
of
these
different
courses.
So
UNCG
has
a
different
profile
than
engineering
or
more
science
or
oriented
universities.
So
we
emphasize
a
lot
on
departments
such
as
history,
biology,
teachers,
education,
all
of
those
have
very
high
number
of
courses
and
much
more
active
in
the
area,
and
you
can
see
the
distribution
of
graduate
and
undergraduate
courses
and
introductory
courses
in
these
different
departments.
A
The
problem
that
we
have
is
basically
the
course
descriptions.
So
at
UNCG
we
are
limited
to
30
words
only
for
the
course
description,
and
if
you
include
prerequisites
and
core
requisites
description
in
that
course
description.
It
really
shortens
up
what
information
you
are
able
to
provide
in
that
particular
course
description.
So
we
maybe
have
about
20
to
25
words
to
figure
out.
What
is
the
context
of
this
course
right,
so
taking
that
further,
our
approach
basically
was
to
take
all
of
these
courses.
A
Do
some
pre-processing
and
basic
natural
language
processing
and
then
take
that
processed
course
description
and
feed
it
through
two
different
models,
so
they
are
staged.
One
extracts
the
topic
out
of
a
course
description.
A
So
a
course
description
will
be
linked
to
a
particular
set
of
topics,
and
then
we
take
those
topics
and
the
course
descriptions
and
basically
group
them
higher
into
like
hierarchies
of
topics
and
that's
kind
of
where
we
get
to
like
a
mapping,
of
course,
descriptions
that
are
present
at
a
university.
A
So
here
we
are
using
a
very
well
known
technique
called
topic
modeling,
where,
if
you
have
course
descriptions
which
are
a
set
of
words
that
are
present
in
any
document,
you
look
at
their
inherent
frequencies
of
words
that
are
present
across
different
documents
to
figure
out,
okay,
which
documents
could
fall
under
a
particular
topic.
So
we
are
kind
of
grouping
these
courses
into
these
different
topics
and
winning
them
into
these
different
topic
groups.
Now.
The
interesting
part
that
happens
over
here
is
each
course
has
a
distribution
of
topics.
A
That
means
that
each
course
can
be
linked
to
different
topics,
and
then
each
topic
has
a
set
of
courses
that
are
linked
to
them.
So
it's
kind
of
a
bi-directional
distribution
that
we
can
take
a
look
at
so
just
taking
a
look
at
the
topics
that
form
out
of
graduate
courses
at
UNCG.
We
evaluated
a
large
number
of
possible
topics
and
we
kind
of
concluded
that
about
25
topics
are
a
good
enough
measure
to
kind
of
distribute
all
of
the
courses
out
and
for
each
topic.
A
There
are
a
set
of
keywords
that
kind
of
highlight
individual
topics,
so
this
is
a
set
of
25
topics
that
were
learned
through
about
2000
courses,
2000
graduate
course
descriptions,
and
these
are
the
keywords
that
formulate
for
those
topics.
So
you
can
see
that
these
topics
are
a
little
bit
unique
on
their
own
and
then
each
of
these
topics
can
be
related
to
the
courses
that
are
present
at
the
University
level
right.
So
so,
for
example,
this
course
has
a
dominant
topic
of
number.
A
A
So
we
take
this
further.
So
we
take
this
to
the
second
stage,
where
we
say
that
we
want
to
map
this
into
a
network-based
graph
where
the
nodes
are
either
the
courses
or
they
can
be
a
topic
and
a
course
is
related
to
a
topic
with
a
similarity
score.
So
we
kind
of
evaluate
how
similar
is
this
course
description
to
a
topic,
and
we
associate
an
edge
from
the
course
node
to
the
topic
node.
A
So
what
kind
of
works
out
is
something
similar
to
this
example,
where
a
course
might
be
related
to
multiple
topics
and
they
might
have
Edge
weight
or
similarity
score
between
them
and,
for
example,
this
is
a
gray
topic
cluster
that
is
forming,
and
here
could
be
a
community
of
courses
and
topics
which
form
into
another
growth
right.
A
So
this
is
something
that
we
want
to
analyze
in
order
to
see
what
domain
areas
form
from
these
courses
and
then
from
the
topics
and
then
from
the
higher
level,
topics
that
form
out
of
it,
which
we
call
as
super
topics
in
the
sense
so
to
look
at
the
data.
This
is
what
the
entire
mapping
of
courses
of
the
almost
2000
courses
looks.
Like
for
UNCG
at
The
Graduate
level,
so
each
node,
the
small
nodes
that
you
see
over
here
are
courses.
A
They
are
all
linked
to
topics
and,
of
course
each
course
can
have
different
topics
that
it
can
be
linked
to
and
what
we
are
able
to
do
is
again
group
these
communities
of
topics
and
courses
together
into
these
super
clusters,
which
is
identified
by
these
colors
that
you
see
over
here,
so
just
to
give
you
an
idea.
So
these
are
the
topics
that
were
formed
and
these
were
the
Clusters
super
clusters
that
were
formed
from
these
topic
areas.
A
And
if
we
look
over
here,
so
we
were
able
to
identify
that
topic
number
one,
two
and
twenty
one
which
have
these
words
right
and
they
were
assessed
as
these
super
topic
clusters
kind
of
fall
into
the
data
science
domain
right.
So
again,
this
was
done
qualitatively
by
analyzing
these
smaller
number
of
topics,
but
then
we
were
able
to
identify
that
these
that
this
super
cluster
number
five
are
the
courses
which
are
or
are
the
topics
which
are
related
to
data
science.
Now
we
take
this
and
map
them
out.
A
So
this
is
a
more
cleaner
graph,
where
you
can
see
topic,
2,
21
and
1,
which
forms
the
mapping
of
the
data
science
cluster,
and
we
take
this
and
these
courses,
because
topics
are
related
to
courses.
So
we
can
figure
out
what
courses
are
most
similar
to
that
super
cluster
that
we
identified
as
data
science.
So
if
we
look
look
at
this
further,
this
are
these
are
the
courses
from
the
Department
of
Mathematics
and
statistics.
So
these
courses,
which
are
mostly
statistical
courses,
are
related
to
the
data
science
clusters.
A
A
Mostly,
we
also
evaluated
educational
research
methodology
which
had
some
data
science
related
courses,
msia,
which
is
our
informatics
and
analytics
program,
which
of
course
had
some
data
science
courses,
and
we
also
wanted
to
analyze
other
departments,
which
you
would
think
are
not
data
science
related
and
to
see
if
they
fall
under
the
category
of
data
science
or
if
they
have
any
courses
which
are
related
to
data
science.
A
So
if
we
took
a
look
at
The,
English,
Department
and
their
course
offering-
and
you
can
see
over
here
their
course
offerings
are
widely
dissimilar
to
this
particular
topic
area.
So
they
are
mostly
related
on
this
side
and
we
took
this
further.
So
we
wanted
to
analyze
per
Department.
What
courses
do
they
offer,
which
are
related
to
the
concepts
of
data
science?
A
So,
to
our
surprise,
educational
research
methodology,
statistics,
teachers,
education,
mathematics,
computer
science?
So
this
is
the
order
in
which
the
courses
that
they
offer
are
data
science,
related
courses,
and
if
we
take
the
ratio
of
the
courses
offered
to
the
courses
that
were
the
courses
that
are
data
science
related
to
the
total
number
of
courses
that
they
offer.
Statistics.
Of
course,
pops
up
and
then
ERM
HHS,
which
is
Health
and
Human
Sciences
Supply
Chain
management,
which
is
from
the
business
school
sociology,
and
then
computer
science
comes
into
the
picture
straight
up.
A
Nope
I
think
you're
good,
keep
going
okay
and
we
are
able
to
identify
what
courses
are
related
to
data
science.
So
you
can
see
that
these
are
the
course
descriptions
and
we
are-
and
these
are
the
course
titles
which
are
most
related
to
data
science
from
statistics.
This
is
from
educational
research
methodology.
A
Okay,
so
I'm
gonna
stop
sharing
this
thing
right
now
and
re-share,
something
else
which
we
just
came
up
with
quite
recently.
A
So
this
is
a
simple
dashboard
that
we
built
for
Georgia
State
University,
where
we
are
also
analyzing
their
course
descriptions
to
figure
out
what
topics
and
what
emergent
data
science
courses
are
being
offered.
So
we
can
basically
do
a
keyword,
search
and
figure
out.
Okay.
What
courses
do
they
offer?
A
Which
departments
offered
them
and
basically
pop
up
what
course
transcriptions
really
match
and
the
keywords
that
are
related
that
are
basically
given
as
a
search
term
to
this
dashboard
again,
a
very
preliminary
prototype
of
a
dashboard,
but
we
believe
that
this
could
be
useful
as
a
students
too,
to
figure
out
electives
the
courses
that
they
want
to
take
in
a
particular
domain
or
the
courses
that
are
interdisciplinary
that
they
can
find
out.