►
From YouTube: South Hub Education & Workforce WG| Insights on Curriculum Design in Data Science & Big Data Courses
Description
November 2021
Presenter: Dr. Da Yan; University of Alabama at Birmingham
A
Thank
you
renata
for
the
introduction,
and
it's
a
great
pleasure
that
I
I
I
have
the
chance
to
introduce
my
my
experience
in
teaching
data
science
classes
here
at
the
south,
big
data
hub
education,
workforce
group,
and
so
today
I
will
be
sharing
some
of
my
experience
when
I
teach
the
data
science
classes,
which
is
actually
most
of
the
classes,
I'm
teaching
at
my
university,
so
the
three
data
science
courses
I
currently
teach
in
foundations
of
data
science,
machine
learning
and
deep
learning,
and
actually
every
year
I'm
teaching
the
first
two
courses
like
for
for
like
for
our
ms
ds
program,
which
is
a
master
of
science
in
data
science
program
which
actually
started
a
couple
years
ago,
and
it's
been
immensely
successful.
A
Like
my
class
size.
These
days
are
way
above
100
students
and
experience
a
lot
of
chinese
in
teaching,
but
I
think
it's
a
like
the
students
are
liking.
It
and
also
they're,
happy
with
all
all
these
new
data
science
materials
that
they're
learning
and
in
our
program
we
actually
have
four
fundamental
courses
like
the
required
courses.
A
You
know
to
to
the
other
data
science
classes
without
much
difficulty,
so
my
topic
today
will
be
really
basic,
not
not
particular
to
any
domain
like
I
know,
a
lot
of
people
are
talking
about
like
social
science
or
like
bioinformatics
medicine,
but
here
I'm
really
talking
about
very
fundamental
data
science
and
machine
learning
topics
and
my
experience
in
teaching
these
classes
and
our
program
actually
has
two
plans.
One
plan
is
a
regular
course
based
another
plan.
A
Actually
student
can
do,
for
example,
the
thesis
master
thesis
with
an
advisor
and
actually
right
now
I
have
a
student
working
on
a
master's
would
be
for
a
computer
vision
project
and
in
general.
Actually,
our
our
department
is
really
very
small
different
right
now
we
are
growing,
but
we
only
have
like
10
10
faculty
members,
so
we
are
just
around
the
three
directions.
One
is
like
data
analytics.
A
Basically,
my
area
is
database
data
mining
machine
learning,
so
it's
mainly
I'm
mainly
in
data
analytics
and
there's
also
a
few
faculty
members
that
are
exploring
cyber
security
and
we
also
do
high
performance
computing,
and
I
also
do
a
little
bit
high
performance
computing,
because
I
develop
big
data
systems
that
are
distributed
and
so
another
important
aspect
of
data
science
really
the
interdisciplinary
and
the
domain
specific
driven
like
way
of
doing
like
computer
science
right.
A
Basically,
we
really
look
at
applications
and
as
the
domain,
and
so
we
actually
have
a
strong
like
flavor
of
like
promoting
the
interdisciplinary
aspects
of
the
program
where
students
can
also
take
like
business
courses,
can
also
take
a
bio,
related
courses
of
medicine,
related
courses
after
taking
the
basic,
for
example,
data
science
classes
techniques
and
as
a
long
story
short.
My
experience
with
teachers
over
these
years
is
that
I
have
three
things
that
I
would
like
to
share
with
the
audience
here.
A
One
is
that
we
for
data
science,
it's
really
more
there's
more
emphasis
on
hands-on
programming,
so
really
hands-on
program
is
very
important.
Initially,
when
I
teach
actually,
I
teach
a
lot
about
the
the
theoretical
aspects,
especially
because
there's
a
textbook
by
I
think
john
hopcraft
and
the
other
authors,
it's
more
like
a
theoretical
computer
science
right,
so
the
students
are
liking.
A
Another
thing
is
that
we
need
to
give
students
a
very
clear,
big
picture
of
data
science
right,
because
if
you
these
days,
it's
like,
we
always
heard
of
these
words
like
big
data
data
science,
ai
right.
So
so
it's
a
little
bit
confusing
right
and
there's
a
boundary
and
we
important
that
students
have
understanding
of
right.
What
are
these
things
and
what
are
specific
domain?
What
are
the
fundamental
skills
down
there
right?
If
you
look
at,
for
example,
cs
rankings,
they
don't
have
a
discipline
called
data
science
or
big
data
right.
A
So
so
how
like
it's
actually
a
combination
of
multiple
areas,
for
example
right.
So
if
they
have
a
clear
picture,
they
know
where
they
really
like
and
they
can
move
to
and
play
a
role
in
the
data
science
world.
And
finally,
what
I
found
important
is
balancing
the
teaching
on
the
basics
in
the
frontier
right,
because
data
science,
especially
like
deep
learning
right
these
days,
moving
like
crazy
ai
and
if
we
teach
a
lot
of
frontiers.
I
will
also
occupy
a
lot
of
times
that
we
teach
the
basics.
A
So,
firstly,
I
will
give
a
few
words
on
the
programming
side
right,
so
I
I
have
very
strong
emphasis
on
programming
my
teaching.
Actually,
if
you
look
at
let's
say
30
years
ago,
like
one
in
my
undergraduate
right
so
well,
my
instructor
teaching
me
c
super
plus
right.
So
he
always
mentioned.
Pascal
was
it's
like
a
fortune
language
and
also
there's
early
days.
There's
a
couple
language
right.
So
the
the
you
know
the
following
important
system.
The
us
government
is
trying
to
find
some
programmer
and
it's
it's
very
difficult
right.
A
So
these
are
the
like
old
days,
these
programming
languages
that
are
popular
and
when
I'm
doing
my
undergraduate
in
computer
science
actually
cc
publicity,
the
must
as
a
first
language
right
and
if
you
look
at
maybe
10
years
ago,
it's
java
is
more
popular
right
because
it
removes
all
those
headaches
like
a
memory
leak
and
the
delete
yourself
et
cetera
right
for
the
object.
And
then
these
days
python
is
absolutely
dominating
right.
So
in
statistics
maybe
are
a
lot
of
packages
are
still
necessary.
But
if
you
look
at
data
science,
people
are
doing
dominantly
python.
A
So
what
should
be
the
first
language
that
students
choose,
so
my
courses
actually
go
with
python
and
I
think
that's
the
right
choice
for
data
science
right,
but
I
I
don't
think
it's
a
the
right
choice
for
computer
science
and
these
are
two
different
tracks
and
programs.
A
From
my
perspective,
for
computer
science,
if
you
are
building
basic
things,
still
use
c
super
burst
and
a
lot
of
python
packages
are
actually
written
in
cc
purpose,
using
like
a
pi,
binder
11
or
something
like
open
api
right
so
and
for
software
engineering
at
least
in
my
days
most
of
the
students
start
from
java
right
and
these
days
I
see
a
trend
moving
to
flask,
also
in
python
right
and
there's
a
node.js
and
for
the
backend
right.
So
so
there's
a
so.
A
I
think,
as
a
technology
moves
fast
and
people
are
moving
to
to
languages
that
allow
us
to
broaden
the
participation
of
people
in
computing.
But
of
course
it's
there's
a
trade-off
right.
So
I
think
why
python
is
the
right
choice
for
data
science.
It's
really
because
I
think
for
data
science,
it's
what
I
think
is
called
applied:
computer
science
right,
it's
application
driven
and
if
you
look
at
some
people
early
days
talk
about
the
data
science,
it's
really
computer,
mathematics,
combined
with
the
domain
right.
A
You
need
to
work
on
a
particular
domain,
whether
it's
social
computing,
whether
bioinformatics
or
whether
it's
like
a
like
a
medical
imaging
et
cetera.
So
it's
really
a
particular
domain
that
you
you.
You
apply
your
computer
science
and
the
math
skills
on
the
phone
and
it's
also
a
fast-track
learning.
If
you
look
at
the
regular
computer
science
program,
we
need
to
teach
all
kinds
of
like
operating
system
compiler
all
these
things.
A
Actually
now
these
days,
we
teach
a
lot
more
about
the
data
science
related
skills
and
play
down
par
a
little
bit
about
the
computer
science,
there's
a
very
fundamental
computer
science
side
right
so
and
another
important
thing
that
I
find
it's
very
important
to
build
a
like
a
impression
on
the
students
right.
They
need
to
know
that
they
need
to
learn
a
lot
of
tools
and
learn
their
models
so
that
they
know
how
to
use
the
tools.
A
Basically,
we
are
not
the
tool
builder
if
we
are
in
data
science,
but
we
we
need
to
know
a
lot
of
the
analytics
statistics,
machine
learning
tools
so
that
we
can
use
them
so
these
days.
One
interesting
thing
I
find
like
in
the
recent
decades
that
everything
that
you
you
find
you
need
in
like.
I
also
do
a
lot
of
collaboration
with
other
people
in
other
like
desiring
domains
right.
A
So
whatever
we
need,
almost
you
can
find
always
find
a
python
library,
that's
already
there
with
some
functions
that
you
can
call
and
to
achieve
this
goal
right
and
these
days
people
are
moving
crazily
like
forwarding
deep
learning
right
now,
like
a
lot
of
machine
learning,
branches
have
making
non
trivial
like
improvement
in
performance
because
of
the
diplomacy,
and
we
have
tensorflow
from
google
and
python
from
facebook
that
people
like
they
are
to
dominating
frameworks
right
and
there's.
A
You
trend
the
model
right
once
parameters
start,
then
you
call
predict
on
your
new
data
right,
so
you
can
see
that
it's
really
simple.
It
makes
it
seems
a
lot
easier
to
broaden
participation
of
other
students
who
previously
they
are
really
afraid
of
learning
c
plus,
for
example,
right
so
now
they
can
use
this
all
these
data
science
tools
to
do
projects
for
for
their
domain
of
interest
much
easier,
and
but
I
would
like
to
indicate
right.
So
these
are
just
a
set
of
libraries
that
my
research
people
have
been
using.
A
I
think
that's
very
good,
but
there
are
many
many
libraries
it's
a
flat
structure
of
data
science
right
so,
for
example,
for
natural
language
processing
and
is
like
a
basic
tools
for
processing
the
natural
languages
like
the
tokenizer
et
cetera,
partying
and
the
gsm
is,
for,
I
think,
topic.
Modeling
right,
hugging
phase
is
really
popular.
These
days
they
have
like
a
recently
checked
as
transformer
models.
70
is
something
of
transforming
models.
A
C
states
right
so
implementing
the
library
and
cv2
is
for
opencv
right,
and
if
you
look
what
kind
of
geospatial
data
there's
joe
pandas
right
and
says?
Also
volume
for
visualizing
your
data
on
the
map,
and
we
also,
of
course,
have
the
deep
learning
libraries
like
tito
and
tensorflow.
If
you
do
graph
mining
graph,
visualization
networks
and
a
lot
of
other
libraries
are
in
python
right.
So
these
are
like
a
lot
of
tools,
students
onto
anything
in
those
different
domains.
A
They
can
learn
these
libraries
and
then
they
can
really
work
on
the
real
projects
really,
for
example,
networks.
If
you
do
some
like
social
science
projects,
analyze
the
social
networks,
you
can
use
networks
and
if
you
do
some
geospatial
data
you
can
use
like
japanese,
and
the
one
thing
I
find
interesting
is
that
we
can
view
python
as
a
bouncer
shell
right,
so
actually
a
lot
of
things
down
there
actually
implementing
like
c
or
c
plus
for
efficiency
reasons
which
can
be
tens
of
times
faster
right.
A
So
even
numpy
operations
they're
actually
implement
using
because
it's
a
rare
matrix
operations
right.
It's
using
like
cache
cache,
conscious,
algorithms,
there
right.
So
if
you
look
at
the
succulent
library,
if
you
do
the
testing,
actually
they
always
ask
you
to
give
the
entire
test
sample
with
and
sample
characterize
in
all
the
sample
rows
and
they
give
the
labels
together
in
one
batch,
because
it's
faster
than
if
you
give
them
one
data
at
a
time
whether
for
loop
in
python
right.
So
actually
it's
a.
A
A
So
writing
everything
by
the
students
itself
could
be
even
worse,
because
the
performance
may
hurt
right
because,
because
you
know
down
there,
a
lot
of
libraries
are
actually
in
cnc
companies,
and
so
another
important
thing
that
we
have
we
are
seeing
these
days
is
a
jupiter
notebook
right.
So
originally,
I
think
it's
just
for
a
python,
so
it's
called
ipasso
notebook
and
these
days
they
also
do
the
matlab
are
all
those
interactive
languages.
A
So
just
if
you
really
want
to
do
big
projects,
I
I
don't
think
a
cuban
notebook
will
be
sufficient
because
you
need
to
debug
and
like
tracking
all
those
like
a
classes
into
different
files,
then
maybe
an
id
like
a
video
studio
or
pycharm
would
be
a
better
choice
and
there's.
Actually,
if
you
really
want
to
go
deeper
to
the
pythons,
they
have
cool
extension.
If
you
have
some
back
propagation
operation,
you
want
to
implement
more
efficiently.
A
So
you,
you
probably
need
to
learn
these
extensions
and
programs.
So
it's
actually
what
I
would
say
is
it's
much
easier
to
enter
this
area,
but
if
you
want
to
really
have
some
students
really
want
to
go
into
the
tabs
right,
so
it's
actually
there's
a
lot
a
lot
of
things
they
need
to
learn
and
another
thing
that
I
I
this
is
of
course
just
my
view
right.
A
I
want
a
student
to
have
a
clear
big
picture
in
data
science,
and
one
thing
that
I
want
a
student
to
know
is
that
data
science
is
really
not
all
about
machine
learning
right,
so
we
also
have
data
cleaning
at
the
very
beginning,
which
can
be
really
time
consuming
right
and
if
you
want
to
analyze
the
result,
usually
you
want
to
do
some
like
data
visualization
right
so
and
see
how
things
goes
and
also
for
training,
machine
learning
models.
We
might
need
an
annotation.
This
can
be
done
trivial.
A
If
you
want
to
do
a
computer
vision
and
the
nlp
tasks,
you
need
to
use
some
tools
right
so
and
also
machine
learning
is
not
all
about
deep
learning.
Even
so,
deep
learning
is
like
crazily
sweeping
the
world
right
so
and
of
everywhere
in
machine
learning
and
like
like
a
caminhos
rats,
now
they're
getting
91
000
citations,
even
though
it's
published
in
2016
and
we
are
seeing
even
more
machine
learning
papers
these
days
than,
for
example,
biorelated
papers
in
bioarchive,
and
so
a
confusion
by
student
is
like
maybe
a
decade
ago.
A
Right,
what's
really
hard
is
like
hadoop
and
spark
also
cassandra
all
those
infrastructures
right
so
see.
My
vision
to
these.
These
are
data
engineering
right.
These
are
not
data
science,
usually
a
company.
You
will
do
all
these.
You
store
all
the
streaming
data
you
have
and
then,
when
you
want
to
do
analytics,
you
use
these
etl
those
extraction,
light
tools,
you
get
your
data
to
analyze
and
on
there
it's
data
science
right.
So
so
we
are
teaching
mainly.
A
We
do
have
a
course
on,
like
big
data
programming,
which
is
aimed
to
teach
those
those
things
down
here
and
like
the
students
need
to
know
a
positive
need
to
learn
for
like
a
using
python
right
learning.
Basic
python
is
not
enough,
because
you
know
everything
these
days
are
numeric
computing
in
machine
learning,
for
example,
so,
for
example,
succulent
they
have
array
like
which
actually
all
the
trending
data
test
data.
A
Actually
I
really
like
you
can
put
it
in
the
number
right
and
also,
for
example,
if
you
look
at
python
tensorflow,
they
are
back
propagation
version
of
the
numpy
operations
right.
So
if
they
know
numpy
functions
well,
they
still
will
easily
learn
python
tensorflow.
If
not
they,
it
will
be
difficult
for
them
to
understand
things,
and
also
I
give
students
ideas
like
what
is
classification
regression.
A
What's
the
screen
models
and
the
generic
models
right
discrimination
models,
more
likely,
regression,
general
models
have
probability,
and
you
generate
things
right
like
generating
natural
language,
et
cetera
right,
and
there
are
also
different
schemes,
like
reinforcement,
active
learning.
What
are
those
concepts?
I
I
in
my
machine
teaching.
I
really
want
to
give
them
a
clear
idea
and
then,
if
you
look
at
the
recent
works,
that
in
deep
learning
actually
a
lot
of
things
are
just
combination
of
them
right.
So,
for
example,
you
have
an
object
localization.
A
You
first
want
to
give
a
bonding
box
location
right,
xyz,
center
point
and
then
width
and
height.
They
are
numerical
numbers,
so
it's
actually
a
regression
right
and
then,
whether
it's
in
the
box,
it's
a
cat
and
dog
is
a
classification.
It's
actually
a
combination
of
those
things.
If
you
have
a
clear
concept
on
the
basic
machine
learning,
it
will
help
them
in
understanding
more
complex
models
right
if
you
have
a
mask
right
so
like
a
mask
on
cn.
A
Basically,
it's
a
mask
where
it's
a
pixel
level
classification,
whether
if
you
have
a
object,
it's
it's
belong
to
that
object
in
the
pixel
knob.
It's
a
binary
clustering
on
the
pixel
right,
it's
new
classification,
and
these
days
we
also
have
more
advanced
models
like
anchor
list
models.
So
previously
we
need
to
give
the
anchor
boxes
right
now.
There's
no
need
to
do
this
manual
and
combust
sliding
right
and
there's
a
like
a
center
knot
right
people
do
a
heat
map
thing.
A
First,
right
this
again
more
like
a
binary
map
right,
so
whether
what's
the
probability
that
each
pixel
is
a
center
point
of
an
object
and
so
moving
to.
Let's
say
deep
learning
right:
this
is
a
slide
from
paper
with
code
right,
it's
crazy!
If
you
look
at
success,
maybe
tens
of
directions
which
were
selling
the
papers
here
and
it's
it's.
It's
a
it's
a
like
a
daunting
task
for
students
to
move
forward
right
so,
but
does
that
mean
they
need
to
learn
endlessly?
A
My
answer
is
yes
or
no
right,
so,
firstly,
they
need
to
learn
the
fundamentals
right.
So
there's
a
lot
of
concepts
are
reused
by
all
those
works,
just
tuned
to
different
data
types
and
applications
right
like
attention
and
like,
let's
say,
self-attention
mechanism
for
transformer
models
right
and
like
other
regressive
models
and
all
those
concepts.
These
are
just
two
examples
right,
so
graph,
neural
network.
If
they
know
these
concepts
well,
it's
actually
a
combination
of
those
techniques
in
different
domains.
A
Of
course,
for
the
specific
thing
say,
they
still
need
to
spend
some
efforts
right
so
like
in
natural
language.
The
metric
could
be
bpc
right
and
it
could
be
blue
score
for
machine
translation,
et
cetera.
There's
some
benchmark
like
glue
like
the
the
stanford
question.
Answering
data
set
right
so
so
there
are
some
domain
specific
things
they
need
to
spend
time,
but
if
they
know
everything,
that's
actually
machine,
learning,
deep
learning
down
there,
classification
regression,
those
kind
of
concepts.
A
And
finally,
what
I
want
to
say
is
they
need
to
like
balance
of
basics
in
the
frontier
right.
So
a
lot
of
times.
If
we,
because
we
are
doing
research,
we
are
really
happy
with
our
area.
We
teach
a
lot
about
the
what
we
we
are
doing:
research
on
right
for
a
specific
project,
but
because
I'm
teaching
really
basic.
Like
a
foundational
data
scientist,
I
need
to
prepare
all
students
so
that
they
can
move
to
different
directions
in
data
science
right.
A
So
I
cannot
really
spend
most
of
time
talking
about
my
research
right,
so
we
need
a
balance
between
like
teaching
the
basics
and
whether
we're
teaching
the
frontiers.
The
front
is
moving
really
fast,
like
crazy
states,
and
so,
but
usually,
if
you
look
at
most
of
our
students,
let's
say
graduate
and
they
find
the
jobs
like
in
a
company.
Some
are
rt
related,
some
may
not
be
even
it
may
be
banking
industry.
A
Maybe
you
know
business,
maybe
medicine
related
right
so
say
we
really
don't
need
them
to
to
go,
go
to
the
research
frontier,
but
they
need
to
know
the
basics
so
that
they
can
use
all
those
tools
to
work
on
products
right.
So
I
put
a
lot
of
emphasis
on
teaching
the
student
basics,
big
pictures,
big
ideas
and
also
how
programs
someone
solidify
the
programming
skills
with
projects,
but
I
do
balance
the
teaching
right.
A
So
if
there
are
some
students
really
good
and
really
interesting
research,
we
have
some
research
projects
where
they
can
really
do
an
advanced
project
and
they
can.
They
can
push
the
frontier
and
delve
into
a
specific
topic.
For
example,
we
have
a
final
programming
project
that
they
can
work
on
more
advanced
projects
and
other
students.
A
Okay.
So,
as
a
summary,
that's
all
that
I
want
to
share
about
my
teaching,
and
so
we
do
have
a
some
support,
even
for
educational
ground,
from
national
science
foundation,
and
I
also
acknowledge,
like
suspect,
data
hub,
we
get
a
lot
of
like
resources
from
the
big
data
hub
for,
for
example,
we
had
a
cloud
resources
from
azure.
A
I
think
back
in
2017
and
also
some
like
training
supported
by
suspect
day
hub
to
chamberhale
for
for,
like
azure
training,
and
we
also
actually
looking
at
the
opportunities
in
sustainable
hub,
and
I
I
do
want
to
like
thank
you
for
all
those
very
helpful
resources
that
help
build
a
community
in
the
south
for
data
science.
And
finally,
I
would
like
to
thank
my
employer
for
providing
all
those
resources
that
allow
me
to
like
do
all
those
things
that
I'm
doing.
Thank
you.