►
From YouTube: CDF SIG MLOps meeting 2020-08-13a
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
A
Yeah,
so
I'm
looking
forward
to
getting
stuck
into
that
and
making
sure
that
all
the
ml
up
stuff
is
is
moving
ahead
on
on
that.
B
C
B
B
A
It
so
you
know
I've
got
some
idea
of
what
the
what
this
the
structure
is
like
now,
good
good,
so
yeah.
I
just
need
to
make
some
time
to
actually
fire
an
instance
up
and
start
playing
with
it.
D
B
D
Hi,
hey
sorry,
I
was
chatting
to
james's
rawlings
and
striking
just
before
so
always
goes
over
time.
So
how
are
you
folks
doing.
B
D
Yeah,
that's
doing
it
right
here.
It's
the
start
of
I've
got
a
t-shirt
on
because
it's
the
start
of
it
feels
like
springtime
here.
So
I'm
pretty
happy
about
that.
Today
was
it's
pretty
short
winter
year,
so
I
can't
really
complain,
but
the
day's
getting
longer
and
it's
getting
warmer.
I
think
you're
just
getting
over
your
heat
wave.
So
that's!
I
think,
house
is
about
to
start
yeah.
D
Yeah,
that's
unpleasant
there
in
the
heat
so
yeah.
I
guess
this
week.
Did
you
have
anything
you
want
to
bring
up
cara
or
terry.
A
So,
from
my
perspective,
we're
just
finished
off
the
last
few
items
in
this
technology
requirements.
Section.
B
A
I've
done
another
three
this
week,
we
just
need
to
fire
through
the
the
last
handful.
D
Maybe
we
could,
I
forgot
to
take
a
bunch
of
notes
last
week
on
some
of
those
points
we
were
talking
about,
so
maybe
we
could
tear
into
those
last
three
today.
D
Maybe
if,
if
I
shared
a
screen
and
typed,
we
could
do
that
so
before
that
I
had
an
interesting
discussion
with
metaflo
and
and
netflix,
so
metaflow
was
something
I
think
I
mentioned
last
time,
and
so
I
spoke
to,
I
guess
the
lead
developer
of
that
from
from
netflix
and
they
are
interested
like
they
were
very
interested
in
the
mlx
roadmap.
In
fact,
he
said
he
was
working
on
an
internal
memo
at
netflix
to
sort
of
pass
around
on
their
sort
of
principles
of
ml.
D
Upswitch
overlap
very
much
with
this,
so
he's
going
to
take
a
another
look
at
it
and
if
he
can
share
it
with
me,
that
would
be
good
too,
because
it's
kind
of
a
you
know
large-scale
validation
of
some
of
those
ideas.
There
was
a
really
good
blog
they
put
out
recently
they
did
integration
with
amazon
step
functions,
so
they
use
step
functions
as
a
back
end
for
orchestrating
the
workflows.
So
metaflow
is
more
like
a
high
level
sort
of
flow
config
that
just
glues,
different
tools
together
and
behind
the
scenes
in
netflix.
D
They
have
sort
of
a
closed
source
orchestrator,
but
they
wanted
to
make
it
work
with
something
that
was
off
the
shelf,
so
that
was
amazon's
dead
functions.
So
that
could
be
something
else.
So
he
was
interested
in
tactile
from
that
point
of
view,
and
he
he
mentioned
it's
a
sister
project
of
spinnaker,
so
he's
very
aware
of
the
cdf,
so
it
could
well
be
if
it
becomes
successful.
D
The
way
spinnaker
was
it
could
become,
they
could
become
interested
in
the
cdf,
which
I
thought
was
interesting,
but
we'll
see
at
this
stage,
they're
kind
of
sort
of
thrashing
pretty
hard
on
it
and
they're
getting
outside
contributions
and
stuff.
So
you
know
they
want
that
to
sort
of
settle
down
and
to
see
it
has
some
life
to
live
outside
of
netflix
and
then
they'd
be
interested
in
perhaps
looking
at
a
place
to
to
host
it.
D
So
I
thought
that
was
interesting
because
to
me
I
view
it
as
sort
of
almost
like
an
instantiation
of
a
lot
of
the
principles
in
the
in
the
anal
ops
roadmap-
maybe
not
all
of
them,
but
I
guess
that's
part
of
the
idea
of
that
is,
like
you
hear
a
bunch
of
challenges
and
requirements
and
possible
solutions,
and
this
is
one
of
them
that
you
know
that
solves
you
know
a
bunch
of
these
things.
Yeah.
B
D
It
it
definitely
comes
from
the
angle
of
building
stuff
for
people
who
are
interested
in
data
science
and
machine
learning,
not
necessarily
only
data
scientists
but
heavily
on
the
data
science
side,
so
building
tools
that
that
guide
them
to
do
it
the
right
way,
but
not
to
too
prescriptive.
D
So
it
was
pretty
interesting
and
they
run
things
at
quite
a
scale
there
like
he
was
saying
they
you
know
for
multiple
countries.
They
would
do
maybe
trying
a
model
for
every
language
in
that
country.
So
you
can
imagine
the
fan
out
of
that
and
they
they're
trying
retraining
that,
based
on
some
of
its
cron
schedules,
some
of
its
when
the
data
becomes
available
from
some
upstream
data
source.
D
D
So
I
I
know
this
is
recorded,
so
I
can't
share
everything
that
you
sort
of
talked
about
because
you
know
netflix
netflix,
but
it
was
it's
fascinating,
so
they're
kind
of
applying
a
lot
of
these
principles
at
scale
and
doing
it
in
a
way
that
everyone
in
netflix,
that
does
machine
learning
and
data
science
sort
of
goes
through
this
platform.
So
I
thought
that
was
a
good
validation
of
some
of
these
ideas
and
there
was
one
blog
post.
D
I
just
put
it
in
the
show
notes
the
show
notes:
the
the
minutes
middle
flow
amazon,
which
I
thought
was
a
really
good
blog
post
on
the
philosophy
of
it
from
a
data
science,
point
of
view.
D
D
On
on
their
philosophy
of
things
and
why
they
use
a
dag,
a
cyclic
graph
to
do
things
and
yeah.
So
that's
that's
worth
a
look.
Hi
is
that
al
mog,
I'm
not
sure
if
I'm
saying
your
name
right,
so
I'm
not
sure
if
you've
been
here
before
when
I've
been
haven't
been
here.
I
don't
know
if
we've
met.
E
Hi
I've
met
on
the
other
meeting
with
terry,
I'm
I'm
a
technical
entrepreneur
founded
a
few
companies
and
now
I'm
investigating
and
researching
about
the
innovation
in
the
envelopes
world.
So
I'm
here
to
listen
and
help,
and
whatever
you
want
to
do,
I'm
I'm
I
can
help
but
feel
feel
free.
Please
to
to
ask
me
to
do
stuff
this
way.
I'll
do
it.
You
know,
because
if
you
ask,
I
can't
say
no
yeah,
I
already
volunteered.
D
Yeah,
well,
it's
great
to
yeah
it's
great
to
have
you
along
yeah.
It's
it's
a
an
interesting
area
of
the
whole
ml
ups
thing,
the
the
more
time
I
spend
on
things,
the
more
I
realize
a
lot
of
it's
about
just
handling
the
data
like
I'm.
I
must
have
spent
most
of
the
past
week
just
trying
to
I'm
working
on
a
conversational
sort
of
user
interface
right
now,
there's
a
bunch
of
different
models
that
are
doing
things
behind
the
scene
that
are,
they
can
take
their
time.
D
They
prepare
the
data,
but
then
the
the
the
sort
of
more
real-time
stuff
is,
I'm
finding
a
bit
frustrating
because
everything's
time
sensitive
and
it's
all
about
getting
the
data
in
a
timely
fashion
and
yeah
it's
most
of
the
mlaps
seems
to
be
about
handling
data.
It's
almost
like
by
the
time
you
get
to
training
the
model
and
deploying
it
that's
the
fun
part
like
that's
the
that's
the
it's
it's
it's
a
dangerous
part,
but
it's
also
the
fun
part
so
yeah.
D
I
guess
the
next
thing
was
last
time
we
mentioned
about
any
rfps
and
conferences,
and
things
like
that
has
there.
Anyone
had
any
done
any
submissions
or
any
heard
of
anything
interesting
to
start
sort
of
evangelizing
or
socializing.
The
roadmap.
A
I
think
we've
we've
probably
missed
most
of
most
of
the
events
this
year,
things
tend
to
be
they're.
D
Yeah
yeah,
so
I
guess
it's
not
to
keep
an
eye
out
is
has
cube
con
closed.
I
don't
know
whether
this
is
within
the
remit
of
that,
but
they
certainly
would
get
a
bigger
audience.
I'm
pretty
sure
their
main
event
is
late
in
the
year.
I
don't
know
if
they
are.
D
D
D
So
I
guess
the
next
thing
is
to
dive
into
those
final
few
items.
Did
you
want
to
show
your
screen
terry,
or
do
you
want
me
to.
A
D
D
So
I
made
a
change
to
the
online
learning
section:
changed
the
wording
to
say
it's
not
out
of
scope,
but
it's
not
something
we're
looking
at
at
this
time
and
these
words
that
effect.
A
A
Yeah,
well,
we
can
we.
What
we
do
is
we
flag
certain
things
as
difficult
challenges
which
are
typically
in
a
long
time
horizon,
and
then
that's
the
that's.
The
point
of
this,
which
we're
going
to
come
on
to
next
is,
is
that
this
is
basically
giving
an
indicator
of
when
you
can
expect
certain
capabilities
to
be
available
and
and
for
something
like
that
which
is
which
is
very
challenging.
A
A
So
you
know
you
you
would
you
would
flag
it
as
as
as
black
research
required,
and
then
you
you,
you
would
potentially
not
see
any
development
activity
on
this
time
horizon,
but
you
you
would
leave
it
as
an
outstanding
that
that
somebody
needs
to
pick
up,
and
those
are
typically
the
things
that
some
people
will
be
very
interested
in,
because
they'll
be
looking
at
the
the
longer
term
wins
and
where
it's
worth
investing
in
a
big
bet
for
something
downstream.
A
Managing
assets
and
security,
so
this
is
a
this-
is
the
latest
pr
and
then
again,
yeah.
A
Yeah,
so
so
this
is
an
interesting
one,
because
you
know
the
practicalities
of
models
are
that
they're.
There
are
nearly
always
some
sort
of
trade-off
between
different
factors
and-
and
there
are
are
often
going
to
be
conflicting
drivers
on
on
customers
as
they're
trying
to
build
solutions.
D
So
these
are
these
trade-offs
in
in
terms
of
almost
design
decisions
like
like.
Currently,
I
imagine,
if
it's
sort
of
a
human
factor,
then
it
would
be
a
note
in
a
or
it
would
be
a
you
know,
some
comments
in
a
notebook
or
a
document
somewhere
saying
we,
maybe
you
couldn't
do
things
at
the
zip
or
postcode
level,
but
we
had
to
reduce
the
resolution
to
be
the
trade-off.
D
Is
we
couldn't
be
as
fine-grained,
because
of
some
legal
constraints,
or
one
thing
I
I
was
looking
at
was:
was
tracking
sentiment
and
things
of
individual
comments
that
could
be
traced
back
to
individuals,
which
is
not
always
what
you
want,
because
that
sort
of
information
could
be
used
the
wrong
way.
So
would
a
trade-off.
Be
that
you
deliberately
almost
blunt
the
system
or
you
blunt,
the
data
in
some
way
yeah
as
a
positive
trade-off
for
privacy
or
in
some
cases
you
go.
D
This
is
we
require
this
personally,
you
know
these
these
pieces
of
personally
identifying
information
to
train
the
model,
but
it's
not
going
to
affect.
You
won't
be
able
to
go
backwards
and
you
have
to
explain.
Am
I
on
the
right
side
of
track
in
terms
of
what
you
mean
by
trade-off.
A
Yeah
I
mean,
and
and
often
different
types
of
compliance
actually
introduce
competing
trade-offs.
So,
for
example,
if
you're
faced
with
gdpr,
but
also
some
new
ai
legislation
about
explainability
those
things
will
be
in
direct
conflict
because
the
the
more
explainable
you
make
the
solution,
the
less
privacy.
A
So
so
there
are
going
to
be
certainly
big
trade-offs
in
a
triangle
between
accuracy,
privacy
and
explainability,
but
but
typically
there
are.
There
are
multiple
other
areas
where
you
will
also
have
these
sorts
of
trade-offs.
Where
you
know
there
are
basically
different
features
in
a
dataset,
and
you
can't
optimize
for
all
features,
so
you're
you're,
ending
up
with
a
balance
between
weights
on
certain
features
that
will
give
you
a
certain
behavior
out
of
the
system.
A
So,
for
example,
what
you
might
want
to
be
able
to
do
is
is
tune
parameters
in
such
a
way
that
you
end
up
with
with
several
models
as
points
on
a
continuum
between
different
endpoints
in
in
in
the
trade-off
space.
So
say
you
say:
you've
got
a
triangle
represented
by
the
extremes
of
accuracy,
privacy
and
explainability.
A
D
And
yeah
yeah,
there's
ensemble,
sort
of
things
and
and
automl
tools
can
do
things
like
that,
because
they
just
try
a
bunch
of
things,
and
you
just
pick
the
in
fact.
Netflix
do
that.
That
was
one
of
the
examples
they
use
in
the
meta
flow
is
that
they
would,
you
know,
do
a
hundred
different
variants
of
it
in
parallel
and
just
sort
of
almost
brute
force
it
out
and
then
pick
the
one.
That's
that's
part
of
the
flow
pipeline
would
be
to
pick
the
one.
D
That's
that's
good
enough,
based
on
some
constraints,
and
some
of
those
options
would
be.
You
know
in
this
case
might
have
more
privacy
challenges
than
others,
and
then
you
could
weight
it
accordingly
and
pick
the
one
that
you
know
if
you
value
privacy
overall
else,
you
want
to
be
as
explainable
as
possible,
but
no,
you
know
yeah.
I
like
that
triangle
idea.
So
could
we
just
take
some
notes
on
this,
or
should
we
open
a
pull
request
and
just
put
some
dot
points
of
what
you
just
said,
because
it's
easy
to
lose
track.
A
D
So
the
next
one
yeah
the
triangle-
things
are
great.
I
know
we
can't
really
include
a
visual
in
there.
It
would
look
a
bit
weird,
but
if
we
could
have
that
somewhere,
that's
kind
of
an
interesting,
interesting
idea.
It's
like
the
what's
the
distributed
computing
version
of
that
the
durability
and
availability
and
gets
the
cap
serum
consistency.
D
D
B
B
So,
with
with
the
model
training
as
it's
done
now,
usually
these
formulations
that
are
are
checking
many
different
variants
of
a
model
they
like
in
different
parameters.
They
are
looking
at
precision
versus
a
recall.
Often
those
are
like
two
standards,
so
this
would
be
beyond
that.
Your
whatever
system
you
have
in
place
come
to
compare
your
different
potential.
D
I
I
don't
know
for
a
fact
what
netflix
do
but
yeah
they
they
try
all
those
permutations
of
hyper
parameters,
but
I'm
guessing
they're
just
looking
for
accuracy,
so
that's
simpler,
but
yeah
you,
you
could
have
some
other.
I
don't
know
how,
and
maybe
this
is
part
of
what
terry's
thinking
for
the
requirement
is
like.
How
do
you
sort
of
quantify
the
the
privacy
trade-off?
D
It's
maybe
it's
how
many
you
know,
maybe
if
there's
10
features
of
personally
identifying
information,
something
that
uses
three
scores
better
in
privacy
than
something
that
uses
seven
maybe
like.
If
all
seven
features
are
used
to
train
the
the
best
model
that
has
better
accuracy,
then,
although
accuracy
and
precision
have
a
specific
meaning
in
models,
so
the
the
good,
the
goodness
like
the
more
good
or
I
don't
know
you
know
the
best
model
by
some
whatever
your
metric
is
that
you're.
D
You
kind
of
have
to
that's
the
trade-off,
then,
if
you
could
say
I'm
using
three
of
these,
but
I
get
this
score
or
I
use
seven
of
these
and
I
get
that
score.
The
trade-off
is
that,
well
that
score
is
actually
good
enough,
so
I'm
gonna
well
that
school's,
not
good
enough,
so
I'm
gonna
take
more
of
that.
Pii
features
in
there
is
that
right.
A
Yes,
I
think
the
the
existing
situation
is,
to
a
large
extent,
people
set
thresholds
and
then
they're
aiming
to
get
one
parameter
above
a
threshold,
whereas
the
the
the
practical
application
in
the
future
is
going
to
be
much
more
about
having
to
tune
for
the
best
compromise
against
multiple
parameters.
A
A
And
try
and
get
them
to
resolve
against
different
points
in
our
triangle
and
then
you'll
compare
the
behaviors
of
those
against
a
set
of
of
overarching
tests
and
then
select
from
that
group.
One
which
is
the
the
best
compromise
for
the
for
the
application.
E
If
I
can
ask
for
so
for
now
currently
for
what
it
looks
like
the
the
measurement,
the
kpi
for
a
good
model
is
how
accurate
it
is,
and
what
we're
saying
is
we
need
to
add
more
measurement
measurements
to
to
to
see
it's
like
a
triangle,
fairness,
accuracy
and
their
privacy,
and
you
need
to
take
in
account
all
of
these.
D
Yes,
it
is,
and
I
don't
I
don't
think,
anyone's
really
yeah,
it's
probably
yeah.
Maybe
people
are
already
doing
this,
but
it'd
be
surprisingly
surprising,
but
I
I
imagine
you
could
come
up
with
a
like
if
you
the
way
I
think
of
it
is
that
you
have
a
set
of
so
many
features
that
you
may
or
may
not
use
in
one
of
those
permutations
and
then,
when
you
look
at
the
output
of
that,
you've
got
all
the
precision
and
accuracy
of
the
bottle
itself
and
then
you've
got
how
many
features
of
those
set
were
used.
D
E
E
D
Yeah
the
the
biases
come
into
it
at
that
point,
based
on
the
on
the
data,
and
I
guess
the
role
of
ml
ops
there
is
to
have
that
sort
of
trail
or
record
of
how
you
got
to
there
like
people
will
ask
how
come
it
doesn't
work
for
this
group
of
people
and
then
you
can
go
back
and
go
well.
Here's
the
training
set
of
data,
it's
from
this
nation
or
this
city
they're
not
represented,
and
then
you
can
go
well,
that's
a
problem.
How
do
we
do
that?
D
And
then
you
can,
you
could
add
inject
extra
data
or
you
could
add,
I
guess,
unit
tests
or
acceptance
tests.
If
you
like
to
go
well,
we
won't
accept
a
model
that
doesn't
you
know
at
least
work
for
these
cases
and
and
fix
it
up
that
way,
but
yeah.
That's
definitely,
there's.
Definitely
a
role
for
mlops
to
look
at
that,
but
yeah.
This
is
something
that
people
are
discovering
every
day
like
there
was
that
I
think
it
was
a
kickstarter
startup.
D
The
other
week
that
was
guessing
people's
gender
based
on
their
email
address
is
that
right.
It
was
all
over
twitter
and
it
was,
it
was
pretty
obvious.
It
was
a
bad
idea
yeah
because
yeah
you
put
in
you
know
such
and
such
a
name
which
could
be
anything
and
it
would
say
a
60
chance,
yeah
whatever
and.
D
Chance
yeah,
it's
just
so
it's
like
it's
in
co,
it's
codifying
the
biases
of,
and
then
they
were
saying
you
know
the
people.
The
project
got
got
cancelled,
but
people
were
saying,
oh
that,
but
this
is
just.
This
is
just
reflecting
what
the
data
is,
but
I
think
you
know
we're
at
a
point
where
people
aren't
accepting
that
it's
like.
Yes,
the
data
says
that,
but
you
know
we
should
be
doing
better
than
just
echoing.
I
don't
know
what
the
real
world
has
so
yeah.
A
So
that
actually
brings
us
on
to
the
next
item,
which
is
pretty
much
the
law
of
unintended
consequences,
because
if
we
actually
have
to
validate
all
of
these
models
for
fairness
and
bias,
then
that
means
that
we
actually
have
to
hold
special
category
data.
A
That
actually
indicates
things
like
race
and
religion,
gender,
sexual
orientation
stuff,
like
that,
because
we
need
that
data
to
actually
detect
the
bias.
A
A
So
what
what
we're
going
to
see
is
a
whole
period
in
which
there
are
knee-jerk
legislation.
Attempts
to
to
fix
the
problem
and
those
knee-jerk
responses
will
actually
make
things
worse
in
other
areas,
because
we'll
get
more
data
breaches
as
a
result
of
having
to
store
more
data
to
address
the
previous
concerns.
A
A
A
So
so
this
piece
really
needs
to
address
data
classification
and
data
protection
and,
and
also
you
know,
probably
some
degree
of
auditing
in
terms
of
managing
the
the
day-to-day
level
of
sensitivity
of
of
the
information
that's
flowing
through
your
mlx
system.
D
So
some
of
these,
so
when
you
say
special
care
category
data,
some
of
that's
in
different
countries,
it's
protected
data.
There
must
be
already
in
place
some
exceptions
around
that,
like,
for
example,
in
insurance
policies
for
vehicles
or
other
things,
even
life
insurance.
D
Here,
certainly
they
find
out
quite
a
lot
about
you
in
fact,
you're
obliged
to
tell
them,
but
normally
in
a
normal
workplace
professional
setting.
They
can't
ask
that,
like
in
in
my
country,
like
you,
can't
even
ask
someone
how
old
they
are
professionally,
you
can't
ask
about
their
marital
status
or
or
any
or
religion
or
ethnicity.
Any
of
that
stuff
is,
is
you
can't
actually
ask
it
as
an
employer?
D
They
can
ask
about
what?
How
old
are
you?
No,
they
can't,
yes,
not
actually
allowed,
but
everyone
everyone,
volunteers,
it
I
mean.
There's
certain
forms.
You
write
down
your
date
of
birth,
but
like
there's,
there's
rules
around
around
age,
but
obviously
you
need
to
know
that
for
getting
that's
a
very
trivial
example,
but
for
getting
in
a
car
insurance,
for
example.
So
I
assume
there's
already
some.
D
E
D
They
do
that,
that's
what
I'm
saying
is
they
do
know
so
they
they
must
have
some
special
exception
or
it's
it's
yeah.
So
there's
I
guess
that's
the
more
on
the
legal
aspect,
but
I
guess
my
point
is
more
that
this
isn't
the
first
time
sort
of
this
handling
of
the
sensitive
data,
protected
data
or
special
category
data.
It
has
come
across
banks
and
insurance
companies
all
over
the
world.
D
Do
this
every
day,
maybe
lawyers,
but
then
they
probably
yeah
it's
filed
away
in
a
cabinet
or
it's
probably
on
a
usb
drive
that
they
just
leave
at
a
bar
or
something
anyway.
That's
how
you
hear
about
this
stuff,
but
yeah.
I
guess
I'm
saying
is
there
in
terms
of
the
data
security,
it
might
not
be
that
there's
new
solutions
or
anything
required.
It's
more
that
considerations
for
the
data
might
matter
more
here
than
than
say
a
typical
software
system.
D
The
data,
you
could
argue
the
data
is
more
sensitive,
so
someone
building
one
of
these
systems
should
be
more
aware
of
encryption
at
rest
or
or
google
even
brought
out
that
system
that
has
things
encrypted.
You
know
almost
all
the
way
to
you
know
the
help
yeah
yeah,
so
that
the
data
stays
encrypted
as
long
as
possible.
D
How
it's
these
are
things
you
might
not
care
about
if
you
are
building
a
an
e-commerce
system,
for
example,
but
if
you're
building
an
e-commerce
system
that
was
tracking
every
click
and
every
transaction
ever
and
and
a
whole
lot
of
the
customer
and-
and
you
know,
maybe
their
mobile
app-
that
they're
using
that
tracks
their
location,
then
suddenly
you've
got,
I
guess,
an
escalation
of
data
potentially
that
that
you
would
have
to
handle
more
carefully
than
you
would
in
the
past.
B
Can
I
just
ask
it's
more
of
a,
I
guess,
slightly
technical
question,
just
checking
something,
so
we
need
to
have
this
in
the
way
that
we're
structured
in
the
roadmap,
the
the
more
sensitive
data
needs
to
be
collected
and
kept
in
order
to
check
that
the
your
models
don't
have
bias.
B
That's
how
we
have
it
structured
here,
like
unintended
consequences,
kind
of
thing
and
would
it
that
is
because
you
should
split
your
test
data
from
your
overall
data
set
and
keep
it
separate
from
your
training
data.
B
This
would
be
a
difference
from
that
standard
practice,
which
is
probably
best
practice
for
checking
your
model,
but
but
if
you
had
data
that
had
been
cleansed
of
all
more
anonymized,
so
what
weren't
collecting
this
data
and
if
it
existed,
was
being
taken
out
of
the
data
set
and
you
trained
your
model
with
that.
D
D
A
The
the
challenge
is
that
the
bias
is
in
the
data
rather
than
in
the
model,
so
so
you're
you're
you're
actually
needing
to
add
information
into
the
model
into
the
into
the
source
data
to
flag.
The
fact
that
some
of
the
items
have
bias
that
you
can
then
train
against
those
to
even
out
the
model
that
you
get.
B
That's
assuming
you
know
that
degree
of
skew
and
a
lot
of
the
times
I
mean
parsley,
is
just
people
not
thinking
about
it,
but
when
you
think
of
things
like
the
mortgage
scandal
were
certain
postcodes
because
they
were
predominantly
black.
Postcodes
were
really
penalized
under
this
ai
okay.
So
in
that
case,
yes,
we
have
bias
in
reality
and
that
will
be
shown
in
the
data,
even
if
you're
not
collecting
for,
but
what
you
could
have
is
you
could
have
a
data
training
set.
That
was
checking
for
this,
like
the
model
you've
used.
B
Does
it
actually
in
you
know
real
world,
but
synthetic
real
world?
Would
it
result
in
you
know
all
these
side
effects
that
we
don't
want,
say
racism
or
sexism
or
whatever,
so
that
would
be
enabling
you
to
keep
your
data,
keep
some
sensitive
data
out
and,
of
course,
like
different
models
like
if
you're
in
medical
care,
like
you
probably
need
a
lot
more
of
that
data,
but
you
know
for
a
lot
of
models.
A
D
Like
perhaps
the
the
the
you
know,
specific
examples
that
might
be
easier
to
synthesize
like
it's
not
uncommon,
to
synthesize
data
to
balance
some
training
set,
but
you
have
to
know
a
lot
about
what
you're
synthesizing,
because
it's
always
risky
to
train
on
data
that
isn't
real,
but
people
still
do
it
like
people
will
balance
data
sets
and
drop
a
certain
percentage
of
things
to
like
in
in
this
case,
terry?
How
are
you
if
there
was
some
bias
detected?
D
It's
like
it's
one
thing
to
sort
of
failed
and
reject
it,
but
how
do
you?
How
is
that
corrected
like,
like
this
data,
that
you've
got
the
sensitive
category
of
data
that
you've
got
your
hands
on?
Does
that
have
to
be
fed
back
into
the
model
itself
like
it
does?
Doesn't
it
to?
If
you
want
to
correct
it
with
real
data,
you
don't
really,
or
can
you
adjust
weights
on
things
like?
Maybe
it's
like
with
enough
data?
Surely
it's
not
it's
not
completely
black
and
white,
where
the
source
data
is
completely
missing.
D
A
D
Yeah
I
mean
in
some
cases
you
might
not,
because
you
might
want
your
model
to
reflect
reflect
reality,
but
you
just
hand
code
heuristics
from
the
output
of
that.
So
you
might
you
know
in
the
mortgage
case,
you
might
go
well
if
the
person's
from
this
protected
category-
or
we
know
these
postcodes
based
on
the
data-
are
problematic.
D
D
You
know
almost
hacking
around
it
you're
using
the
model
for
where
you
want
to
use
the
model
and
then
you're
using
yeah
ensemble
approaches
work
that
way
effectively.
They
just
have
different
models
for
different
sections
of
the
data,
rather
than
one
big
bundle
yeah,
so
I'm
sure
there's
lots
of
technical
solutions
there,
but
yeah
it's
not
a
if,
if
reality
is
biased
away
in
a
way,
that's
not
suitable
for
your
business
and
it's
decision
making.
Then
you
basically
have
to
either
bend
reality
or
bend
the
decisions
to
ensure
fairness
and
equity.
A
Yeah,
but
again
it's
I
it's
actually
from
this
perspective,
it's
irrelevant
what
the
what
the
solution
would
be
to
rectifying
the
underlying.
B
D
D
D
Yeah,
so
even
if
you
could
synthesize
things
to
validate
it
for
some
parts
of
your
pipeline
or
retrain
it
to
correct
the
balance
at
least
initially
or
probably
regularly,
you
would
need
to
be
getting
data
into
the
system
that
is
sensitive
because
you
wouldn't
know
like
you
could
you
could
go
a
fair
way
with
synthetic
data
creating
fake
personas
that
match
different
things,
but
at
some
point
that
will
drift
from
reality
and
then
you'll
end
up
back
where
you
started
where
it's
it's
miscategorizing
underrepresented
people,
because
your
synthetic
personas
are
not
filling
the
gaps
right
so
yeah,
I
guess
that's.
D
A
D
Yeah
I
mean
in
in
in
traditional
software:
it's
you
would
get
if
you're
building
on
a
system
that
that
has
a
database
of
some
sort
of
pretty
much
everything
does.
D
Then
it's
not
uncommon
for
people
to
want
to
have
some
subset
of
the
production
data
to
work
on,
but
in
general,
people
won't
have
the
whole
copy
of
it
or
they'll
have
a
scrubbed
version
of
it
or
it'll,
be
a
very
small
portion,
and
it
won't
at
all
reflect
the
real
production,
typically
in
machine
learning,
development
and
and
data
science
realm
that
doesn't
really
cover
it.
D
So
you're
gonna
be
handling
what
would
be
production
data
just
to
get
anything
done
like
it
might
be
a
subset
of
it
just
scale
it
down
to
get
some
algorithms
right,
but
essentially
it
will
be
the
real
sensitive
stuff
and
it'll
be
a
lot
of
it
too,
like
even
a
small
subset
is
still
a
lot
more,
whereas
in
normal
software
development
it
would
be
fine
to
have
a
a
barely
populated
system
just
enough
to
exercise
the
functionality.
You
would
really
work
across.
A
I
think
that's
the
that's.
The
point
that
we
want
to
get
across
in
the
technology
requirement
is,
is
that
you
know
the
all.
Mlops
systems
will
be
expected
to
to
work
with
highly
sensitive
data
and
need
to
take
that
into
account
at
a
design
level,
because
otherwise
we're
going
to
go
into
a
period
where
there'll
be
multiple
data
breaches
and
every
time
it's
going
to
be
somebody
attacking
the
the
the
mlops
system
to
get
access
to
the
to
the
data.
D
That's
a
great
point.
I
understand,
I
mean
yeah
input.
Devops
type
infrastructure
is
sometimes
a
fairly
advanced
target
of
attack
because
it
has
its
fingers
in
everywhere
and
yeah.
This
just
magnifies
that,
because
if
you've
got
every
ml
engineer
with
a
copy
of
this
sensitive
data,
that's
a
huge
attack
surface,
it's
you
know
all
their
laptops
and
things
like
that
and
typically
laptops.
D
You
know
they're,
not
they're,
not
as
uniform
and
securable
as
a
server.
Yes,
it's
just
the
way.
It
is
it's
they're,
very
customized.
So
so
I
guess
there's
one
sorry
sure.
D
So
black
boxer
is
that
is
that
to
protect
the
ip
or
is
that
to
protect
the
data
or
both?
I
guess
it's,
the.
A
So
that
means
you
actually
need
to
provide
protections
that
can
detect
large
amounts
of
data
being
fired
at
particular
model
services,
because
that
might
indicate
that
someone
is
trying
to
reverse
engineer
your
your
model.
D
I
mean
that
that's
a
that
would
be
a
similar
pattern
of
attack
to
to
the
other
ones
like
extracting
data
about
individuals,
membership,
inference
attacks
by
by
just
throwing
data
out
and
seeing
if
it
does
something
different
yeah.
So.
D
Guess
the
the
analogy
here
in
the
analogy
here
in,
for
example,
web
apps
would
be
protections
against.
You
know,
request,
forgery
and
and
injection
attacks,
and
things
like
that.
There's
probably
there's
a
bunch
of
you
know:
there's
the
oh
wasp
12
or
whatever
it
is
sort
of
things,
there's
obviously
a
lot
more
ways
to
attack
things,
but
that's
a
good
generic
sort
of
baseline
of
things.
So
it's
analogous
to
that.
D
A
Yeah,
so
that
you
you
would
expect
there
would
be
a
need
for
validation
level
tooling,
to
allow
you
to
exercise
your
models
in
against
things
like
adversarial
attacks
during
the
integration
phase
of
your
development.
A
D
E
More
like
a
firewall
protection
or
more
like
a
sandbox
where
you
put
your
model
in
and
you
run
a
lot
of
tests
before
production.
D
A
So
again,
using
the
the
airwass
model
you're
going
to
have
owasp
top
10
penetration
testing
done
during
your
build.
But
then
you
would
also
want
some
sort
of
owasp
rast
style
deployment
in
in
your
application.
That
is
actively
monitoring
incoming
traffic
and
and
either
warning
or
reacting
to
additional
attacks
on
on
the
service.
D
There's
a
lot
a
lot
of
if
it
follows
the
analogy
of
other
software.
Most
attacks
are
fairly
generic,
like
the
usually
software's
compromised
by
casting
a
very
wide
net,
and
then
one
out
of
ten
thousand
servers
will
respond
with
something
and
there's
also
spearfishing
and
more
targeted
things
and
social
engineering.
Social
engineering
is
outside
of
the
scope
of
this,
and
and
there
might
be
a
very
specific
model
that
someone
wants
to
attack,
in
which
case
the
generic
things
won't
help
a
lot,
but
that's
the
same
with
websites.
Today,
it's
the
like.
D
I
had
a
bunch
of
friends
that
worked
on
the
the
blogger
platform,
I
guess
still
around,
but
google
and
they
would
come
under
ddos
all
the
time,
because
there
were
political
blogs
on
there
and
there
were
state
actors
wanting
to
attack
it,
and
their
strategy
was
just
to
make
things
scale
so
much
that
the
ddoses
weren't
ddoses
anymore,
and
then
they
couldn't
really
prevent
it
so
that
they
could
make
it
strong
enough
to
harden
enough.
So
it
didn't
actually
matter.
So.
D
I'm
sure
there's
analogies
here
as
well,
although
this
in
this
case
it's
not
so
much
a
protection
of,
I
mean
there
might
be
protections
around
availability,
although
generally
models
are
fairly
efficient
at
runtime
execution,
it's
maybe
it's
more
the
training
of
things,
but
it's
more
than
ip
protection
and
and
defending
against
inference
attacks.
So
you
know.
A
D
I
guess
sorry,
you
go
to.
A
A
A
Like
no,
I
think
that's
that's.
That's
actually
the
point.
The
point
is
that
what
what
we're
going
to
find
is
that,
from
a
legal
perspective,
governments
are
going
to
force
us
down
a
route
that
leads
to
having
to
capture
more
information
and
that,
if
we're
not
going
to
then
create
a
a
big
privacy
problem,
we're
actually
going
to
have
to
ensure
that
the
the
implementations
of
these
solutions
are
built
to
an
appropriate
standard
to
mitigate
the
risks
of
that
happening.
A
A
So
we
have
to
think
ahead
of
all
of
this
and
and
plan
for
the
situation
where,
where
we
are
end
up
end
up
left
trying
to
reconcile
a
set
of
poor
decisions
that
are
imposed
on
us
in
law
and
make
sure
that
we're
not
creating
a
a
technical
vulnerability.
Under
those
circumstances,.
A
D
Like,
for
example,
a
gdpr
whilst
well
intended
has
an
unintended
side
effect
of
strengthening
the
position
of
facebook
and
google
in
the
face
of
other,
more
diverse
competitors,
and
that
certainly
was
not
intended,
but
that's
one
of
the
effects
it
has
had,
because
no
one
else
can
really
afford
to
operate
in
certain
markets
because
of
it,
and
that
actually
makes
things
worse
for
privacy
because
it
drives
more
data
into
facebook
than
google's
hands
than
it
would
have
before.
D
So
that's
one
example
of
an
unintended
kind
of
consequence
of
well-meaning
legislation.
So
I
guess
terry
you're
saying
that
there
could
be
a
similar
thing
here
if
governments
so.
A
What
we're
seeing
is
a
large
number
of
efforts
to
introduce.
D
A
really
good
point,
because
70
people
talk
about
explainability
and
it's
I
don't
know
what
they
mean
like
it's
other
than
that
like
are
they
thinking
there
would
be
some
like
a
human
would
explain
something
which
is
how
a
human
explains.
Something
is
never
really
the
truth
like
it's,
you
have
to
go
back
to
how
they
learned
it,
who
taught
them
and
no
one
ever
tells
the
truth
of
how
they
really
know
something.
D
So,
there's
this
false
assumption
that
you
can't
explain
without
giving
away
the
whole
chain
of
things
so
yeah.
I
think
that's
a
great
example
of.
A
Well,
the
only
way
to
demonstrate
that
from
a
legal
perspective
is
to
actually
record
the
the
gender
and
the
sexual
orientation
and
the
religion
and
the
race
of
everyone
in
the
data
set
and
then
show
that
your
model
is
not
discriminating
against
any
of
those
factors
when,
when
you
run
data
through
it,
so
you
actually
have
to
by
law,
record
information
that
you
otherwise
don't
need
in
order
to
prove
that
you're
complying
with
the
legislation
that
has
been
put
in
place.
You
actually
make
everything
worse
by
trying
to
make
it
better.
D
Well,
I
think
I
was
going
to
look
at
the
last
one,
the
intrinsic
protection,
because
that
sounds
interesting-
the
middle
one.
In
the
escalation
dated
categories,
I
could
have
a
run
at
that
and
then
pass
it
around
and
we
can-
or
maybe
we
need
to
tag
each
other
in
the
document
and
then
flush
things
out
there,
because
it
sounds
like
there's
more
discussion
on
that,
but
certainly
the
first,
the
last
one.
We
could
definitely
knock
out
pretty
quickly
and
then
we
just
got
this
one.
D
But
other
than
that,
I
think
I
think
I'm
pooped
so
we'll
call
it
a
day,
and
this
is
really
good
stuff.
So,
if
we
need
to
next
time
we
can
dig
more
into
the
escalation
of
data
categories,
because
I
think
this
is
an
interesting
topic.
I
did
see
in
the
news
this
week
in
the
uk.
I
think
they
have
banned
police
departments
from
using
facial
recognition
in
crowds
which
is
fascinating,
like
basically
a
blanket
ban,
or
is
that
incorrectly
reported.
A
D
The
general
case
it's
too
tempting
it's
like
it's
too
people
want
that
sci-fi
future
of
you've,
seen
hollywood
movies,
where
it
picks
someone
from
the
crowd
and
yeah
it's
it's,
but
I
did
see
that
in
the
news
but
yeah.
I
thought
that
was
interesting.
D
All
right
well,
yeah
thanks
everyone
and
good
good
conversation
as
always,
and
some
good
notes
and
some
good
stuff
to
look
at
next
and
then
then
we
can
sort
of
go
on
to
the
final
bit
of
the
road
mapping
of
filling
in
squares
of
when
things
can
happen
and
what
might
be
out
there
and
yeah
all
right.
Keep
the
ideas
flowing
and
chat
next
time.