►
From YouTube: Kubernetes & Data Engineering - Salvador Cruz, Bonzzu
Description
¿Por qué Kubernetes se ha vuelto tan importante en el campo de la Ingeniería de datos? Es bien sabido que los contenedores se han popularizado hoy en día debido a su portabilidad, además de que Kubernetes es una herramienta robusta y escalable. Descubramos de qué manera K8s puede ser de utilidad en un datalake o lakehouse.
Contacta a Salvador en:
- https://www.linkedin.com/in/salvador-cruz-0840aaa3
- https://twitter.com/salva_sgcg
A
A
Good
afternoon
to
all
it
is
a
pleasure
for
us
that
you
continue
tuning
us
in
the
edition
of
professionals
of
the
cabernet
community
day.
I
have
the
pleasure
and
the
pleasure
to
introduce
our
guest
speaker
to
this
opportunity,
salvador
cruz,
nice
to
meet
you,.
The
time
is
all
yours,.
Thank
you
very
much,.
B
It's
good,
well,,
thanks
for
the
introduction,
today
I'm
going
to
come
to
talk
a
little
about
what
governed
test
is,
and
data
engineering,
above
all,
because
rulers
is
a
very
powerful
tool
that
we
can
use
today.
So
we
are
going
to
see
how
we
can
take
advantage
of
all
the
big
data
side.
We
are
going
to
review
the
agenda.
B
This
I
am
going
to
be
talking
about.
Why
use
containers
for
All
this
part
of
data
processing,
likewise,.
We
are
going
to
review
why
governments
have
increased
their
popularity
lately
in
this
field.
We
are
going
to
see
some
big
data
practices,
this
above
all,.
This
is
a
little
bit
also
focused
on
the
experience.
I
have
had
working
in
some
projects
with
clients
and
users
and
well.
We
are
also
going
to
see
some
of
the
tools
that
can
be
quite
useful.
B
These
are
just
some
of
the
whole
compendium
that
we
have,
but
I
think
they
are
quite
useful
and
we
will
be
concluding
with
some
points
where
they
can
be
done.
O
well,.
Some
recommendations
are
going
to
be
made
about
what
can
be
done
in
cover
net,,
which
are
the
best
approach
that
they
can
give
to
your
project,
and
we
are
going
to
be
finalizing
the
issues
in
general.
Now,
let's
start
with
which
container
It's
good,.
Let's
start
by
mentioning
that
containers
do
not
offer
standardization.
As
you
well
know,.
B
It
is
somehow
an
isolated
environment
in
which
we
can
have
our
libraries,.
We
can
have
packages,
and
well,.
These
containers
help
us
in
a
certain
way
to
make
migrations
of
In
a
simple
way,.
If
we
want,
for
example,
to
run
this
container
on
a
machine
that
has
linux
or
on
a
machine
that
has
windows,,
it
will
be
practically
the
same,
because
it
is
offering
us
that
standard
of
use,
that
is,
we
do
not
have
to
do
any
additional
configuration
or
anything
for
it.
style,.
That
is
why
containers
are
really
very
popular
today,.
B
They
help
us
to
do
a
lot
of
tasks
of
this
type
and
especially
for
the
applications
that
we
want
to
develop
now,.
This
also
provides
us
with
the
ease
of
support
in
terms
of
repetitive
tasks
and
jobs..
He
is
already
talking
specifically
about
data
engineering,.
This
is
focused
on
this
part
of
repetitive
tasks:
because,
as
you
know,
in
the
world
of
big
data.
B
Tools,
temporal
computation
is
used,,
so
that
means
that,
well,.
We
are
going
to
be
perhaps
auto
scaling
incrementally,
according
to
the
high
amount
of
data
that
we
are
working
with,,
we
are
also
going
to
see
why
containers
offer
us
better
support,,
especially
for
an
architecture
focused
on
microservices
in
the
case
of
cuber
minds,,
since
we
can
have
microservices
for
our
data
platform
configured
in
different
ways.
B
Let's
say
streaming
for
data
processing,
for
analysis,
for
machine
learning,
models,
I'm
going
to
be
all
clearly
separated
and
I
think
it's
a
better
way
to
manage
the
whole
application
in
general,
for
example,
before
if
we
Let's
go
to
an
example
of
applications
as
such,
since
they
were
all
monolithic
so
to
maintain
it.
That
was
quite
complex
because
we
have
to
modify
in
a
certain
way
all
the
dependencies
and
the
entire
system
in
general.
So
this
issue
of
microservices
has
helped
us
to
leave
a
little
All.
This
administration
is
more
separated
and
which,.
B
Well,
has
brought
us
many
advantages.
Now.
We
are
going
to
talk
about
why
the
government
has
become
so
popular
in
this
field
of
data.
First
of
all,.
There
is
what
is
the
orchestration
of
the
containers,,
as
you
well
know,
before
cooperating.
Well,,
it
was
a
little
more
difficult
to
do
that
configuration
and
that
administration,
because
many
things
had
to
be
done.
Manually,,
the
connections
between
the
containers
have
to
be
ensured,,
so
rulers
help
us
to
make
this
a
little
easier
for
talking
specifically
about
data
engineering,
well,.
B
This
It
is
a
very
important
point,
since
orchestration
will
always
happen,
that
is,,
we
will
always
almost
always
be
talking
about
what
temporary
compu
is.
We
also
have
the
advantage
that
it
has
a
declarative
definition,.
This
means
that,
with
simple
templates,
we
will
be
able
to
configure
everything
all
the
application
everything
we
need
for
our
data
pipeline,
that
when
we
want
to
allocate
memory,
we
don't
want
to
register
the
service.
So
this
is
a
great
advantage
because
we
only
have
to
work
with
the
templates
called
and
well.
B
We
can
have
everything
in
a
compendium,
maybe
in
a
repository
this,
and
we
can
work
collaboratively-
is
one
of
the
advantages
now
also
what
vernet
helps
us
a
lot
is
that
we
can
maintain
the
health
of
the
execution
layer.
This
means
that
we
will
always
have
a
desired
state.
If
we
want
a
In,
the
number
of
governing
posts,,
it
will
look
for
a
way
to
always
have
that
availability
for
the
service
that
is
being
requested
and
as
you
well
know,.
It
also
somehow
has
this
capacity
to
be
able
to
take
action,.
B
Also
talking
about
rulers
as
it
complements,
it
is
autoscaling
of
a
In,
an
impressive
way,
then,
as
the
coherent
information
grows,.
It
can
help
us
so
that
everything
we
need
can
be
worked
correctly
or
it
will
escalate
to
make
a
is
that
the
car
is
that
Jim,
depending
on
the
configuration
that
we
can
provide,.
We
also
have
the
part
of
parity
between
the
environments,
because
sometimes
there
are
many
differences
between
the
development
environment
and
the
production
mind.
B
Then
it
is
when
using
templates,
because
it
will
give
us
that
variety
that
we
need
to
be
able
to
do
precisely
the
tests
that
we
normally
need,
because
this
is
the
environment
that
is
more
like
production
and
is
where
we
can
do
all
these
types
of
tests,
and
we
are
also
going
to
have
faster
interactions
in
the
sense
that
we
are
going
to
be
working
with
the
code.
With
the
configurations
of
rulers.
We
can
work
together,
With
other
teams.
B
We
can
do
the
part
about
whether
there
is
a
cidh,
and
in
this
case
it
would
be
something
similar
to
the
box,.
It
would
be
something
more
focused
on
data,.
In
this
case.
Graus
is
practical,,
it
is
not
only
ebooks
applied
to
data,,
but
it
has
a
lot
to
do
with
all
the
automation
part
of
those
paivenses
data
that
they
have
on
their
platform,
and
one
of
the
advantages
of
using
it
is
because
it
has
surely
become
so
popular
is
that
we
can
use
gel
like
you
as
you
know.
B
Well,
it
is
a
tool
that
is
going
to
help
us
help
to
deploy
faster,
because
it
has
applications
that
are
pre-
configured,
so
that
only
requires
us
to
pass
parameters
that
give
a
minimum
configuration
and
with
that
we
will
have
applications
running
in
a
short
time
now,
from
the
best
practices
of
both
big
data
like
hot
cubes.
That
I
can
recommend
would
be
to
first
keep
the
images
small,,
especially
because,
sometimes
when
we
are
building
docker
or
cone
images,
we
want
to
get
a
container,.
B
Sometimes
we
install
libraries
that
perhaps
we
do
not
use,,
so
we
just
have
to
try
to
build
the
containers
according
to
the
libraries
that
we
need
and
liz
use
images.
If
this
is
isolated,
if
possible,
we
can,
as
everyone
knows,,
we
can
have
a
post
in
q
verne
test
and
that
pod
can
have
more
than
one
container,,
only
sometimes
due
to
issues
of
configurations
or
for
other
reasons
that
can
make
the
whole
administration
more
difficult,.
B
So
it
is
recommended
that
they
be
a
container
for
a
well
in
this
case,,
but
it
is
only
if
possible,,
there
are
services
that
sometimes
require
more
than
one
container
to
be
able
to
function,.
So
it
depends
In
the
case
of
use,.
We
must
also
verify
the
base
images,,
especially
since
the
base
images
are
the
ones
that
we
are
going
to
be
using
as
the
root,
as
the
pillars
of
our
project,,
and
we
have
to
make
sure
that
they
are
reliable,
that
they
have
sufficient
security,
among
other
things,.
B
Now
we
are
going
to
see
that
we
also
have
to
use
country
names
and
labels..
This
is
a
practice
that
is
also
rooted
in
Cover.
Net.
We
can
use
it
to
separate
the
services
to
quickly
identify
the
applications.
So.
It
is
something
that
I
strongly
recommend
in
the
part
of
it,
the
containers
as
such,.
It
is
advisable
not
to
use
the
root
user
for
security
reasons,
above
all,.
You
only
have
to
give
the
necessary
permissions
to
the
folders
to
which
the
user
requires
access
to
the
executables.
that
they
have
to
be
done.
B
But
beyond
that
I
think
I
would
not
recommend
using
root,
that
is,,
because
it
is
a
very
open
permission
and
you
have
to
use
services
to
expose
all
the
containers
that
we
have
in
function
within
cover
net,.
They
are
going
to
help
us
to
quickly
identify,
let's
say
through
a
port
again
from
rl,
depending
on
how
we
want
to
call
the
service,
but
it
is
very
important
to
always
try
to
configure
the
service
it
can
be
in
a
public
way
in
a
private
way.
B
This
already
depends
on
its
design,
but
there
is
You
have
to
remember
that.
You
always
have
to
identify
the
services
in
that
way
and
well,.
It's
on
the
topic.
One
point
would
be
focused
on
the
big
data
part,,
which
is
recommended
to
run
the
whole
part
of
hd
fs
in
12.
In
a
separate
node,,
this
of
hd
fs,
well,.
B
The
file
system
has
been
distributed,,
so
as
you
know,,
all
the
marius
tasks
are
done
there,
and
it
is
very
important
that
if
you
have
to
communicate
with
other
nodes,,
it
can
take
a
latency
time,.
So
it
would
be
advisable
to
run
everything
in
a
single
place
and
there
they
will
have
a
little
more
improvement
in
performance.
B
B
Behind
tensor
flow
is
used,
then
well,
it
will
facilitate
in
a
certain
way
all
the
training
of
the
modelers
that
have
to
be
the
team
data
scientists
or
perhaps
the
team
of
data
engineers,,
but
they
are
already
focused
on
that
use
that
they
want
to
give
it,
for
example,.
We
also
have
what
flow
is,.
The
pro
is
one
of
the
tools
that
I
have
seen
the
most,
this
is.
It
uses
a
lot
for
all
the
jocs
sky
wing,
so
this
tool,
yes,
I,
think
that
in
a
data
platform,
it
is
very
important.
B
If
you
decide
and
something
is
good,,
it
is
quite
focused
on
currents,,
so
you
can
get
a
lot
of
use
out
of
it
and
it
will
help
us,
above
all,
to
automate
the
more
automatic
and
the
processes.
Are,
I
think
it
is
The
situations
that
we
can
do
are
faster,
for
example,.
We
also
have
cloud,,
which
is
also
dedicated
to
the
whole
part
of
machine
learning.
Basically,.
We
have
the
raw
data,,
we
prepare
the
data,,
then
we
do
the
transformations,.
We
do
the
training,.
A
B
Make
an
equal
type
m,,
and
this
can
work
even
for
pixies
when
they
want
to
do
a
very
quick
test,
so
they
can
check
all
those
tools
and,
from
my
experience,
I
have
used
the
projectors
a
lot.
For
example,
what
is
kafka?
What
is
sport?
They
are
quite
useful
when
they
want,
for
example,
I
used
them
with
gel.
It
helped
me
a
lot
to
make
configurations
super
super
fast.
So
the
truth
is
that
it
covers
minds
and
offers
us
many
benefits
now.
B
But
perhaps
if
we
are
working
with
little
data,
it
may
not
be
very
necessary.
We
will
have
to
see
the
availability
of
these
tools,
let's
say,
for
example,
kafka.
No,
if
I
want
to
configure
kafka
that
will
be
running
24/7
streams,
it
can
be
a
good
solution,
but
if
perhaps
a
streaming
service
that
works
only
a
number
of
hours
a
day,
then
maybe
I
should
analyze
or
reformulate.
What
It
is
what
I
have
to
deal
with.
In
this
case,
we
have
to
see
the
frequency
of
the
execution
of
the
jobs
to
know.
B
If
it
really
covers
minds
is
the
best
solution
and,
above
all,
to
see
then
calculate
the
costs
and
calculate
the
costs
and
see
the
part
of
the
auto
scalability
for
See.
If
Cubre
Mente
supports
all
the
data
load
that
we
need
and
I
am
sure
that
it
will
be,,
but
also
as
the
autoscale
test
goes,
on,
the
cost
increases,.
B
A
B
And
I
think
it
depends
a
lot
on
how
much
they
have
implemented,
that
they
are,
but
I
think
that
Cooper
Mentes
was
born
precisely
from
that
personal
endo
need,.
There
were
configurations
that
were
too
manual
that
had
to
be
done
by
those
who
were
configuring
and
maintaining
the
system,,
so
Escobar
Mente.
If
it
comes
to
solving
those
needs
in
some
way
automatic,
so
it
may
be
that
yes,
I
can
easily
replace
it.
A
B
Think
It
depends
a
lot
on
where
they
have
all
the
infrastructure.
Configured.
The
advantage
of
Cúber
Mente
is
that,
since
it
is
agnostic,
that
is,
I
can
go
from
a
w,,
it
is
possibly
to
be,
or
from
yesterday,.
There
is
sp,
due
to
the
fact
that
we
have
a
clear,,
declarative,
definition.
The.
Only
thing
we
need
to
be
able
to
have
our
poster
is
simply
the
templates,
but
it's
good
to
work
in
the
same
way
in
hard
blue
that
one
and
I
think
this
is
indistinct.
It
depended
on
it.
B
A
B
The
ops
a
bit
basically
and
as
I
already
mentioned,
it
is
not
only
box
applied
to
data,
because
I
think
there
is
a
little
misconception
about
that.
What
we
want
to
ensure
in
a
data
ops
process,
it's
the
whole
part
of
automating
the
pipeline,
but
at
the
same
time,
being
sure
that
the
quality
of
our
data
is
not
going
to
be
lost.
So
it's
like
we
have
to
take
care
of
two
things,
instead
of
just
automating
the
whole
process
of
doing
the
rodizio
or
Either.
B
Here
we
have
to
concentrate
on
the
code,
and
apart
from
that,
we
have
to
concentrate
on
the
data.
So
there
is
an
aspect
and
in
that
aspect
is
where
all
the
data
part
enters
the
cycle,.
It
is
not
just
like
an
iteration
like
the
one
we
know
from
the
box,
but
which
is
more
like:
let's
see
it
as
an
en
tuenti.
So
we
start
in
this
part
where
the
programmer
only
cares
about
his
code
or
the
data
engineer.
B
We
make
sure
of
the
quality
of
the
data
that
the
code
is
already
working
with
the
appropriate
data
and,
at
the
same
time,
In
the
end.
We
are
going
to
the
best
where
that
part
of
automation
ends
is
where
we
are
going
to
deliver
the
best
to
the
data
scientists
to
do
a
subsequent
analysis
of
machine
learning
or
something
like
that,
and
there
would
start
a
different
flow,
because
different
teams
are
the
ones
that
who
work
in.
B
I
think
the
main
thing
is
to
start
playing
with
docker.
Let's
say
that
All,
the
containers
are
the
base
of
the
rulers,,
so
I
think
we
have
to
start
to
see
the
whole
part
of
microservices,.
How
the
containers
work,
understand
that
a
container
is
this
immutable,,
so
that
sometimes
requires
that
we
have
to
design
them
in
an
ephemeral
way,.
B
This
and
also
You
can
see
what
you
want
to
do,
that
is,,
because
we
can
use
microservices
for
fairly
standard
applications
or
we
can
have
microservices
for
data
applications,,
but
I
think
they
are
different
things,,
so
I
would
go
to
the
basics,,
which
would
be
microservices,
containers,
configuration.
of
turbinates,
the
templates
jeon
heon.
The
truth
is
that
it
has
helped
me
a
lot,
so
they
are
like
my
general
recommendations,.
A
B
Just
want
to
thank
him
for
the
time
they
gave
me
to
be
able
to
share
these
ideas.
The
truth
is
that
I
am
very
passionate
about
data
engineering.
Although
I
am
currently
working
As
a
development,,
then
there
are
disciplines
that
it
is
time
to
combine
them.
If
we
can
realize
that
we
can
do
very
cool
things
only
because
we
have
to
see
how
to
configure
them,
how
to
automate
as
much
as
possible
and
then
ensure
the
quality
of
the
product
that
is
being
delivered.
A
Definitively,
many,
many
Thank,
you
also
to
our
viewers,,
just
to
remind
you
that
the
next
talk
is
on
another
track
that
you
can
see
in
the
agenda
part,
and
also
that
we
have
a
raffle
for
a
book
with.
If
you
put
the
hashtags
on
twitter
is
castell
guatemala
and
also
dry,
then
you
can
participate
for
the
raffle
of
a
book
and
well,
without
further
ado,.
Thank
you
very
much
Salvador
and
thank
you
very
much
to
our
viewers
until
the
next
opportunity.
Thank
you.