►
From YouTube: An Introduction to Fluentd
Description
Here you will learn
1) Why logs are important
2) The challenges of collecting and consuming logs
3) How Fluentd works - and solves those challenges
4) How to configure Fluentd
A
In
this
video,
I
want
to
give
you
a
short
introduction
to
fluency
an
open
source
log
data
collector
to
understand
what
this
actually
means.
I
will
first
explain
why
we
actually
need
logs
the
challenges
of
collecting
and
consuming
application
logs,
how
fluenty
works
and
how
it
solves
all
those
challenges
and,
finally,
how
to
configure
fluenty
as
a
user.
A
Let's
say
we
have
a
microservices
application
deployed
in
a
kubernetes
cluster,
two
applications
in
node.js,
a
couple
of
python
applications,
maybe
databases,
a
message
broker
and
other
services.
All
these
applications
talk
to
each
other
and
produce
log
data.
So
each
of
these
services
is
logging
information
about
what
the
application
is
doing
now.
What
are
some
of
the
information?
A
These
applications
are
logging,
and
why
do
we
need
this
log
data?
This
may
be
some
compliance
data,
for
example,
like
if
you're
required
to
log
some
specific
information
depending
on
your
industry.
In
order
to
be
compliant,
it
could
be
for
your
application
security,
for
example,
to
detect
suspicious
requests
in
your
application
by
logging,
all
xs
attempts
with
ip
address
and
user
id,
etc
or
log
who
is
accessing
what
and
when
and
an
obvious
usage
for
log
data
is
debugging
your
application
when
there
is
an
error,
analyzing,
all
application
blocks
to
find
the
cause.
A
A
However,
as
you
can
imagine,
it's
difficult
to
analyze
loads
of
data
in
raw
log
files,
so
it's
not
really
for
human
consumption
and
without
user
interface
or
visualization
for
this
data.
How
do
you
analyze
logs
properly,
especially
across
applications,
by
checking
each
application's
log
file
and
trying
to
look
for
similar
times
to
check
across
applications?
Also
logs
will
be
in
different
formats
coming
from
different
applications
like
the
timestamps
and
log
levels,
etc.
Another
option
could
be
to
log
directly
into
a
log
database
like
elastic,
for
example,
to
then
have
a
visualization
of
this
data.
A
However,
in
this
case,
each
application
developer
must
add
a
library
for
elastic,
search
and
configure
it
to
connect
to
elastic
and
send
those
logs,
and
also
each
developer
must
configure
the
proper
format
so
again,
there's
some
challenges
with
this
option
as
well.
Now
what
about
the
third-party
applications
in
your
cluster,
like
databases
and
message
broker,
also
in
kubernetes
requests,
go
through
nginx
controller.
So
what
if
you
want
to
see
those
locks
too?
Or
what
about
system
logs?
You
can't
control
how
they
look.
So
how
do
you
collect
logs
from
all
these
different
data
sources?
A
All
of
these
are
challenges
of
collecting
and
consuming
logs
in
complex
applications
with
tons
of
useful
data,
because
you
have
loads
of
data
which
you
can't
really
consume
and
analyze,
because
you
don't
have
it
all
in
one
place
in
a
unified
format
to
be
able
to
visualize
them
properly,
so
lots
of
valuable
data
is
kind
of
wasted.
So
what
is
a
good
solution
to
that
challenge?
A
A
technology
that
lets
you
collect
all
the
data,
regardless
of
where
they
come
from
and
transform
in
a
unified
format
all
in
one
place,
so
that
you
can
then
use
that
data
again
for
compliance
or
debugging
etc,
and
that's
exactly
what
fluentd
does
and
fluency
does
that
reliably,
meaning
if
there
is
a
network
outage
or
data
spikes,
this
shouldn't
mess
up
data
collection
right,
so
fluency
handles
such
cases
as
well.
So
how
does
fluentd
work
and
how
does
it
do
all
of
this?
A
It
can
be
your
own
applications,
third-party
applications,
all
of
it
now
these
logs,
that
fluently
collected,
will
be
of
different
forms
and
formats
right,
like
json
format,
nginx
format,
some
custom
format,
maybe
and
so
on,
so
fluentd
will
process
them
and
reformat
them
into
a
uniform
way.
Now,
on
top
of
that,
you
can
enrich
your
data
with
fluency,
so
you
can
add
additional
information
to
each
log
entry
like
pod,
name,
namespace,
container
name
and
so
on.
A
So,
for
example,
you
can
later
group
logs
of
the
same
pod
or
logs
of
the
same
namespace,
or
you
can
even
modify
the
data
in
a
log
so
now
you're
streaming
your
logs
from
all
the
applications
into
one
unified
format
through
fluentd.
What
happens
to
these
logs
after
fluency
processes
them
well.
Obviously,
in
most
cases,
the
goal
is
to
nicely
visualize
them
right.
So
we
can
do
some
analysis
on
it.
Well,
fluency
can
send
these
logs
to
any
destination
you
want.
This
could
be
elasticsearch,
mongodb,
s3
kafka,
etc.
A
A
In
addition
to
elasticsearch,
you
can
actually
very
easily
configure
that
routing
in
fluency,
which
is
a
great
thing
about
fluenty,
because
it
gives
you
such
flexibility
compared
to
alternative
tools,
so
you
can
send
any
data
from
any
data
source
to
any
destination
or
storage,
and
this
flexibility
also
comes
from
the
fact
that
fluenty
is
not
tied
to
any
particular
back
end.
So
you
have
a
wide
choice
of
such
destination
targets
without
a
vendor.
A
Looking
when
using
fluentd,
now
you're
probably
wondering
what
you
as
a
fluent
user,
need
to
configure
and
how
you
can
actually
use
fluentd
first,
you
must
install
fluentd
in
kubernetes
as
a
daemon
set.
Daemon
set
is
a
component
that
runs
on
each
kubernetes
node.
So
if
you
have
five
nodes,
they
will
all
have
a
fluency
pod
running
on
them.
You
can
configure
fluency
using
a
fluency
configuration
file.
A
A
These
are
all
the
applications
from
which
fluenty
will
start
collecting
the
logs.
So
first
you
configure
which
application
logs
you
want
fluenty
to
start
collecting.
Second,
you
configure
how
these
data
entries
will
be
processed
line
by
line,
so
you
parse
each
log
as
an
individual
key
value
pair.
You
have
log
level
message:
date,
user
id,
ip
address,
etc,
and
you
do
that
in
fluentd,
using
parsers.
A
A
A
What
can
also
happen
is
when
the
backend,
the
output
target
is
not
accessible,
can
happen,
that
elasticsearch
is
down
or
mongodb
isn't
accessible.
In
that
case,
fluentd
will
handle
that
by
automatically
retrying
to
send
logs
until
that
endpoint
becomes
available
again,
and
in
addition
to
that,
you
can
also
cluster
your
fluency
setup
to
make
it
even
more
performant
and
highly
available.
A
I
should
mention
here
that
this
is
one
of
the
use
cases
of
fluency,
which
is
logging
in
kubernetes.
However,
logging
is
a
very
important
topic
in
iot
applications
too,
or
in
non-containerized.
Applications
running
on
bare
metal
servers,
for
example,
and
many
projects
are
using
fluenty
for
those
use
cases
as
well,
so
fluenty
can
be
used
in
many
different
environments.