►
Description
Fast Data on-Ramp with Apache Pulsar on K8 - Timothy Spann, StreamNative
As the Apache Pulsar communities grows, more and more connectors will be added. To enhance the availability of sources and sinks and to make use of the greater Apache Streaming community, joining forces between Apache NiFi and Apache Pulsar is a perfect fit.
Apache NiFi also adds the benefits of ELT, ETL, data crunching, transformation, validation and batch data processing. Once data is ready to be an event, NiFi can launch it into Pulsar at light speed. I will walk through how to get started, some use cases and demos and answer questions. Benefits to the Ecosystem.
https://www.datainmotion.dev/
https://github.com/tspannhw
A
I
have
a
little
demo,
but
based
on
the
timing,
I'm
not
showing
it
I'll
be
in
the
booths
in
s91
over
by
the
cafe
tomorrow
in
the
next
couple
days.
If
you
want
to
see
some
of
this
in
the
real
world,
now
look
two
different
technologies,
working
together
in
the
open
source
that
help
out
for
a
lot
of
different
workloads.
A
One
of
them
is
apache
nifi.
It
is
an
open
source
project.
That's
I
kind
of
think
of
it
as
a
swiss
army
knife,
it's
anytime
you're
trying
to
implement
getting
data
from
any
type
of
place,
and
you
don't
want
to
write
any
code
now
if
I
could
start
off
on
one
pod
scale
it
up
to
thousands
and
it's
very
easy
to
use
to
get
data,
whether
it's
in
a
batch,
whether
it's
in
a
stream
just
get
it
started,
get
it
into
your
data
pipeline.
A
A
A
They
run
pretty
fast,
so
yeah
we
could
do
real-time
apps
or
we
could
do
batch
apps.
One
of
the
nice
features
here
is
no
matter
how
much
data
you
put
into
the
message
queue.
A
We
could
scale
to
whatever
level
you
need,
because
we
could
tear
out
automatically
to
whatever
kind
of
storage
you
have,
whether
that's
s3
or
some
s3
compatible
adls
hoodoo
file
system,
whatever
it
may
be.
A
A
A
Cluster
is
pretty
straightforward,
got
a
number
of
brokers,
the
bookkeepers
for
storage
and
they
scale
up
independently
uni
using
anything
that's
standard
on
kubernetes
or
yarn
or
wherever
you
may
be
running
and
again,
like
I
mentioned
this
happens
automatically.
You
don't
have
to
know
about
it
once
you
have
it
configured,
it'll
just
do
it.
A
When
you
set
points
to
say,
if
data's
over
a
certain
age,
if
data's
over
a
certain
size,
you
could
have
it
automatically
go
out
there,
but
you
could
still
consume
it
as
if
it
was
in
the
regular
local
storage
makes
it
very
easy
for
you
to
just
have
a
message
system
that
goes
on
forever,
but
you
need
to
start
back
from
the
beginning.
You
could
do
that.
A
We
could
also
run
functions
within
this
architecture.
Functions
are
kind
of
nice,
it's
kind
of
like
having
your
own
aws
lambda,
that
you
run
yourself
in
kubernetes.
What's
cool
with
this,
we
support
java
go
and
python
very
simple
api,
just
deploy
it,
and
we
have
all
the
kubernetes
operators
and
helm
charts.
You
need
to
do
it
the
common
use
case
internally.
We
use
this
in
the
open
source
apache
to
do
sources
and
syncs.
A
A
I
didn't
really
tell
you
what
pulsar
is
I've
been
going
pretty
fast
here,
trying
to
get
everything
in,
but
it's
messaging
and
streaming
and
they're
not
the
same
thing
streaming
is
like
kafka
and
kinesis.
You
want
things
in
order,
you're
thinking
about
cdc
event
at
a
time
very
fast
things
like
flink.
We
operate
that
way.
If
you
want
to.
If
you
decide
you
want
to
do
work,
cues
or
messaging,
and
you
send
a
message:
don't
care
order,
don't
care
who
gets
it?
A
And
we
have
a
native
connector
that
my
friend
and
I
worked
on
to
make
sure
we
can
connect
to
95.
So
now,
if
I
get
that
data
started
in
the
system,
you
do
very
simple
workflows
in
nifi,
which
is
nice
no
coding,
and
then
you,
you
got
your
data
in
pulsar,
just
to
show
you
how
we
do
it.
We
have
the
open
source
operators
and
we
got
a
couple
of
managed
ones
depending
on
how
you
want
to
run
it.
One
of
the
really
cool
features
I
I
didn't
mention
is:
we
could
talk
other
messaging
protocols.
A
So
if
you
want
to
talk
kafka,
we'll
act
as
if
we're
kafka,
you
want
to
do
mqtt,
you
want
to
do
a
rocket
mq.
You
want
to
do
rabbit.
We
can
look
like
all
of
those
and
interrupt
any
kind
of
messaging
at
the
same
time,
so
send
a
pulsar
message,
pull
it
out
as
if
it
was
kafka
and
mix
and
match
as
many
as
you
want.
At
the
same
time,.
A
If
you
don't
want
to
run
it
yourself,
you
could
run
it
in
our
cloud.
It
will
do
it
for
you.
We
have
a
free
tier,
you
want
to
start
off
with
and
you
can
run
it
as
if
you
just
want
to
run
it
as
if
you
had
kafka
that
scales
infinitely
and
you
could
have
a
million
topics
without
having
to
worry
about
brokers
or
anything
like
that
runs
all
the
kafka
stuff.
If
you
need
to
do
that,
I've
got
links
to
all
my
stuff
here.
I
don't
know
how
much
time
I
have.
A
I
know
I'm
going
pretty
quick
here,
but
I'd
only
have
a
couple
of
minutes,
so
I'll
just
go
on
through
quickly
I'll
give
all
these
slides
out.
So
you
can
get
to
all
the
links.
If
you
have
want
to
see
demos
examples
different
things
we
work
with
I'm
in
booth,
s91
next
couple
of
days
I'll
show
you
some
different
demos,
microservices
spark
flink.
A
We
interrupt
with
a
lot
of
different
things
and
it's
part
of
the
apache
projects.
You
know
we
work
well
with
pretty
much
all
of
them
there,
whether
it's
kafka
spark.
What
have
you
just
show?
You
some
kind
of
command
line,
because
people
like
to
see
that
and
a
link
to
our
thing
how
we
can
auto
scale
up
our
pulsar
functions,
which
is
our
microservices
in
kubernetes,
just
using
some
custom
metrics
and
it's
pretty
straightforward.
A
A
A
Want
some
details
on
anything
wants
better
pictures.
I
don't
know
want
to
see
batman
robin
moore.
I
don't
know
yes.
B
A
A
So
it's
really
designed
to
do
something.
It
could
be
routing
transformation,
machine
learning,
we've
got
someone
implemented
a
sql
engine
in
there.
It's
really
something
happens.
Do
something
and
what's
nice
is
it's
triggered
by
something
going
into
a
pulsar
topic
which,
like
I
said
you
could
have
millions
of
topics
broken
down
with
multi-tenancy
for
tenants
and
name
spaces.
A
A
If
you
needed
a
longer
running,
one
I'd
probably
say,
run
spark
or
run
flank
or
run
a
google
function
or
something
else
you
could.
I
don't
really
want
someone
sitting
in
there
for
hours
in
one
of
these
doesn't
really
make
sense.
Something
like
one
of
those
other
infrastructures
would
make
more
sense.
This
is
really
an
event
comes
in
you
know.
Maybe
I
want
to
do
real
time,
nlp
on
it
on
one
piece
of
data.
A
One
event,
one
log,
that
sort
of
thing
we
tend
to
have
people
if
you
want
to
do
joins,
do
it
with
flink
sql.
If
you
want
to
do
etl
spark,
we
don't
want
to
do
any
everything
in
the
world.
We
do
enough
with
messaging
and
streaming
so,
but
we
needed
these
functions
so
we
opened
it
up
for
all
the
infrastructure
to
use
it
well,
we
might
not
have
more
questions.
A
B
A
Yeah
we
have
there's
a
couple
of
different
cloud
companies
in
china
that
have
created
in-memory
data
warehouses
with
flink
and
pulsar
together
and
those
are
in
the
hundred
petabyte
range,
and
it's
fast
enough
that
it's
used
for
if
you
know
singles
day
in
china,
it's
part
of
that
infrastructure,
so
real
time
transactions
pretty
powerful.
I
don't
know
if
I'd
use
it
for
scientific
computing,
but
you
know
flink
can
do
a
lot
in
memory.
We
could
run
as
much
as
you
need
in
memory,
and
you
know:
what's
nice
too
is
once
it's
in
pulsar.
A
A
Since
you
can
have
you
know,
especially
with
the
tiered
storage,
maybe
I'll,
keep
50
terabytes
in
recent
local
bookkeepers
storage
and
then
do
the
rest,
the
other
500
petabytes
in
s3
storage.
You
know
and
then
I
can
look
back
and
I
can
rerun
everything.
That's
ever
happened
for
topics
in
order
and
I
don't
have
to
do
any
special
code
for
that.
A
I
could
just
point
to
earliest
offset
and
just
do
that,
and
I
could
do
that
with
the
native
client
drivers,
which
support
like
the
top
16
languages
out
there,
or
I
could
do
that
with
spark
or
flank
we're
first
class
ones
for
both
of
those
projects
pretty
straightforward
to
do.
A
This
is
a
typical
app
that
I
do.
I
have
some
app
doing.
Something
gets
data
into
pulsar.
Have
a
function,
do
something
like
for
mine.
It's
breaking
data
up,
I
get
data.
I
pull
out
of
a
couple
different
rest
sources,
clean
it
up
based
on
where
it
should
go.
Reformat
it
put
it
into
a
couple,
different
topics
and
then,
as
those
events
pop
up
a
spark,
etl
grabbing
a
batch
of
them,
dropping
them
into
a
table,
and
then
I've
got
flink
sql
running
continuously.
A
So,
as
events
come
in,
it's
updating
its
sql
results
of
that
sql
can
go
into
another
topic.
They
can
go
into.
You
know
a
file
system.
It
can
go
into
something
like
hbase
or
kudu
or
any
of
the
data
stores
that
flink
supports.
So
it's
a
nice
way
that
this
this
is
a
toy
application.
I
wrote.
I
know
there's
a
lot
of
different
areas,
but
each
section
is
very
simple:
the
client
libraries
are
pretty
straightforward,
whether
you're
doing
java
python
go
rust.
Kotlin
mo
scala.