►
From YouTube: Clickhouse proof of concept for applied ML
A
Hello
and
welcome
to
this
video
today
we're
going
to
be
talking
about
our
journey
to
create
services
that
talk
to
click
house
getting
us
up
in
kubernetes.
My
name
is
stephen
brainer,
I'm
a
senior
back-end
engineer
with
the
applied
ml
team.
A
So
we
can
get
from
an
end
to
end
python
and
golang
into
click
house
and
then
visualized
in
some
other
visualization
software,
which
we
use
grafana
for
this
to
get
an
end-to-end
experience
of
how
the
whole
click
house
process
would
work
when
deployed
in
kubernetes
in
a
simplified
fashion
that
we
could
then
take
that
and
export
to
our
existing
applications
problem
statement.
We
need
to
move
off
postgres
for
some
of
our
data
because
we
wanted
to
get
better
throughput
version
of
our
analytics.
A
Postgres
is
a
great
database.
It's
wonderful
for
crud-based
operations
and
your
standard
oltp
kind
of
things.
If
you
use
those
terminologies
online
transactional
processing
is
what
that
acronym
stands
for,
and
we
want
to
move
towards
more
of
an
analytics
database
that
we
could
use
for
getting
distributions
of
model
performance,
doing
analytics
and
a
series
of
other
things.
A
So,
for
that
we
decided
to
use
click
house
click
house
has
some
wonderful
benefits
in
that
it
is
a
column
oriented
database
instead
of
a
row
oriented
database
which
allows
for
really
really
fast
processing
of
analytics
and
sort
of
ad
hoc
queries.
A
It
has
an
incredibly
rich
table
engine
which
we
will
not
be
covering
in
depth
in
this
meeting,
which
allows
you
to
be
incredibly
flexible
with
how
you
process
your
queries
in
that
way,
it's
very
good
for
that.
It
is
not
good
as
a
transactional
processing
database
similar
to
postpress.
So
if
you
have
to
do
a
bunch
of
updates
click,
us
is
not
your
jam,
wouldn't
recommend
it
for
that,
there's
also
a
series
of
really
well
supported
baked
in
functions.
A
B
So
how
we
actually
iterated
well
before
anything
we
released
some
research.
B
A
Yeah
so
first
steps
for
this
were
spinning
up
a
clickhouse
instance
on
kubernetes.
We
leverage
the
alternati
kubernetes
operator,
for
this
makes
the
whole
process
significantly
easier.
Instead
of
having
to
maintain
all
of
the
kubernetes
manifests,
it
provides
you
a
templated
system
which
allows
you
to
spin
it
up
yourself.
I
would
highly
recommend
anyone,
spinning
up
click
house
on
kubernetes
uses
that
thing
uses
that
operator,
as
opposed
to
most
any
other.
A
A
A
We
tried
a
different
number
to
the
different
database
engines
into
table
engines
to
create
a
database
of
dogs
that
tracked
things
like
breed
height
weight.
Things
like
this
and
then
a
service
that
generated
random
data
and
just
spammed
click
house
to
see
if
we
could
hit
threshold
hit
caps
for
inserts,
which
we
were
able
to
do
more
on
that
in
the
documentation
attached
to
the
repository
once
we
finished
that
we
were
able
to
visualize
all
the
data
in
grafana,
using
the
click
house
plugin
and
a
pretty
rudimentary
knowledge
of
sql.
C
This
is
an
example:
dashboard
we're
gonna,
come
back
to
and
talk
through
how
it's
got
here.
Let's
start
with
our
kubernetes
manifest.
This
is
the
kubernetes
manifest
that
defines
all
of
the
grafina
dashboard
and
underlying
tools.
You
need
there's
a
config
map
to
define
all
the
plugins
we
need.
C
We
are,
of
course,
leveraging
click
house
in
this
demo,
so
the
click
house
plugin,
is
in
here
that
dashboard,
you
saw
we're
going
to
see
again
in
a
moment,
is
related
to
click
house
data
quality
testing.
There's
a
persistent
volume
claim
which
allows
you
to
store
data
for
your
grafana
dashboards
super
useful.
If
you
want
to
ever
visualize
it
again.
C
C
Life
cycle
failure,
probes,
resource
limits
are
currently
not
fully
set
because
we
like
that-
and
this
is
the
service
that
allows
us
to
visualize
and
look
at
the
dashboard
itself.
You
can
get
this
set
up
on
your
local
machine
if
you
connect
to
a
cluster
you'll
notice.
Here
I
am
connected
to
our
gke
recommender
viewer
cluster
and
if
I
port
forward
in
the
appropriate
namespace
to
port
3000
can
now
go
over
here
and
open
localhost
3000
and
get
grafana.
C
The
data
analysis
dashboard
that
I
mentioned
earlier
is
probably
probably
on
the
front
page.
You
can
jump
into
it
and
just
scroll
through
it
and
take
a
look
at
our
digital
quality
checks
and
a
bunch
of
data
checks
that
this
actually
came
with
the
click
house
plugin
for
grafana.
C
A
C
Plug-In
ecosystem
will
actually
come
with
default
dashboards
if
you
just
want
to
explore
click
house
directly
via
sql.
You
can
do
that
by
going
through
the
explore
tab
on
the
left
here
it
jumps
into
here.
I,
like
the
sql
editor.
You
can
use
a
query
builder.
Pardon
me
if
you
are
not
a
sql
happy
person.
C
Check
find
all
of
our
tables,
we
go
and
here's
a
little
demo
table
that
we
have
that.
We
talked
about
later
that
we
talked
about
in
a
different
video.
There
you
go.
That
is
everything
about
how
to
set
up
and
where
to
find
all
the
information
about
the
plc
demo
for
click
house
and
our
usage
of
grafana.
B
Besides
that,
we
also
added
persistent
volumes
because
yeah
you
don't
want
the
pods
destroyed
and
with
that's
the
whole
database
destroyed,
so
don't
forget
to
add
persistent
volumes
for
your
data,
and
since
we
have
both
python
and
golang
in
our
repositories,
we
also
decided
to
to
copy
the
toy
returning
python
in
go
just
so.
B
So
the
standard
way
using
this
function,
but
we
could
also
use
sql
open,
just
create
a
standard
sqli
connection,
which
is
quite
nice
because
first
is
standard
library
and
second,
because
if
we
ever
want
to
replace
click
house
in
the
future,
we
don't
have
to
replace
the
whole
code
base.
We
can
just
replace
this
part
and
the
other
bits
that
depend
on
the
this
sql
connection
can
pretty
much
stay,
as
is.
B
So
this
is
how
we
insert
neuro
one
risk
here
is
that
these
parameters
have
to
be
in
the
right
order.
B
One
thing:
that's
good
to
know.
With
this
advanced
practice,
we
can
use
the
ch
struct
tag
to
define
how
the
the
fields
will
be
named
in
the
actual
insert,
because
without
this
this
will
be
saved
as
capital
d
capital
s,
capital
l,
but
this
is
not
how
we
want
to
save
our
database
entries.
Instead,
we
want
to
use
state
case.
All
we
have
to
do
is
provide
this
ch
tag,
and
this
will
be
saved
in
the
right
format.
B
A
To
the
persistent
volume
problem,
there
is
one
other
caveat
here
as
you
are
when
you're
working
into
click
house,
if
you
as
we
did,
set
up
your
table
engine
as
an
in-memory
table,
it
will
not
be
persisted
to
disk
and
your
persistent
volume
will
not
help
you.
We
did
it,
so
you
don't
have
to
now
learn
football.
B
All
right,
so
this
is
pretty
much
how
far
we
got
with
the
poc.
We
also
updated
some
data
we
had
lying
around,
so
that
was
5.1.5
gigabytes
of
data,
almost
130
000
entries,
and
we
could
upload
it
in
5
to
10
seconds.
So
we
can
see
that
it's
quite
an
efficient
engine,
what
we
didn't
implement
in
the
poc
or
things
that
we
want
to
maybe
come
back
to
in
the
future.
A
A
We
also
looked
at
user
management
and
thought
that
was
a
problem
worth
solving
at
some
point,
but
does
not
have
to
be
solved
right
away
for
this
sort
of
demonstration
capabilities,
as
the
alternative
operator
does
do
a
lot
of
that.
For
you,
however,
it
does
in
a
very
static
way,
so
we
may
have
to
address
that
in
future.
We
also
looked
at
various
proxy
solutions
for
balancing
load
across
multiple
clusters
and
across
multiple
insertions,
which
again
is
not
part
of
the
psc,
but
is
worth
looking
at.
A
Some
of
those
proxy
solutions
also
allow
for
more
granular
user
management.
So
we
will
be
circling
back
to
take
a
look
at
those
in
future,
so
yeah
key
takeaways.
We
had
a
wonderful
time
doing
this,
it
was
click
house
is
an
incredibly
powerful
system.
We
were
not
quite
aware
of
how
many
different
options
they
are
and
we
were
not
fully
able
to
to
traverse
all
the
way
down
all
of
the
options
truly
impressive
system
that
we
were
happy
to
build
up.
A
A
basic
click
house
instance
get
to
sort
of
very
chatty
random
data,
generating
services
up
and
running
and
get
a
visualization
going
in
grafana.
Much
of
what
we've
done
here
can
be
lifted.
There
is
still,
as
we
mentioned
earlier,
things
around
secrets,
users,
config
control
and
stuff-
that
sort
of
bridges
between
click
house,
specific
things
and
kubernetes
specific
things
that
I
think
other
teams
would
want
to
take
on
board
if
they
were
to
take
it
to
this
stage
or
wait
a
little
while
and
we'll
probably
hammer
those
out
in
our
next
iteration.
A
So
there
is
one
sort
of
caveat
when
you're
dealing
with
data
insertion
using
an
atomic
database
engine
if
you're
not
familiar
with
the
difference
in
a
database
engine
and
a
table
engine
is
there's
a
doc
in
our
docs
that
links
mostly
to
the
click
house
docs.
So
you
can
go
directly
there
for
that
or
read
ours.
A
When
you're
dealing
with
an
atomic
database
engine,
you
have
to
set
insert
limits.
The
initial
theory
is
that
you
should,
where
the
initial
thought
perhaps
is
that
you
might
want
to
set
that
limit
as
high
as
possible,
so
you
can
insert
as
much
data
as
quickly
as
possible.
A
There
are
also
drawbacks
to
that,
because
click
house
internally
will
have
to
be
inserting
and
sorting
up
this
data.
So
you
want
to
keep
that
within
a
relatively
optimal
margin
for
your
insert
period
and
otherwise
register
period
and
also
the
batch
sizes,
avoiding
dropping
and
avoiding
issues
and
avoiding
really
lagging
out
click
house.
So
there
are
some
limitations
to
atomic
engines.
There
are
definitely
some
guarantees
in
that.
You
have
atomic
transactional
guarantees
on
inserts
on
database
renames
or
very
table
renames,
and
several
other
things
there.
So
there's
trade-offs.
A
There
read
the
docs,
be
mindful
juxtaposing
atomic
engines
versus
ordinary
engines
for
the
database
as
a
database
engine
there
is
advantageous
using
an
ordinary
engine.
Despite
the
fact
you
don't
get
the
atomic
guarantees
you
get
higher,
insert
volumes
and
faster
inserts
if
you're
just
uploading
a
bulk
bit
of
data
and
want
to
do
some
work,
as
was
the
case
when
we
were
loading.
Some
of
the
v2
model
data.