►
From YouTube: Turbocharge Your Data with Cube & Dagster
Description
Integrate Cube with Dagster to seamlessly connect your data transformation pipelines and orchestration tools with your semantic layer. Handle all of your rough-grained transformation tasks with Dagster and land data in your warehouse and then perform your business logic and metric definition in Cube and provide consistent analytics throughout the organization. Use event-based integrations to trigger your pre-aggregation builds efficiently and easily, saving time and compute resources.
A
Webinar
today,
today,
we're
talking
about
some
pretty
exciting
stuff,
we're
going
to
show
you
how
you
can
turbo
Turbo
Charge
your
data
pipelines
with
Cube
and
Dexter.
A
We
have
Brian
and
Tony,
representing
Cube
and
pedram
from
Dexter
to
talk
about
all
things,
Dexter
and
Q,
but
before
we
get
started
just
a
few
logistical
pieces,
if
you
have
any
questions
along
the
way,
definitely
enter
them
in
Click,
submit
and
we'll
answer
them
during
the
live
q.
A
portion
at
the
end
of
the
webinar
and
the
recording
of
the
webinar
will
be
available
on
Demand
right
at
our
events
page.
A
So
you
can
check
it
out
at
any
time
right
after
the
live
show
and
what
we'll
be
discussing
today
is
what
is
Cube.
What
is
Dexter
then
jump
into
data
orchestration
and
then
close
out
with
a
live
demo
with
the
Dexter
and
Cube
integration
in
action
with
q,
a
t
Fallout
so
without
further
Ado
I'm,
going
to
hand
it
up
to
Brian
to
get
things
started.
Welcome
Brian,.
B
Hey
thanks,
Nathan
I
appreciate
it
yeah,
hey
everybody.
My
name
is
Brian
Bickel
I'm,
the
head
of
Partnerships
here
at
Cube
and
I,
have
the
Good
Fortune
to
manage
all
of
our
isv
relationships
and
our
SI
relationships,
and
you
know
I
get
to
get
to
work
with
Daxter
and
you're
happy
to
be
here
today.
B
So
as
Nathan
said,
I'm
going
to
start
us
off
by
talking
about
Cube
cube
is
a
universal
semantic
layer
product
and
our
mission
is
to
power.
The
next
generation
of
data
applications
across
an
organization
to
unpack
that
a
little
bit
in
the
modern
data
World.
There
are
a
lot
of
different
ways
to
use
data,
whether
that's
you
know,
multiple
business
intelligence
tools,
embedded
analytics
and
customer
facing
use
cases,
and,
and
now
these
kind
of
emerging
use
cases
around
AI
agents
that
we're
all
sort
of
forced
to
deal
with.
B
And
you
know
if
you
think
about
all
of
those
different
use
cases.
B
The
major
challenge
that
we
see
in
these
situations
are
that
customers
have
a
lot
of
different
data
sources
and
then
there
are
a
lot
of
different
Downstream
use
cases
that
they're
trying
to
map
all
these
different
tools
to,
and
one
of
the
common
approaches
in
the
past
to
solving
this
was
simply
doing
the
the
data
modeling
in
each
of
your
individual
Downstream
tools,
whether
that
was
in
a
business
intelligence
tool
or
with
some
kind
of
lightweight
data
model
in
a
notebook
or
something
like
that.
B
This
creates
a
problem,
because
every
time
we
add
a
new
data
tool
that
involves
repeating
ourselves
and
are
repeating
our
data
model,
and
it
allows
for
a
situation
where
we
end
up
with
a
lot
of
extra
work.
That
is
unnecessary
and
a
lot
of
opportunity
for
models
to
become
stale
or
inconsistent
and
for
different
teams
to
kind
of
end
up
with
different
calculations
or
different
metric
definitions,
or
potentially
different
insights
and
disagreements.
Based
on
based
on
the
incorrect
data.
B
This
also
allows
for
a
lot
of
flexibility,
because
you
know
the
data
models
and
the
architecture
can
change
alongside
your
needs
as
you
as
you
grow
your
business
now
from
a
features
perspective.
Let's
talk
maybe
about
exactly
what
a
semantic
layer
is.
We
sort
of
break
this
down
into
like
four
major
groups
of
functionality
and
as
far
as
what
Cube
connects
to
cube
connects
to
relational
databases,
cloud
data,
warehouses,
lake
houses,
query
engines
of
all
types
of
shapes
and
then
a
lot
of
like
Oddball
sort
of
Time
series
databases
as
well.
B
Typically,
if
something
can
speak
SQL,
we
can
usually
connect
to
it
and
and
start
to
ingest
data
from
it.
But
once
inside
Cube
we
provide,
you
know
four
major
sorts
of
functionalities
and-
and
these
are
data,
modeling,
Access,
Control,
caching
and
then
apis-
and
to
start
to
break
some
of
that
down
in
our
data
modeling
capability.
We
are
a
code,
First
Data,
modeling
experience
where
users
can
use
either
JavaScript
or
yaml
and
yaml
assumed
to
be
templated
and
kind
of
python,
augmented
yaml.
In
an
upcoming
release.
B
Shortly
you
can
build
your
data
models
there
and
then
you
can
Version
Control
them
in
different
flavors
of
git
and
move
them
through
development
to
production
branches.
Once
your
data
model
is
in
a
place
where
you
like
it,
we
do
access
control.
That's
a
you
know:
an
access
control,
that's
able
to
do
things
like
column,
masking
role-based,
access
control.
That
can
do
you
know
row
level,
security
and
other
kind
of
interesting,
like
multi-tenant
requirements
that
you
may
have,
and
then
our
caching
engine
is
a
product
that
we
built.
B
We
call
Cube
store
and
that
is
all
custom
development
built
around
fast
pre-aggregation.
It
is
an
aggregate
aware
system.
So
if
you
build
a
roll
up
for
let's
say
a
time,
Dimension
or
maybe
a
specific
Dimension
that
you're
interested
in
rolling
up,
you
don't
have
to
inform
any
of
your
bi
endpoints
or
anything
else.
Downstream
about
those-
and
you
don't
have
to
maybe
like
change
your
definitions
or
point
to
a
different
view
like
you
would
have
to
do
with
something
like
a
materialized
view.
B
Whenever
a
query
comes
in,
we
we
figure
out
what
the
most
performant,
either
cash
or
pre-aggregation,
would
be
to
solve
the
problem
for
you,
and
then
we
rewrite
the
query
and
then
finally,
everything
leaves
Cube
as
a
relational
or
sorry
as
a
rest,
API
or
a
graphql
API
or
via
our
SQL
API,
which
speaks
postgres
and
the
SQL
API
is
sort
of
how
we
get
to
you
know
most
bi
tools
and
that's
typically,
what
they're
going
to
be
consuming
rest
and
graph,
of
course,
may
be
more
suitable
if
you're
building
your
own
custom
front-end
framework
with
something
like
react
or
angular
review
or
a
myriad
of
different
products.
B
You
could
use
now
and
then
this
emerging
kind
of
category
of
AI
agents
and
chat
Bots,
they
sort
of
split
in
this.
Some
of
them
consume
rest
and
some
of
them
are
are
writing
their
own
SQL,
so
they
might
consume
different
endpoints
as
far
as
like.
B
B
You
know
if
you
have
multiple
Downstream
tools,
it's
tempting
to
have
different
data
models
in
each
one
or
potentially,
you
know
have
some
inconsistency
and
not
be
able
to
keep
the
data
models
or
the
access
control
policies
similar
between
all
of
them
performance
is
improved
in
a
lot
of
ways
by
our
pre-aggregation
and
caching
engine,
and
you
know
with
the
ability
to
really
not
Define
data
models
and
security
within
each
Downstream
tool,
but
do
it
once
that
improves
flexibility,
time
to
value
and
future
proofs
you
for
Downstream
changes
in
the
future
to
kind
of
Orient.
B
You
know,
maybe,
where
the
semantic
layer
sits.
If
you
aren't
familiar
with
the
idea
of
using
one
of
the
modern
data
stack,
we
are
usually
adjacent
to
data
sources,
whether
that's
cloud
data
warehouses
like
I,
mentioned
relational
database
products,
lake
houses
and
other
things.
We
are
not
a
transformation
tool,
but
we
do
typically
I
would
say,
connect
to
rough
grained
transformation.
Rough
grain
Transformations
through
DBT
that
are
like
landed
back
in
the
data
source
that
could
be
DBT
could
be
SQL.
B
Mesh
could
be
a
number
of
other
products
and
yeah
relevant
to
the
the
conversation.
Today
we
do
have
an
integration
with
dagster
as
an
orchestration
tool
which
we'll
talk
about
later
but
yeah
again.
We
we
don't,
we
don't
compete
with
orchestration.
We
we
kind
of
run
right
alongside
it
as
part
of
the
modern
data
stack
and
then
again,
you
know
Downstream.
Here
you
can
kind
of
see
all
the
use
cases
we
talked
about
already
and
with
that
I'd
like
to
hand
it
over
to
pedram,
to
tell
us
about
Daxter.
C
So
to
really
understand,
Dexter
I
want
to
first
talk
about
data
engineering
and
kind
of
where
we're
at
today,
I
think
a
lot
of
us
are
familiar
with
the
Pains
of
data
engineering.
There's
probably
no
doubt
in
all
of
your
minds
that
there's
just
too
many
tools
and
it
seems
to
be
getting
worse
right.
Even
with
proliferation
of
the
modern
data
stack,
like
things
have
become
so
fragmented
that
now,
as
a
data
engineer,
you
have
to
go
and
look
through.
You
know
five,
six,
seven
different
apps
just
to
find
out
where
something
went
wrong.
C
What's
going
on
in
your
data,
and
so
that's
just
part
of
the
problem.
But
let's
say
you
do
find
it
once
you
do.
You
probably
don't
have
the
right
context
to
understand.
What's
going
on
traditionally,
when
we
write,
you
know
orchestration
and
tasks.
C
We
have
these
big
tasks
that
do
a
whole
bunch
of
things,
but
the
task
doesn't
really
tell
us
what's
inside
of
it
too
often
right,
and
so,
if
we're
trying
to
understand
what
happened
with
one
of
our
models,
there's
not
a
clear
sense
of
like
which
task
created
that
model
in
our
database
or
what
table
came
from
where,
and
so
we
lack
this
full
context
on
what's
going
on
within
orchestration,
and
things
tend
to
be
a
bit
of
a
black
box.
C
The
other
side
is:
if
teams
start
to
grow,
we
start
to
see
a
lot
of
Silence
happening
and
so
often
there's
this
very
little
shared
context
between
teams,
because
things
are
so
hard
to
integrate
and
we
don't
have
all
that
context.
C
It's
hard
to
have
a
local
Dev
environment
that
allows
for
easy
testing.
So,
for
example,
I
remember
when
I
used
to
use
airflow
bunch
I
had
to
install
kubernetes
on
my
laptop
just
to
test
a
pipeline
that
I
was
going
to
push
into
production.
That
was
a
really
heavy
lift
and
often
you
end
up
pushing
things
to
production
over
and
over
again,
just
hoping
for
something
to
work,
which
is
not
something
we
love,
and
so
those
paints
are
really
what
drove
diagram
to
exist.
C
It
was
a
tool
designed
really
to
solve
these
fundamental
data
engineering
problems
that,
in
a
lot
of
ways,
software
Engineers
have
figured
out
and
it's
restored,
but
we
haven't
really
had
the
right
Tools
in
place,
and
so
what
the
extra
enables
did
you
do
is
you
know
you
can
write
your
code
and
test
your
data
pipelines
in
pure
python?
You
don't
need
to
start
creating
all
these
abstract
classes
in
order
to
do
simple
testing
everything
works
pretty
much
out
of
the
box.
You've
got
great
monitoring
and
computation.
C
The
UI
is
really
intuitive
and
then
what's
nice
is
because
we
look
at
things
in
a
little
bit
different,
a
different
way.
Instead
of
focusing
on
tasks
indexing,
everything
is
based
on
an
asset
which
I'll
talk
about
a
second.
You
do
get
all
these
nice
ancillary
benefits
like
being
able
to
track
data
lineage
and
metadata.
C
So
let's
talk
a
little
bit
about
data
orchestration,
so
there's
really
like
these
two
overall
I
would
say:
styles
of
orchestration.
There's
the
traditional
Oracle
orchestration,
with
tools
like
prefect
and
airflow
are
really
good
at
and
what
they're
focused
on
is
again
that
task
right.
You
say:
I
need
to
do
this
thing
in
order
to
get
some
outcome.
Here's
all
the
steps
I
got
to
do,
wrap
that
in
a
task
call
that
a
dag
and
then
you
ship
it,
and
because
of
that
you
really
only
have
Quran
scheduling
right.
C
C
C
Schedule
you
know
a
it's
not
efficient
and
B.
It's
not
always
accurate
right.
If
you
have
to
do
retries
and
that
type
of
stuff,
like
the
schedule,
quickly
gets
out
of
hand
and
then
the
other
part
is
always
very
hard
to
test.
If
you
ever
try
to
test
an
airflow
dag
yourself,
you
know
that
that's
not
an
easy
thing
to
do
and
because
everything
is
centered
around
tasks,
you
kind
of
lose
access
to
all
the
things
you
actually
care
about.
C
So
you
don't
have
any
metadata,
for
example,
on
the
tables
and
how
many
rows
it
wrote
unless
you
explicitly
pull
these
things
down
on
the
data
orchestration
side,
which
is
the
model,
the
extra
uses,
we've
we've
kind
of
Spun
things
around
and,
to
be
honest,
like
it
took
me
a
while
to
sort
of
wrap
my
head
around
what
this
actually
means,
but
once
I
did
it
like
unlocked
a
lot
of
powerful
features
for
me
instead
of
thinking
about
tasks
which
is
like
I
need
to
do
this
thing,
let's
get
it
done,
and
actually
we
think
about
the
asset,
the
fundamental
thing
we
actually
want
to
produce,
and
so
by
flipping
that
around
we
now
have
this
nice
way
of
thinking
about.
C
You
know
the
things
we
actually
care
about,
but
the
task
is
kind
of
like
the
imperative
way
of
writing
things.
The
data
asset
itself,
that's
a
declarative
thing
that
you
want,
and
so,
when
we
start
thinking
about,
you
know
this
table
that
we
need
or
that
report
that
we
need
or
this
model
that
we
need.
We
can
start
to
do
a
lot
of
really
interesting
things.
C
We
can
start
to
do
upstream
and
downstream
dependencies
for
that
specific
model,
rather
than
trying
to
figure
out
what
task
it
takes
to
create
that
thing,
and
because
of
that,
we
can
move
outside
of
just
cron,
and
now
we
get
event
driven
and
SLA
based
scheduling.
So
if
there's
a
report
that
is
really
important
and
you
want
to
be
refreshed,
you
know
every
hour.
You
can
now
do
that
and
then
Upstream
that
you
can
say
here's
the
minimum
set
of
things.
I
need
to
run
in
order
to
make
sure
this
asset
is
fresh.
C
And,
conversely,
if
you
have
something
that
you
know
runs
once
a
day
and
you
don't
really
need
it
to
be
that
up
to
date,
you
can
say
that
this
asset,
you
know,
doesn't
require
that
much
compute
and
will
wait
until
the
next
day
to
get
this
to
run,
and
then
the
other
nice
thing
is,
though,
because
we're
thinking
in
terms
of
assets,
these
things
become
really
easy
to
test,
especially
with
Dexter.
We
have
a
system
of
resources,
and
so
you
can
assign
different
resources
based
on
your
environment,
for
example,
in
production.
C
We
can
say
you
know,
run
kubernetes,
run
AWS
run
snowflake,
but
then,
when
we're
testing,
we
can
say,
do
the
same
operations
but
youths.db
and
use
local
fire
storage
and
because
we've
written
these
things
in
a
decoupled
way,
it
makes
it
very
simple
to
test
it
very
inexpensive,
and
this
final
thing
is
I.
Think
what
really
becomes
really
interesting
and
you'll
kind
of
see
this
in
the
demo
is
when
we
start
thinking
in
terms
of
assets
instead
of
tasks.
You
now
have
a
system
of
record.
C
You
can
start
to
see
your
entire
pipeline
in
terms
of
the
things
it's
producing,
rather
than
the
things
it's
doing,
which
I
think
is
a
real
big
unlock,
and
so
I've
talked
a
little
bit
high
level
around
like
what
an
asset
is.
But
let's,
like
take
a
look
at
quickly
like
how
do
you
even
write
an
asset?
C
It's
really
not
that
different
from
what
you're,
probably
used
to
an
asset,
is
just
a
wrapper
around
a
function
that
produces
a
thing
that
you
care
about,
and
so
here
we
have
three
assets:
country,
stats,
change
model
and
continent
stats.
C
C
And
this
goes
beyond
just
assets
that
you
care
and
create
about,
but
also
into
Integrations.
So,
for
example,
we
integrate
with
a
lot
of
modern
tools
such
as
air,
byte,
DBT
Cube
as
well,
and
so
we
have
these
integration
in
the
library,
so
you
can
leverage
those
and
very
quickly
you'll
get
your
entire
DBT
repository
available
to
you
with
all
the
assets
materialized.
C
Now
you
get
introspection
into
all
those
models
and
their
dependencies
which
is
really
powerful,
because
now
you
can
see
you
know
this
CBT
model
has
this
air
byte
orders
as
a
source,
while
this
other
one
has
the
users
of
the
source,
and
so
when
these
are
updated
independently
of
each
other.
You
can
then
incrementally
update
these
things
as
needed.
C
So
that
was
the
talk,
let's
jump
into
a
demo,
we're
gonna
do
this
live
so,
let's
see
how
it
goes.
We
will
start
with
our
friend
the
terminal
and
I
just
wanted
to
show
what
a
sample
project
looks
like
I'll
actually
send
a
GitHub
link
to
this
as
well.
C
Really,
there's
not
a
lot
in
here:
there's
assets,
content
and
resources,
so
I'll
start
with
resources.
The
resource
is
really
like
that
connection
to
the
outside
world
and
in
this
case
I
just
have
one
simple
resource:
it's
a
duck,
DB
resource-
and
this
is
like
the
two
lines
of
code.
I
have
to
write
really
to
initialize
that
resource
and
if
there's
a
thing
that
I
want
that's
a
little
bit
more
complex.
C
For
example,
I
have
this
custom
sling
resource
I
can
write
the
code
for
that
as
well,
and
so
for
things
that
are
supported
out
of
the
box.
It's
quite
simple
and
relatively
easy
to
get
a
resource
connected
and
then
for
things
that
might
be.
You
know
your
own
custom,
tooling.
It
doesn't
take
a
lot
of
effort
to
create
these
resources
as
well,
and
you
can
see
I
had
this
one
function
called
sync
I
ask
it
for
a
source
table
destination
file
and
then
I
run
some
operation
on
it
and
then
I
return.
C
The
number
of
rows
in
the
file
size
and
the
resource
is
used
by
a
thing
called
an
asset
right
and
an
asset
like
I
said,
is
just
a
function
that
has
that
decorator
attached
to
it.
This
example
I'm
taking
a
bunch
of
data
from
various
sources
such
as
duckdb,
the
internet,
I'm
downloading,
from
zip
files,
I'm
extracting
them
I'm,
getting
some
data
from
postgres
and
I'm
getting
some
data
from
an
API
from
from
some
SAS
application,
for
example,
and
so
all
of
these
things
can
be
represented
as
assets
and
using
those
resources.
C
I
can
start
to
extract
those
assets
and
then
you'll
see
that
represented
into
extra
quickly.
So
I'll
just
quickly
show
what
these
assets
look
like
right.
I
have
this
asset
decorator
I
put
this
little
compute
kind,
equal
python,
so
I
get
a
nice
picture
at
the
end
and
give
it
a
group.
If
that's
helpful-
and
really
these
are
all
the
lines
of
code
I
needed
to
extract
that
asset
I
can
use
an
existing
function.
C
I
had
already
in
order
to
download
that
that
asset
extract
it
and
then
I
just
output,
some
metadata
and
so
I
can
do
that
for
different
types
of
checklists.
That
I
have
here.
I
can
even
do
it
for
let's
say:
PB
I
can
ingest
the
csvs
and
I
can
use
TV
and
just
run
some
straight
SQL
in
there
and
then
create
an
asset
like
that.
So,
overall,
it's
just
very
simple
python:
there's
not
a
lot
of
boilerplate
around
it.
C
C
This
one,
this
is
the
repo
and,
if
I,
load,
dagster,
make
sure
this
is
loaded
we're
here.
So
all
that
work
I
did
in
the
background
you
can
kind
of
see
it
visualize.
Here,
it's
really
intuitive
to
understand.
All
the
raw
data
is
my
various
data
sources.
For
example,
these
are
coming
from
the
web
I'm
using
python
to
download
them
I'm
using
sling
to
get
data
from
a
postgres
database
and
I'm
using
steam
pipe
to
get
data
from
mastodon's
API,
so
easy
to
connect
various
different
sources
and
resources
to
get
that
data.
C
I
can
click
here
and
just
materialize,
for
example,
only
the
raw
data
if
I
wanted
to
or
might
as
well
just
click
materials
I'll
and
do
the
entire
pipeline.
So
this
pipeline
takes
the
raw
data.
Duct
DB
is
ingesting
some
of
that
there's
a
little
bit
of
light
aggregation
going
on
and
then
this
resource
here
all
birds.
That's
really
the
culmination
of
all
this
work.
We
take
all
this
data
We
join
it
together
and
we
create
that
DBT
model,
and
you
can
see
this
Upstream
data
flag.
C
C
Things
are,
you
know,
sequenced
properly,
anything
that
can
run
apparently
will
I,
don't
have
to
specify
that,
like
these
things
have
should,
and
these
things
shouldn't
all
that
sequencing
is
sort
of
done
for
us
by
dagster
and
then,
as
these
assets
are
created,
they're
materialized,
and
what
that
means
is
we
get
this
little
nice
feature
here?
We
see
this
view
asset.
We
click
on
that.
Maybe
I'll
make
this
a
little
bit
bigger.
C
This
cites
data
set
that's
an
asset
of
the
table,
I've
created,
there's
a
table
name
database
name
and
the
path
number
of
rows
and
what's
nice
is,
as
I
run
this
over
and
over
again
over
time.
I'll
get
a
plot
of
you
know
the
column
count
the
size
over
time.
So
I
can
keep
an
eye
on
these
things.
If
something
went
wrong,
for
example,
I'll
have
a
sense
of
like
what
happened
there.
C
So
I'll
go
back
into
our
asset
graph
here
and
we
can
see
everything
is
materialized
if
for
exist,
for
instance,
I
updated
this
raw
data
set
on
its
own
eventually
you'll,
see
once
that's
updated.
Daxer
will
be
aware
that
you
know
tickets
and
daily
tickets
are
now
out
of
date.
Here
you
can
see
that
Upstream
beta
it's
now
out
of
date,
and
so
I
can
choose.
So
just
click
these
two
here
and
materialize.
Those
or
I
can
click
this
little
triangle
and
say
materialize.
Anything,
that's
missing!
C
D
D
You
know
as
needed
and
triggered
you
know
when
something
else
completes
further
up
in
the
stack.
We
don't
have
to
necessarily
try
and
try
again,
you
know
every
five
minutes
during
the
during
the
morning,
Rush,
depending
on
when
a
file
lands,
so
that
can
keep
us
very
efficient
on
the
on
the
database
transformation
front.
So
we're
going
to
see
how
Q
Builds
on
this
foundation
and
delivers
the
the
business
value
The
Last
Mile
through
to
the
to
the
rest
of
the
organization.
D
So
we'll
start
by
taking
a
look
under
the
hood
of
cube
to
see
you
know
kind
of
how
we
get
to
a
pre-aggregation
that
we
want
to
keep
up
to
date.
So
first,
we've
got
that
that
duck
or
not
the
duck,
the
yeah
the
duck
DVD
model,
but
the
birds
model
on
dot
DB,
so
that'll
be
a
fun
pun.
Here
and
so
the
model
that
what
we've
modeled
at
is
four
cubes
that
pick
up
that
data
we've
got
bird
toots.
So
this
is
this:
is
the
Whimsical
name
for
the
the
messages
on
Mastodon?
D
This
is
this
is
not
something
else
for
those
of
you
with
an
eyebrow
raised
right
now,
but
we
we've
got
Birds
sites
and
species
as
well
from
the
observations.
So
these
these
are
related
together
and
we
we
handle
those
those
relations
through
a
series
of
joins.
Basically,
we've
created
a
couple
keys
that
are
going
to
handle
the
relationship
between
those
two
and
or
between
those.
Three
sites
are
basically
where
the
bird
was
seen.
D
Species
is
the
type
in
more
information
on
what
the
type
of
bird
that
was
observed,
and
then
the
the
birds
table
is
just
the
big
table
of
observations,
so
we
can
bring
in.
We
can
Define
things
like
in
our
metrics
or
sorry
in
our
Dimensions.
We
can
define
those
custom
cues
to
help
with
any
composite
keys
that
we
need
for
joins.
We
can
bring
in
latitude
longitude.
We
can
compute
dates.
D
A
lot
of
these
string
Fields
actually
started
out
as
or
sorry
A
lot
of
these
quantifiable
fields
that
we'd
expect
to
have
quantities,
they're,
actually
string
fields
in
the
in
the
source,
data
that
brought
in
numbers
or
Nas.
So
instead
of
nulls
or
spaces
or
zeros,
or
something
it
was
n,
a's
and
numbers.
D
So
we
want
to
make
sure
that
we're
handling
those
things,
and
we
can
do
that
through
cubes
kind
of
Last
Mile
data
cleansing
here
in
the
SQL,
we'll
just
cast
those
quantifiable
columns
as
integers,
and
we
use
tricast,
which
is
a
it's
going
to
return
null.
If
it's
not
so,
it's
not
going
to
error
out
on
us
and
then
we're
also
taking
the
some
derived
metrics
as
well.
D
So,
for
example,
if
we
we
can
get
the
number
of
observations,
we
can
get
the
number
of
hours
that
were
spent
people,
I,
imagine
sitting
in
little
hides
or
sitting
on
a
chair
in
the
field
and
join
the
enjoying
the
nature,
counting
things
and
we
can
tabulate
how
long
was
spent
per
siding.
We
can
also
see
how
many
observations
per
side.
You
know
big
flock
Rules
by
we
see
20
Birds,
you
know,
that's
that's
more
interest
or
that's
an
interesting
point
to
call
out
here.
D
So
we've
got
a
number
of
different
metrics
that
we
can
calculate.
It
needs
to
be
called
derived
metrics
because
we're
already
using
Aggregates
from
within
the
cube
data
model.
So
we
can
do
pre-aggregations
as
well.
Based
on
that.
So
maybe
we'll
we'll
take
a
look
in
the
playground
first,
and
so
as
we're
building
this
data
model
jump
back
to
views
real,
quick
as
we're
building
this
data
model.
D
We
don't
necessarily
need
our
end
users
to
know
that
birds,
sites
and
species
are
Separate,
Tables
or
or
even
separate
cubes
that
they
just
are
interested
in
bird
sightings
as
a
concept,
and
we
want
to
put
all
those
dimensions
and
measures
together
in
a
way
that
they
can
just
reference
each
of
the
fields
that
they
want.
So
in
Cube
we
call
that
concept,
views
and
so
we're
going
to
bring
through
a
few
fields
from
the
birds.
These
five
Fields
we're
going
to
bring
a
couple
fields
from
the
sites
latitude
and
longitude.
D
We
can
get
a
visual
on
you
know
where
these
birds
are
being
seen
and
then
we're
going
to
get
the
actual
English
name
form
instead
of
a
species
code
from
the
species
table,
so
we're
on
the
playground
side.
We
can
run
a
query
against
this
View
and
see
you
know
what
are
the
most
common
words
in
this
case.
2017
is
what
the
is
the
date
range
I've
got
filtered
here.
D
How
many
bird
sightings
were
there,
so
you
can
see
people
were
busy
out
there,
counting
Birds
500
000
morning
doves,
seen
in
that
year
alone,
and
you
know
the
list
goes
on.
So
there's
there's
quite
a
few
Birds
here,
but
we
can
use
the
playground
to
kind
of
test
our
initial
model
to
make
sure
that
our
computations
make
sense
that
the
numbers
are
accurate,
that
we're
Computing
over
in
our
data
model.
D
So
if
we
get,
you
know
a
couple,
queries
that
were
that
we
like
here,
you
know
one
of
the
things
that
we're
going
to
be
curious
about
and
very
interested
in
is
a
performance
aspect.
So,
what's
a
good
candidate
for
pre-aggregation,
you
know
why
not
go
directly
to
the
source
every
time.
So
if
we
can
identify
a
slow
query.
D
So,
for
example
in
in
our
query
history
here
in
Cube
Cloud,
we
can
see
what
queries
were
run,
how
long
they
took
to
run
whether
or
not
they
hit
the
cache
or
whether
or
not
they
yeah,
whether
they
the
cash
or
whether
they
hit
the
source
Upstream
database
and
for
those
that
that
did
not
hit
the
cache.
So,
for
example,
this
1.6
second
query
not
bad,
but
could
be
faster.
D
We
can
go
into
pre-aggregations
and
accelerate
that,
and
here
we're
defining
which
measures
and
dimensions
we
want
to
include
as
well
as
the
time
Dimension
and
the
grain
that
we
want
to
include
that
at
and
that
generates
a
roll-up
for
us,
a
roll-up
code
in
either
em
or
JavaScript,
one
of
the
two
formats
that
we
support
the
cube
and
then
we
can
go
ahead
and
add
that
directly
to
the
data
schema
and
we've
already
got
one
to
play
with,
but
the
or
two
so
the
pre-aggregations
that
we
have
were
based
on
some
of
the
the
dashboards
that
I
put
together
for
this
and
unfortunately,
there's
a
snafu
with
with
Tableau
right
now.
D
So
you
can
see
our
Tableau
external
status.
Page
is
all
red.
So
we're
going
to
be
missing
that
part
of
the
demo
today,
but
I
can
talk
through
it.
D
We'll
give
one
more
refresh
here
on
the
on
the
login
page
and
see
if
see,
if
we'll
get
any
luck
here,
but
yeah
so
in
in
Cube,
we've
got
the
the
four
pillars
right:
we've
got
the
data
modeling
we've
got
access,
control,
we've
got
caching
and
apis,
and
so
because
all
the
because
all
of
our
apis
share
the
same
caching,
the
same
data
model
and
the
same
Access
Control.
D
We're
able
to
make
sure
that
all
of
our
data
that
leaves
cube
is
the
same
or
it's
it's
very
consistent.
So
Cube
can
be
that
single
source
of
truth
that
that
the
other
Downstream
applications,
Downstream
users,
can
rely
on
so
as
far
as
front
ends
go
it's.
Unfortunately,
it's
unfortunate
we're
getting
our
time
out
there.
I.
D
One
learning
I
had
from
this
data
set,
which
was
the
blue
jay,
so
I've
got
I,
live
over
in
Washington,
State
and
I've
always
called
these
birds,
Blue
Jays.
But
apparently
that's
not
a
blue
jay
I
clicked
on
the
map
for
blue
jay,
and
it
was
all
East
Coast.
So
I
was
like
what's
going
on
here,
so
I
got
to
learn
that
what
I
had
been
calling
a
blue
jay
was
actually
a
Stellar's,
J
and
I
like
those
guys
because
they
eat
the
stink
bugs
around
the
house.
D
So
big
fan
of
the
stylish
J,
let's
see
and
and
then
we've
got
another
tool
Delphi.
So
Delphi
is
a
an
AI
chat
bot
that
actually
only
connects
to
semantic
layers.
It
doesn't
connect
to
any
SQL
data
source
it.
It
requires
a
semantic
layer
for
context,
for
things
like
you
know,
describe
descriptions
of
the
fields
when
to
use
which
field
and
that's
all
context
that
we
can
provide
through
descriptions
in
queue.
D
B
D
An
authorization
token
so
that
it's
able
to
actually
reach
out
and
and
hit
this
hitter
API
and
then
I
can
ask
questions
in
plain
text
in
2017
which
birds
had
more
than
25,
000
siding
or
250
000
sightings,
and
then
it
goes
through
the
process.
So
it's
very
transparent
here,
which
is
very
important
for
llms,
especially
at
this
stage.
We
want
to
know
what
the
process
is
that
they're
going
through
and
even
shows
the
queries
that
it's
running.
So
here's
a
the
actual
query
that
it's
running
in
Cube.
D
D
We
could
go.
You
know
verify
those
with
with
our
playground
as
well
to
make
sure
that
it's
actually
querying
the
right
fields
and
see
you
know
and
make
sure
that
you
know
in
2017
the
dark
eye:
junko,
200
or
725
634.
D
275
or
725
634,
sightings,
perfect
and
then
the
percentage
change
side,
and
so
this
is
more
of
a
this.
This
question
isn't
straightforward.
It's
not
just
give
me
this
data
point.
It's
actually
think
about
it
and
do
a
do
a
change
calculation
here.
So
what
percent
of
change
in
sightings
were
the
north
for
the
Northern
Cardinal
were
between
2020
and
2021,
so
it
decided
that
it
had
to
pull
data
for
2020
as
well
as
for
2021.,
and
then
the
sightings
it
calculated
the.
D
As
14
growth
from
2020
to
2021.,
so
yeah,
we've
got
lots
of
business
value
or
the
business
value.
That's
delivered
through
the
entire
pipeline
that
lands
in
queue
and
then
is
extended
to
the
rest
of
the
the
business
apps
and
the.
Therefore,
the
business
users
through
through
the
data
model,
through
the
consistency
through
the
tight
integration
of
the
pipeline
I,
think
that's
really
one
of
the
keys
to
how
you
can
deliver
the
business
value
in
an
impactful
user.
B
Hey
thanks,
Tony
yeah
and
thanks
pedram
for
the
earlier
you
know,
kind
of
demo
of
Daxter
and
also
you
know
talk
about
where
the
modern
data
stack
is
the
data
engineering
yeah.
Just
as
a
reminder,
we
do
have
about
20
minutes
left
in
the
webinar.
If
you
guys
do
have
questions
feel
free
to
drop
them
in
the
QA
and
we
will
go
ahead
and
get
to
them,
and
we
have
a
couple
here.
So
hey
wait.
B
A
couple
questions
about
llms,
specifically
about
how
semantic
layers
can
kind
of
help
provide
data
to
llms
and
the
the
benefits
over
doing
native
SQL
querying
in
an
llm
and
I
think
Tony
sort
of
touched,
some
of
that
with
our
integration
with
Delphi
and
his
commentary
about
Delphi
and
their
architectural
decisions
to
not
connect
to
SQL
databases,
but
instead
connect
to
semantic
layers
like
cube
and
a
few
other
ones
that
they
do
and
the
reason
they
do.
B
That
is
to
be
able
to
provide
the
llm
with
better
context,
to
have
a
better
shot
at
more
correct
and
more
insightful
answers
versus
having
the
llm
do
just
like
raw
SQL
generation
and
attempt
to
guess
at
maybe
the
way
that
your
organization
would
calculate
a
metric
just
by
using
you
know,
essentially
its
Corpus
of
different
ways.
Metrics
have
been
calculated
before
that
it
knows
about
so
that's
kind
of
like
one.
One
way
that
that
works
and
there's
another
question
here.
B
B
Well,
what
I
can
say
is,
if
you
think
about
maybe
the
cohort
of
like
what
other
tools
do
something
similar
to
us.
There's
some
of
the
more
Legacy
incumbents
like
at
scale
and
then
there's
obviously
like
Challengers
or
or
you
know,
kind
of
other
competitors
to
us,
like
maybe
DBT
and
like
what
they've
done
with
their
original
semantic
layer
and
then
what
they'll
potentially
like
re-release
the
new
version
of
that
at
coalesce.
We
all
expect
and
then
there's
some
more
more
platform,
aligned
semantic
layers
that
I
would
say.
B
Things
like
you
know
like
what
looker
is
trying
to
do,
what
Google's
doing
at
looker
and
trying
to
split
out
the
semantic
layer
product
and
then
also
some
statements
from
companies
like
thoughtspot
and
Tableau
at
their
shows
about
like
intentions.
To
do
this,
so
that's
sort
of
like
the
maybe
like
the
landscape.
B
Cube's
position
is
that
we
are
what
we
call
a
a
fully
Universal
semantic
layer,
meaning
that
that
we
aren't
aligned
to
any
particular
data
or
Cloud
vendor
or
to
any
bi
tool
or
any
Downstream
tool.
So
that's
one
one
distinct,
sort
of
difference
from
maybe
like
the
looker
and
thoughtspot
and
Tableau
sort
of
ways
that
they're
approaching
it
we're
not
connected
to
any
vendor.
B
So
you
know
we're
essentially
like
data
Switzerland
here
in
some
ways
and
as
far
as
like
you
know,
the
different
things
that
I
think
that
we
do
that
some
other
products
don't
do
I
would
say
that
the
API
connectivity
options
we
provide
that
are
very
easy
to
use,
such
as
like
rest
and
graph
and
the
SQL
API.
B
Getting
those
up
and
running
is
just
as
easy
as
Tony
as
Tony
mentioned
in
the
demo.
You
know
he
creates
a
data
model
and
then
he's
got
all
three
API
endpoints
immediately.
He's
also
got
the
same
security
and
the
same
data
models
in
all
of
them
and
then
he's
got
caching
with
pre-aggregations.
So
you
know
I,
don't
know
exactly
sort
of
like
feature
comparison
versus
every
other
semantic
layer
out
there,
but
but
those
are
like
the
big
things
that
we
do
and
they're
what
we
find.
B
A
lot
of
people
use
Cube
the
different
components
that
kind
of
add
up
to
their.
You
know
helping
them
build
their
Solutions.
B
Yeah,
okay,
there's
a
question
from
Emily
Emily
Jacobs
from
gopuff.
She
asks.
Is
this
a
demo
of
the
orchestration
API?
Will
we
be
seeing
that
functionality?
We
already
have
dagster
and
we
are
trying
Cube
looking
forward
to
seeing
how
these
two
play
together.
Please
apologize
if
I
missed
that
part
of
the
demo
yeah,
so
I
think
we
mentioned
this
I
think
I
think
Tony
mentioned
it,
and
pedram
also
mentioned
it.
B
There
is
a
dagster
component
that
can
invoke
a
pre-aggregation
from
Cube,
so
essentially
Cube
the
way
we
do
this
is
we
have
what
we
call
our
orchestrations,
API
and
I.
Think
Tony
mentioned
that
this
doesn't
have
to
be
like
a
time-based
refresh.
This
can
be
an
event
driven
refresh
so
yeah
and
the
way
that
you
would
orchestrate
that
in
your
pipeline
is,
you
know,
after
you've
kind
of
kind
of
completed
your
last
step
with
the
asset
that
you're
working
on.
B
You
can
just
invoke
that
that
pre-aggregation,
refresh
and
pedram
I
might
ask
you
to
talk
about
this
I.
Actually,
don't
actually
don't
know
what
what
you
guys
would
call
that
component
in
indexed
or
I
know
that
it's
something
somebody
has
to
import
right.
C
Yeah,
it
would
be
like
a
library,
it's
pretty
light,
I
think
just
like
one
line,
you
would
probably
add
maybe
two
and
then
you
would
really
just
have
a
dependency
on
your
Upstream
assets
and
then
that
asset.
It
would
really
call
that
asset
itself.
That
asset
is
materialization.
You'd
have
that
to
run
after
everything
else
is
finished.
B
Cool
yeah
yeah,
pretty
straightforward
yeah,
so
you
know,
if
you
have
any
more
questions
about
that.
Definitely
you
know
if
you're
already
trying
Cube
feel
free
to
reach
out
to
whoever
you're
working
with
on
our
side
and
we'll
we'll
help
you
out
and
also
connect
you
to
somebody
at
Baxter.
If
you
need
help
with
dagster,
yeah
I
got
another
question
here:
do
you
know
folks
using
cube
with
streamlit
snowflake
data
apps
and
their
llm
capabilities?
B
We
do
have
a
streamlit
integration
and
streamlit
is
not
uncommon
as
a
front
end
for
apps
built
with
Cube
and
typically
like
in
the
embedded
analytics
or
customer
facing
use
case.
So
yeah,
I
I,
don't
know.
I,
don't
have
a
specific
example
of
like
what
their
llm
capabilities
are
and
interacting
with
Cube
I
will
say.
The
most
common
sort
of
llm
Bridge
to
cube
is
going
to
be
Lang
chain.
B
That's
I
think
the
most
common
that
people
are
using
to
sort
of
build
different
AI
chat,
bot
type
experiences.
A
Great
well,
it
looks
like
that's
all
the
questions
we
have
today.
Thank
you
for
spending
your
hour
with
us.
If
you
want
to
come
back
and
check
out
the
recording
again,
that
will
be
available
on
demand
on
our
events
page,
and
these
are
a
few
other
resources
you
can
check
out.
A
That
will
give
you
all
the
details
about
what
we
talked
about
today.
So
thanks
again
and
hope
to
see
you
soon
have
a
good
day.
Everyone
bye.