►
From YouTube: Dagster at Rohde and Schwarz Mobile Network Testing
Description
Simon Späti—Lead Data Engineer at Rohde & Schwarz—discusses the production Dagster setup at Rohde and Schwarz Mobile Network Testing
See the full May 11, 2021 Community Meeting: https://www.youtube.com/watch?v=HRd6rEU33XM
A
Okay,
hi
and
welcome.
Thank
you
very
much
for
having
me
my
name
is
simon
spaty.
I
have
the
pleasure
to
show
you
today
how
we
use
taxer
at
rota,
schwartz
mobile
network
testing
a
little
bit
about
myself.
I'm
a
data
engineer.
A
I
I'm
an
author
of
my
little
blog,
where
I
write
also
about
data
engineering
stuff
and
I'm
an
early
user
of
daxter
wrote
schwartz
has
is
a
big
company.
We
have
over
12
000
employees
around
the
world,
we
specialized
in
electronic
test
equipment,
broadcast
and
media
cyber
security,
radio,
monitoring
and
all
kind
of
radio
communications.
A
I
myself
I'm
located
in
switzerland,
where
we
work
at
a
product
called
smart
analytics.
This
is
it's
actionable
benchmarking,
so
you
can
compare
different
the
network
providers
to
each
other.
You
have
kind
of
a
business
intelligence
overview
to
to
really
see
the
mobile
network
testing
quality.
So
you
can
see
how
is
my
network
doing
compared
to
my
competitors?
A
A
To
go
a
little
bit
more
in
detail
what
we
do
so
our
main
goal
is
to
provide
the
tools
for
our
customer
to
improve
the
quality
and
performance
of
the
mobile
network.
So
on
the
downside
you
can
see
the
car.
So
this
is
a
mobile
network
testing
in
africa.
So
there
is
not
always
a
wells
streets,
but
we
have
a
equipment
that
you
can
see
on
top
of
this
car.
A
There
is
actually
a
we
put
all
our
smartphones
in
there,
so
we
have
a
quali
poc,
which
our
software
is
is
running
on
and
you
can
either
put
that
in
a
car
or
you
can
also
put
it
in
a
backpack.
So
if
you
would
like
to
measure
the
network
quality
in,
for
example
like
in
a
metro
or
in
a
football
stadium
or
so
then
you
can
use
the
backpack
put
in
the
smartphones
and
then
configure
what
kind
of
tests
you
would
like
to
do.
A
So
we
we
can
configure
like
youtube
tests,
we
can
the
phones,
they
will
call
each
other,
they
will
call
whatsapp
calls
they
will
upload
to
youtube,
so
they
will
all
kind.
They
will
do
all
kinds
of
different
test
scenarios
that
hopefully
the
real
user
will
also
do
and
then,
at
the
end
of
the
day,
you
get
the
measurement
file
and
then
that
gets
uploaded
in
our
smart
analytics
in
in
our
smart
analytics.
A
We
have
like
a
time
series
and
statistical.
So
if
you
look
at
the
architecture,
this
is
actually
the
testing
going
on.
So
during
the
day,
several
cars,
several
backpacks
are
collecting
this
measurement
files
and
then
typically
during
night
or
in
the
evening,
when
you
finish
measuring
you
upload
it
to
our
data
warehouse
where
our
custom
etl
is
running.
So
this
is
where
all
the
the
magic
happens
where
the
the
data
is
is
transformed
into
fact
and
dimensions.
A
A
So
that's
where
you
use
the
sql
server
or
also,
if
you
want
to
plot
the
the
points.
So
the
measurement
points
on
the
map,
you
would
typically
use
the
sql
server
and
then
we
we
use
the
analysis
service
where
we.
This
is
our
cube.
So
for
statistical
reasons
there,
you
can
have
overall
statistics
how
your
network
is
doing
compared
to
your
provider
competitors,
what
tests
you
have
actually
measured,
how
long
and
they're
all
kind
of
statistical
measurements
yeah
and
then
the
motivation
for
us
to
for
using
daxter.
A
So
we
want
to
bring
this
intel
into
the
cloud
and
we
would
like
to
manage
all
this
logic
or
this
ether
logic
in
a
center
place
and
also
being
a
big
data
ready
with
the
cloud.
As
you
know,
there's
a
lot
of
open
source
tools
which
you
need
to
mingle
together
and
there
we
just
want
to
be
ready
to
to
connect
all
of
these
tools
and
also
be
have
like
a
state-of-the-art
tool.
So
I
said
we
from
on-premise.
We
went
to
the
cloud.
A
We
also
want
to
scale
out.
So
at
the
moment
we
have
sql
server
that
can
only
scale
up,
meaning
we
need
to
buy
a
lot
of
expensive
hardware.
In
case
we
have
a
lot
of
data
which
costs
a
lot,
especially
if
it's,
if
you
have
idle
time
so
you
pay
the
cost
anyway.
So
and
in
the
cloud
we
want
to
use
more
cheaper
mushy
machines
and
only
scale
up
if
we
really
need
the
resources,
so
we
can
really
also
scale
down
and
save
money.
A
A
A
A
We
have
jupyter
notebooks
to
do
ad
hoc,
analytics
or
machine
learning
models
and
then
the
measurement
files,
the
ones
that
we
talked
about
before
they
get
uploaded
directly
in
our
s3
storage,
and
there
we
ingest
into
our
data
warehouse
which
is
through
it
apache
druid,
is
our
replacement
for
our
cubes,
so
it
it
is
well
suited
for
us
because
it
has
a
sub
seconds
response
time
also
on
large
data
sets,
and
it
also
the
architecture
is
made
that
you
can
really
differentiate
the
ingestion
to
the
query
time.
A
We
also
use
spark
for
processing
the
data
and
also
creating
delta
tables.
We
have
general
service
in
our
road
and
charts
cloud.
We
use
kubernetes
for
that
and
then
there
we
get
like
monitor
logging
resource
scaling.
We
get
out
of
the
box
and
then,
of
course,
that
the
heart
is
is
stacked,
where
we
really
put
all
our
code
into
so
the
import
pipeline
is
is
mainly
based
on
eventing,
so
we
use
a
lot
of
event,
event-driven
pipelines.
A
A
As
mentioned,
the
sensors
is
a
big
part
of
it,
so
if
you
have
never
used
sensors
in
in
daxter,
this
is
this
is
how
it
looks
on
the
dacket
ui.
So
it's
very
nicely
done
that
you
can
really
see
all
the
dots.
A
That's
actually
when
you
trigger
a
new
sensor,
so
there
you
can
say
how
many
off
or
how
often
it
should
poll,
and
then
it
would
check
some
kind
of
log
python
logic
and
then
yeah,
we'll
start
some
chop,
so
you
can
see
which
got
spawned
and
then
you
can
really
click
all
the
pipelines
and
see
which
pipeline
has
been
started.
A
So
this
is,
from
the
ui
perspective,
a
very
nice
way,
but
also,
if
you,
if
you
implement
this,
so
the
only
thing
you
need
to
do
is
actually
to
to
put
an
annotation
at
sensor
and
name
your
pipeline.
You
would
like
to
start
and
then
for
us,
it's
very
handy.
We
use
we
use
s3,
mostly
with
sensors,
and
there
we
can
build
in
some
glue
logic.
So
we
need
to
we
define
which
files
we
would
like
to
read.
A
A
This
is,
let's
say
our
import
pipeline.
Just
to
give
you
an
overview
how
this
looks
so
we
don't
have
like
an
overall
pipeline,
as
mentioned
before.
If
we
upload
our
zipped
files
sensor
get
triggered,
it
will
like
parallelize
per
file
that
get
uploaded
all
of
the
unzipped
files
get
uploaded
again,
and
then
we
take
a
next
sensor
that
takes
each
of
this
file
immediately
and
run
the
atl
pipeline
on
it.
A
So
we
really
try
to
scale
as
much
as
possible
and
that's
also
why
we
use
sensors
because
we
have
different
granularity
between
these
sensors
or
pipelines.
This
etl
pipeline
will
then
do
all
the
the
logic
that
which
we
had
before.
So
we
have
our
facts
and
dimension
coming
out
here
as
a
parque
file,
and
then
this
gets
based
on
fact,
tables
gets
ingested
into
or
created
as
a
delta
table
on
our
s3
storage,
and
then
we
also
with
python
ingested
to
do
it.
A
A
This
is
the
pipeline
part,
but
we
actually
also
use
tax
for
provisioning.
So
whenever
we
have
a
new
user
or
we
have
a
new
tenant
in
our
cluster,
we
we
use
taxi
to
create
the
buckets
to
upload
the
secrets
to
our
vault.
We
create
the
druid
account,
so
the
more
we
use
tax
there.
We
also
find
new
use
cases
that
we
can
actually
integrate
with
it.
So
it's
not
just
for
data
pipelines.
A
That
is,
is
it
which
is
very
powerful
for
us,
but
also
for
for
other
kind
of
administrative
tasks
that
we
would
like
to
automate
exactly
and
then
we
also
use
assets.
So
assets
is
the
part
where
you
link
your
data
to
the
computation.
A
So
it's
also
maybe
a
rather
new
newer
feature,
but
it's
very
well
suited
for,
if
you
want
to
add
metadata
on-
or
you
also
want
to
show
your
persistent
data
to
your
customers,
so
we
played
around
with
the
etl,
so
we
added
some
size
or
durations
to
it,
and
then
we
could
immediately
see
if
there
was
a
spike
in
some
retail
jobs
or
then
we
could.
Actually
you
can
then
on
the
ui.
A
You
can
also
click
on
that
point
and
really
trick
build
drill
down
to
the
actual
pipeline
to
see
which
file
it
was
and
what
happened
there.
So
this
is
a
very
nice
feature
for
us
and
we
would
also
like
to
to
add
more
so
we
try
to
create
assets
for
all
our
persistent
three
tables
and
also
for
our
delta
tables
and
in
the
future.
A
We
also
like
to
try
the
data
lineage
that
just
got
added
to
dexter,
because
we
have
this
event
driven
pipeline,
that
from
the
zip
files
to
to
the
actual
druid
fact,
data
table.
Sometimes
we
we
have
some
wrong
data
and
then
it
would
be
nice
to
see
where
this
data
is
actually
coming
from
and
there
we
were
hoping
to
use
the
data
lineage
feature
to
actually
document
it
and
also
make
it
available
for
all
our
customers
and
also
for
us
as
engineers.
A
If
you
haven't
used
assets,
it's
actually
quite
easy.
The
only
thing
you
need
to
do
if
you
have
your
solid.
This
is
the
one
part
of
your
pipeline.
Besides
yielding
your
output,
you
just
yield
another
asset
materialization,
and
this
is
actually
exactly
the
what
we
do
here
for
to
get
the
output
that
we
just
saw
before.
So
we
just
just
add
a
key
which
is
the
unique
identifier.
So
here
we
are
still
playing
around.
A
A
So
so
far
we
use
it.
Let's
say
for
quite
a
bit
and
the
advantages
for
us
using
daxter
is
in
the
first
place
also
that
we
could
replace
our
custom-based
processing
engine
that
we
used
on-premise.
But
now
we
have
massive
out-of-the-box
features,
so
we
just
have
restart
capabilities.
We
have
backfill,
we
have
dependency
management
yeah,
we
see.
What's
top,
we
have
the
ui,
so
we
have
a
different
mode,
so
you
can
just
switch
from
production
to
local.
A
Just
you
have
this
modes
that
you
can
just
switch,
so
all
of
these
features
are
tested
and
stable,
so
we
just
get
them
out
of
the
box.
So
this
was
a
nice.
A
Nice
implication
for
us
when
we
start
using
daxter
and
the
beautiful
packet
or
the
ui
is
very
nicely
done,
and
it
shows
exactly
what
what
you
need
to
know.
It's
you
see
always
the
state
of
each
pipeline.
What
is
the
current
jobs
running
and
you
get
rich
metadata?
That's
also
a
big
plus
for
us
that
we
can
really
add,
also
customize
metadata,
to
show
make
available
in
the
yeah
in
the
ui
as
well
easily
without
doing
too
much
the
problem.
Solving
is
also
a
big
thing
for
us
before
we
used
other
methods.
A
Sometimes
it
was
actually
hard
to
find
the
error
itself,
so
we
had
to
go
through
pipelines
or
we
had
to
really
find
the
error,
and
sometimes
we
couldn't
even
find
it
so
with
tax.
You
really
have
the
error
straight
in
your
face,
so
you
really.
A
A
We
also
have
uses
developers
that
didn't
come
from
the
data
sphere
and
they
have
told
me
that
it's
easy
for
them
to
grasp
the
concepts,
so
all
the
concepts
with
the
resources
and
the
solids
and
the
pipelines
it
made
sense
to
them.
So
we
could
also
have
quite
fast
speed
up
with
new
developers
using
daxter
and
it's
actually
very
pleasant
to
write
pipelines.
A
When
you
have
a
such
a
nice
framework
around
you
with
yeah,
there's
a
lot
of
thoughts
in
it
that
you
don't
need
to
think
about
it's
just
built
in
and
then
everything
is
self-documented.
So
you
don't
need
to
actually
like
draw
diagrams
anymore,
because
the
pipelines
index
that
are
self-documented,
you
can
really
see
each
steps.
A
You
can
even
put
sql
statements
in
inside
there
to
really
see
what's
actually
going
on
with
that
assets.
As
mentioned
and
yeah,
you
you're
spending
less
time
actually
explaining
what's
going
on,
but
you
can
really
discuss
business
transformation
and
logic.
What
needs
to
be
done?
A
The
one
of
the
biggest
thing
for
me
was
the
reusable
of
code.
So
once
we
had
the
before,
we
had
microservices
in
python,
so
first
things
we
could
really
easily
move
them
to
daxter,
so
we
had
glasses
and
then
we
just
added
them
to
resources,
and
then
we
already
had
them
inside
taxes,
so
that
was
a
very
elegant
way
to
to
use
access
existing
code.
A
A
It
also
reduced
the
boilerplate
code
for
us,
because
when
you
have
microservices,
you
tend
to
integrate
the
logging,
the
restart
and
so
on
these
kind
of
things
you
always
need,
but
you
always
re,
implement
them
normally,
when
you
have
different
microservices,
but
with
taxed
you
you
do
that
once
or
you
you
even
get
it
already
out
of
the
box
or
you
build
it
yourself,
but
then
you
do
it
once,
and
that
was
very
good
for
us
and
it's
also
functional,
but
by
design.
A
A
And
then
you
do
that
exactly
once
and
then
you
can
use
it
in
your
pipelines
just
by
specify
the
resource,
and
then
you
can
access
this
with
the
context.
So
the
context
is
very
powerful,
so
all
resources
that
you
specify
you
can
just
with
context
resource
dot
through
it.
You
have
access
to
all
these
yep
resourced
methods.
So,
besides
that
the
context
also
has
additional
functions
that
you
have,
for
example,
the
run
id.
A
So
you
really
get
a
powerful
tool
there
and
you
don't
need
to
fiddle
around
with
passwords
and
usernames,
because
this
you
do
once
in
the
resource
and
then
you're
ready
to
to
actually
do
the
pipeline
logic,
which
is
yeah.
It's
also
another
very
nice
feature
of
dexter
when
we
started
there
wasn't
yet
the
kubernetes
deployment,
but
just
when
we
were
about
to
start
kubernetes,
actually
there
was,
and
so
we
we
are
heavy
using
kubernetes.
A
It's
also
very
handy
for
us,
because
we
have
a
sql
server
on
linux
in
one
deployment
and
then
another
one
we
use
spark.
So
it's
easy
to
separate
them
with
the
user
codes
and
we
can
also
scale
out
easily
with
new
ports,
and
it
makes
things
very
easy
for
us
and
also
understandable.
What's
going
on
and
that's
also
a
big
plus
for
us,
it's
piston
based.
So
the
language
of
the
data
nowadays
is
is
python.
So
it's
easy
to
learn
and
to
adapt
for
engineers,
and
it
also
supports
sql
statements.
A
So
you
can
have
your
solid
that
takes
sql
statements
and
then
you
can
easily
integrate
that
as
well
or
it
has
also
powerful
extensions
like
dpt
and
other
things
that
you
easily
integrate
with
it
so
and
yeah.
That's
this
and
the
next
steps,
so
we're
not
yet
using
that
much
unit,
testing
and
smoke
test.
So
we
would
like
to
improve
there
at
the
moment.
There's
a
lot
of
manual
testing
and
yeah.
We
would
like,
of
course,
to
have
some
test
files
that
automatically
get
tested
the
documentation
we
would
like
to
use
assets
more
intensively.
A
Maybe
even
automated
integrate
the
data
lineage
feature,
we
started
some
daxter
pipeline
guidelines
just
to
have
some
best
practices
around
the
let's
say
when
to
use
assets.
What
is
the
namings
how
to
use
resources?
So
it's
very
basic,
but
we
also
like
to
start
growing
that
one
and
specifically,
I
would
also
like
to
try
the
dynamic
orchestration,
as
taught
before
we
have
many
sensors
starting
low
level
pipelines,
but
with
I
would
like
to
try
if
we
can
also
use.
A
Maybe
one
pipeline
then
spawns
all
the
dynamically
all
the
sub
pipelines
and
also
partitioning
we're
not
using
yet,
but
everything
is
based
on
files
in
our
environment,
so
partitions
would
make
sense
there
as
well
yeah.
That's
it
from
my
side.
Just
contact
me
anywhere.
If
you
have
some
questions
or
yeah,
feel
free
to
ask
them
later
on.