►
Description
Presented by Rogan Hamby & Jason Ethridge, Equinox
Slides: https://drive.google.com/file/d/1yIb9nDlvEIzLmMASaQdP0SxH2ax3JSsX/view?usp=sharing
A
A
So
just
a
wee
bit
of
history
before
we
get
into
the
nitty
gritty
stuff.
Mig
is
a
tool
set
that
borrows
its
feel
from
git.
Now,
by
that
we
mean
that
it
is
a
command
line
tool,
it
is
kind
of
a
wrapper
for
other,
more
specialized
tools.
Jason
will
talk
later
about
things
like
mig,
add
mig,
remove
stuff
like
that
and
mig
is
simply
an
abbreviation
for
migration.
It
was
originally
developed
as
a
migration
tool
set
and
lives
inside
a
larger
migration
tool
repository.
A
However,
of
course,
migrations
are
ultimately
just
big
data
projects
right,
so
it's
not
terribly
surprising
that
a
tool
set.
That's
useful
for
migrations
is
also
useful
for
other
data
projects,
and
so
we'll
talk
about
that
later,
but
we're
going
to
focus
on
the
original
set
of
use
cases
that
led
to
the
creation
of
mig
first,
and
that
is
migrations.
A
It
is
a
collection
of
tools
that
are
primarily
written
in
perl,
although
you'll
find
a
number
of
bash
scripts,
xml
files,
sql
the
occasional
cat
picture
or
easter
egg
in
there
as
well,
just
whatever
has
been
useful
as
a
repeated
use,
utility
and
doing
migration
work,
and
these
live
on
github.
You
have
the
address
right
there.
It
is
under
the
equinox,
open
library,
initiative
organization,
migrationtools.get,
we
track
issues
there.
A
B
B
B
If
you're
going
to
use
the
mark
cleanup
utility,
then
you
need
that
and
if
you
have
trouble,
I
have
trouble
installing
a
text
csv
auto,
sometimes
often
I
put
those
directly
into
the
equinox
migration
lid
directory
and
and
when
I
do
that,
I
need
the
pearl
fiber
lip
as
well
yeah.
The
other
thing
about
mark
cleanup
is
good
for
adding
tags,
sequentially
numbered
on
the
fly,
and
you
don't
often
need
to
do
that
with
co-op
migrations.
B
This
is
how
data
can
be
represented
deep
under
the
hood
within
cohab,
so
a
table,
you
can
kind
of
think
of
a
table
as
a
tab
in
a
spreadsheet,
or
so
you
have
the
you
know,
different
columns
and
fields
and
each
row
and
a
table
is
kind
of
analogous
to
a
row
in
a
spreadsheet,
and
a
migration
process
is
to
take
data.
That's
similar
to
this
and
to
beat
it
into
shape
where
it
fits
within
these
rows
and
columns.
B
A
If
you
have
a
tool
set
for
converting
data
from
a
certain
xml
or
json,
or
even
pdf
or
whatever,
once
you
get
it
into
a
tabular
format,
it'll
work
with
mig,
and
I
mentioned
the
migration,
the
larger
migration
tools
repo
a
few
minutes
ago.
There
are
a
number
of
non-mig
tools
in
there
that
are
specific
to
certain
data
sources
as
well
as
there
are,
of
course,
other
repositories
elsewhere
that
are
sometimes
specific
to
given
ilses
or
data
sources
for
converting.
B
B
Csv
is
not
always
csv,
it's
not
a
rigorous
standard
there
in
my
opinion,
but
we
do
have
tools
for
dealing
with
it.
So
there's
a
you
know,
clean
up
step
that
can
kind
of
fix
a
lot
of
csv
for
you.
A
Yeah,
I
I
think
calling
csv
a
standard
might
be
a
stretch.
A
a
frequently
abused
gentleman's
agreement
might
be
the
best
description
of
what
csv
really
is.
It's.
B
Kind
of
like
sif
in
that
regard
and
what
you're
seeing
there
on
the
screen
is
actually
pipe
separated,
pipe
delimited
data
in
some
ways:
that's
preferable
to
this
pure
csv,
because
you
don't
often
see
an
actual
pipe
in
the
data
itself.
So
you
don't
have
to
worry
about
those
delimiters
being
escaped
or
quoted.
A
So,
ultimately,
this
is
your
goal
of
converting
your
data
to
get
to
a
line-oriented
file
or
what
I
tend
to
call
tabular
file
and
then
you're
going
to
want
to
stage
it
like.
We
said-
and
I
said
before-
this
was
first
talking
about
the
use
cases
of
mig
why
we
came
up
with
make
so
we're
going
to
start
with
a
non-optimal
way
to
do
this
migration
process.
A
So
what's
the
most
non-optimal
way,
you
can
do
it.
The
most
non-optimal
way
is
to
do
everything
by
hand,
sit
down
type
out,
create
table
user,
underscore
data
bracket
and
I'm
not
going
to
go
through
every
line,
but
obviously
list
out
every
column
of
data
list
out
its
data
type.
A
A
tab
whatever
and
during
this
process
you
need
to
watch
out
for
collation
because
it's
easy
to
end
up
with
a
table
encoding
that
is
not
going
to
play
nice
when
connected
via
joins
to
other
tables
in
the
system.
So
you
got
to
be
very
careful
about
that,
and
so
this
is
the
very
manual
long
time-consuming
way
to
do
it
and
I
feel
like
we
could
probably
have
two
or
three
slides
of
gotchas
here.
What
do
you
think?
Jason.
B
A
B
B
A
Now
here
on
the
slide,
we
just
have
a
couple
of
very
simple
manipulations
card
number
maps
directly:
the
user
id
or
rather
user
id
to
card
number
user
last
name,
maps
directly
to
surname
and
we're
just
trimming
the
spaces
off
the
end.
But
some
things
are
going
to
have
much
more
complicated
manipulations.
Let's
take
names.
For
example,
it's
not
uncommon
to
not
have
a
last
name
in
its
own
column.
A
It's
actually
pretty
common
to
get
data
where
you
have
something
like
smith,
comma
space,
jane,
comma,
mary
and
you're,
going
to
have
to
take
substring
commands
in
sql
with
positions
and
start
splitting
that
stuff
up
and
putting
things
into
surname
first
name.
Maybe
other
name,
if
you're
going
to
keep
the
middle
name
or
combine
first
and
middle
into
first
name.
However,
you
want
to
do
it
and
you
may
have
to
bring
in
data
from
other
sources,
some
of
which
may
be
several
tables
of
connection
removed.
A
So
if
you're
going
to
go
straight
into
borrowers,
it's
going
to
be
a
lot
of
very
complicated
manipulation
and
the
more
complicated
the
migration
is.
Of
course,
if
you
have
to
do
it
all
in
one
single
big
transaction
shot,
the
more
chance
there
is
of
some
sort
of
error,
sneaking
in
any
more
thoughts
on
that
jason.
B
B
I
mean
you
could
comment
out
that
insert
and
just
do
the
select
and
see
if
it
blows
up
on
you,
but
if
you
get
a
lot
of
if
you're
dealing
with
a
lot
of
data,
you're
not
going
to
be
able
to
tell
at
glance
necessarily
or
browse
through
or
skim
that
data
and
catch
edge
cases,
whereas
if
you're
doing
something
like
this,
you
know
on
the
next
slide
in
a
way
where
you're
actually
manipulating
things
within
tables
before
pushing
them
into
production,
then
it
gets
easier
to
catch.
That's
what
it's
done.
A
Yeah
and-
and
I
would
argue
that
even
if
you
have
something
that's
super
simple-
it's
probably
a
bad
habit,
yeah
yeah!
So
let's
talk
about
better
staging
tables.
A
little
bit
jason
already
mentioned,
there's
a
next
slide
and
boom
like
magic.
Here
it
is
when
we
talk
about
better
staging
tables,
I'm
going
to
start
talking
about
quite
a
few
conventions
here
that
we
follow
in
our
own
workflows
and
have
kind
of
snuck
their
way
into
mig.
Now,
perhaps
in
a
perfect
universe,
tools
are
completely
agnostic,
they
don't
take
in
any
of
your
workflow
conventions.
A
A
So
I
want
to
make
you
aware
of
them
as
we
talk
about
them
and
why
we
have
them
as
well
as
that
they
exist
so
a
better
staging
table
instead
of
manually
creating
a
staging
table.
We
start
by
creating
a
table
like
one
that
we
want
to
inherit
columns
from
so
create
table
m
borrowers
m
underscores
our
convention
that
it's,
a
migration
related
table
like
borrowers,
and
what
this
will
do
if
you're
not
familiar
with
it,
is
basically
create
a
second
borrowers
table
with
a
new
name.
A
Now.
This
is
the
part
where
jason
and
I
both
feel
a
little
bit
of
paying
of
wishing
that
my
sequel,
slash
mariadb
had
child
tables
like
postgres
does
because
then
you
can
actually
share
a
number
sequence
and
my
sequel
and
maria.
When
you
do
like
it's,
not
a
child
table,
it's
actually
completely
separate.
Just
has
an
identical
structure
and
the
sequence
is
going
to
be
completely
separate.
A
So
there
are
a
few
things
we
have
to
do
in
order
to
deal
with
that
and
we'll
talk
about
that
in
just
a
few
minutes
now.
What
is
the
use
of
having
a
copy
of
borrowers?
Well,
the
advantage
is
that
all
that
manipulation
I
said
we
don't
want
to
do
in
a
production
table,
yet
we
get
to
do
it
in
this
m
underscore
borrowers
instead
and
then,
when
we're
ready,
just
copy.
A
All
that
stuff
over
to
borrowers
and
the
real
value
of
this
comes
in
when
we
look
at
the
next
couple
lines
on
the
slide,
alter
table,
add
column,
and
we
have
two
conventions
we
follow
here.
One
is
l,
underscore
l
underscore
means
it's
legacy
data
we
are
bringing
that
data
in
from
one
of
those
tabular
files
that
we
talked
about,
and
our
convention
is
that
we
never
alter
that.
A
That
is
a
pristine
copy
of
the
data
as
we
pulled
it
in
so
that
if
it's
not
what
was
exported
from
the
system,
it's
at
least
what
was
in
the
files
that
were
imported,
and
this
allows
us
to
know
that
if
something
doesn't
match
up
somewhere,
we
need
to
look
for
maybe
an
issue
with
loading,
the
data,
but
one
way
or
another.
This
is
the
migrated
data
x
underscore
is
our
convention
for
something
that
we've
manipulated
or
calculated
in
some
way.
A
A
A
We
immediately
do
an
update
where
we
say
we
take
the
x
borrower
number
and
we
set
it
to
the
borrower
number.
We
set
the
x
borrower
of
the
migration
table
to
the
actual
borrower
number
in
the
real
production
table
based
on
the
card
number.
At
that
moment.
That
means
that
we
know
from
then
on
that
x.
Borrower
number
represents
the
actual
production
row,
regardless
of
what's
changed
on
that
borrower's
account
from
then
on.
A
A
A
A
B
B
B
Surname
I'll
use
your
last
name.
If
you
had
to
do
more
convoluted
munging,
that's
where
you
would
do
it
as
well,
you
could
put
you
know,
replace
and
regaps
replace
and
there's
those
sort
of
things
in
here
as
well
x,
migrate!
That's
another
convention!
We
do
during
mapping.
Libraries
will
often
take
met
the
mapping
process
as
an
opportunity
to
clean
up
their
data.
B
A
Yeah-
and
I
will
say
that
when
we
talk
about
legacy
columns,
l
columns,
they're,
usually
text
just
for
convenience,
sometimes
we
might
manually
go
in
and
change
them
to
a
var
care
of
a
certain
length
if
we
want
to
index
them
or
something,
but
the
x
underscore
often
data
typed.
So,
for
example,
x
migrate
is
usually
a
tiny
end,
because
we
only
need
a
one
or
zero
and
others
will
often
be
of
a
data
type
for
something
convenient
to
move
over
into
a
production
column.
B
There's
a
another
reason
to
do
things
like
this.
If
you
think
back
on
the
the
first
bad
example
of
inserting
directly
into
production
tables,
mysql
has
this
bad
habit
of
truncating
data,
and
yes,
so
you'll
see
warnings
and
things
you
can
do.
Show
warnings
after
you
do
that,
but
if
you've
already
pushed
it
into
a
production
table,
that's
kind
of
too
late
yeah.
B
But
this
works
I
mean
so
you
have
your
staging
table.
You've
done
all
your
mapping
and
you
just
insert
into
the
production
table
the
same
columns
in
the
same
order
and
you
flag
it
where
you're
just
doing
the
ones
that
are
supposed
to
migrate,
so
possibly
a
subset
of
what's
in
the
staging
table,
so
we've
talked
about
staging
tables
and
but
sometimes
we
do
still
want
to
insert
directly
into
things
and
those
things
can
be
staging
tables,
but
you
don't
have
to
necessarily
lunge
or
mat
them
the
same
way.
B
So
this
is,
you
know,
it
still
gives
you
easy,
iteration
you're,
not
polluting.
You
know
a
production
table
or
anything,
but
sometimes
you
have
derived
tables
or
auxiliary
tables
where
it's
going
to
be
pretty
simple.
Just
to
do
things
like
this
this
in
this
example,
this
is
a
staging
table.
That's
going
to
eventually
get
pushed
into
the
statistics
table
in
cohabi,
and
this
is
specifically
for
circulations,
but
you
might
have
another
staging
table.
That's
you
know
intended
for
statistics,
but
it's
for
something
else
related
to
binds
or.
B
Whatever
and
finally
other
things
you
might
want
to
stage
so
we
talked
about
hard-coded
mapping,
but
you
know
these
for
the
migrations.
We
do.
We
often
give
the
libraries
an
opportunity
to
determine
exactly
how
they
want
things
to
map.
So
there's
a
bit
of
more
data
driven
or
soft
coding
going
on
here
we
usually
give
them
spreadsheets
and
then
they
kind
of
pick
and
choose
how
they
want
to,
especially
with
consortia
how
you
want
to
map.
A
Now,
all
through
this
we've
talked
about
bad
ways
to
do
things
and
good
ways
to
do
things,
but
we
haven't
really
talked
about
how
mig
plays
into
it.
So
we're
going
to
do
that
now
and
as
we
talk
about
mig,
what
we're
going
to
talk
about
is
going
to
go
back
to
that
better
way,
to
do
things
better
ways
to
stage
better
ways
to
map
better
ways
to
load
and
how
mig
allows
us
to
take
a
migration.
A
So
let's
meet
the
mig
tools.
These
are,
I
don't
think
this
is
all
of
them.
We
may
be
missing
one
or
two
off
here,
but
here
are
a
bunch
of
the
kmig
tools
and
you'll
hear
us
talk
about
kmig
k-mig
is
in
contrast
to
e-mig
when
mig
started,
it
was
just
mig
and
it
was
for
evergreen
as
we've
done
more
and
more
coho
migrations
and
we
do
a
pretty
steady
stream
of
them.
Now
we
discovered
we
really
missed
having
these
tools.
We
used
an
evergreen
for
co-op
and
they
weren't
really
evergreen
tools.
A
Actually
they
were
tools
for
postgres,
so
we
took
those
and
we
split
them
so
that
one
set
of
tools
could
cater
to
evergreen's,
idiosyncratic
needs
and
postgres,
but
we
could
have
the
same
functionality
for
koha
and
mariah
db
and,
as
it's
turned
out,
some
tools
have
turned
out
a
little
bit
different.
A
B
Yeah,
but
just
to
back
up
a
little
bit
about
the
how
this
is
modeled
after
git,
I
was
really
enamored
but
I'll
get
sub
commands
and
how
you
implement
those
sub
commands,
and
it
also
gives
you
kind
of
a
conceptual
framework
for
doing
migrations
here.
B
So
once
you
learn
this
and
you
just
get
kind
of
the
same
steps
and
the
same
it
kind
of
encodes
or
enshrines
the
workflow,
and
one
of
the
first
steps
is
to
create
the
environment,
which
is
what
references
what
database
you're
dealing
with
and
your
where
the
data
is
actually
located
on
the
file
system.
You
can
bundle
all
that
up,
and
so,
if
you
jump
in,
if
you're,
jumping
back
and
forth
between
migrations,
this
helps
you
keep
things
straight
without
you
know,
cross-contaminating
anything.
B
It
actually
writes
things
into
the
database
once
you
have
it
configured
which
one
you're
talking
to
it
creates
a
tracking
table
that
gets
used
by
the
quake
and
ad
commands,
and
a
few
other
things
to
get
added.
Some
convenience
store
procedures
that
are
used
are
useful
for
migrations
for
the
mapping
step.
B
Mig-Link,
this
is
actually
that's
what
associates
a
incoming
source
file
with
an
existing
production
table
and
it'll.
If
it
doesn't
already
exist,
it
will
do
that
whole
create
m
borrowers
like
borrowers
type
thing
when
you
specify
it
here
so
this
this
makes
those
those
more
useful
staging
tables
for
you,
but
it's
optional.
You
don't
have
to
associate
a
staging
table
with
a
production
table
you
want
to
take
over
on
the
other
ones
right.
A
Sure
k-meg's
status
is
very
simple
and,
and
let
me
say,
backing
up
we're
going
to
go
into
more
detail
on
our
upcoming
slides
about
each
of
these.
This
is
just
to
give
you
a
quick
overview
for
context.
A
Kmig
status
just
tells
us
what
status
the
environment
is
in,
and
it's
going
to
give
us
information
quick
sheet.
Quick
sheet
is
extremely
useful.
It
gives
you
a
statistical
overview
of
what's
in
a
tabular
data
source,
so
you
have
these
files.
You
want
to
get
a
quick
overview
of
them.
What
kind
of
data
is
in
each
column?
What
kind
of
value
ranges
you
have
things
like
that
and
is
extremely
useful
if
you
are
working
with
a
project
manager
or
other
people
on
your
project
who
aren't
technical,
they
don't
know
tools
like
awk
and
grep.
A
A
A
So
when
we
bring
a
library
in,
we
will
take
an
initial
extract
of
their
data,
set
them
up
on
a
test
system
after
we've
scripted
the
load
they're
going
to
work
on
that
a
lot
and
then,
instead
of
making
them
manually,
recreate
every
little
thing
on
production
that
they
did
on
test.
We'll
often
do
an
export
of
one
or
more
options
from
this
tool.
We'll
talk
more
about
those
options
when
we
get
there
and
then
just
import
them
into
the
production
system
form
so
easy.
Peasy.
A
A
When
we
create
the
environment,
it's
going
to
under
our
home
folder
in
a
little
hidden,
folder
called
kmig,
create
an
env
file
and
in
that
env
file
will
be
the
information
to
create
this
information
that
you'll
see
in
the
environment
and
a
lot
of
it's
going
to
be
pretty
obvious
kind
of
stuff.
What
is
your
mysql
database?
What
is
your
mysql
user?
Your
password,
your
host,
all
that
kind
of
stuff
by
default.
A
You
know
one
where
you
have
the
quahog
conf
xml
file
for
it
under
coha
sites
and
all
that
it
can
pull
a
lot
of
this
information
automatically
for
you
from
that
file
and
populate
it
for
you,
and
we
have
a
few
conventions
that
are
again
defaults
in
kmig,
such
as
our
convention
of
using
a
get
migration
work,
folder
and
a
data
folder.
That
is
where
we
tend
to
toss
the
raw
data
files
get
migration
work.
We
use
shared
git
repository,
that's
internal
to
us
for
our
scripts
and
things
like
that.
But
you
can
change
these.
B
A
Environment
use
is
obvious:
after
you've
created
your
environment,
you
want
to
actually
use
it,
and
these
are
going
to
be
the
system
environment
variables
that
it
creates.
Some
of
these
are
used
by
k-mig
directly.
Actually,
all
of
them
are
used
by
kmig
and
one
way
or
another,
and
you
see
down
there
a
shell
process
id
that's
because
it's
going
to,
of
course
create
a
shell
for
you
to
do
this
in.
B
So
with
the
mig
environment,
there's
one
thing
that
we
do
on
the
evergreen
side
that
we
couldn't
propagate
on
the
go
high
side
and
that's
you're
using
quahogis
instance
names
here
and
that's
where
you
do
your
migration,
but
on
the
postgres
side,
you're
actually
going
to
specify
a
migration
schema
and
you
could
have
more
than
one
migration
schema
for
the
same
migration.
If
you
want
to
partition
things
a
bit
more,
so
there
may
be
some,
maybe
some
warts
with
this.
B
B
So
this
is
spelling
out
the
the
tracking
column
we
use
and
a
some
other
file.
Some
other
things
are
a
bit
more
experimental
but
are
still
there
base
staging
tables
which
mig-link
can
biblicate
on
the
fly,
but
we
go
ahead
and
pre-create
the
more
common
ones
here,
and
then
we
have
utility
function,
stored
procedures
that
are
useful
for
manipulating
mark
and
strings
and
things
you
might
find
in
legacy.
Data.
B
And
you
only
call
that
once
make
quick
make
quick,
is
a
wrapper
around
a
big
ad
and
there's
another
make
tool
called
icon
b,
which
we
don't
actually
use
that
much
that's
kind
of
legacy.
Mig
clean.
We
actually
probably
should
have
created
a
slide
from
it.
Clean
make
clean
as
a
wrapper
around
the
clean
csv
tool
in
the
migration
tools.
B
Repository
and
clean
csv
will
actually
parse
the
csv
file
and
if
it
finds
errors,
if
it
can't
parse
something
it
actually
brings
up
an
exception
for
you
to
handle
there
on
the
fly
and
then
once
you
handle
it,
it
remembers
how
you
handled
it
and
it.
Furthermore,
it
can
actually
apply
that
fix,
based
on
matching
patterns
to
other
rows
that
remain
in
that
file.
So
it's
a
very
useful
tool,
but
yeah
make
quick.
A
Yeah
and
one
of
the
things
to
say
about
the
csv
clean
that
is
extremely
useful.
Even
if
you
don't
learn
some
of
the
advanced
functionality
is
by
giving
it
a
file
of
headers.
It
will
know
how
many
columns
should
be,
and
it
will
check
each
row
for
that
number
of
columns
and
if
it
can
parse
the
number
the
rows
correctly
so
extremely
useful.
A
Mig
link
I
mentioned
earlier
making
it
super
easy
to
say
that
an
incoming
file
would
be
associated
with
a
table
in
the
database.
This
is
what
does
that.
So,
let's
say
you
have
a
m
items
table.
A
You
have
an
m
items
table
because
you're
going
to
be
putting
items
in
the
database
simple
enough
right
and
then
you
have
an
items
tsv
from
your
legacy
system
that
you're
bringing
over
from
whatever
vendor
it
doesn't
matter,
and
you
want
to
bring
that
into
a
whole
bunch
of
l
columns,
and
maybe
I
mean
let's
say
this
is
a
very
robust
system
and
it's
got
60
columns.
That's
really
tedious
to
do
it
all
by
hand,
but
with
mig
link
you
can
tell
mig
hey.
A
This
file
is
supposed
to
be
associated
with
this
table.
So
what
mig
is
going
to
do
is
go
through
the
that
list
of
headers,
whether
they're
in
a
separate
file
or
on
the
first
row.
It
is
going
to
remove
spaces,
put
an
l
in
front
of
them
and
use
those
as
definitions
to
add
on
to
the
m
items
table
and
when
it
creates
the
sql
file
for
staging.
A
B
Yeah
everything
he
said
happens,
but
most
of
the
work
is
actually
done
by
may.
Convert
and
mig-link
is
just
very
simple.
It
just
records
a
reference
dissociation
in
the
tracking
table.
A
Yeah-
and
we
probably
should
have
made
mig-
convert
a
separate
slide
here,
but
because
you
can
use
mig
convert
directly
and
by
doing
that,
it'll
just
get
loaded
into
its
own
table
that
you
specify
say
items
underscore
tsv
underscore
from
someone.
I
don't
I'm
not
very
good
at
making.
You
know
clever
table
names,
and,
but
you
do
see
down
here
from
the
output
of
mig
convert
the
writing
m
underscore
items.stage.sql.
A
A
It's
about
analyzing
data,
but
these
tools
are
about
data
work
in
general
and
everything
you
need
for
a
project,
and
this
is
an
example
of
taking
a
file,
this
items.tsv
and
creating
an
items.tsv.mapping.xls
because
jason-
and
I
both
like
really
robust
names
that
tell
you
every
single
thing
that
went
into
making
this
file
and
it
will
go
through
and
analyze
these
columns
to
give
you
useful
data
afterwards,
such
as
here's
the
legacy
column,
here's
how
many
rows
actually
have
data
in
them
from
that
file.
So
in
this
case
there
are
19981
rows.
A
A
And
here's
a
little
bit
more
information
that
will
be
in
another
part
of
that
xls
file
that
excel
spreadsheet,
that's
created
and
examples
of
legacy
values
that
are
in
there.
So
here
you
can
see
that
in
this
column,
that's
being
analyzed,
you
can
see
that
two
of
them
say
book
club,
29,
say
in
repair,
8,
say
2b
withdrawn
so
again,
stuff
that
allows
you
quick
analysis
without
having
to
manually
write
each
query.
A
Next
up
is
kmig
reporter
one
of
the
common
needs
that
we
have
as
we
do.
Things
is
reports
on
data
and
people
not
only
want
reports,
but
they
actually
want
them
to
be
pretty,
and
things
like
that,
so
that's
where
kmig
reporter
came
from.
We
also
want
to
be
able
to
store
reports
in
something
that's
friendly
to
use
and
get
what
kmig
reporter
does.
Is
you
pass
it
a
title?
There
are
some
optional
ones
you
can
pass
as
well.
You
can
give
it
the
name
of
data
analyst,
if
not
it'll,
just
say
data
analyst.
A
You
can
pass
it
some
external
files
to
use
an
introductory
section
and
things
like
that,
and
then
it
will
go
to
a
stock
collection
of
reports
that
are
in
a
cohab
xml
file,
run
through
those
and
give
you
output
and
that
output
is
going
to
look
a
little
bit
like
this.
It
is
an
ascii
doc
file,
so
those
not
familiar
with
ascii
doc,
it's
kind
of
like
markdown.
A
It
is
a
formatting
language.
It
happens
to
be
used
in
the
evergreen
community
for
documentation,
so
that
was
the
purpose
behind
choosing
it.
If
that
hadn't
been
the
case-
and
it
wasn't
historical-
I
probably
would
have
done
it
in
markdown
to
be
honest,
but
it
gives
you
an
s
doc
file
for
your
reports.
That's
very
easy
to
throw
into
get
there's
also
plenty
of
tools
out.
There
that'll
convert
this
into
nice,
html
or
pdf
for
people.
A
There
are
also
the
capability
for
this
to
support
non-stock
xml
report
files.
So
if
you
have
a
custom
data
reporting
need,
you
can
write
a
new
set
of
reports
and
put
them
in
an
xml
file
and
when
you
run
kmig
reporter
just
point
it
at
that
and
run
those
custom
reports,
you
don't
have
to
do
the
stock
ones.
You
don't
have
to
change
the
stock
file.
You
can
point
it
at
some
totally
different
collection.
Next
up
is
bib
stats.
A
As
I
said
before,
this
is
just
a
sort
of
oddball
collection
of
information
when
looking
at
a
mark
file,
so
a
very
standard
way
for
me
to
start
a
data
project
is
to
get
a
mark
file
from
somewhere
and
people
to
say
hey.
This
is
supposed
to
have
this
number
of
bibs
in
it,
and
it's
from
this
source
and
here's
what
we
know
about
it
and
my
starting
position
is
to
trust
but
verify
you
know
so
I'll
run
it
through
this
and
have
it
say
how
many
bibs
are
really
in
it.
A
What
does
the
zero
nine
in
the
leader
say?
Does
it
at
least
think
it's
unicode
well,
of
course,
it'll
depend
on
whether
the
source
system
actually
enforced,
that
or
not
more
often
than
not.
The
statement
in
the
zero
nine
is
more
of
a
hopeful
declaration
than
a
definitive
definition.
What
does
the
leader
say?
It
is
I
obviously
that's
not
a
full
breakdown.
You
really
need
this
zero,
zero,
seven
and
eight
for
more
information,
but
you
know
it's
enough
to
give
you
a
quick
idea
of
what's
in
there.
A
You
know
what
are
in
some
of
the
does
it
have
245
zeros
does
have
100
zeros,
because
I'm
checking
to
see
if
there
are
authorities
are
there
856's
that
might
be
indicative
of
overdrive
and
things
like
that,
and
then
a
little
bit
of
quick
and
dirty
holdings
analysis.
This
does
not
definitively
say
what
the
holdings
are,
but
I
look
for
holdings
that
have
certain
formats.
A
A
And
then
I
mentioned
before:
okay
make
export
and
import.
This
is
an
example
of
me
running
kmig
export-
and
here
you
see
some
of
the
stuff
that
it's
sending
out
and
I
probably
should
have
a
more
sophisticated
term
here
than
stuff,
but
authorized
values.
Book
sellers
budgets,
borrower,
attributes,
calendar
circles,
item
types-
I
didn't
put
it
all
here,
there's
some
more
at
the
bottom,
but
what
it's
exporting
is
not
just
a
dump
of
those
tables.
A
It's
actually
a
little
bit
more
involved
than
that
and
sensitive
to
what
the
data
is.
So,
for
example,
some
of
these
tables
have
a
sequence
that
is
actually
used
and
if
you're
going
to
pull
those
tables
into
another
system,
you
need
to
make
sure,
before
you
truncate
data
out
of
that
previous
system,
that
it's
going
to
obey
the
new
sequence
rules,
so
that
information
is
in
there
also
for
some
of
these
they
have
to
pull
in
data
from
multiple
tables.
A
So
these
are
artisanally.
Crafted
data
sets,
if
you
will,
and
then
we
have
kmage
import
and
k-meg
import
is
simply
taking
everything
that
k-mag
export
does
and
following
those
same
rules
to
bring
it
in.
So
if
it
has
to
reset
a
sequence,
it
does
that
if
a
table
needs
to
be
loaded
before
another
table
as
a
prerequisite
for
that
data
set,
it
does
that
all
those
sorts
of
things.
A
So
it
is
sensitive
to
schema
changes,
and
all
of
that
brings
us
to.
Why
do
we
use
it?
Big
picture
wise.
I
mean
it's
nice
to
be
lazy.
That's
a
virtue
of
programming.
According
to
larry
wall.
You
want
to
not
take
unnecessary
effort,
but
there
are
actually
a
lot
of
advantages
beyond
automation
to
why
we
use
the
kmig
toolset,
and
some
of
those
benefits
include
easier
iterative
testing
and
less
churn.
A
B
Yeah
there's
a
bit
of
the
unix
philosophy
here,
where
we
want
small
tools
that
do
one
thing
we
just
happened
to
put
the
monitor
that
big
umbrella
yeah
and
the
other
thing
is.
We
didn't
want
a
program
that
tried
to
do
everything
with
just
the
press
of
a
button.
So
all
these
tools
they
produce
artifacts
and
all
those
artifacts-
are
text
and
they're
easy
to
go
into
get
repositories
or
to
diff
and
manipulate
you
can
you
know
you
can
interject
yourself,
but
between
these
milestones,
workflow
milestones
do.
A
You
want
to
talk
a
little
bit
about
you,
know
the
etl
and
all
that
kind
of
stuff.
B
Yeah,
so
we
call
these
data
migrations,
but
other
industries.
They
use
the
word
etl
a
lot.
Etl
has
some
different
connotations
to
it.
It
stands
for
extract,
transform
and
load,
and
often
that's
for
more
for
things
that
recur
a
lot.
So
it's
like
entire
pipelines,
where
you're
constantly
moving
from
one
data
source
to
another.
Migrations
are
like
that,
but
you'll
find
even
say
the
same
version
of
a
legacy.
B
Software
system
that
libraries
will
use
that
system
differently
and
they
will
open
those
fields
and
try
to
work
around
limitations
in
the
system
so
that
we
do
get
code
reused.
There's
this
almost
never
goes
without
some
editing
needed
some
tweaks
and
yeah.
We
have
tools
for
extracting
data
from
these
systems
tools
for
munjukit
and
these
are
useful
and
for
quahog
and
evergreen
context
and
the
ones
we
like.
You
know
some
of
the
other
ones
we
kind
of
wrap
and
make.
A
A
So
our
sort
of
compromise
is
doing
the
same
thing
that
we
do
with
tables
and
we
put
an
m
underscore
in
front
so
that
it,
while
it's
not
terribly
likely
if
somebody
else
creates
an
upsert
data
field,
function,
we're
not
conflicting
with
names,
and
we
create
these
and
load
these
through
mega
knit
like
a
lot
of
other
stuff,
and
these
are
a
combination
of
utility
and
just
quality
of
life,
things
on
the
utility
side.
You
have
things
like
update
leader,
update
003
upsert
data
field.
A
These
are
manipulating
bits
of
the
mark
and
are
just
handy
things
to
have
around
for
manipulation.
Sure
you
could
do
these
in
other
ways,
but
it's
just
nice.
Then
you
get
to
things
like
m
split,
string,
m
string
segment
count.
These
are
just
wrappers
around
more
complex
callings
of
things
like
substring
and
substring
position.
You
certainly
don't
have
to
use
these
they're,
not
reinventing
the
world,
but
they're
awfully
convenient
and
make
your
code
way
more
readable
when
you're
doing
a
ton
of
string
manipulations
on
legacy
data.
A
Why
have
this
big,
convoluted,
nested,
strings
of
substring
and
substring
position
calls
in
order
to
pull
out
the
middle
name
from
a
text
field
when
you
can
just
split
it
easily
with
the
function
after
that?
I
wanted
to
say:
kmig
is
not
a
static
thing,
we're
constantly
using
it.
I'm
constantly
looking
for
more
things
to
add
to
the
reporter
and
bib
stats.
For
example,
some
things
are
fairly
static
and
have
been
around
a
long
time,
but
still
even
with
those
they
receive
occasional,
tweaks
and
changes.
A
So
this
is
a
tool
set,
that's
constantly
in
use
and
constantly
receiving
updates,
and
I
welcome
people
to
contribute
to
it.
We
make
it
available
to
the
larger
community
because
we
want
our
work
to
be
useful
to
other
people,
but
we're
also
perfectly
willing
to
take
advantage
of
your
labor.
B
So
there
are
other
tool
sets
out
there.
We
started
this
with
evergreen,
so
this
was
a
natural
evolution
of
the
evergreen
tool
chain
for
for
doing
migrations,
we're
not
above
cribbing
from
these
other
tools
when
needed,
especially
for
extracting
data
from
legacy
systems,
but
a
lot
of
times.
They
are
following
a
different
philosophy
and
they
aren't
using
staging
tables
like
we
are
so
they're
there
and
we're
not
trying
to
denigrate
them
or
suffer
from
not
inventive
or
syndrome,
but.
A
So
at
this
point
I
want
to
make
sure
that
everybody
has
a
chance
to
get
a
hold
of
us
of
us
if
they
want.
If
you
have
any
questions,
if
you
have
feedback
thoughts,
anything
you
want.
Here's.
Our
email
addresses
jason
at
equinoxinitive.org
and
myself
our
hamby
at
equinoxinitive.org
we'll
be
glad
to
chat
with
you.
You
have
any
parting
words
jason
before
we
sign
off
yeah
all
right,
bye,
bye,.