►
From YouTube: Data Sets for ML Chip design
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
B
Okay,
perfect,
thank
you.
My
name
is
Aman
Arora
I'm,
a
PhD
candidate
and
a
graduate
fellow
at
UT
Austin
a
little
bit
about
the
background
of
you
know
why
I'm
presenting
this
talk
here
today,
I
I
was
a
leader
in
the
AI
fpga
Committee
in
the
osfpga
foundation,
which
recently
got
you
know,
joined
hands
with
chips
Alliance.
B
So
this
project
that
I'm
presenting
here
was
actually
something
that
we
had
been
working
on
at
UT
in
my
research,
lab
and
I
was
trying
to
get
some
traction
on
this
in
the
in
the
committee
at
osfpj,
and
now
that
you
know
it's
everything
is
a
part
of
Chip's
Alliance
through
through
this
presentation
and
kind
of
I'm,
going
to
give
you
an
overview
over
this
project
as
what
we
have
done
so
far
and
also
kind
of
do
a
you
know,
a
call
for
contributions
towards
is
the
end
to
see.
B
If
you
know
people
are
interested
in
this
kind
of
work
and
if
there
is,
you
know
a
scope
for
you
know
we
will
be
submitting
this
project
as
a
Sandbox
project
under
chips.
Alliance
and
seeing
you
know
which
working
group
does
this
fit
under
or
maybe
starting
a
new
working
group,
or
something
like
that?
So
that's
the
background
and
the
title
of
the
talk
is
data
set
for
ML,
guided
chip,
design,
I'm
gonna,
get
started
now,
but
before
actually
I
go
into
the
details
of
this
talk.
B
I
want
to
thank
Qigong
way.
Who
is
a
PhD
student
in
our
lab?
He's
the
gorgeous
person
who's
doing
the
Hands-On
work
on
this
project
and
I
also
want
to
thank
my
advisor.
You
know
Lizzy
John,
you
know
who
has
been
funding
all
of
us
so
far,
all
right,
so
ml
or
machine
learning
has
been
used
for
chip
design
for
a
lot
of
work.
You
know
I'm,
showing
here
a
few
of
the
papers
that
I've
come
across
in
the
recent
past.
You
know
the
works
in
this
area.
B
You
know
range
from
applying
ml
to
you
know,
doing
placement
or
floor
planning
better,
which
is
the
famous
paper
from
Google
to
you
know
doing
prediction
of
of
some
Metric
like
predicting
resource
usage
for
an
fpga.
You
know
design
using
an
ml
model
or
trying
to
predict
the
power
consumption
of
running
some
code
on
an
fpga
or
a
GPU,
given
the
power
consumption
or
given
the
performance
counters
when
that
application
is
run
on
a
CPU
so
and
actually
specifically,
because
I
come
from
the
fpga
a
field.
B
That's
my
main
area
of
research.
There
are
a
lot
of
papers,
applying
animal
to
perform
prediction
or
improving.
The
estimates
of
you
know:
area
perform
area
frequency
power,
consumption
Etc
that
are
that
that
are
gener.
The
estimates
that
are
generated
by
you
know
high-level
synthesis
tools,
for
example.
So
all
of
these
you
know
experimental
projects
to
use
ml
for
chip
design,
they
need
data
sets
and
if
you
see
that
you
know
generating
a
data
set
for
for
for
these
projects
is
actually
pretty
time
consuming.
B
It
requires
you
know,
tools
and
licenses
that
may
be
proprietary.
You
know
it
requires.
You
know
a
person
to
generate
scripts
to
to
run
and
parse
the
results.
Of
course
it
needs
a
lot
of
machines.
B
You
know
to
run
those
tools
and
also
you
know
somebody
has
to
period
the
whole
thing
to
make
sure
that
the
data
set
is
actually
usable
and
all
of
the
I
shouldn't
say
all,
but
we
looked
through
many
of
the
projects
that
I
flashed
through
on
the
previous
slide,
and
we
see
that
all
of
them
had
proprietary
data
sets
that
are
not
available
in
in
open
source
and
for
every
project.
B
You
know
a
new
data
set
is
created
kind
of
on
an
ad
hoc
basis
and
therefore
we
have
a
lot
of
custom
data
sets
that
are
available
right
now
and
and
if
you
look
at
the
contents
of
the
data
sets
used
by
you
know
the
plethora
of
studies
that
exist
out
there,
it's
not
a
it's,
not
a
large
variety.
There
is
a
small
subset
of
type
of
data
that
keeps
getting
used
in
most
of
these
projects
and
some
of
them
I'm
listing
here.
B
You
know,
graphs
of
netlists
of
HDL
designs,
signal
activity
with
that
with
the
graph
of
the
net
list
performance
counters.
You
know
of
a
c
application
running
on
Hardware
in
power
consumption,
either
measured
from
a
tool
or
measured
on
a
board.
You
know
fpga
resource
usage
and
timing
for
a
particular
HDL.
You
know
design,
2D
images
of
flow
planning
and
placed
under
object
circuits.
B
B
So
we
believe
that
open
source
data
sets
can
be
very
useful
in
the
research
community
and
we
have
actually
started
this
project
quite
a
long
time
ago.
You
know
we
only
have
one
person
working
on
this
and
not
even
full-time,
and
there
is
a
lot
of
other.
You
know
Focus
when
we
are
working
on
developing
this,
this
data
set,
so
so
the
the
the
you
know
the
data
set
that
we
will
present
right
now
until
very
recently.
We
believe
that
that
was
the
first
one.
B
That
would
be,
you
know
available,
but
we
recently
saw
just
in
October,
2022
I
think
it
was
in
ic
cat
I,
don't
remember
which
conference,
but
it
was
released
by
Peking
University,
it's
a
it's.
A
data
set
called
circuit
net
and.
B
A
very
small
data
set
that
covers
mostly
the
you
know,
floor
planning,
placement
and
routing
type
of
information
for
Asic
designs,
but
what
we
are
working
on
is
this
thing
that
we
are
calling
chip,
design
data
set,
CD
squared
s,
it
has
a
set
of
HDL
designs
and
a
set
of
C
applications.
Rather
I
should
say
it
has
data
collected
from
a
set
of
HDL
designs
and
a
set
of
C
applications,
the
htl
designs.
We
are
sourcing
from
open
cores.
B
You
know
there
is
a
bunch
of
designs
in
VTR,
very
locked
to
routing
there's
a
there's,
a
benchmark
called
koios
inside
VTR
and
vdla,
and
some
other
sources
of
open.
You
know
designs,
so
we
are
taking
these
designs
and
then
taking
them
through
different
flows
to
generate
the
kind
of
data
in
the
data
set
and
similarly
for
C
applications.
We
are
taking
C
applications
from
polybench
CF,
Stone,
Mac,
Suite,
Etc
and
generating
similar
data
for
C
applications.
B
We
first
take
them
through
an
hls
flow
and
then
generate
the
data
because
eventually
we
want
to
you
know
Implement
these
C
applications
onto
a
hardware.
Now
let
me
give
you
a
quick
overview
of
what
kind
of
data
we
are
collecting.
There
are
features.
B
You
know
every
every
model
that
you
train
needs,
you
know
features
and
and
some
metrics
that
are
typically
Target
metrics
for
training
for
the
model,
training,
the
model
for
the
kind
of
features
that
we
are
looking
at
or
we
are
collecting,
are
number
and
size
of
primary
inputs
and
outputs
number
of
operators
number
of
memory
bits
size
of
the
design.
What
application
is
the
design
from
you
know?
What
are
the
number
of
registers
you
know
or
signals
or
fsms
in
in
the
HDL
designs?
What
are
the
number
of
basic
blocks?
B
Conditionals
Etc
and
the
C
applications,
and
then
we
have
some
metrics
that
we
are
collecting
so
for
each
design.
We
are
collecting
how
much
area
does
it
consume
and
the
metric
for
that
like
it
is
in
terms
of
resource
usage
for
fpgas,
but
in
terms
of
just
an
area
number
for
for
Asics,
and
then
we
are
collecting
power.
B
Consumption
numbers,
wireland
numbers
operating
frequency
Etc,
and
we
want
to
do
this
for
multiple
fpga
devices
for
multiple
fpga
vendors,
because
we
want
this
data
set
to
be
usable
by
many
people,
just
collecting
data
for
one
device
or
one
vendor
is
not
enough,
and
similarly,
on
the
A6
side,
we
want
to
be
collecting
data
for
multiple
Asic
libraries
or
multiple
pdks,
and
also
for
multiple
implementation.
Settings
which
are
in
case
of
C
designs
refers
to.
You
know
different
settings
of
hls
fragments
or
in
in
the
Asic
World.
It
refers
to.
B
You
know
different
gig
level,
synthesis
options
and
multiple
process
Corners
also,
so
we
are
trying
to
make
this
data
set
exhaustive
enough,
so
that
it
covers
a
lot
of
these
studies
that
are
being
done
by
researchers
and
also
that
people
don't
have
to
kind
of
you
know
redo
a
lot
of
the
work
that
is
involved
in
generating
a
data
set
now
before
we
actually
kind
of
published
this
data
set.
B
We
also
want
to
run
some
case
studies
to
make
sure
that
this
data
set
is
valuable
enough,
and
this
is
the
this
chart
here-
is
showing
the
the
space
of
case
studies
that
we
are
thinking
of
undertaking.
We
will
not
be
doing
all
of
them.
We
have
some
in
mind
that
we
are
working
on
right
now,
but
the
idea
is
that,
let's
say
the
user
wants
to
train
a
model.
B
An
ml
model
to
you
know,
model
and
the
user
can
be,
can
give
either
a
c
code
as
its
input
and
or
a
very
long
code
as
it's
input
so
for
these
two.
That
is
why
we
are
using
you
know:
collecting
data
for
C
applications
as
well
as
HDL
designs,
the
kind
of
input
required
for
training.
For
for
for
these
you
know
data
sets,
it
can
be.
B
B
You
know,
predicting
power,
consumption
of
a
given
C
code,
running
on
a
specific
fpga
or
predicting
the
internet
usage
or
predicting
the
operating
frequency,
for
example,
and
we
want
to
Target
both
fpgas
and
A6
and
I'm,
showing
here
a
stack
of
fpgs
and
a
stack
of
Asics,
because
we
want
this
prediction.
You
know
these
case
studies
to
do
cost
prediction
also
predicting
from
one
fpga
to
another.
B
So
right
now
the
case
study
that
we
are
working
on
actually
follows
this
it.
It
follows
this
path
in
this
in
this
chart,
the
part
that
got
just
highlighted
with
yellow.
So
let
me
Define
that
that
that
that
problem,
that
for
which
we
are
designed
the
case
study,
so
we
will.
We
are
training,
a
model
that
will
take
in
a
piece
of
C
code
or
a
c
application
of
a
user.
It
will
be
trained
on
RTL
features
and
HMS
outputs.
B
It
will
predict
the
power
consumption
for
that
c
code
running
on
a
particular
fpga
and
actually
running
on.
You
know,
train
on
one
fpga
and
predicting
on
another
fpga.
That's
the
case
study.
We
are
working
on
right
now,
so
to
this
case
study,
this
is
just
the
first
one.
We
hope
to
do
more
case.
Studies
in
the
same
you
know
in
this
space
and
kind
of
you
know
establish
that
this
data
set
is
is
useful
enough.
B
So
where
are
we
right
now?
This
is
the
link
and
I'll
quickly
show
you
know
just
flash
this.
This
GitHub
link
there
is
the
the
QR
Code
by
the
way,
but
this
is
how
the
the
you
know-
the
GitHub
link
looks
like
it's
under
right
now
under
our
Labs
GitHub
project,
and
so
there
is
some
documentation.
B
It's
it's
definitely
needs
to
be
improved,
but
there
are
you
know
at
the
top
level,
there
are
two
levels
in
it
Asic
and
fpga,
and
if
you,
for
example,
go
into
the
Asic,
there
is
some
documentation
of
you
know
where
the
data
data
is
how
it
can
be
used.
B
Etc
and
right
now
we
have
two
types
of
data.
One
is
you
know,
data
and
CSV
files,
and
one
is
data
entire
walls,
because
there
was
some
data
that
we
wanted
to
make
a
part
of
this
data
set.
But
it
was
huge
right.
You
know
multiple
gigabytes
of
data
that
we
didn't
want
to
put
on
GitHub,
so
we
have
created
Tower
balls
for
that
kind
of
data,
but
for
some
simple
data,
like
you
know,
collecting
information
about
resource
usage
or
information
about
timing,
Etc
that
we
have
parsed
ourselves
and
put
in
CSV
files.
B
So
that's
how
the
data
set
looks
like
that's
where
we
are
the
current
Focus
that
we
have
is
fpga
connecting
data
for
fpga,
we're
not
focusing
on
Asic
right
now,
and
we
have
a
sufficiently
large
number
of
HDL
designs
and
you
know
generated
designs
from
C
applications
and
we
have
some
funding
for
this
project
as
well.
We
recently
apply
for
funding
with
meta
for
this
project,
and
this
is
this
is
going
to.
B
This
is
planned
to
be
an
open
source
project,
and
so
the
next
steps
for
for
this
project
are,
you
know
we
want
to
in
the
fpga
Flow
side.
We
want
to
continue
collecting
data
for
Max,
read
and
CF
Stone
right
now.
We
only
have
data
for
all
events
benchmarks
and
on
the
verilog
side
we
are.
B
We
are
currently
parsing
contents
from
Joseph's
reports,
but
we
also
want
to
you
know,
run
VTR
and
vivado
and
maybe
quarters
and
parse
reports
for
those
from
those
and
generate
data
for
that
and
the
Asic
flow
is
something
we
haven't
started
yet.
So,
as
I
said
earlier,
we
want
to
bring
this
project
to
chips
Alliance.
We
are
going
to
submit
this
as
a
Sandbox
project
soon
and
the
call
to
contribute
is,
you
know,
to
kind
of
help
us
build
this
data
set,
be
bigger
and
have
more
value
and
use.
B
You
know,
use
this
data
set
right,
find
issues
find
bugs
Etc
in
the
data
set.
While
you
contribute
and
the
kind
of
work
that
will
be
involved,
it
will
be,
you
know,
writing
scripts,
running
these
tools
and
parsing
data
and
collecting
them.
B
So
in
summary,
you
know
ml
is
being
used
in
chip
design
processes
by
so
many
researchers
out
there.
So
many
companies
out
there,
but
the
the
data
sets
that
are
used
by
those
projects
are
not
open
source
and
we
want
to
build
an
open
source
data
set
and
that's
why
we
at
UT
Austin
are
working
on
CB
Square
s
and
we
hope
that
people
will
be
interested
in
this
kind
of
work
and
contribute
search
of
science.
B
All
right.
That
is
the
last
slide,
so
I
will
stop
sharing
and
please
ask
questions
if
you
have
any.
A
All
right
so
I,
don't
I,
don't
know
if
that
was
mentioned
in
the
presentation,
but
I
I
think
I
didn't
see
that
the
repo
does
not
contain
a
license.
I
assume
it's
just
a
temporary
thing,
but
just
make
sure
that
you
kind
of
put
in
Apache
License
there
so
that
we
can
kind
of
smoothly
onboard
it
into
chips.
When
the
right
time
comes
perfect.
A
I
asked
this
question
not
just
to
be
smart,
but
more
like
very
often
when
there's
no
license
it
kind
of
implies
that
there
is
a
problem
with
the
license
or
someone
has
some
kind
of
no
no
doesn't
have
crystallized
plans
for
what
line
is
going
to
be
I.
Think
that's
not
the
case
here.
Right,
you're
actually
trying
to
get
it
in
the
chips
Alliance
which
requires
Apache.
A
So
it's
kind
of
an
obvious
thing
to
add
which
raises
the
confidence
of
people
and
they
look
at
the
repo
they're
like
oh
yeah,
it's
Apache,
it's
fine
I
can
use
that
and,
of
course,
like
the
follow-up
question
is
and
I
don't
want
to
make
things
hard
for
you
but
like
when
you
generate
this
data
and
I'm,
not
an
expert
I'm,
a
lawyer,
but
just
make
sure
that
you
can
actually
license
it
under
Apache
because,
like
you
know,
AI
is
complicated
in
that
way
where
the
kind
of
data
you
parse
kind
of
end
results
from
this
whole
Endeavor
kind
of
the
source
material
can
influence.
A
And,
of
course,
chips,
Airlines
kind
of
has
a
legal
committee
and
and
kind
of
potentially
could
help
in
figuring
that
out
I'm,
not
saying
that
we're.
You
know
we
have
a
very
strong
track
record
of
figuring
out.
You
know
AI
for
chip
design
data
sets
as
such
right.
A
It's
a
fairly
kind
of
New,
Field
I
would
say,
but
certainly
there's
there's
lawyers
involved
right,
so
it
doesn't
have
to
be
just
developers
talking
to
developers
and
trying
to
figure
out
if
we
have
a
good
understanding
of
the
law,
but
we
can
actually
get
professional
help.
B
Reach
out
to
Rob
I
hope
you,
you
can
connect
me
to
the
legal
people
right
now.
B
D
Being
recorded
now,
I'm
in
trouble,
I
did
want
to
ask
you
one
question:
why
I
enjoyed
your
talk?
Have
you
working
at
all
with
the
si2
I
know?
There's
some
initiative.
I
was
meeting
with
Professor
Andrew
Kong
here
about
a
month
back
in
San,
Diego
and
Tom,
Spyro
and
I
know,
there's
some
initiative
to
create
a
standardized
API
for
collecting
metrics
that
you
would
need
relative
to
chip
design.
You
know
rather
than
having
to
endlessly
parse
her
course.
D
This
is
under
si2,
which
is
I,
don't
want
to
say
it's
a
proprietary
organization,
that's
not
quite
right,
but
it
is
heavily
participated
in
by
the
Eda
industry,
but
it
may
help
resolve
some
of
the
questions
or
concerns,
assuming
that
they
make
this
API
publicly
available
under
Apache.
Two
dial
license
that
you
know.
Michael
was
correctly
sharing
right
and,
of
course,
that
is
a
concern
relative
to
using
data
that
is
generated
by
any
of
the
proprietary
Solutions.
So.
D
C
So
I
had
this
question
that
it
ties
into
this
as
well,
and
that
I
saw
you
we're
talking
about
the
FBA
flows
that
you
were
using
and
you
had
thought
about.
Nazi
flow
I
was
wondering
if
you
have
looked
at
using
either
lies
to
to
Target
a
lot
of
different
fpga
amazing
targets
and
also
related
to
that
I
mean
that
could
also
be
a
good
place
to
put
this
kind
of
report
parsing
in
a
centralized
place,
two
birds
in
one
stone.
B
Yeah
we
haven't
looked
at
the
idealized.
I
am
actually
slightly
familiar
with
it.
I
remember,
you
know,
you
know
questions
about
it,
my
BTR
group,
but
you
know
I,
haven't
we
haven't
even
been
to
it
at
all.
For
this
purpose.