Chips Alliance Workshops, 12 Oct 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CHIPS Alliance Workshop - October 12, 2021

Description

The CHIPS Alliance held its Fall 2021 workshop, to share milestones, progress, updates and more.

Slides are available at: https://chipsalliance.org/workshops-meetings/

A

Hi everybody everybody. This is uh rob maines, general manager of chips alliance. I will turn to my video here to see that they see that I am a real person, so I want to welcome everyone to our second workshop of the year and an opportunity for us to update the community on some of the different activities that chips has ongoing and let me just share a brief presentation.

A

Okay, so, as I said, this is our fall workshop, and so this is just the agenda of the talks that we have uh here uh today that uh we'll be going through. So I look forward to everyone uh getting to hear about that and they want to thank in advance the different contributors for making the time and effort to put together the material and also taking time out of their their schedules uh to help share this information.

A

So, as I mentioned, I'm the general manager of chips alliance. I've been in this role for about nine months and getting it's been a great opportunity for me to get to work with different folks in the community and learn about the different activities that we're trying to do so. Oh, it's not presenting. Thank you.

A

Thank you. Hopefully that is now sharing the screen. I apologize for that. So I this is the first slide, that's of interest. So if we look at what's going on in the landscape, you know, or in earlier times and even still exist in some cases, we have what I call a soups to nuts type of silo. Everything is done in-house and you know you're effectively operating in your own little world and then now.

A

We've also gone to more what I call a supply chain, type of management, basically an overseer uh that you see with different things and also you. We now have resident or exhibited relative to a different supply type of constraints such as all the cargo ships that are blocked up at different ports around the world and what we're moving to.

A

Finally, is really an open collaboration, type of environment, where we have folks coming from different companies from universities, individuals who are working together on hard problems and coming up with solutions to those, and that's really what we're trying to do uh both in the linux foundation and also more particularly in the hardware space. And so we have three different efforts in the linux foundation: chips being one of them, the other being risk five and also open power, and we work together on different things of mutual interest so effectively.

A

I consider this to be affect what I call a concentric collaborative innovation or an open designed ecosystem where we're trying to pull together different parts of the overall design process, from specification rtl to pdk, tooling, to the physical implementations and also to standards and allow for open innovation and collaboration across those different parts of the overall ecosystem, to bring the best ideas to the table and help move things forward in the industry.

A

So chip's alliance so far is now at 38 members and growing. We've had a number of different members join uh this year and that's really exciting and participation in the different work groups that we have. So I encourage you to check those out and hope that today's chats or conversations are informative to each of you and uh you know, parties help encourage participation in ships alliance.

A

So with that I'd like to introduce uh our first speakers, gwen, chen and han mao from alibaba they're, going to be talking about the porting of android to risk 5, which I think is an interesting topic. They are engineering members of the alibaba company and look forward to hearing that. So gentlemen, I want let you go ahead and take it over from me.

B

Hello, everybody, my name is mohan, I'm a senior engineer for alibaba cat, I'm going to here for the first time talk a little bit about polyandry on tourist 5 with my colleague, chan guaying. We will go through the reason why we put this android tourist 5 holy project include hardware platform. It runs on the software chain changes we made and I will show a short demo for how it looks like now.

B

The angel ecosystem is huge. We just step into the door and need to need more of your help to contribute to this project. I'll, show you how to get access to our work and involve in the discussion.

B

Android is the most popular option system around the world. It takes 42 percent of the market share or much higher than windows and ios. It has 2.5 billion active users spread over 190 countries. It's widely used in mobile device, automotive, as well as smart tv, wearable device, and many more. The trend is still going up.

B

The development of android device involves lots of slc peripherals product hardware, vendors. They have their platform built on top of the hardware, abstractly optimize the device to pass the official compatibility test suit and make the device a real commercial product.

B

That is just the hardware part. If we look into the android ecosystem, we have many categories here: code to set used for application development, core service provided by google like gms ep store and the glta terms of third-party libraries, multimedia computer wishing and etc. Sdk for gui design, authorities payment and live streaming and uncountable famous applications.

C

B

Alipay tick, tock and facebook. I want to assure you that we are far from being done in terms of protein.

B

So far we have touched more than 100 get ripples. While adding this file support to android, more than 2000 files have been added or changed, and over 100 lines of code has been changed, and so the work was done for getting android to run on this file. It benefits all operating system that runs on v5, especially for the applications that requires high accuracy, high efficiency and to deal with more complicated workload.

B

B

All three hundred packages enjoyed maintained by third party organizations and they are also widely used in mixed distributions, authors, reports and other opportunities, and some this chronological are being relied without the their security behavior at and the reparties is commonly used in video meeting on windows, nyx and mac os. Here we highlighted the packages we have touched.

B

Those marked as agree are to beat down and with c0 visually, more architectural support and optimizing work hidden in this and other packages.

B

Most android devices have multiple powerful cpu calls to deal with massive computational workload. Changes c19 is the cpu ip. We used our old board.

B

It's a risk file compatible 64-bit high-performance processor, which deduces industry leading performance in country flow computing and frequencies through architecture and microarchitecture innovation.

C

B

A state-of-the-art 12-stage other world, a multiple issue, superscale pipeline with high frequency fpc and is very power efficient and the comox goal. C910 is 7.1 mark per megahertz under austria. Optimization and these pictures show the internal diagram of c19 multiple core processor. Each class can contain up to four homogeneous cpu calls.

B

Each core has a separate l1, cache and dks, and a configurable l2 cache upper to uh eight meg gabis is shared with in a cluster and it's a major uh memory management unit used as win 39 virtual memory, translation and can handle up to 2080 48 krb interest.

B

The hardware platform we use for android protein is called rbis.

B

It has a dual core: changing c910 jump, cpu uh running at 1.2, gigahertz, uh 4gb, lp, ddr4 memory and a gpu for graphical rendering you can order this developed from our website and previous and preview it. I will provide the link and to our to the very end and the project was done with uh android 10. We enabled the annual uh java runtime and the most of the native service in system. We can now launch some basic applications on the emulator, as well as the apis.

B

However, some features are still disabled on our system, including codec service run scripture, neural network which need more work, and now let me show you a short demo of android running on risk file platform. Hello. We've got android system running on ice debris. Here, let's check the android version by clicking the settings hello. There we've got android system running on gmt ice doubled. Here, let's check the android version by clicking the settings.

B

About the phone.

B

It shows android 10 on stream tc910 now I'm gonna demonstrate some integrated applications, pre-installed third party applications and the chrome will run on android. First, let's do some basic mathematics with the calculator like 36 plus 48 equals 84..

B

I had created several profiles in the contact list. I can scroll down the screen to check all the profiles, use the search option to get to the result, tab with the contacts that want to open. You can see the phone number email address, company name and other information. Yet some pictures have been uploaded to this device through atv.

B

You can view them through an imperial zoom in or set a zoom level with double tap. These pictures can also be proved in the file manager. It can track all the recent opportunity files, including images, audio video files and etc.

B

Let's start with cascade the snake. I've got application page for installation here, that's inside.

B

Now it's ready click and you can see now I can control snake with arrows as age.

B

Don't press the file dirty to use the charge button on the top get back to me screen we can change the wallpaper to our rotating 3d, cube.

B

Tap the android version icon for several times. You can see the heating easter egg inside android, 10.

B

All the words and numbers can be dragged along in this window. Here we also get a functional email application by setting up the account name password and map smtp service.

B

I can now log into my email account send and receive emails through the internet. We have also successfully put the chrome to the device, let's browse over some website together. Let's try the android website this web page has quite some texts in there's lots of dress code running on wait at the background.

B

You can browse this with new issue now I'll show you the alibaba tag, openshift community webpage.

B

You can also order the data board from this page right after the talk there are far more apps you can install and drive compiled with 35. We look forward to getting more polling work. Next, I like to share some technical details of the project.

B

Well, the most tricky problem we need to deal with is two transfers on decay. First, as shown in the figure left side, there are things unbreakable interdependence between ndk, androids and current tool chain.

B

The amount watching ndk compilation relies on previous version of ndk and the control chain. This dependent can track back to android 2.3 about 10 years ago.

B

It's definitely not a good choice to step back to android 2.3, only 5 for ndk generation. What we do is created a simplified control chain, ndk that has dumped memory to start ways and trick the script to house. The main part of android 10 system compiled copy, the general reach the binary to a temporary ndk and regenerator control chain, angio, os and ndk as well. If we normally do is this will break the loop but still have other tools we wanted.

B

As of now, we have successfully support assembler, disassembler long time, jni just compiler aot for art. uh Yet some of the bytecode handler are not well optimized compared to arm or x86 and planning more optimization of always our way. Rep and.

C

Other extension.

B

In the future, we can also support android chrome web wheel, browser on this file. Exer support has been added for angle, brink, fm perk and several other software appearances. In addition, the performance of we age is boost uh to about 20 compared to the upstream that washing and there are 10 kilo, calls being added uh or change. The form is missing characters and the main change is implemented in bionic, open gis, debug tools and the tool chain and for the compatibility test based on the existing environments that we set up.

B

We can now run almost 50 000 cases with a pass rate of 60 percent and still not perfect, but a good start. It's our goal to continue investing on cts and get at least five hundred thousand test case covered and 100 passwords compared to commercial platform on the market.

B

The comprehensive level of our guys can own bridge houses of performance. Yet the tremendous work behind it unreal some fundamental lack of response and a huge margin of optimization. As we get more progress it may make us be. We can further improve the assembly of lefty library and art, and development on the new hardware is also on the way we will launch our full feature. The risk file, android 12 platform towards the end of next year.

B

If you are interested in our cpu ig, soc or debug, you can visit the aldi property had openshift community website here, product scene for ticketing spark, so uh service signal and other useful information can be found in.

D

B

Webpage, if you are wearing to participating or contributing the refile android porting project, you can also pre-order and download from the rbi's page of occ. We have up to 80 developers board available for sales.

B

uh If you are looking for code and binaries, please check the refine, android collaborative source called ripple in the github. There is ngosi running uh in the refined international organization organization. uh Please hoping and join us uh through the declaration.

B

You can subscribe the mail list here and you seek it and we need more contribution and ongoing uh efforts to project uh into audio radio, codec neural network, rust, compiler and etc. What alibaba has had done is only a tip of iceberg. It would definitely require enormous effort for all of you, the individuals, energies, the whole community to eventually make enjoyed our supportive form on raise five.

B

Thank you for your attention.

A

Thank you so much, aha and gwen appreciate the excellent presentation that you uh folks provided uh I'll just start with one opening question. I was just curious. uh You know, as you're porting the software over to risk five. Do you do any uh work on an emulation platform, some type of fpga platform, or do you go directly to uh risk five processor itself.

E

um So let me understand the question. We do support the emulator, so you can just start the emulator as other like arm like like x86, okay, yeah.

A

That's it and, let's see whoops sorry, do we have. What would you say is your biggest uh technical challenge so far in doing this porting work? Is it developing out the overall uh software tools, ecosystem or just curious? What the challenges would be.

E

uh The most challenging way comes is that the tool change the art part, because the other part involves quite a lot of collection to improvement, compile let's jit aot such kind of features, and the other challenge for us is to atm because of the two team uh enter system haven't been supported, a new cpu architecture for a long time, so the tour tune challenges is quite new to us. I mean it's also too new to to our cpu architecture.

E

You really just from the beginning to support it like the ndk is the most challenging for us one of the most challenging issues we have.

A

That makes sense so a quite question from the audience from uh rishur nikhil. I hope I'm pronouncing that correctly. uh The question is, are you trying it on any other risk, 5 cpus other than the one mentioned in the talk.

E

Right now we only tried the one we have talked, but I believe it's just. If you have 64 risk five, it should be okay to to waste all our public source code to running that on your cpu yeah.

A

Thank you. Are there any other questions from the audience.

A

Okay, well, thank you so much for the presentation. I really enjoyed it and uh hope the audience did too is very informative.

E

A

A

Our next uh presentation will be michael gilda from ant micro. uh Michael, is very active uh in chips alliance and uh also uh very involved with ant micro and its different development activities, in particular open source tools. So his talk is going to be about the practical adoption of open source system, verilog tools, so michael, you want to take it away.

F

Sure, thank you rob. Can you hire me? Yes, perfect, all right, so let me share my desktop and uh in a second I'll just present.

F

Is this visible.

C

G

F

Perfect right, so yes, I'm michael gilbert, I'm a vp business development at micro, but I also have the pleasure of being of chairing the marketing committee for chips alliance and today, I'll be talking about open source system tools. um So chips alliance tries to enable innovation with many things.

F

We want to broaden outreach of different things that are already out there. We want to bridge methodologies between you know, things that are already established and have been around in the ecosystem for long and also new ideas. New methodologies that are popping up and to break down complexity for people want to enable collaboration, and on top of that, we want to enable development through scalable compute resources.

F

That would you know supercharge what people in asic development are doing that just like we are experiencing this in the software development world, where anyone can just grab a computer and rely on the power of the cloud to be much more productive. So we want to just like get the same kind of productivity in hardware. Development too, and my company at micro is kind of providing work and services to enable these kind of things.

F

So our goals are extremely aligned with chips alliance and I suppose, of course, of the goals of many other companies who are here listening to those presentations. So we encourage you, of course, to join chips alliance, just as we did.

F

Why do we need system, relax, support and open source tools? um We think that center prologue is a great language, and the problem is that there's a lot of tools for system verilog that are proprietary, which in turn make it very hard to build scalable systems uh if you want to have many developers using those tools, you'll have to purchase a lot of licenses if you want to put them on.

F

The cloud you'll have to you know, come up with clever solutions to problems that perhaps shouldn't be there in the first place, and especially in multi-organization projects such as open titan and then chips alliance. As such the challenge in deploying you know, proprietary tools becomes almost insurmountable because it's very hard to like share licenses or, generally speaking, collaborate around things that are closed.

F

It's much easier to use tools that are open and, of course, a lot of open source cores exist that use system verilog, for example, chip sciences, swerve ibex, from open, titan, black pirate core 5 from open hardware group, so lots and lots of pre-existing ip lots and lots of things that we should be reusing, so in in total. Of course, we want to enable a more collaborative ecosystem for asign fpga design and, of course, many people in this ecosystem are using system growth today how to build this, then?

F

How can we create this open source system, real ecosystem? That's a mission we kind of got on board with, and uh we think that there are a few elements that you need to kind of uh satisfy here. First, you need to identify, what's missing and see, see the problems.

F

Secondly, you want to see what you can reuse. You know what can you improve, rather than a build from the ground up? Thirdly, you need to document and create transparent projects that have automation and ci built in, so that you can see your status. You can kind of constantly improve. um You also have to collaborate. You have to build things together: it's not a goal for one company, it's it's rather go for an entire ecosystem and, lastly, but perhaps more important.

F

Most importantly, you need to provide incremental value, because if you wanted to tackle the entire problem at once, it's very hard to achieve right. It's very hard to just magically make system relocate supported, it's easier to identify which areas are. You know, low hanging, fruits and then tackle those problems, provide incremental value and encourage people to become part of.

D

This ecosystem.

F

um Chips alliance uh has a very good relationship to you, know, hdls. um We think that we should encourage all kinds of open source, async development, and we acknowledge that you know there is no single language that really will dominate everything I mean, of course, people have preferences. Of course people have different kinds of ideas how this ecosystem will develop, but we think that, of course, just like in software there will be many tools to do one job and so uh work groups for both chisel and system.

F

Verilog can peacefully co-exist and in fact, they collaborate within chips alliance, and we have a technical steering committee on which representatives for both you know, chisel projects and system broke projects, sit side by side and think about how to push the second system forward. You can see our workgroups page for a list of our groups and the tfc repository for the current composition of the tsc, just to kind of uh put my claims into into practice. Just see them at work.

F

So, okay, I I have a presentation called practical adoption of open source tools. So so what do we mean by this practical adoption?

F

uh First of all, of course, I mentioned yes to to actually know what you're doing you need delay of land. You need to kind of see. What's there what's missing, and so one of the things that we did together with google was to create a test suite. That shows us what features are supported. What features are missing in different kinds of open source tools and through this exercise, we've actually discovered. There's many more tools that we were aware of. There were people who reached out to us and said: hey.

F

I have a system relock tool that you haven't included in your dashboard. Can you please include it now, and so, even through this testing activity, we already discovered that you know systematic support in the open source system is actually a bit better developed than we had hoped for and, of course we discovered a lot of holes. We discovered a lot of problems and uh we have a report that summarizes the current state of sterlock in the open source ecosystem available on this link, and the test suite is testing.

F

You know three different kinds of tests, both just language features existing third-party suites. As well as different kinds of open source, ips such as web or ibex, and also here, it's an interesting kind of ecosystem endeavor, because of course, through this, we're seeing more people come forward with system verilog ip they'd like to test and we're discussing. You know what kind of different test cases would be useful and meaningful to have so already in this aspect. It's it's very, very fruitful.

F

But of course, if you can't just test and see what's missing, you actually need to improve. So one of the things that we decided early on is that we really need this system verox support to be a portable feature that we can plug into different kinds of projects, and we, you know, embarked on a project called sherlock uhdm, which actually was originally created by uh marcel um who's. Also, of course, a psc uh member of chips. Alliance now- and uh you know, who's donated this project with the help of google into chips, alliance and chips.

F

Alliance now uh hosts sherlog, which is a parser and uhdm, which is a you know, a tool, a framework for modeling data universally, so that different kinds of system related tools can easily plug in the schulich usd in front end and get a support of system verilog so to save for free. Well, naturally, that requires some integration work and our focus for the integration work currently is for very later aniosis, which are two very well known tools and we've had quite quite a lot of success.

F

With this, we've been able to, for example, show that we can parse and synthesize and simulate open, titans ibex core direct from source, and you can find the repository that showcases this use case, specifically on chip's alliance's github and, generally speaking, if you look at this diagram, that kind of shows our idea where we want uhdm to be kind of a intermediate building block that you can kind of really interchange, different parses uh on the one side and different open source tools.

F

On the other and, of course sure log is our kind of key parser for the advanced stuff, but naturally in the future. Hopefully, you will be able to plug in different parsers and then on the other side. uh You have yoasts you very later. Well, of course, you want uh all open source tools to support uhdm via different kinds of plugins and and so on, and this is this is very promising work.

F

This is kind of uh moving along nicely, but of course this is a longer term plan right, so kind of taking a step back. How do we provide incremental value while we're at the uh let's solve system block for good? um Well variable is one of the answers. Variable is an open source, linter and formatted formatter, and that's a google tool that was first released on github and then onboarded into chips. So that's also a big success story.

F

You know: uh chip's alliance took over bearable to provide this kind of ecosystem for it to provide a neutral ground for building it out and getting it used at more organizations, etc. uh Variable is being actively developed by google and micro and it's being used in a number of places, including the ibex ci.

F

It's a very versatile tool. I have a screenshot on the right, which I'll show you a little bit more about later, but in general this is really kind of enabling us a lot of practical use cases right now.

F

First up is linting, which of course, many of you will know is a way to analyze code so that you can find and spot bugs and errors.

E

F

Similar code linting.

E

F

Well known, of course, in all kinds of software, world and hardware as well, but typically linting tools wouldn't have been open source and now variable is completely free, completely open source. So you can, in fact um you know, enforce rules uh at different levels and not only within your own company actually across companies across projects, because you can just implement it for free in as many development, workstations and cis as you want.

F

The rules in the linter can range from, of course, very simple ones to to very sophisticated ones and there's a lot of configurability involved.

F

So you can, for example, check whether you know the module name is consistent with the file name. You can see whether the variable naming conventions are are the same across the entire project and many many other things so linting, of course, uh very clear uh benefits that you can get today formatting.

F

Similarly, this is a complementary tool where you can also detect formatting issues that don't influence the actual. uh You know working of the code, but it does have an impact on readability reusability across two across company collaboration and so on.

F

So on the right, you can see an example of usage where you get a very badly formatted struct and, of course the formatter can easily change it into a nicely formatted struct. That will be very easy to read and show to your colleagues and together.

F

The letter informator can actually remove all the discussions that people have about how to style code, how to conventionalize different commonly scene constructs and also you can kind of if you're working using something like a git flow on github or elsewhere, you'll be able to vastly improve your work with pull requests by sorry by um just removing all the discussions around formatting.

F

So um this will enable your developers to just focus on the technical aspects of their work. So very often people end up leaving a lot of comments in the review that focus solely on okay. There's a typo here, there's a space that you have made here.

F

We normally do this differently. The linter in the formatter will eliminate a lot of that work and to prove like how it's immediately useful and to make it really easy to apply to your project. We've enabled some flows. That kind of make it possible for you to use a variable with github easily.

F

So, as you can see in this flow, you know normally you'd have this flow, which would be most manual except with github and with variable you can get automated reviews on your pull requests, which will you know, save a lot of work for your developers and we have a github action for lending.

F

Here's a github link for it and any open source or closed source projects on github can just uh take this action and implement it with a basically one click, and so what you'll get is uh information about issues detected in your code right in your pull request review, so this is kind of uh automatically inside github you will just get uh the problems will be highlighted for you in the form of a automated review and of course, you can come up with a lot of different applications for this.

F

Just to you know, make things faster or just isolate errors. um This is pretty neat. Pretty incredible. We've actually described this in a dedicated blog note. We also have a formatted formatter action and just same as above you can, but this time you can actually generate not reviews, I mean not only reviews but also suggestions to your code. So all you have to do is just click and accept the suggestion and you get automated and instant benefits.

F

This is really cool stuff and again you can see it in action on github and just grab it from our repository top of all that I mentioned there's some interesting uh things you can do with variable and kaif, which is just a general language tool, um enables us to generate indexed database of code which allow you to browse through code view patterns, see where different definitions are being used, trace, references and so on.

F

So again, this is a collaboration enabling tool. You can navigate through the code very easily and we have both a showcase github repository as well as an example index web page that are provided as links in this presentation. I won't be showing them right now, but you can just kind of go ahead later and you know click through the ibex code yourself and see how it really works in practice, and you could imagine that being deployed in your company.

F

In your you know, private uh repository in your proprietary ecosystem as a helper tool for your entire team. We can definitely help you do that.

F

I mentioned that you need to collaborate with others to you know to make it possible to achieve this very, very far-reaching goal, and so, of course, you have to start somewhere. So we start with the open titan project which, which is a very nice open source use case, but currently we're also pushing for a wider adoption at google with the other open typing partners like western digital.

F

um We know that our work is currently being used by some of the core five users, such as quick logic, we've just started a collaboration of xero isic. I know andreas on this presentation and also we're just working with uh ibex and and black parrot and core five and sewer communities. Everyone, that's using you, know: open source system, verilog, ip or closed source.

F

Of course, as well is a target, is someone who could immediately get some benefits from what we're doing so, of course, if you're also interested just reach out just to briefly kind of say about the longer term goals, we have an ambition to also enable uvm uh in open source domain. Of course, this is uh challenging. There's a lot of things that we need to do here, but there's also a lot of benefit. There's a lot of pre-existing ip.

F

There are test benches that people implement the system, verilog, there's a lot of developers out there who are familiar with uvm, and we want to enable them too. We want to bridge you know the existing commercial ecosystem of open source tools and methodologies, so that you know chip companies can benefit from what they already have, but move it into open source, which will ultimately enable an infinitely scalable and reproducible ci driven workflow for open source uvm. We focused on verilater and implementing things in very later that allow you to do uvm.

F

We start with the stratified scheduler randomized methods and class support, and you can see one of the already reached milestones in action here in this link. We've implemented dynamic, scheduling and verilater and currently in the process of upstreaming this feature and now getting to the final part of my presentation.

F

You also need to be able to scale, so you need to put your development in the cloud to make it possible to leverage all this goodness of open source, because the reason for having unlimited licenses for something is that you can put it on an unlimited number of computers.

F

So, as a company, we help our customers to scale up their asic development and- and you know, put in the cloud, make it easier to collaborate between teams and perhaps between companies. So, for example, a company can collaborate more easily with their suppliers or with different ip vendors that work with them.

F

So all the tools described in this presentation can be and should be used in the cloud context both on github but also in private enterprise installations. You can mix and match open closed components as well. Of course, I'm focusing on the open components, because that's what chip science is about, but many of the flows are still very much focused on closed tools and we can make those two work and, as one example of what we've done, we've built those custom github runners that enabled us to provide scalable, compute and customized peripherals to our ci runs.

F

So we can, for example, plug in fpga boards into our server room and get them running in a github ci build to see if a specific uh build, for example, a synthesis or a placement process actually completes and works, and you can see in this picture. There's some open source hardware from micro as well as some just the development boards connected together and ready to run some ci applications and there's a blog note about that too. That are my presentations too, and so you can see that in action actually we're using we're dog fooding.

F

This we're using this to test our uhdm integrations on github, and this gives us the practical ability to do longer runs than github would enable. We can use more compute resources because we have custom machines that run those jobs. We can generate some additional statistics and metrics. We can see what the research usage is and why provide some more insight to the development and, especially, uh you know, see the bottlenecks right, because uh we can get detailed research, huge uh statistics that show us. Oh this. This is actually not executing in a very parallel way.

F

How do we make it more parallel? How do we enable this scalability to give us more value and another thing that we're doing with this? Is you know enabling flows where perhaps the tools themselves are closed? Perhaps some of the components are closed, but you can still share something with the community. You can still show that your development is ongoing and it's kind of generating concrete results.

F

So one of the things that has been enabling us to do this is this: the best which enables us to run private builds but then share sanitized build logs and results with other people. We can upload the results to our own servers. We can store them for as long as we need, because you know public open source things like github. They will get rid of your logs after a few months because perhaps they're not useful anymore.

F

Here we have a complete control over our retention policies. We can also create some more com, customized dashboards, and see exactly what went wrong if a build fails and we help our customers to do that.

F

So summarizing um chips alliance is building an open source and we're up to laying out the system, and you can join us too to help us do this. There are practical things you can do right now, and this is both true for local development and collaborative cloud-based development and we're working on more we're working on adding new features and new use cases. We want to hear about your specific needs about your projects.

F

What what are you doing with system vlog and how and there's a working group and there's a mailing list that you can join for system verilog? Here's the link! Please join that! That's all! Thank you very much. If you want to use open source, develop tools, you can reach out to us at contact. Micro.Com.

A

Thanks so much michael, that was a great talk. I really enjoyed it very informative. uh We have one question from the audience and this is from oron port. The question is why a system verilog to verilog compiler that removes the missing features by simplifying them, not a good option to integrate with the existing tool flow.

F

Actually there already is such a tool called svtv right, and this has been used with great success before so. This is a viable option for some things, because if you really don't care about these features, you can just get rid of them, except that, of course, this is a code transformation you're not really getting the same code so, for example, for lending or performancing or for stuff like kite.

F

This is not really very useful right, because what you're going to get is some kind of a transformed code that will not correspond to the exact line numbers for example- or you know, variable names, and uh definitely your flow will not be as nice as it is with with direct system verilog support, also for more complicated use cases. If you really like need those features, then you have to implement them. You can't just go and uh do uvm, for example by just getting rid of uvm.

F

uh I mean that probably wouldn't be really helpful right. So but yes, limited results can be achieved with stuff like sv2v and have been, and this tool has been used in the past. But we are just aiming for more we're aiming to really have complete systematic support so that we can treat it as a first-class citizen.

A

And we have one more question, and it is this from uh thomas suller, which is. Is there any particular reason for developing this ecosystem over system verilog and not any other hdl.

F

That is a perfect question and, of course, uh by the way, as I said in one of the slides, we encourage similar ecosystems to come. You.

H

F

To to create the round and.

H

In fact, we even.

F

Work together with people building things with vhdl, for example, uh the focus on systemverlog is simply put based on the interest from our members. So we have, you, know, members developing huge systems of code bases. Google is an example, and um you know if you want other languages to be equally represented.

F

The best thing to do is to join chips alliance and encourage the creation of new working groups, and you know, propose new tools or perhaps raise the awareness that those tools already exist and just need to be better maintained.

F

I know that there are some activities, especially for vhdl, but of course, uh there's many many flavors and languages being used across the industry. So it's all a matter of interest uh and not in the community and not that chips alliance is saying you know, system bulldogs, better than anything else is just popular and it's just popular amongst our members. Hence the working group.

A

And I'll, just I'll put in one more question here from richard nikhill, which is is uh dm just a parsed ast representation, or can it represent irs after more compiler phases? Is it extensible for such purposes.

F

Honestly, I'm not directly involved in development, so I'd rather kind of pass this question on to the tsc, uh because I don't want you to give you the wrong answer here, uh but certainly it's aimed to to be. You know flexible. So I I'm guessing that. Probably the answer is yes, but I don't want to uh claim claim that without you know, checking with the project maintainers first.

A

Okay, well, thank you so much mike. I appreciate the the presentation today was very informative and uh thank you so much.

F

A

Okay, our next talk actually is uh which uh this was a good introduction, which uh is now on chisel, which is the other design language that michael had mentioned. System verilog, and I know it does have a lot of interesting capability and uh jack koenig from sci-fi will be providing us with an update on this. So jack, look forward to your talk.

G

Thanks a lot rob and thanks for the introduction- and thank you michael for uh for the shout out, so you know I think that is a pretty good transition. So let me share. Hopefully you can see my slides um okay, so hi. My name is jack koenig and I'm here to talk a bit about chisel and fertile and talk about how they can be used.

G

You know talk about what they are for, people who are new to them and then give recent advances in the last six months for people who have seen um similar slides in the past and so that nice double acronym down at the bottom, is chisel working group technical advisory committee. That's just saying that I'm a member like maybe a maintainer of the chisel project and a collection of projects that make up the working group.

G

Okay. So first for those who are new- and I hope there are new faces in the crowd- you know what is the chisel working group and, of course, to answer that. I first have to tell you what chisel is so chisel is an acronym for constructing hardware in a scala embedded language, and we like to stress the word constructing here, because it is a very different kind of language than most are used to coming from something like verilog or vhdl.

G

It is a domain specific language where the domain is digital design. It's not a particularly insightful comment because you know verilog is also a domain specific language where the domain is digital design. But the point is that this is really geared toward digital design, we're not really kind of going into more analog or the many of the surrounding important pieces. We're really focusing on the digital aspect.

G

All right! So what's really important to understand is it is not high level synthesis, nor is it behavioral synthesis you're, not writing scala compiled down to um to verilog. What really happens is you're writing a scala program where you can construct and connect hardware objects and then that get that in turn gets compiled down to verilog. So why is scala the language chosen for chisel? It's because scala is a very modern programming language with parameterized types object, drawing to programming, functional programming, and then you know, I think, importantly, static typing with powerful type inference.

G

So these are things that the software world has adopted in order to write, you know more maintainable software and we think it's an important it's important to learn from software development practice in how we design our hardware- and, what's I think, really important- is that chisel is really just the the it's the kind of um I don't know the platform for which you use to write reusable hardware, generators, the focus of chisel is reusability.

G

Right writing something once isn't that hard in in almost any language, but when you write something that you can reuse for lots of different things, that's when you get the real power out of something like chisel. So first I want to note that you can write chisel very similarly to how you might write verilogs. Let me see if I can get a pointer, um so you know this. Obviously, the syntax is going to look a little funny if you've never seen it before, but the basic idea is that you have some module.

G

It has an input, port and output port. We do have a parametrized bit width, just like you might in verilog. You have two registers, and this is essentially just like delaying two cycles right, so the input goes to one register. That register goes to the next and then what we do is we are summing: uh the current input and the result of those two registers. So this is just a moving sum filter, something that most people probably seen in an undergraduate course in digital design.

G

But what happens if you want more than three points or what, if you want weighted averages so often in verilog? Not always, but often you would have to go write. You know a different filter, but what chisel is enabling you to do is write something. That's very software like it's parameterized.

G

So in this case, not only are we parametrized by the bit width we also parameterize by our coefficients, including their size. So how many and how and what those coefficients are so in this case, based on how many coefficients we have, is how many um you know how many registers we need, how many like cycles of time we need to look into the past for our filter.

G

We need to, of course, multiply all those registers by our coefficients, and then we sum the the result so with this there's no loss of of performance. If you were, if you use this to implement the same moving sum filter, it gives you, um if not identical paralog, certainly like it will synthesize to the same thing.

G

um But the important observation here is that this meta programming, this ability to write software that is generating the hardware, enables very powerful parameterization, and so, as I mentioned, you can write that same moving moving some three filter, but you can also create all different kinds of filters, including a delay filter, which this is basically just like. You know, having a register, but you can express that using this filter, you can express a triangle filter all different kinds of things, from the same source.

G

Now anyone who has built a lot of a lot of or any chips is going to kind of laugh a little bit because you realize.

D

G

Reusing the source is not the hard part most of the time, some that is very useful and very important, and we find it very powerful, but a lot of times what really mucks with your verilog and why you end up having to fork your you know: copy paste. The verilog for every project is that you end up with platform specific or application specific changes.

G

So these types of things are your. You know the most common examples, your sram macros. Obviously, if you're in you know, 45 nanometer, 28 nanometer, you have different srams, but there's a lot of other stuff that goes into it, um and so this is very complicated and it results very often the same barrel cannot be used for multiple purposes.

G

So you have the same issue potentially in chisel right. We want to write chisel that we can use for all these different purposes, and so this led us to the realization that we need a software stack, but for hardware so just like when you write c plus, and you don't have to specialize it if you're, compiling it to intel or to x86 or to arm or to risk five.

G

You wanna do the same thing for your hardware, so really, instead of just chisel, we have this kind of front end level, and then we have this compiler called fertile infertile allows us to specialize the verilog that we're emitting, whether it be for simulation fpga. um You know fpga whether it's emulation or that's our target and then of course, for asics.

G

Now this is a growth simplification right, so the level that fertile's, mostly dealing with, is still the rtl level. What almost everyone in this meeting would refer to as rtl level, um so I want to give a shout out to the other projects that kind of do the rest of that um that heavy lifting. Of course we have, you, know verilater, simulator and there's open fpga and open road, great projects for actually taking that verilog and synthesizing it for those designs.

G

But the point is what fertile is trying to do is make sure you have a single chisel design and it will emit not only the verilog but also collateral. You may need, in order to run a flow on open, fpga or open road all from the same single chisel source.

G

So you know, as I've basically said, fertile is a hardware compiler framework and a big focus on it is on custom transformations. So this is really important when you have your your own, every company has their own custom flow. Every open source project has their own flow and there's often little customizations that you need to do a good example is, like you know, memory built and self-test imbest.

G

You know. If you have your memories, you need the ability to to connect it up to your imbus, especially, and that may differ. You know using the same generator input depending on how it's used in the overall design may result in different endless connections, and so it's helpful if you can use custom transformations in order to do that, wiring.

G

um So that's really what pearl's all about is taking this. You know this compilation flow and allowing you to customize it. However, you want so I've introduced kind of the two biggest projects in the chisel working group, but there are several others which there's just one fertile, as I mentioned, chisel test is a really important one, and this allows you to write unit tests directly in scala to test your design, including you know, all the different parameterizations you may have treadle is a simulator for fertile, so that allows you to just simulate quickly.

G

In the same, you know in the same process. Of course you can always go to verilater, but treadle has you know um faster upstart time so for short tests it can speed things up, especially for short unit tests.

G

The io testers are kind of our older testing framework that is mostly replaced by chisel test and I'll touch on that a little bit. Dsp tools is useful tools for doing digital signal processing and then some of these other projects are, you know, just kind of little helper projects um like the boot camp for learning and the template for starting a new project, and we are chips alliance project.

G

I think we're formally a sandbox project, but we really need to just kind of hop over that that line to graduate, because um we are that's in in everything but name, I guess so now. um Sorry for those who've seen those slides or in some more form before, but now highlights from the last six months that I think will interest anyone who's familiar with chisel. So we just recently released chisel 3.5.0 rc1.

G

This is a culmination of almost a year of work. We probably started mostly working on it in earnest in january, but now I'm going to give some quick lightning highlights, and then I have following slides with more highlights that I'm going to die to go a little deeper into sorry. One moment.

G

Okay, so uh one thing that people have needed for a long time is back literal support. This is basically just making it more convenient to create vex, which are basically arrays. We've got a sport for scholars, 213 and end of life. Scholar 211 it's. This is just keeping up with advances in scala that make it. You know faster, uh easier to write things like that.

G

It's very exciting for those who have written a lot of chisel, there's a decoder and minimizer api that um interact that kind of integrates with the new chips alliance, espresso, which is really useful for doing logic. Minimization everyone's favorite digital logic, minimization tool from the 80s is in modern c now, maybe c, plus, plus eventually and some source locator compacting is just a useful quality life improvement, but there are far too many things to cover.

G

I will cover some more here in a minute, but I'm going to point people to the um to the actual release notes, at least, which is one fertile that are currently published, um and then I do want to note that the website docs will not be reflect, will not reflect all this until we do. The final o release, instead of just release candidates, but yes, like I mentioned back literals, is like literal expressions.

G

The decoder api makes it really convenient to write to express you know bit patterns, including unknowns or don't cares and allow you to compile that to something really to some really efficient logic and then source locator compacting, as I mentioned, is just kind of a quality of life improvement.

G

So now for some features I want to dive into a bit more. So chisel test I mentioned, is kind of the testing framework, and so it's really important, for you know, writing your designs. So there's been a few improvements like barrier leader simulation performance improvement by using jna. This allows it to run in the same process instead of using inter-process communication.

G

um The barrier backing now supports dumping fst instead of vcd. That's a really little one, but I think it does this by default now, but for anyone who's run a long running simulation. You know that vcd can be a not a great format for your disk space.

G

There is. This is important for longer term users, which is that the peak poke tester, which is kind of from the old chisel io testers, now has a compatibility mode and chisel test to help. You migrate your code, so that your the same tests you wrote in the past will continue working um simulation constructs can now be annotated.

G

This is very much like a power user api and it doesn't affect users right away, but it makes it easier to write verification, libraries, um a certain assume and cover graduated out of experimental, so they're much more encouraged and just come in by default. In the chistle3 package.

G

um There's simulation binary. Caching just help performance when you're iterating on your testing and and then something I'm very excited about, is support for bounded model checking which I'll dive into on the next slide.

G

So something that I think many people uh in the hardware industry recognize is that formal verification is incredibly powerful, but it's often assumed to be difficult for users. uh One of the things that we really focus on in the chisel world is that, um unlike the more traditional hardware design world is that the the person designing should also be doing at least some chunk of the testing of their work right.

G

This is really a focus on unit testing, rather than just kind of write your code and throw it over the wall, and it's someone else's problem, but formal verification often has seems a bit difficult for people for unit tests. So what we find is that good, tooling and sensible defaults can help here. So if you make it feel similar to a simulation based flow, then maybe a designer who's used to writing a little simulation test bench can still write a formal verification test bench.

G

um This is a subtle point, but you very often need to check against previous values, and this is really. This is sometimes really annoying to deal with because of things like reset or unknown. Like you know you, if you have a if you're asking for a value from three cycles ago, you need to make sure that you've been out of reset for three cycles and things like that.

G

So making your past function safe by default, to make it easier to use, and then, of course, I mentioned reset, but automatic reset guarding, um which generally make you know generally you're thinking in terms of a correctly reset circuit, and you want your defaults to assume that. But of course you need the ability to also verify your reset. So the point here is just making the default.

G

What you usually want, or at least what a somebody writing a unit test would want. But then, of course, people testing, the actual resetting of the design, still have the ability to do that.

G

Another thing that helps is close integration with the simulation testing flow, so that your your simulation based tests and your formal tests can live next to each other.

G

Having the same basic api is the same developer, environment, tooling, integration and then other useful tools for debugging when it fails, like you know what, if your counter example automatically produces a waveform so that it's as if you had written the test to give you that result, and so we've also added to fertile native emission of smt, 11b2 or outputs, and this works at the moment, but with um it, can work with any open source solver, but currently it integrates with z3 and cbc4, and so I'm going to point you to kevin's paper at wasat.

G

It dives into all this in a lot more detail, but here's just a quick example of showing the kind of things you can write. The point here is just that you have some. You know some sram and you have a right port to report and this sram is defined to have right first behavior such that.

G

If you read and write from the same address, you should see the value you wrote, and so this little formal assertion here just saying when you know on the previous cycle, you you wrote and you read and the addresses were the same. Then the output data should be the previous input data. So this test down here. This is the actual code that you write to run this test and it will run a bounded model check and it will return success in this case.

G

If you were to change the behavior of the memory to like maybe undefined ordering of the writes and reads, then this test would fail and give you a waveform showing you why it failed.

G

So I'm going to jump on to definition instance, which is sometimes what we refer to as the quote instance, api. So historically, chisel elaborates every module instance and then deduplicates structural, equivalent modules. This is very in the weeds, I'm sorry for people who aren't as familiar, but this is a new experimental api to make to allow you to define a module once and instantiate it multiple times.

G

The definition is the the elaboration like this, the the actual I keep using the same word, but it's the definition of that actual module, the implementation of that module and then the instance is an instantiation of the public api.

G

Now I'm building public, because what is a public api is not just the ports, it can be things that you need for verification as well like things you need to reach in and see. This is a major performance optimization and this composes with lots of other features, including cross module um reference annotations.

G

So this is just a quick example of what it kind of looks like this is an experimental api. I encourage people to check it out, but there's just a couple extra little annotations you use, and then this allows you to now balance you know create one definition of this. Add one module and then instantiate it twice now. This may look a little weird. Some people wonder why can't I, just by virtue of creating to add one that uses api, and so we have a potential alternate api on top of it.

G

That will match that expectation where, by virtue of instantiating it, the actual definition is handled under the hood. You can see documentation on the pr a feature that I'm very excited about is something called dataview and to make sure not to spend too much time on this. This is um something that I've been working on for a while, where sometimes users want to manipulate some type of hardware value as if it were of a different type.

G

A good example is that you may have an axi style flat bus interface, but you would like to treat it as if it had a more structured hierarchy, but you still need to match, pin compatibility with some interface or you may have a one dimensional array of registers, but you want to manipulate it as if it's two-dimensional, um and so this is something that I like to describe as like a super powered union or cast.

G

If anyone is familiar with database programming, it's very similar to a view in sql and, what's so, what I'm so excited about? Is this one primitive allows us to implement so many different things like seamless integration with scala types, bundle up casting and I've mentioned these you know viewing one type is another, of course user-defined mappings between types. So just some quick examples using this feature. We're able to you know take these. These are scala built-in tuples and um it made it very simple to implement this sort of thing where we can.

G

This little connection operator is a chisel operator, but we're doing it to what is technically a scala type. This is a tuple for those who've written some chisel. We'll understand that this is a pretty a pretty neat thing. To be able to do bundle up casting is when you have. You know bundles kind of like a struct, and you have you know a subclass of another class and you want to be able to connect them together, but their ports, don't matter their fields, don't match exactly.

G

It makes it really easy to cast between their types and then, of course, to get I'm not going to dive into this code, but just showing that this is all user extensible. I showed two use cases for it where, in the standard library we've provided, um we've provided implementations, but users are given all the same power that the library has and they can define their own custom mappings, for example, between their own bundle.

G

You know like a struct and they're in a vac which is like an array, and then I mentioned this in the previous workshop autoclone type 2, but I do just want to touch on it again, because it's gotten better, so auto clone type 2 clone type, is an implementation detail. That's existed since chisel 1, like long before I was involved in this project. It's useless boilerplate.

G

We had kind of a version of autoclone type, but it had a lot of limitations, and so now this is basically a non-issue. The compiler plug-in will generate it for all your bundles and in fact, starting at 3.5. It's mandatory and we're you're not even allowed to implement it yourself because you don't need to, um and so, as you can see from this example, it takes this. This clone type boilerplate is completely useless, and now it is gone and that's, I think, very exciting.

G

For usability and for writing, you know, writing maintainable code and so I'm running low on time. But I just want to note the continued growth of the community and give a big shout out. That is a really nice little spike there in in stars, at least on github, and I'm pretty sure that corresponds to the community conference in shanghai. um There's a lot of users in in china, and so please check out the talks um on the the youtube which is a youtube channel um from that conference and then get involved.

G

Chats on getter, ask questions stack, overflow and watch the talks on youtube. Thanks a lot.

A

Thanks jack, it was a very informative chat. Talk. I really appreciate it. So uh just a couple questions from the audience here, so uh first one is from richard nakhil and I'll. Ask this of his two questions. The last one first, which is our older libraries, such as rocket, live being upgraded to use the new chisel features.

G

Yes, they are um that you know that's lagging slightly. We just did the release, maybe two weeks ago, of um or maybe three weeks ago of rc zero, but yes rocket will be rocket. Ship will be updated to use it in shipyard, we'll use it as well.

A

And then a question from chandra ramurthy is there support for writing and stimulating with ams models.

G

I'm not I'm familiar with um ams the acronym, so maybe I can oh I'll analog analog mixed signal. Got it uh yes, sorry um so chisel itself is not focused on. You know mixed signal stuff or any analog stuff, um but you are always able chisel supports simulating with arbitrary verilog. So if you can set up a verilog simulation for your ams models, you can set up a chisel simulation as well. You just have to treat them as black boxes in your chisel.

A

Okay- and I one question just from my side, you know ever since I've learned about the work that uh you've all been doing on chisel, uh you know just curious is: what's the reception in the design community relative to object-oriented programming. It was such a productivity boom for software development. I'm just curious in the hardware community how that's being received.

G

You know, I think um so. People like most designers when they learn chisel, enjoy the additional power that they get. I think the biggest struggle with new, like next-gen hardware description languages like chisel, is not designers. Designers I find tend to be easier to win over it's it's more, the verification which I think will not surprise anyone.

C

G

So I think that, um and a lot of that has to do with that- you know the verification. World is really uh there's a lot of uvm and a lot of things that are harder to bridge to something like chisel, and I think that's where the real work in the future is going to be it's like figuring out how to you know, lift uvm as well or and and while also integrating with all the existing uvm ip out there.

A

Great thank you for that. Thanks again, for the presentation was very informative and uh look forward to hearing more about chisel and the environment you're. Creating thanks a lot all right thanks. So our next uh talk will be by professor mehdi saligan, who uh is with uh michigan university of michigan and uh many received. His ph is bs and ms in 2009, from the engineering school of polytechnic and grenoble, and also an ms from aix marsai, and he is currently with university of michigan, as I had mentioned.

A

So many you want to go ahead and take over.

I

uh Thanks rob and thank you all for your attendance, uh so my name is maddie and I'm a research scientist at the university of michigan's, integrated circuit slab today I'll be talking about open, fx, automated open source analytics generation, and I will welcome you to a little history of episode and how we started working on open fps. Our country, as the name suggests, an open source system forward of what we have been doing.

I

I

First, I think it is worthwhile giving an overview of who we are and what we do in the episode project. It is a darpa, funded program, part of idea. This is a multi-university and industry effort led by professor dave wenzlov, and I have added here all the pi's from each institution, fsx specializes in autonomous soc synthesis, which includes all the building blocks, such as generators, memories and cores.

I

So, at a higher level, fsrp is short for fully autonomous soc synthesis, which is a set of tools and scripts, typically based on python and tickle, and writes on top of existing synthesis and physical design tools. We support the usually adapted commercial tools which allow us to build complete soc's designs, ensuring full integration of all the blocks.

I

We leverage ipxact to do both the stitching and floor planning, which was added as part of the program, and I recommend watching dave's presentation at chips alliance's analog working group back in may to learn more about the soc tools and I have added the link to the youtube video, as well as a link to our website and article references in addition to the automated soc generation.

I

We also have a whole suite of analog generators that has been growing over time and just since the fsoc program has started, we have added a couple which are highlighted in green uh in uh on the floor on the diagram.

I

So over the course of the program, we have successfully taped out a good number of socs, as well as test chips containing specific kind of blocks. Here on the left, we can see our first automated soc in gsmc65lp, and it was successfully demonstrated at the last impression darpa summit at salt lake city.

I

Since then, we have ported our generators to gf12lb finfet technology, as well as a second test chip which is currently in fabrication.

I

One note here uh from this slide is that, thanks to our cell based approach and the sound, has a layout nature of our of our auxiliary cells that would stay later, we are able to shrink our analog blocks considerably compared to traditional analog layout.

I

We have also sent two contributions on the skywater and google shuttles in pw1 and two which used open source tools and a few proprietary tools to fill in the gaps I would go. I will go into some more details about this later in this presentation.

I

uh So you heard that a few times now, we use a cell-based approach to analog design. But what is it exactly?

I

A simple way to understand this is that we try to describe analog blocks in verilog and that can either be structural, very verilog, behavioral or a combination of both with the structural verilog. We plug in specific cells that are identified as the critical analog functions to a generator.

I

These could be either from traditional standard sales, libraries or a library that has been augmented with a couple of these cells that we like to call auxiliary, auxiliary cells.

I

I have added circuit diagram, diagrams of analog generators we use in fx all our structures have been designed before and published in conferences like international, solid state circuit conference, bsi or cicc.

I

I highlighted in blue the example of auxiliary cells for designs like sar dc's, digital ido stem sensor and pls.

I

Each time the analog function is either a header cell, a power switch a flying cap or comparator.

I

So then, after creating a template of these analog structures, we use different automated ways of modeling the circuit so that we cover generation for a wide range of user input. Specs the diagram on the left summarizes that process.

I

This is a list of required, auxiliary cells and really, depending on the input specs. Sometimes it is possible to get away only using standard cell libraries other times. It is not enough. We need to add some couple cells to which it is really equivalent to adding a standard cell.

I

When we add the auxiliary cells, we make sure they are laid out following the standard cell grid. We show here in the bottom a couple of layout examples. It is also a worth noting that we are working with sachin's align group who is presenting later this morning on autonomous generating these layouts use of our auxiliary cells.

I

This effort is focused on gf-120b and has shown great results so far. In fact, this piece was the last of having a fully automated generator, but again, really the effort of making a jury cells is really negligible compared to full custom and old layout.

I

We also like to call fx approach vlsa that would stand for very large scale, analog, which is a play on blsi and in fact the use of veloc to describe analog designs makes makes it easier, since we ride on highly automated digital flows.

I

Now that we have um available available hardware description of our designs, then we have a shot running that through the digital tool flow and place and route to generate the the final physical design.

I

I made a small workflow diagram on the left of what um of what would be a traditional analog layout flow today.

I

As you all know, most of it is custom and require multiple iterations from schematic entry to layout, then simulation and so forth, while the digital flow is highly automated for a while now, so the idea is to basically take what would otherwise be a full custom analog layout and show hornet into the cell-based digital automated using design flow methodology, which is pretty amenable to the available logic, synthesis and placing drug tools, be it proprietary and even better open source tools.

I

Fsr relies on a set of appropriate proprietary tools, wrapped in the cadre flow developed by ron brzezinski's group. We have been able to build really a wide range of analog blocks, such as the ldos plls jam, sensors, the adcs cdc switch cap, these cdc converters and even memories.

I

But in this talk um uh our focus is really to talk about the open source versions of our tools. Open, replace socks relies heavily on open road. uh In addition to a set of other tools like kiosks abc for reject synthesis. We recently started adding the sherlock hd plugins to uses described earlier this morning by michael bielda from micro to add the system.

I

Dialogue support magic negen for dlc and lbs we'd love to have kelly out added as well, but that's still a work in progress and finally, ng spice is currently used for pet simulation, but we also want to add size support in the future.

I

Coming back to physical design, it is worth noting that using open road we are able to use sql-based or even better python based software to easily do things that are a little more complex, using close tools such as building macro functions, to symmetrical to do symmetrical placement or arrays, broadbanding, etc.

I

So I really do like this diagram, which summarizes where fsr is today. So, if you take full custom layout, which is all the way on the left with 100 complexity and and compare it with the initial version of episode 1.0, we started two and a half years ago we uses um we which actually uses minimum constraints uh without any custom placement or anything in apr tools. Then, obviously the complexity goes way down.

I

However, we can see that the performance takes a hit compared to position, precision, analog design uh and now, when we swing back into the middle, where we use partial partial constraints and digital compensation to address non-idealities, this still reduces complexity, a lot compared to full custom flows and tools, and we gain back a lot of the loss performance in loosely constrained approaches.

I

uh So we can't we kind of get the best of both worlds.

I

In fact, in using our cell based digital approach, we are in reality avoiding the extreme complexity. um If you try uh to take in all the pdk, dlc rules that comes in with full custom automate automated layout, also with the rule of the machine returns, the time required porting to a new pdk or even a new design. Topology would essentially make an automated full custom analog generator too costly to make across bdks um new cases test cases.

I

So I think we have here a sweet point with uh which I fundamentally believe we could. We could push a little more to the left when using open source tooling, and I will get that later on this presentation.

I

Here's a few examples of what constraints would look like in gf12lp for pll, where we give extra attention to the dco block. We are using python scripts to structure the placement of fine across core cells, which are which all where all the cells are sitting tightly on a fairly wide stage.

I

uh The dco sits, then right in the middle of the rest of the pl, which is all placed and routed and surrounded with end caps and d-caps. This uh this block has been taped out last october and and I believe it is being tested right now by cumin from this group.

I

Another example is: is for uh power regulation, a digital ldo, which we taped out in different technologies, including sky water 130.

I

You can see the prepex simulation of the max load, current versus the array size of the pima switches, where we, where everything really looks fine and smooth, uh but then after running the design through apr with minimal constraints and random placement, we see after prosthetic simulations that the max load current sticks at all and goes way down and as we increase the array size, we can also clearly see the variations and resistive effects due to the narrow, routing or minimal bia cuts.

I

But then, after coming up with a more structured uh placement of the switch array, as well as fencing some of the cells, we also sorry. We also did automatic pdk parsing of its metal stack info to calculate resistivity over the power stripes mesh.

I

We can see that the performance is considerably improved with the smooth curve we see in prepex simulation as well as getting back the max load current, as you can see on the right as well.

I

At last, um a last example of what constraints look like for our sorry, adc generator. We have implemented a python script to achieve a common centroid placement strategy with a symmetrical placement of the unit caps and switches. This had been has been taped out in the f12 as well, so uh open face open. Fxr has been driven by the opportunity of having a fully open source pdk and three charles, uh where we can frequently test our tools using sequent data.

I

Please take a look at t, pencil's presentation and talks where he provides a lot more details on how to join this program and contribute to the open source community. I also recommend checking the solid state circuit society, people open, source design contested by uh boris merman.

I

Also, please check out the fussy talks uh from efablis and mohammed palsam, madquettius and jenstein, who have worked on enabling users such as me.

I

So um back to the tape outs, openness appsock has been an active contributing contributor to the open source community. Since the start of the program, we have been working closely with efabus skywater and google to enable users.

I

We also had the chance to tape out our designs on the first shuttle, which consisted of 64 sensors using multiple vt flavors, so that we could test the full capabilities of the devices on this pdk. We also typed out a digital ado targeting on output of 1.8 um for a max load, current of 25 milliamps.

I

A lot of the the flow steps are using open source tools. However, we use closed tools to as needed to fill in the gaps on the second channel. We have tried to set up step up our game by including a demonstrator for our generators.

I

One of the achievements was to have a fully open source entry and flow for the temp sensor generator. In fact, uh since we were, we were closely with the professor andrew again tom spyru, the architect of open road. We were able to develop a physical design tool, features that allowed us to have to have that and I'll go in more detail. Later, uh the sensors were integrated with the open title and soc and pulled up by one of the deos.

I

We also improved the digital ldo design, which includes now a voltage reference localized, a localized main based d-cap cells and an array of ldos supporting different clouds and using different implementations.

I

We have integrated four sensors connected uh to the open, titan's ibex core through a tiling. As you can see, we were able to create voltage domains and had the sensing element of the temperature. Sensors uh generator sits on a separate area with its own on-chip generate voltage, and this is uh using open road.

I

Also, non-default rules have been also enabled in open road. We added that, on top, um we added on top of that python scripting to connect custom nets to power rings as you as you can see, on the left, which is essentially doing a special routine.

I

um We have made 10 versions of the of leos using our the digital ldo generator in the second channel. We have included a highly trimmable voltage references to address dependencies and temperature variations.

I

As you can see, all the switches and voltage reference structures are tightly placed using an evenly distributed placement to minimize variations. The power strike for the switch array is created to max to minimize the resistivity, as described earlier in the presentation.

I

The open, titan soc includes a uart and api interfaces in addition of 16 60, okay byte of sram. Thanks to the open run team who enabled many users, we have selected one of the digital radios to power up the soc and its peripherals.

I

Although the connection between the two was manual, we are really excited about this project, since it could be our first open source, fully integrated, ams soc.

I

The the open titan soc is our current demonstrator and we plan to take out many other versions that would improve the overall ppa and especially in open source design in general. Our research lab focuses on low power. Ic design and improving the energy efficiency is one of our focal point and to achieve that, we expect to this to be an iterative process over takeouts automation and tool updates.

I

um Speaking of energy efficiency and ppa, we have spent considerable amount of time on verification and checking timing. I think we were one of the last contributors to the to the second channel and we barely made it, and the reason is that we have given an extra care to closing timing. We had to use the latest timing features of open road in addition to a newly developed eco flow, to ensure our fmax isn't impacted by all the margining on clock uncertainty.

I

As you can see on the diagram on the left. We noticed that the tools had a hard time fixing timing for cts and after doing many iterations, using both uh clock, uncertainty and margining, the f-max was considerably altered or even worsened our whole violations.

I

um So our eco flow used uh downsizing buffers on specific paths or added buff or adding just buffers, and, as you can see, we were able to fix our hold violation without losing any of our target.

I

Any of our target clock of 20 megahertz, which took around five iterations.

I

Although open road tools are constantly improving thanks to the work of the open road teams and e-fabbles, an eco flow is usually is really useful and necessary for us, especially in low power designs, which are tightly constrained to save energy uh for target speed, um as it is well known.

I

Such implementations require some flow gymnastics to close timing at specific operating points and- and I mean here worst case- temperature condition at lower voltage operating points and as a member of the openro team, we have tried to drive openroad to include features that would enable openfsoc generators.

I

We have enabled commands such as a create voltage domain that allowed us to have a fully open source temperature sensor. An example of that is shown in on the dipole on the right, and I have added the functions we had to add or update to achieve that. So I would really encourage pda developers to take a stab at the tools and help us improve or add new features and or even just test them.

I

Requests are constantly reviewed, which is a great way to have a thriving open source uh ecosystem.

I

So, uh basically, we are working tightly with ucsd and colin whole house from arm to add even more capabilities for the next shuttles and enable new generators or improve ppa in general.

I

Now, coming back to the performance of this tool versus uh tool, complexity, trade-off illustration I have shown in the introduction. We have noticed that open source tools allow much more automation and control in general control over the tools. I mean, um as you can see in the illustration here.

I

This allow um allow us to tremendously reduce the complexity of our flow in order, and in other words, we we push the limits of preference performance a little more to the left um in the case of the temp sensor, generators being able to control, open source place and drive tools to generate exactly what we need, as I have shown in the previous slide, allowed our generator to match the results using closed tools.

I

The four instances of the implemented temp sensor are actually showing better results, but we have yet to confirm that on silicon, so as we as we speak, we are wired bounding. Our chips from the uh in pw1.

I

uh Regarding the ci effort, our motivation here for our ci infrastructure is that we wanted to set up a way to constantly test our generators because of the tools and pdks used in our flow are moving targets that update frequently the ci would check if the generated design is still drc and lbs clean after any changes or updates.

I

Just as you know, new functionalities or merge vr are made to the tools and pdk repos or even the design itself, and to do so, we first tested our github action, ci infrastructure on microsoft, vms and uh then thanks to the help from tim, ansel and ethan from google and micro's blog example uh and kev. Roll um reported uh to self-hosted runners on gcp via terraform um and terraform creates the necessary infrastructure, such as network interfaces and the coordinator instance for us to register the automatic software for github actions.

I

This is an example of what the ci looked like after running the flow python scripts would would go and check flow ripples to find drc errors and lbs mismatches. Additional testing features will be added in future to check the performance of the generators as well. In addition to more exhaustive regression tests, the ci is initiated when changes are pushed to the repo. The flow in the ci first builds the docker image that has necessary tools and pdks for running an open face. Sock.

I

The ci then boots up a container that would run open, fk stock and a python script would go and check dlc and lbs errors and um even performance.

I

Now, uh regarding our future plans, we are hoping we could enable more users and collaborators by making a pmu generator uh process voltage. Group asian sensors generators and even heaters are also considered, and finally, studies are using uh our cell-based approach and open source tools, and uh this has been a lot of work done by dave, wentz loves group before and 14 nanometer, so we're going to leverage that hopefully- and we are constantly making test ship and making sure our generators improve, especially using open source tools.

I

So that's all I had to show you uh thanks for attending this presentation um and please feel free to reach out to us if you have any questions or would like to collaborate on a specific generator.

I

These are a few links, uh useful links, um so please go and check them and if you find a bug or enhancements, please find a github issue thanks.

A

Thanks so much man, it was an excellent presentation really enjoyed it. I just did have a couple questions. I just want to ask you. I don't know if you can comment, but what does one see in terms of productivity gains uh in the design cycle? Using the fafsa, I mean, as you well know, analog is typically the long pole in the design, and you know, particularly with more modern process nodes or new process nodes. That's a better way. To put it.

A

You know, there's often frequent pdk changes, so I'm just kind of curious how quickly pdk changes can be incorporated and also then, how does that affect the the productivity or time to market using fafsa.

I

Okay, uh yeah uh thanks rob for the the question and comment so uh this is. uh This was gathered by dave during this last presentation and I think it gives a good sense of how fast we can sort over uh pdks.

I

So you know, um if you, if you pick each of the the generators we can see that it doesn't take over even for the sram, which is a big big chunk. It takes 22 days plus 11 days for the cadre setup. So now we are talking about open source tools and um it is a little more harder because of the maturity of the tools. But um you know we noticed with opennote that everything is actually um you know um in um uh rolling away or running. You know going really fast. So um it's I don't.

I

I. I really feel that it's really easier uh to port from new technology um and, uh like, let's say from skyward 130 to another open source technology um and- um and that is uh uh basically uh thanks to the tools being uh built in an automated way and uh uh in and with the non-human in the loop um methodology.

I

So, basically, if you, if you talk in know practically if I take open road- and I want to make a new design- let's say in gf12- I just have to update the uh the platform, which is my side already done, and I will have to update the auxiliary cells. So that's really just a work of a few standard cells and um and uh yeah.

I

So it's pretty um pretty efficient. To be honest,.

A

No, that's great, thank you. So I have a question from chandra ramorthy. uh He says thanks, matty, really interesting work which parts of the flow are you not able to use open source flows.

I

uh So um I'll go back to the to the tape out in um in 130 and pw2.

I

So if you see here, we try to we try to use the open source tools as much as possible, but um some of the features that are actually currently being built for the ldo, um such as having mulch power rings, which which isn't hard, but it's still a pull request in progress right now, so that that didn't you know, um let us have a really open, fully open source entry and flow for the digital ldo, but um in general every I mean I think, for our generators.

I

We are mostly uh ready to import everything to open road, um so, for instance, here the temp sensor, you can see that we have two voltage domains um even non-default tools here. So um so really we just need to put the effort importing the designs to open road flows for.

A

Thank you so I'll just ask with one final question, so I was excited to see your uh call for collaboration on certies. I think that would be an interesting uh next step to uh you know, take place in this arena and just curious what your thoughts are on, how to help move that forward.

I

Oh, uh so I'm I'm actually uh working with dave winslow, I think he's here and uh we are trying to come up with a strategy and important is to uh skyward 130.

I

uh So um I'm suspecting there that we might need to uh really push the ams flow of open road, and um you know collaborators would really really be awesome in terms of eda, tooling or even like testing our tools. You know filing github issues and also please reach out uh to my email, and we can try to organize that.

A

Well, that would be great and let uh let me know too, if I can provide any help on that. um I had one question: here's a question from edmund hummenberger or human booger. uh What is the target speed for the 30s that you're considering.

I

um So we are trying to go for five uh for better gigabit per second, um so that's uh for usb um application. 3.0, I think or 2.1 uh I mean. I think dave is the expert here. Unfortunately, we can't talk, but I think the limiting factor is the the pll and we'll try to put an effort on that. So we can, um you know, have a higher frequency and a higher bandwidth or speed.

A

Oh, it's great! Oh! Actually, I see all right. One final question from edmund, so five gigabits wow on sky 130, we.

I

Hope to achieve that, um I think it's uh it's ambitious uh plan, but based on my discussion with dave, um it's feasible, so we've done that in 14 nanometer and we hope to do it in skyward, 113.

A

That's great well manny's. Thank you again so much for a very informative talk and I think the work that you all are doing there in michigan is really exciting. So thank you so much thank.

I

You thank you, rob bye-bye.

A

All right and uh dave wentzoff just commented, that's the goal and ambitious. We will be exploring this over the next month. So thanks dave appreciate the comment and also for uh tuning in uh today.

A

Well thanks again, so uh our next talk will be from uh mike kirk from ant micro he's going to be talking about fpga, tooling interoperability with the fpga interchange format. So I know this is an exciting area that we've been getting going in chips alliance and getting more folks on board with fpga. So with that uh mike, if.

C

You're available. uh Yes, actually, can you hear me? Well, yes, you're just fine! Thank you uh perfect. So.

J

I'm just starting to share my presentation: can you see everything?

J

C

J

All right, so, if you're giving me a green light, then I'm ready to start.

C

uh Yes, it's yours.

J

Okay thanks so um maybe.

J

Okay, uh so yes welcome everybody, my name is magic gorge. I work at microcurrently and today it's my pleasure to present you, the fpga interchange format and all the benefits uh which it can bring to the refugee, the world of fpga, tooling, both open source and proprietary.

J

uh So now, maybe, let's begin from a quick summary of uh what are the options of open source virginia tools available and how we can use them. So over the recent years we've been we've seen a tremendous development in opera source refugee touring and on the right you can see some names.

J

Most of you probably are familiar with them and for those who are not I'll just give a brief description about them and all those tools allow us currently to assemble a full-fledged hdl to b stream implementation flow uh with which we can implement our design using a completely open source flow.

J

Take, for example, yosis, which is a versatile hdl synthesis engine supporting multiple architectures, and then there is next pnr and vpr, which are placement and routing tools. Next, pnr supports multiple architectures. While vpr requires it's very data-driven like it requires its architecture to be defined by by a description file, and then also, there is only two which, together with vpr form the developed routing project and.

J

Yes, the the fact with those tools is that we can either use them to make a full implementation flow or even mix them with existing proprietary vendor.

D

J

Tools and until now there were two main actually main flow schemes that were repeating over and over so in any real design implementation. We need to do to perform roughly those four steps. This is logic, synthesis placement, routing and then finally, bison generation.

J

uh I maybe leave the b string generation uh for today because it's not entirely up to the interchange.

J

So the the the most common approach is to use a single tool chain, most probably when a vendor run, which does all the things either internally or can it can store the intermediate results, but still we are forced to use the same tool chain to start to continue from from any any point.

J

We can also use- and that's also very common scenario. We can use separate sentences and placement routing, and this is what you see in the bottom.

D

J

Where you have where we have synthesis done using one tool and the rest done via the the another one, the common use case here is to use either open source or closed source sentences. Engine and what's the problem here. The problem here is that we have a very limited inter comparability of those tools, even though there is a multitude of them, and we basically could imagine more complex flows in which we use different tools like, for example, in these diagrams.

J

Let's say one want to make it to to perform synthesis and placement using one tool and then routing, plus, b stream generation using another one.

J

We can even go a step further and perform each of those steps using a different tool, a completely separate.

J

So until now it hasn't been possible, and that is mostly because the two reasons, uh these two reasons are that all those tools, most of them, actually use different data formats for the results for input and output, or even they use the same format but with different so-called flavors. So one tool can write a a file, but another one cannot read it reliably.

J

Plus those tools represent the data internally in a different way. Hence they require some sort of different representation which would require additional conversion routines, which are not either not implemented or simply not exist, and that's why the fpga interchange format came to be uh before I tell you about all the technical details regarding it. Let me focus on some goals of the interchange format, so.

J

We together at a micro with our partners, which are when it comes to this project, mostly google and xilinx, plus other contributors, uh I'm at enabling interoperability of the all the existing tools so that they can be exchanged freely using the the a common representation and data format and yeah, and we want this format to be strictly defined so that there is no no no place for any any specificalities.

J

That would prevent one tool from interpreting with another yeah. That's pretty much what I said so, having all that in mind, we began development and what we come up with is known as the fpga interchange format and it's made of the three major components and those are the logical net list, a physical netlist and device resources. And now let me explain all the details. What's uh what's what all of these components is responsible of what data it carries.

J

So logical net list, basically in a nutshell, represents the design.

J

It is a logical structure of the design expressed in terms of a number of cells connected together. uh Besides that it can be used to store cell libraries like cell definitions and, of course, the design part is actually kind of a library which stores also a connection between those those cells. Instances.

C

J

And, of course, uh apart from that, there are attributes and parameters as most of the htl description languages allows the the structure is very similar to the well-known edit format, but in contrast to that, it's strictly defined by the schema which I'll tell you about later.

J

Hence there is no way to make something incompatible or which one to can read, and the other cannot.

J

One more thing about logical net list is that it is basically a standalone entity, meaning that it does not depend on the other tool because it represents uh logical design. But when it comes to physical networks,.

J

The physical net list basically tells us how the logical and at least how the logical design is placed and routed on a specific fpga device defined by device resources, so physical, lattice and device resources have to come together all the time, and yes, as I said this, uh the physical, at least stores cell placement and net routes, and of course it's not.

J

It's not mandatory to have both of them both of those kinds of information start at the same time, you can have only placement only routing but and that that's what is required actually by when we want to have to exchange data between various tools. So that's yeah, so that's the physical net list and finally, we have device resources which basically define the whole fpga fabric, the whole chip, and we we have there. Actually it's it's. The description is built around island-based style fpga.

J

So we have tiles sides and bells they'll stand for a barrel, stands for basic logic element and, of course, their instances over the data on fabric, the whole grid, routing resources which are wires nodes and pips. A pip is a abbreviation of programmable interconnect.

J

We're going to interconnect points to be exact, plus many, many more supplementary information like that parameter, definition, timing, data and so on. I won't go into more details now, because the time is times pass so yeah, the the the interchange format, as I said before, is strictly specified and for specification.

J

We selected schema language of the cabin proto library and this this choice has been made because cabin proto is a library which allows us to make to generate basically an api for reading and writing binary file defined according to the schema, and it supports supports multiple programming language plus it is cross-platform.

J

Hence it is. It is ideal for this application as many many fpga tools are written in different languages, plus they run on different platforms, so they carry around different platforms.

J

So here we are all the schemas which actually defined the interchange format available on the github repository up under the link you see below and yes, please feel free to look into them like. If you have some ideas how to contribute to them. You can always make a pull request and add something to the format.

J

All right, together with excuse me, together with the forward definition. There comes also a set of utilities, and these utilities are currently written by phone and besides providing routines of reading and writing the interchange format into a set of titanic objects.

J

By reading and writing, I mean more complex reading and writing than just what captain proto provides us, because cabinet protocol gives us simple routines for accessing data while to in order to understand the data, we need to have more complex object models, so this is what those utilities provide.

J

Also, there are mechanisms for scienti checking the interchange format.

J

This tells us whether all the internal references between things inside inside the interject file are correct and then what's also very useful, are utilities for conversion from cap and throttle binary presentation to textual format, and we currently support json and yaml, although there is no formal schema for neither of those.

J

So these are useful for right now for debugging, plus some basic data manipulation, and we also expect to come up with formal schema for those in the future and one more actually actual thing in the utilities is a script which provides the means from means of converting eos's output net list, which is also in json format, to the interchange, logical netlist.

J

And this is a very important utility, because for now it binds the gap between uses and the interchange format.

J

And hence we use it to to actually write logical methods using just.

J

And, of course, utilities are also available on github the link to them to the repositories.

J

Is on the slide below.

J

Okay, so what's the state of current adoption of the interchange format.

J

I can say that it is at currently at the stage where we can do actual placement and routing using both device resources defined with that using the interchange format and yeah, and and do that using mixed, open source and commercial tool chats. So what are those tools?

J

uh The first tools that actually take took advantage of the interchange format and actually it was developed together. Kind of with the interchange format is thanks. Rapid right and this tool is written in java and allows us to first of all to generate device resources, sort of database for most of the seven series and ultra scale fabrics from the links, and so this is the basic.

C

J

For uh for any other placement routing tools to operate on plus, which is the most important feature, it enables conversion of logical, physical, netlists and yeah vodka from at least uh to the dcp format, with the built by xavius livado, and this is the the the actual uh that I think which closes the gap and which actually allows the inter operability uh of vivado and other tools, because this makes uh vivado understand uh the interchange for through this, you do it and uh yeah so interrupted. Right comes in two parts.

J

A one of the part is not free. While I mean not open source well, the other one is open source and the second one is available on github. Under the link below.

J

And when it comes to fully open source tools, there is next pnr, as I tell told you before. Next pmr supports multitude of architectures, but currently one of those architectures is kind of virtual architecture called fpga interchange, and this architecture allows us to dynamically read its definition from from device resources from the interchange format.

J

Then we can of course, load logical list and after successful or not placement routing, we can write physical netlist with all the placement routing information as output.

J

Maybe I didn't tell you about that, but there is also a full-fledged suit of tests in which we have extensively tested and we are. We have tests for the flow which which in which we place and route designs using xp and synthesize, but with theosis placing it out with next pnr and then donate this stream using vivado by by taking advantage of rapid write which which allows us to input the results to move up.

J

All right so coming to uh the ongoing effort and future plans. uh Currently, there is an ongoing effort to add support to dpr for fpga interchange.

J

Currently, there is that the development focuses on and first being able to define the architecture for vpr using device resources from the interchange, a plus, to be able to read that design the logical net list from from geological networks from from fpga interchange, and we pro we expect to the the the reading of logical items to be finished.

C

J

And apart from that, we would like to also to have a full-fledged front and back and for logical, at least at least back and for now, because, as I said before, right now, we have two fouriers because, as I said before, for now, we have to we're first to use the script which is kind of a temporary solution.

J

But ultimately you would like to have something that allows theosis to natively write the logical networks.

J

All right and we have come to the end. Thank you very much.

A

Thank you so much mike for a very informative presentation. I really appreciate it and the work that uh you were doing as part of ant micro and the uh the ecosystem around fpga tooling. I did have a question or two from my side. We have one here from the audience now, two so I'll start with that one. This is from thomas solar.

A

The advantages for developers seem clear, but what are the incentives for other fp fpga vendors, such as altera intel, have in employing this standard, or is the idea for the community to do all the work, unlike the collaboration with xilinx.

J

Well, so um maybe a small history historical overview, so the the change format started as an effort of the emerged from the sim before project which basically aims at providing the same flow for multiple rpg vendors and managed to work together on that with xilinx.

J

But until now, I'm not aware of any ongoing effort and interest in that from other fpga vendors. But I expect that that will change as soon as we will show that this format actually works and is worth investing. It investing and meaning investing time and effort.

J

Because and once everybody sees that, for example, uh zionist refugees can benefit from that. The techniques can benefit from that others probably will they will follow. But that's that's the future and.

D

A

So I did, I had a somewhat related question, but I was just curious as to what the reception was of the proprietary tool industry and accepting the fpga interchange format or exchange format in some ways. I I see a similarity here relative to uh having an open physical design data representation or having a a open verification date of representation.

A

Just curious as to you know, has there been conversations uh with some of the different uh proprietary eda vendors on this topic or interest shown by them. If you can comment.

J

uh Yeah well, actually, as for now, I'm not aware of the end of any of that and, of course, I'm not aware of their internal. Let's say feelings about about the these efforts and well. My opinion is that, ultimately, they will actually have to either contribute to that or accept it, because this this would be beneficial, because once specific architecture provides tools which support that format, and there is another architecture it doesn't, then it's it's.

J

Actually it might change the decision made by engineers to use whether use a specific fpga chip or not because having having the ability to to know more, what's actually going on in the whole implementation process, having it more open.

J

Definitely I'd say this is something we should. We should go for.

A

I appreciate it yeah one of the richard akeel commented that vendors eventually adopted gcc llvm as examples and by not adopting they end up being left behind. So I think that's certainly a true sentiment, so exactly yep. So thank you again so much for your talk and for the work that you're doing uh on this. I think it's very important. So thanks again.

J

Okay, thank you.

A

Okay, our next uh presentation will be on omni, extend, milestone, updates and that's going to be provided by uh jaco hoffman of western digital. So this is collaboration work that we have ongoing with risk five. As an example, I'm trying to expand the overall uh scalability considerations of cash coherency so jacob you want to take it away.

K

Sure, thanks for the introduction, as was already mentioned, I want to talk about our omni extent today and our current development in that area, and before I get into the details of omnics in itself, I wanted to start a bit with what problem omnicson tries to solve, and when you look at today's service, you usually have single racks they're interconnected, but you cannot really directly connect to, for example, an accelerator in a different server rack, or you cannot easily access memory coherently, the same memory with all the cpus in one coherent domain.

K

um It's also difficult, for example, to do things like processor, only nodes, some nodes which do not have their own memory, but only interact with a.

D

K

So um that access these remote resources, so what omni extent tries to do is tackle this problem at a very low level. So, instead of relying on, for example, rdma or nvme over fabric, which tackles this problem at a somewhat higher level omnics and directly attaches to the internal coherence switch of, for instance, a risk five processor, so in omni extend what we do is using the tiling protocol, which is already standardized in the risk, 5 ecosphere and serialize that over into ethernet.

K

So we have the advantage of already using a well established protocol and being able to scale that out over off-the-shelf ethernet and from a software side. Nothing really changes. If you want to access remote memory, because you have your hardware threads and they are connected to the tiling interconnect and then you can of course access all your memory map devices locally. You can access your local dram, but you can also access wire, omni, extend remote memory or remote, accelerators, remote mmio, and so for your application itself. Nothing really changes!

K

You simply access a different memory memory range, so um this works by simply mapping these resources at a certain space in memory, and you can have different kinds of memory.

K

For example, your local ram just cachable, but it's not shared with other nodes. You can have global ram. You can also have global mio to access remote sensors to access, remote, accelerators and so on, and um of course, you can also have your local mmi, oh, but you might as well also share these resources with other nodes in the system.

K

So um how does omni extend do that? And I want to talk a bit about omnix 10 1.03, which is the current um version of omni extent and what it currently provides, because it's not only the uncached axis so just reading or writing memory, but also the scaling of the cached accesses.

K

So you um so coherently accessing the same memory and following the whole mesi protocol of tiling to ensure that all the nodes have the same view of the memory and there's a bit of typo. It also provides atomic accesses to the memory region. So, depending on what kind of application you're running you can flexibly decide on what kind of memory access you need.

K

For example, if you just want to read a bunch, a bunch of data which has been written by someone, you can just use the uncached axis and you don't have any overhead for the coherent accesses or you want to have atomic accesses on memory. Then you can do that and um to ensure that the tiling protocol works over the uh over ethernet omni x10 also provides flow control mechanisms, so a single node.

K

So the single node is not overhelmed by the incoming packets and it also detects and handles out of sequence or drop packets to ensure that thailink itself tiling in the end, receives proper messages in the right order and is not confused and the packet format. The ethernet format you can see on the right. It allows for multiple tiling messages to be tightly packed into the into a single ethernet frame.

K

To ensure that you have a good utilization of the link- and the header right now looks like this on the bottom, where you have the sequence numbers for for for the out of sequence and drop packet detection and the chan and credit for the flow control mechanism and a virtual channel to to allow for virtual channels.

K

But one problem we faced when we tried to to scale out the system onto a larger number of nodes. Is the state that's required to manage these recent and flow control mechanisms, and you can scale it quite easily to tens of sessions with sram or even to much more with dram, with, of course, a bigger latency penalty?

K

But you will always get to some point where you um have the state, that's required between the different parties to prove generate issues with the resources. So each of these components here will have a maximum number of nodes. It can talk to and the way omni extend is currently set up. It requires that this is a static setup, but when you look at a real system, not all of these connections have to be active at every given moment in time, because application server, 1 might only access the lpc or application server.

K

2 might only access application, server, 3 and so on. So what we're working on is a dynamic connection scheme for omni extend, which is based on the current omni extent mechanisms, but allows for very low cost establishment and termination of connections.

K

So you can dynamically connect to a different omni, extend host and terminate the connection if you need to free the resources- and that's currently done by introducing a new type field in the reserve, bits of omni expand 103 and currently we have three different message: types which are new and which I'm going to explain to you now.

K

So the first is used to establish a new connection and, as we already have the fault, tolerance mechanisms which simply retries uh to to send the packet until it receives a response, we don't even even have to do anything, introduce any new mechanisms. We simply send this message type hey.

K

I want a new connection and when the other end can provide these resources to establish that connection, it will simply respond right away and, as we don't have any new package or additional package in the best case, we have zero additional latency before this connection establishment and we can start right away with trying to send a request and in the best case we immediately get a result.

K

um The termination is a bit more involved depending on, if you use coherent accesses or not, because when you want to terminate the connection, what we decided is on is the both parties have to agree that there are no outstanding tiling transactions so that there is no situation where um one party has some outstanding, read or write, for instance, or for a career and a probe, for instance, and won't receive a response, because the connection is down, and the same is true for permissions for for cache lines which have been transferred so in a probe based cache system.

K

So where the lowest point of coherence has to ask the different.

K

These permissions have to be released before terminating the connection, um but for a directory-based system. So, where the system itself knows how to um who to contact who the the permissions are owned by you can get by with using connection reestablishment. For these permission, changes, the third message type we introduced is which we call kennels.

K

um Now these this type of messages, because they bypass the the recent logic and the flow control logic, and we need to introduce those because without those messages, there might be potential deadlocks in this case, the first one is an egg, only message.

K

So when you receive one of those, you know it's only an act for previously received messages, and it won't be put into the recent buffers and not fill up the recent buffers, because we noticed when you have a very high throughput and high latency scenario, where the recent buffers are full on both sides.

K

You might come to a situation where you cannot send another acknowledgement, and in this case, without these kennel message, which bypasses the recent logic, you would run into a deadlock. So that's why we introduced those.

K

um Another thing I wanted to introduce today is our lowest point of coherence which we are going to open source as um bluesback source, for which, luckily now an open source compiler is available.

K

It's another modern hdl, a bit similar to a chisel but comes with a few features which makes it very nice to design hardware in and the lpc is designed for 10 gigabit ethernet right now out of the box. It works with a variety of exciting fpgas as we're using the topasco framework to handle all the board abstraction, but there's nothing fundamentally exciting about the lpc and you can easily port it to intel fpgas or even to asic.

K

If you want to the lpc itself, on the one hand, receives a stream of ethernet data on the other end, has an xi interface to access memory and with this lpc there's also software implementations of the requester. So the party that accesses the memory- and you can also simulate the whole thing and software to to play around with to get started with omni, extend to see if that's something you want to use and um yeah to test that.

K

The last thing I want to show today is a little demonstration of that system. So what I'm I'm switching to a video after this slide um and you can see a simulation where we have three software requesters and those are connected over an ethernet switch to the lpc running on an fpga and in this we do reads and writes, and you can see that the nodes receive the same data. So let me switch to the video.

K

Okay, now you should be able to see that so you have the window for your convenience. You have the three red requesters here top bottom and out and the bottom right. We will start with a read to some address.

K

We're simply writing hello, world or hello. Omni extend to this address and you can see all the other uh notes receive that we can read back the memory and receive the answer, and we can also read it from another note and will receive the same data we have written to to that location.

K

But we can also, of course, change. The the variable from any node and omni extent will make sure that we we properly transfer the permissions between the different parties and we can read and write the data and it's all for for the applications itself. It's completely transparent.

K

Okay, let me switch back to the slides.

K

With that um yeah, I can thank you for your attendance and, if you have any questions, please come back to me and I'm happy to answer them thanks.

A

Thanks so much for your excellent presentation, I really appreciate it uh just ask a question or two: oh, we have one that just came in here. So let me with it. This is from edmond human berger. Oops. Sorry, that's! Oh okay! What is the latency to access memory from another node over 10 gigabit, ethernet.

K

So the question is uh um a bit difficult because right now we have the software requests as well. That's at the middle of leighton's latency, but we are in for reads and writes non-coherent. We are in a low microsecond range, so less than 10 microseconds.

C

A

Thank you. I was just curious uh nope. We have another question here. uh This is from rashir nikhil. Can you please compare omni extend to axi ace? Are there situations where one is more suitable than others.

K

um In general, omni extent contains a bit more, especially when you we have. It contains a complete coherence protocol which excite does not, as far as I know, and the other thing it supports atomics, which is not properly um properly supported by xi. Okay,.

A

Now question from leonidas cosmetis: is there any provisioning for rdma.

K

um What do you mean with provisioning? So basically you can do the same thing as rdma does.

K

Do you mean a specific implementation of rdma.

A

uh He doesn't specify.

A

May have to follow up offline.

C

A

Let's see here did that one.

A

Okay, yeah, I think that's all the questions that we have right now so but uh yeah, if you could follow up with these folks, that'd be great. Thank you.

K

A

All right, thank you again,.

A

Okay, our next presentation is uh open source nvme ip with ai acceleration, it's by anat kolkarni of western digital, and also carol google, who is from ant micro.

L

Hello, um so I will be sharing the screen. Let me do that.

C

E

Okay, that one.

L

That one okay, uh I don't think I can sure I can start my video. So let's, let's uh do it like that? Okay, um so my name is kara kugawa and I'm from on micro and today, with alan kukani from from western digital, I wanted we wanted to present uh the project we've been recently working on uh in collaboration uh project uh on creating a platform for uh for for implementing uh accelerators in nva uh lvme uh devices uh to ease up uh logistics.

L

I will present uh the presentation. I will talk through the presentation alone, uh but aaron is here and after the presentation we both will be answering two questions. uh So, let's start from the uh from the beginning. What nvme actually is? So you, probably all of you, heard about nvme nvme- is, uh if you hear this name, you probably think of uh of hard drives and disks in in your computers and you're right.

L

Nvme is a in general is a set of uh familiar specifications uh regarding non-volatile memories uh and in most cases those are simply drives uh and the so normally uh those are just just hard drives that you connect into your pcs or in data centers to machines there, the most common, uh the most the most common uh combination of like a hardware, connector uh and communication interface and and the storage is using a flash memory uh with m2, connector and and transfer the data over pcie.

L

uh If you want to learn more about that, you can always visit nvme uh web page. All the specifications that are published and accepted are there. You can simply grab them and and read if you, if you want to know more. So if we are talking about the hard drives, what's the deal with uh with accelerators, what why do we need? Actually, you know accelerators within the hard drives.

L

uh So if you look at, if you think of uh uh processing algorithm, especially those ai algorithms, but not only as it can be, actually any video or huge data uh like algo is processing large amounts of data. They they often run on a dedicated hardware. They often run on on some kind of accelerators that make them quite fast. So the bottleneck in the whole processing process is data transfers. It's moving data from some kind of a storage uh to uh to the accelerator itself and then getting it back.

L

So if we can put the accelerator as close to the data as we can, that will speed up things enormously and we can basically process the data faster or even we can do like pre-processing and then limit the data that we have to transfer to something else.

L

We may save a lot of time and energy on spinning up new machines that will have to process the data that we read if we can process it at the place where we store the data. So if we can have a nvme storage with additional accelerator uh that may solve a lot of problems and that way that may speed up the whole processing the whole execution of of data of data processing.

L

Basically, to address this problem in cooperation between western digital and micro, we came with an idea of building an open source platform that will allow developers uh working on on implementing that kind of uh solutions. I mean implementing different types of accelerators testing them. You know putting them into uh into data centers and simply uh see what we can do when we have that kind of uh that kind of uh platforms uh that one needed to create.

L

First of all, the hardware platform, I will show you the hardware platform in a moment, but also a lot of software around by software I mean either software running on cpus, but also fpga, firmware, fpga gateway and and all the logic that process the data and the hardware platform is pretty pretty nice. I must say uh it fits into a standard, uh small, two and a half inch small for factor uh hard drive, so you can easily mount it into in your uh local pc or in in you know, standard cases.

L

Someone can see in some kind of a data center or somewhere in server room. It is based on xilinx, zync, autoscape plus mpsuc, which is quite a big chip with quite big cpu subsystem, cpu processing system and nfpga. That allows us to implement some additional dedicated logic. The whole system looks more or less like that.

L

You can see. There are two parts here like on the left side. You see a programmable logic which is fpga firmware in this case. On the right side, you see a processing system which is all the cpus that are available in in the outdoor sky and psoc and psoc chip. So basically, the idea of operation here is that we connect over pcie to a host machine to a house computer as standard nvme devices.

L

Do then host can use that one to basically communicate with some dedicated logic here, marked as nvme control registers uh to basically write, read the data and set up transactions, as normally uh those type of devices.

L

Do we use dedicated data movers to to uh move the data between uh the data we get from uh from host machine uh between our internal memory and uh and, of course, host machine memory and in order to handle all the all the nvme comments, uh we implemented a firmware running on one of the cpus one of the clusters, cpu clusters available in insidings out to scale plus and psoc, namely we run a firmware running on a real-time processing unit to hear marked as ipu.

L

This firmware basically is responsible for handling all the standard. Nvme comments. All the non-standard nvp comments are passed to apu, which is application processing unit, since this is cortex a53 in an ultra scale, plus mp soc chip where we run linux and we pass all the non-standard non-standard comments to that to that piece of the system and there then we process it with with some kind of application, some kind of software running in the system uh in order to build this whole device.

L

This whole platform, we used a number of open source, open source blocks, building blocks that are available right now, and I would like to tell about a few of them and how they are used, how we use them to to build that such a complex system and how we uh were able to actually utilize all the benefits from from uh from open source uh blocks.

L

So one of the most important parts of the uh of the system is pcie core. uh I mean we use that to connect to the whole system to transfer the data, so it had to be uh it had to be uh pretty it had to be fast. It had to be stable and and well tested, so we decided to use uh a project called verilog pcie. uh This is a part of corundum project which is project viewed, uh 10, gigabit, ethernet and nsc.

L

uh So this core actually allows us to to wrap uh hard pcie cores available in xilinx mpsoc chip and wrap it with some additional logic, for uh you know, interfacing, that with axi buses, internal buses in the system, adding some data, movers and and so on.

L

The next block we we used, we created uh is a nvme register file, so this one we actually had to implement from scratch, but still we used uh a few open source open source blocks to speed it up. So we implemented we decided to implement chisel, because chisel gives you quite uh quite nice up nice abstraction. You can implement things way faster.

L

We can implement logic way faster than then you can do it, then, with standard, very local, some kind of a low level, hdl language and one of the cool things about it is that, for example, all the register, definitions of the register's logic along with behavior and all those things that all the logic that that handles accesses, read, write or you know, different types of accesses to the registers are- is automatically generated uh in chisel by traversing the pdf version of the nvme specification. So what we do we grab a document we run through it.

L

We read the tables uh in a script and then we generate logic uh from it, which is pretty really nice. We really don't have to uh write it from scratch manually. We we just generate this part of the code and, along with generating logic, we also generate uh uh some basic software like like basic accessor functions or headers and all those things. So that's that's pretty nice, and this is what what you know. Languages like chisel, give you it just.

L

They just make things easier on a rpu, rpu cpu in the system we run an application handling basic, like standard nvme comments, so we decided to use a zephyr rhythm operating system to implement this application. Why zephyr it's yet another open source block I mean zephyr is pretty uh pretty nice uh real-time operating system. uh It is posix compatible. So writing an application. Implementing an application is uh not that hard. I mean you basically write it as if you wrote a standard linux application.

L

So if you want to spawn a thread, you just spawn a thread and and if you want to use some kind of a more fancy functionality, zephyr probably has it. So uh it's it's just you know, saves you a lot of time to to use the system instead of implementing everything from scratch uh and, as I said when I was describing the whole uh the whole system structure.

L

uh All the standard comments are handed here in in this rpu cpu, but all the others are passed to uh to another system to another cpu uh to another piece of software, that processes the rest and to communicate two different operating system and two different pieces of software. We had to use some kind of a asynchronous communication. Yes, some kind of a piece of software that will allow us to pass the data safely and synchronize both systems both pieces of software, and here we decided to go with uh with openamp.

L

So if you're not familiar, openmp is a open source uh framework allowing you to uh to pass the uh information between two different asynchronous systems and and synchronize them, and uh you can, uh you can still maintain uh safety and uh you can avoid race conditions and- and you can uh basically use those two systems as if they were a single one, um the application on the um zephyr and linux.

L

They both implement uh drivers for openmp and they both can implement, and you can utilize synchronization and memory sharing uh hardware components available in in uh ultra skateplus.

L

Ultrascale splitship, so using that we were able to simply pass the data into into linux application and linux application beside, of course, run beside handling the whole communication. It also implements a bpf virtual machine. Bpf is berkeley packet filter.

L

I will talk about that in a moment. So having this berkeley packet, filter virtual machine allows us to run uh bpf uh bytecode programs. Bpf was originally designed to handle packet routing within data centers on a very low level. Normally it's integrated within the window kernel, but here we used a virtual machine called ebpf which allows you to run bpf code in user space.

L

So instead of routing the packets, we use bpf bpf bytecode bpf programs to control the acceleration to basically um to basically describe how the computation should be accelerated. uh But since we are, we were getting here, um ai application. It would be good to to have support for some of the leading ai or commonly used ai frameworks, and we decided to go with tensorflow.

L

So we extended this ebpf virtual machine with some functionalities, allowing you to to run allowing the user to run tensorflow lite models and then simply pass them to the machine to the device and run them within the device. uh This bpf byte code is transferred from a host machine.

L

A bp of bitcoin and uh ai model is transferred from the machine from the machine using a dedicated custom and vm instructions.

L

We introduced, we extended the system with them, so uh uh so user can can use those to uh simply send uh send them to the the device and then bpf once we send the firmware and the model we can, we can, let's say, run acceleration saying that grab, for example, this piece of data from this logical block of the disk and process it with this model write the results here in this logical uh block of the of the drive and that's how we can.

L

We can process data on the device, but having only this virtual machine and tensorflow lite on a cpu running on the cpu, wouldn't give us much acceleration. It would just give us some possibility to process the data on the drive, but it wouldn't be really fast so to mitigate this problem. To address this problem, uh we extended the system with uh with a dedicated accelerator, and here we used uh vta accelerator.

L

This is another open source block that we that we used in the system vta is, uh is accelerator implemented four fpgas implemented with uh written in chiso uh it it implements uh just just a few additional buses. So it's pretty easy to integrate with that kind of a system that we have and in order to handle that from software we extended the tf lite library that we use uh on the device with some additional custom delegates that can spawn certain uh can can delegate certain types of computations into the uh into the accelerator.

L

So um I think I lost one of the one of the slides, uh so one of the uh at the end I wanted to uh to present that next steps, uh so the next steps of the of the of the project is to first of all clean it up and and release to uh to community, so stay tuned. uh I'm pretty sure it's gonna happen sooner than later, um and thank you for attention and we're happy to answer uh your questions.

A

Thanks carol, not both for an excellent, informative presentation, so uh I just had one question from my side. You know I'm always interested when I see the increased use of fpgas in different settings and you know certainly makes sense for uh machine learning type of acceleration for what you're doing relative to nvme type of data assistance. I'm just curious: do you think that will continue to stay with using an fpga for this, or do you think that it may be implemented as part of an asic? At some point.

L

um So I think in implementing that into a6, of course, will make those those things faster. uh That's you know, no doubt about it. uh Asics are just just faster, but having an fpga, there allows you to to reprogram it on the fly with uh a dedicated uh accelerator.

L

So if you, if you'd like to, for example, replace this accelerator that that we used here and we use vta that is dedicated for accelerating certain uh ai computations, you can possibly replace that with some kind of uh like video filters or encoders, or maybe some kind of accelerator, that that would speed up uh regular expressions, searches in uh within a like a lot bunch of text that is stored on the on the device.

L

So um it depends what you put your targeting, where, if you, if you know that your device is gonna, be processing certain types of uh types of data and certain uh types of algorithms using asic there is is just faster. I mean they will work faster, use, less power and and will behave better.

L

But if you want to have like more flexibility using fpga, there is is simply better, and here we wanted to prepare a platform uh that other developers can actually use to develop and then check the um check. Whatever. uh Whatever approach of of processing data uh in the drives in the storage devices uh is efficient and how it can be addressed, how can how it can be implemented and so on? uh Actually, this flexibility is is really useful, not sure. If anant, you want to want to add something into.

D

L

D

Exactly right uh role.

D

Right now, we are at a stage where we are still uh just coming up with innovative ideas on how accelerators can be used. So I expect that the applications would be in a wide variety of domains, so it is still not mature enough that we can have an asic implementation that would satisfy a wider range of applications easily right. So uh I think there will be still a window of time when an fpga based accelerators would continue to be valuable and uh eventually, yes, I agree.

D

um An asic based implementation would be better from both performance cost and power, all three of which are very important when it comes to making accelerators um more easily accessible.

A

No thank you one. One final question: uh this is from paul. Harkey is the plan to make? Is there a plan to make this system available.

L

That is the plan uh we just have to uh polish it. You know we just have to finish it and at some point it will be uh open sourced.

A

Well, thank you. Thank you so much both for the presentation and the fine work in this area. It looks it looks quite exciting.

A

Our final presentation today is by professor sakhin zappanacher from university of minnesota, and this is entitled automating analog layout, using a line so sucking.

H

uh Thanks rob uh should I start sharing. Oh.

E

H

So thanks for the invitation rob glad to meet with this group. uh This is, we are relatively new to chips alliance. So today, what we're going to be presenting is mostly something like a capability statement talking about what we've done, some idea of what we have in the public domain, etc.

H

So as an overall idea of we are a project, that's supported under the darpa idea program for open source software, we're focusing on generating analog layout, and this is a joint university industry effort. uh We have the university of minnesota intel, labs and texas, a m university who are collaborating on this, so it's I'm just a front-end who's. Presenting this. A lot of the work has been done by my colleagues.

H

uh The motivation, I think, is fairly obvious, and also since there was a previous analog presentation, I won't belabor the point, but basically there's a lot of analog out there in the real world and very few analog designers in proportion, and this is a problem because analog design is a major bottleneck in both design difficulty.

H

If you look at the design risk, it's actually significantly or disproportionately taken up by the analog part of the chip, even though the area is relatively small, so one of the big issues here is: how do you go from an analog design to a layout implementation of the design? The reason why this is important is because layout significantly impacts the performance of a design and it's a critical bottleneck in trying to make sure that your analog design needs specs. This is particularly true at new technology nodes. A typical design flow works, something like this.

H

You have a circuit designer who tries to basically build a circuit. An analog circuit creates a net list optimizes the net list, but does that in complete ignorance of any layout considerations and that's important, because your layout dictates your dictates your parasitics, which dictates your performance.

H

So typically, what happens is that the circuit designer kind of makes a good guesstimate of what the parasitics might be and then optimize a circuit hands it over to a layout designer who takes a fair amount of time. This is a pretty manual approach traditionally and then they basically return. The circuit and the initial guesses may or may not have been correct, so it goes through another optimization, loop, another layout, step, etc, and then you really bottlenecked by this green box over here.

H

The reason why the green box is important is that if you look at the results of simulation in the pre-layout versus post layout step, so basically the red box, this is the after the green box, the max error or the average error can be quite large. It frequently reaches 50 percent, often hundred percent, on the range of circuits, as shown over here. So if we could somehow speed this green box up, that would help tremendously and that's what we are trying to do over here.

H

So if we could complete this task within minutes or hours, then the circuit designer gets a lot more flexibility. They can start their next round of optimization early. They can actually run more rounds of optimization and uh that's key to making sure that you build high performance circuits in an efficient way. So the aligned project which stands for analog layout intelligently generated from netlists, is a way of trying to perform this layout synthesis as part of the overall analog optimization engine.

H

This has been something that have people have tried to do for many years or decades, but there are a few things that help us uh in the current generation of technologies. The first is that we're looking at a number of emerging finfet technology nodes, and in these sunset technology nodes we have nice straight line routes with reserve directions on each layer, etc. So these kind of artistic routes that analog layout engineers have been using for many years are actually verboten.

H

At this point, however, this is complicated by the fact that you also have complex year rules. You have uh finfet self-heating that you need to worry about, and in fact this is actually a plus for a tool, because a tool may be able to do this better than an uh human designer for whom understanding these complexities of vr rules, for instance, may not be that easy.

H

What we've tried to do in the align project also is to work across a variety of circuit classes. Traditional approaches to automating analog layout have said: okay, let me automate the layout of an op-amp or let me automate the layout of a vco and there's very little in common between these layout engines.

H

uh In contrast, what we try to do is a kind of a larger framework that handles wide classes of circuits, so we divide circuits into low frequency, analog, wireline or high speed, io type of circuits wireless or rf circuits and power delivery circuits, for example, voltage, regulators and so on, and for each of these there are different considerations that are important and different constraints, and these impact the things that are important to layout.

H

Another thing that we've leveraged now, which is different from the past few decades, is that machine learning has come a long way and that helps us to perform automated recognition, tasks or constraint, generation tasks etc, and we've been leveraging that within our overall solution.

H

So this is a quick overview of what the align layout generator looks like our input is a netlist, so you can think of it as a spice file and our output is a layout in the form of gds.

H

So in going from here to here, there are several steps that need to be filled in so first, we need to kind of organize this. These lines of text into something that's more recognizable and more hierarchical layout is essentially all about hierarchy.

H

So we go through an annotation step where we recognize the hierarchy and, for example, for something like this. We might recognize that there's a switched capacitor filter over here with an ota and operational trans-conductance amplifier sitting here a switch capacitor network up here and down here and when an analog designer looks at this, they see that well, there's a line of symmetry over here, so my layout should be symmetric, etc and, as part of this recognition process, we also capture those types of constraints. There are other electrical constraints.

H

If you want to operate at uh say, high frequency, you want to kind of limit the total capacitance, and so we can use that to determine constraints on layout next. What we do is we engage with the process, design, kit and the design rules, and we start generating layout, so we've got the hierarchies and we basically go top down sorry bottom up and generate the layout. So at the bottom level we might have this block.

H

We generate a layout for this even to generate that there are some sub blocks over here, for which we generate layout and we go further up the food chain through this hierarchy until we come to the layout of the overall structure, so to be more concrete. If we take this circuit over here, what we first do is to recognize the building blocks of this. So in this case we might, for instance, recognize this block and also the switch capacitor network.

H

Then we go further into the hierarchy until we come to structures with just a small number of transistors, and once we have this here in business, we can start generating layout because the layout for this it can be procedural. The recognition process over here might require uh some amount of machine learning kind of techniques. We have some techniques that are built into our approach, but this is entirely procedural, so we take our design rules and then we use them using an approach that I'll talk about later to start building layout.

H

For these from this we start assembling the layout for this structure over here, and once we have this structure, we assemble the layout for the overall structure. Maintaining these lines of symmetry. The large blocks over here are the capacitors, which tend to be area intensive.

H

So that's the kind of quick overview of the approach that we are taking and what you can also see from here is that there's a certain amount of modularity in here, because each the overall flow goes through a number of these steps and in fact, earlier during mehdi's presentation on fa soft, you saw that they are working with us to generate their auxiliary cells and in fact we are leveraging this procedure over here to work with them and try to generate auxiliary cells layout for them.

H

So I'll go through each of these steps very very briefly, just to provide a sense for what they do. The first step is to go from the text-based representation to recognize hierarchies and I'll show this by way of a simple example and I'll focus more on the results than the procedure.

H

The example shown here is a ten tap fr equalizer, and you can see that these blue boxes are essentially identical. So what we do over here is to recognize that similarity and use that to enforce regularity. The slight catch here is, even though it's shown as identical over here. It's actually slightly different, because these current sources that you see here are five bits in some tabs and seven bits in some other tabs, but for layout purposes they should be recognized as being symmetric.

H

So we use some ml techniques to perform approximate matching, and once we do that, we can feed those guidelines and constraints, and then we can get a kind of a an aesthetically pleasing layout which analog designers always like, and so this array of latches down here goes here. The resistors up here are laid out over here and all of these tabs are laid out symmetrically with the bigger taps in the middle and the smaller tabs on the side.

H

Once we have the recognition as we go to layout generation, I skipped a few steps in jumping from here to here, but basically, what we do is we first generate the lowest level layout and then we go higher up, and the philosophy here is that our design rule manuals are pretty complex, so we simplify them by creating a grid-based abstraction which is specific to every layer and in fact, we've done this over a wide range of technology. Nodes once we have that now we start going upwards, and so we look at the lowest level structure.

H

So this is a differential pair that appears in a number of circuits, as shown over here. So this is a common building block, so we can start generating layout for this, except that we need to parameterize this layout. So we parameterize the layout of these primitive cells in a number of ways. They may have a body contact or not. You may want to change the aspect ratio which gives you more flexibility in placement. You may want to use more parallel wires to reduce resistance, which affects analog performance. You might need a larger gate length.

H

You might need common centroid, interdigitated formats, etc. So, based on this, the core primitives are available in a library. These are primitives that are avail used in the wide range of circuits, and so we provide primitive layouts for these in a parameterized way and we allow users to specify their own primitives.

H

Next we go to the next level, so we have the layout of the lowest level. Now it's block assembly through multiple levels of hierarchy and basically over there, we use a place and route framework. So there's a macro placement framework that we have over here. It deals with analog constraints, like symmetry alignment, matching, etc. Routing also needs to work with matched routes, etc. So all of that is put together.

H

uh Performance constraints might dictate the maximum distance that a route can go or the number of parallel wires or wider wires that need to be used for a specific route. We have a global router and a detailed router. We have a second detailed router, which works really well for highly constrained situations. So in this case uh you have a layout in which some routes are pre-specified and you need to perform routing within these blank areas.

H

uh This becomes really complicated for a router, because a lot of attempts end up sort of ending in failure, but a highly constrained approach actually works very well with the satisfiability framework, so there's a sat based router that works well for these constrained situations.

H

When we started developing this, this was about say three years ago, or so we were a university team. Without that much experience in software development, so our colleagues at intel actually helped us to bring a lot of discipline to this, but because various parts of the flow were developed at different sites, they were actually developed in a modular way which, as it turned out happened to be a good thing um with each module was developed more or less at once uh site.

H

uh As we move forward, we started trying to put this together and the current architecture, or the overall structure, looks something like this using. This is relatively simple: if there's a github repo that I'll point to later, this is essentially what you do over here and there are modules for the compiler which performs auto annotation, topology identification figures or constraints, etc.

H

There's a primitive generator like I said this is the one that's also engaging with the michigan folks on the face of project. There are other utilities that we have. We have an internal lva, a drc engine, a pex, engine, etc. The lvs is to be built, and then we have the place and route unit over here and because these are modular, it means that you can actually configure them and you can use them independently, and this turned out to be kind of really good because of the use cases that we came across.

H

So our original flow, which was actually the our charter from dart powers to start with an unannotated netlist with no hints from the user and go all the way to ggs. However, many of our users, especially at intel, told us that they really did want to specify some of these constraints. They didn't necessarily want to work with this, and so we introduced other entry points here. The user can provide the constraints and still use the.

D

H

Of this, in this case, the user can provide place and route guidance and still use the rest of this, and sometimes you can actually use the constraints that are generated over here and augment them. So this provides some guidance, but the user has the ability to enhance the that guidance and then finally, there's one approach which is based on more codified uh placement and routing uh where the user has full control and that works really well, when you're laying out large, regular arrays, for example, I'll, go through a couple of examples where we've used this.

H

This is a latch comparator, and this has a set of primitives that are identified here and then you can either build a layout, that's fully automated or with user constraints, and you might get different layouts in general. This has a little less white space than this and we'll see a few more layouts. So for the same circuit, this is the manual layout that was done by an experienced manual designer you can see it's very compact.

H

This was the fully automated layout and you can see that there's actually a little more white space over here. So then we sort of compacted it up and we used those constraints and we had a layout that looked like this and then.

H

Finally, the the third entry point essentially used the placement that we had here and performed automated routing, and if you look at the area it sort of shrinks but area is not necessarily the biggest concern in analog, because these structures are small anyway, what's important in terms of area is that larger area implies larger parasitics.

H

So if you look at this, actually all of them were reasonably acceptable solutions. If you look at power, for instance, our worst case power was within 10 of the manual power. The interesting thing was user productivity. This was the amount of time that it took the manual designer to build this layout. If you want this layout, you can get it almost two orders of magnitude faster, and if you want something like this, it takes uh it's still an order of magnitude faster.

H

These circuits were built and tested, and uh basically the results of uh measurement said that the if you look at the input referred offset, this is the manual design and these are sort of very competitive they're within sort of similar uh ranges over here. If you look at the comparison delay for the comparator, it's a little worse uh for the automated designs, but they're still sort of within the range of the manual design.

H

This is another example. That's shown over here. This is the manual design. This is the automated design using entry points three and four. There are regular arrays over here that used entry. Point four notice that the manual designer used these sort of rectilinear layouts that were not rectangles, but our approach didn't do that, so they get some advantages here in terms of compactness.

H

Again, you can see there's a significant productivity improvement, uh several of the metrics in, for example, the offset or the worst case group for the analog ldo. This is a voltage regulation engine, so voltage group is important. They were comparable for these two there's, no significant difference. Load regulation was actually slightly better. Settling time was slightly worse, but overall you can see that this shows something about the capabilities uh we are available on github we have a repo, that's shown over here feel free to check it out.

H

These are some of the designs that we've generated using a line, a wide range of families of circuits over a wide range of technologies. This could be wider still and we continue to work on this. So this concludes the presentation. I'd be happy to take questions.

A

Thanks austin look some great work and really appreciate all the details that you shared today for that work done. So I a couple questions came to my mind, so one was about you know. Just how hard is it? You know you have a good representative set of classes of different circuits. How hard is it for a designer to you know add something different if they wanted to.

H

uh So the key over here is when you talk about adding something different. uh Essentially, ultimately, everything is a bunch of transistors. The difference that comes about is in terms of how you handle the constraints so because we have multiple entry points. The designer, depending on their level of experience, could actually engage with the tools law at any level.

H

And so if there was, for example, a scenario, we can actually, for example, recognize something as an rf design, because you might, for example, see antennas or you might see inductors or something like that, and we use hints such as that to kind of figure out what our constraint should look like. But at present there is a uh the best. Results will be obtained using some level of designer engagement.

A

Okay, interesting what uh I'm just curious, I know you know you've had this on a number of different technology nodes. I just was kind of curious. What what are the some of the challenges that you come across or what does one have to consider when you, you know, want to look at adopting a different technology pdk or a new process? Node, okay,.

H

So we abstract the design rules in terms of this kind of layer, specific grid that we have and doing that for a new technology takes a certain amount of effort for the technologies. We've worked with. We've been able to do that.

H

There's a little bit of an art involved there as well, because several technologies give you multiple options so which one do you choose in terms of the pitch, etc. uh But again, that's something that we've had a little bit of experience with so we're currently in the process of putting together a kind of a how-to manual uh or best practices on how we would do this. uh Hopefully we'll have that in the not too distant future.

A

Very good, you know I was curious being a uh as you know, uh a tool developer at heart and from my experience I was just curious what data representation or in uh eda terms, I guess what data model do you use for uh you know doing the placement and the routing.

H

Essentially, okay, uh so within the because they are kind of very modular right. So as we uh go across modules, that's where the data representation becomes the most.

D

Important, and by and large.

H

We've been using json based representations uh for engagement across modules within modules. It depends on the specific structure so, depending on whether we are kind of building layout for primitives or across or building the block layout, we might use something different. Initially, we had worked with using left, f kind of structures, but later on, we found that working with json was a lot more efficient.

A

That makes sense. Yeah left f certainly is uh fairly dated at this point in time, although of course is kind of a standard currently still too and then one final question from that. I wanted to ask you: was uh you know, I saw some of the different utility engines that you have, and I was just curious if you, if you can comment how you qualify those either for drc- or I know you said you mentioned about building a parasitic extractor, just how you will qualify those.

H

Yeah so I mean clearly, we are not competing with caliber here and we are not competing with uh something a commercial.

G

I understand right right.

H

What we're trying to do over here is at least capture all the common sense uh kind of violations, because this is also a way of sort of sanity checking whatever we generate. uh We actually do surprisingly well better than you you would expect, but we are not perfect so essentially at present, it's probably good practice to take this through some kind of a better.

H

Drc check, for instance, just to be sure that uh things completely are completely clean, but uh we continue to improve it and the hope is that uh we'll catch most, if not all, issues as we go down the road.

A

Yeah well, thank you so much again for your excellent presentation today. I really appreciate it sahin, and I also want to thank all the presenters for coming today and the audience. I hope you found uh the different talks informative and uh you know look forward to our next event. So thank you again for all your time today and appreciate it, and everyone have a good rest of the day.

A