Chips Alliance Workshops, 13 Jul 2023

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: CHIPS Alliance Technology Update July 13 2023

Description

Hear the latest about CHIPS Alliance open hardware collaboration activities in these 6 talks.

A

Okay, hi everybody. Thank you for joining us today for uh the chips bi-annual Tech update. It's great to see everyone here and uh thanks to Google for providing us a beautiful location here in San. Francisco I really appreciate that, and thanks all for coming in person and then also for all all of us all those joining us online I apologize for the logistical challenges that always seem to come with any event or any meeting uh so anyway, but that just seems to be par for the course. No matter what, but again. Thank you very much.

A

So you know just provide a couple introductory comments. We have some great talks this morning. I look forward to uh to hearing them, but you know really what Chip's Alliance is all about, and what Linux Foundation more broadly is about it's about collaboration right. It's about different folks, different teams, different companies working together on hard problems and the belief that, by having a community or building a community uh to collaborate on these hard problems, that's how the best Innovation comes forward. That's how these problems are tackled.

A

You know if you particularly look at the advancements that have occurred in artificial intelligence really in a very short time, I mean, of course, it's a very long history in that overall technical area going back into what I think the early 60s or late uh 1950s with some of the work from MIT right.

A

But uh irrespective it's been a long road, but it's really starting to accelerate now and that's in large part because of the amount of collaboration right so at least in terms of Linux Foundation, what we sponsored right, we're up to like 18, 000 developers and that's been in the past couple years. So that's really rather remarkable.

A

You know we're trying to do that in chip design right and it's it's a long journey and there's obviously different constraints in the chip world compared to AI. But there are also similar concerns right and a lot of that boils down to you, know security or proprietary data right and how to handle that and treat that right. So that's something that you know. We continue to look at relative to chips, but again for all of this.

A

It really tastes boils down to collaboration and participation by different companies and organizations to help make this go forward and really to build up a trusted source of Ip catalog and that IP could be. You know actually RTL. It could be firmware. It could be Eda Etc right, but building that confidence so that you know you know if you're part of a company or university when you pull a piece of RTL or something from our repository as an example.

A

You know that it's good right that it's going to work for you, but if there's a problem, though too, we certainly always welcome contributions back to that as well, and that's overall part of building that community whoops.

A

So you know, we've been fortunate. uh You know: we've been gaining some members over the past six months or eight months that we've joined, so it's it's great to have AMD part of us Microsoft, Nvidia Marvel, uh amongst others, exoto who've joined us, and so we welcome them and uh particularly in participation on the root of trust project collector, which we'll be hearing more about later today.

A

So look forward to hearing about that and again, as I've already said, this really in so many words, but really we're about open collaboration across the entire spectrum of building chip design. So with that I will stop chatting and I will introduce our first Speaker, which I am very pleased to announce and that's Michael Gilda from Ant micro. So uh Michael is a VP of a development business development at micro and uh also a great Enthusiast and sponsor and supporter of chips.

A

Alliance, as is his brother Peter, who is also here with us and uh thank them as well for all their help and contributions to open source. So Michael.

B

C

Hi everyone just give me a second I need to connect to this event and show the presentation.

C

C

C

Okay, uh I think I can do this with the screen right change, layouts, spotlight now.

C

Mirabose screens.

C

Great, let me see if I can change the slides easily, all right good, uh so uh welcome uh it's a pleasure to be here and uh great to have everyone attending virtual as well. um So today, I'm going to talk about calypra and Veer and all the work we're doing this is all enabled by chips. Alliance. It's really taking a lot of Prior great work and putting into practice in a collaborative project.

C

So I think that's exactly what chips Alliance is about um viewers about us and why we're kind of doing this, so we're kind of working with commercial adoption of Open Source trying to you know, make it useful in Practical scenarios and uh uh we're a service engineering, Service Company, uh helping people to build things.

C

uh Very often we find ourselves work with tooling and specifically, we work a lot with async and fpga oriented tooling, as well as you know, IP development, all sorts of Open Source tasks that help people build practical things and, of course, like you, can't do it well without software, without automation, without CI, and that's a major Focus Veer that we're going to be talking about today is a family of 32-bit, open source risk, 5 CPU cores. It comes in three options: they're all hosted by chip science.

C

So these links here in the slides we'll share the slides, of course as well. They all uh point to a different version of uh the same thing. uh So so today, specifically we'll be talking more about the uh uh the the el2, the uh kind of uh ore, that's embedded inside the the caliptra route of trust, uh but, uh of course, you you're free to explore the entire family, of course, and use it for your own designs. uh This is Asic proven RTL.

C

uh These are pretty mature cores, they're, implemented system, verilog and yeah right now, we're kind of uh giving them a new life, maintaining them adding features making sure they work using continuous integration.

C

Eliptra, if you are not familiar with the project, it's an open source rule of trust. It's specifically meant to be integrated into other Solutions, so bigger socs that need a root of trust functionality. The idea comes from a collaboration in ocp, open compute project, where four big players- Nvidia Google, AMD and Microsoft got together, and you know they have the same problem. They need a root of trust so that, if they're buying, Hardware or developing hardware for their servers, they can actually rely on the security of those blocks to be well uniform, unanimous, comparable.

C

uh They can reason about the security of those building blocks in a similar way without having to uh test and and debug.

C

You know, 12 different solutions, the same problem, so it's meant to and again it's kind of is really well tied into the mission of chips, which is standardizing, which is making people, reuse, RTL and tools instead of kind of Reinventing the wheel um and it's kind of, is a great example of a collaborative project because since it started with those four companies and of course by now, we have like probably a dozen others collaborating um yeah. Obviously, we need to find ways to work together and processes to make that collaboration efficient.

C

uh So so, overall, the mission of chips here is to host the implementation, so the spec is kind of hosted in ocp, but the actual implementation and all the technical work is going on in chips. Alliance all the repos are kind of found under the chips, Alliance GitHub organization, um and uh it's a paragon project. You know where we're kind of uh helping establish methodologies to collaborate on open source. Rtl collector itself is kind of at least the 1.0 version of it is focused on very specific features.

C

We're not trying to kind of boil the ocean, but more like specifically address the need as this today and there are plans to integrate that first version into real products in the near future. As I said before, it uses the VR El to core, and of course it has more than just the core right. It has a bunch of different peripherals. Many of these are actually reusing. uh You know open Titans, peripherals again we're not trying to reinvent the wheel.

C

Whatever great open source building blocks, we can find we try to adapt to our use case, but, of course like then we're doing something differently right. So uh that's why the culture project was kind of born now.

C

Our work here is specifically focusing so our when I say R now, I need a micro, uh it focuses on making it a collaborative project a little bit more and uh collaboration is best done with open source that can be fully reproduced and easily scaled and the main goal is, you know, to to introduce better testing and more coverage and public facing CI all sorts of things that, of course, you can do with proprietary tools and people do that a lot.

C

But the challenge is that it's kind of hard to share that methodology easily to make it completely reproducible for other people for new partners during the project. You know: how do you actually get them on board quickly and how do you make it easy for them to reproduce your results right, so open source is a good answer to this, and certainly we're not meaning to like replace. uh You know every existing tool out there, but more like augment exiting workflows that each and every of those companies so Microsoft Google ND Nvidia, have internally right internally.

C

They have, of course, their own huge flows and tools and everything, but we're trying to kind of add an open source uh layer of great stuff. On top of that, which would enable us to have you know, smoke tests and the ability to reason about the quality of the code and lint it and so on. So so, basically, that's the focus of this project and and we're kind of executing on this project.

C

So, let's start with all the things that we've done, uh starting with variable variable is a tool that was donated by Google internships. Alliance is being developed within Chip's Alliance, we're pretty involved in in the variable development itself and what it does it's a system very loud, parser and toolkit for linting formatting uh and also includes like a lexical diff, obfuscator and indexer, and from interesting things. It also has a language server, so it's a pretty versatile framework for working with systemvale code and it's completely open source. It's it's under active development.

C

It's uh pretty widely used, it's it's a great tool, um so so yeah we actually had some questions yesterday about you know, is open source limiting possible and the answer is yes, absolutely it's not even the only tool that can do it in the open source space, but certainly it's one of the major ones and one of the more actively developed as I said.

C

It also includes the language server, which is kind of makes it even more useful, because it integrates very well with a variety of editors, so, whatever tool you're using for editing your code, you can actually integrate through this LSP layer and there's documentation. That tells you how to do that and we support most of the features that are capable possible. With you know, language servers, including Auto expansion, there's also blog notes that we wrote about this specific feature so feel free to read it for more backgrounds, um there's also a plugin for vs code.

C

We were aware that vs code is kind of becoming the de facto standard for IDs. So uh to make things easy for people there's an official plugin we have to go through. You know like registering our official account in in the vs code, extension, Marketplace and so on and so on, but we have an officially supported plugins, but there's, of course other ones. uh But this one is, you know the ones that's being maintained by chips, lines from interesting things with variable.

C

We also have GitHub actions which make it super easy to integrate with existing code bases on GitHub I. Believe it's pretty awesome because you can get uh you know, feedback on your pull requests. You can get linting suggestions or even kind of formatting. You know uh real changes to your code that you can just accept right, so you kind of submit a PR and it sees that. Okay, this change is not really great.

C

Perhaps you could do differently, so it's a personal assistant for a code, um very cool I, advise you to to check it out. If you, of course, if you use GitHub actions at all, uh there's links to the specific actions and integrating them. It's just a few lines of code.

C

um Yeah I believe that.

D

C

uh So so kind of that's the variable Parts, as you can see, there's quite a lot of stuff out there. Moving on to the next uh topic, we also have a lot of work around testing that we're doing and, as you know of course, testing.

C

The verification with open source tools is certainly possible, but then not necessarily the industry standard way to do it uh and Veer is kind of an example where originally uh the testing methodology for Veer being based on cloud Source tools uh up on open source reason, the core, not all of it, was released right.

C

So the core was verified by the original authors to a very kind of deep extent, given that it was shipped in like millions or perhaps even billions of devices, but the the actual test benches you know could not be fully released, so we only had system level tests to start with. So um basically, one of our goals is to kind of increase the coverage that we started out with and we've been adding this as like.

C

Pub click CI checks so that, if there's any changes to the core, they kind of we continue to test against all those scenarios. And, of course we generate a report with coverage and there's a summary web page with links to more specific details and so on. There's a link in this presentation as well and now, naturally kind of taking it to the next level. uh We want to make sure that you know things continue to work.

C

So uh one of the ideas that we have given that all of the organizations involved in collector are actually active users of UVM long term and it's a goal of chips. Alliance that precedes this project, but I think, like this project.

C

Kind of uh shows that it's necessary is to add your support to verilator, which is a difficult task, and everyone admits it's hard, but on the other hand, we've had quite a lot of success and we've covered a lot of grounds since we started originally Western Digital was a big kind of uh sponsor of this effort.

C

Nowadays, it's kind of the funding is coming from different sources, but, generally speaking, we're pretty close to parsing the entire UVM library with their later as well as, of course, we'll need a few more features to actually go and do proper UVM. You know large-scale testing. So definitely, if you have need for this, you're invited to kind of talk to us- and you know kind of figure out a way to accelerate this work, because we know from many sources it would be really really great to get it past. The finish line so so far.

C

We can't do this. Unfortunately, we can't you know, do actual. You know UVM end-to-end, but also we believe it's possible and it's not extremely far away. On the other hand, of course, there's open source methodologies that people might be aware of Coco TB by UVM being two of vampire VMS, actually Siemens uh tool cocoa to be as a Community Driven fossil foundation-backed project, and they they work together.

C

You can use them to test things in a uvm-like methodology, so if you're not afraid of python, it's it's a great way to you know, do your stuff, and especially if you want to keep to something, that's like familiar Pi UVM would be a way. Coco2B is more generic. You can, of course, do kokuti B testing without Pi UVM, and you can trust the developers that consistently, uh you know rate Goku to be as an excellent tool.

C

That does a great job, but we're also aware that a lot of the world's developers in the verification space they they're just used to system very long they're used to UVM, so we'll have to tackle all these ways of testing in parallel right and eventually, we want to be able to do this, but in case you're you want to do open source testing and verification. Today there is a way and there there are also examples in the VR and clicker apples. How to how to do this.

C

One other interesting thing that we added it's still kind of experimental but I think very, very useful. We do run the entire VR core through the open source RTL to GDs flow with open road, and this is part of the the CI I mean it's. It's a work in progress PR, but it's kind of working now, so we just need to clean it up and merge it now. This allows us to do like the smoke test where any changes to the RTL like we can certainly see if they compile.

C

You know if they, if you can synthesize the real design and as we all know, open road of course, is still in its early days and you won't get. You know the same quality of results as with other tools. At the same time it it has a very interesting quality of being possible to run.

C

You know, perhaps with AI on a thousand computers, or you can experiment with different parts of a tool chain, and you know, do research on top of it, and even if you get worse results, you might just get them quicker compared to to other tools. So, overall, we believe that having open road support here is important. We would want to extend it to cover all of colletra in the future. We haven't done it yet and also one other limitation.

C

So far is that normally, if you want to run things through the open road Flow by default, it'll support some system very loud, but not enough to cover your real designs. So one of the ongoing projects within Chip's Alliance is also Sherlock uhdm and related eosis plugin, where we can actually parse a very significant part of the system, works back and, in fact, we're not far away from parsing all of beer encryptra in general. So there's ongoing work to just enable this. We can parse open Titan.

C

We can parse Black Parrot right, so you can kind of run very serious designs through this plugin already and it's just a matter of time before we can kind of run, clip shot too and.

E

C

Will, of course, enable us to just compile the entire design without having to have sv2v in the.

B

C

Okay, so moving on to another way of improving, you know the verification of of the core um there's a framework called risk 5dv which is originating from Google, but also now donated into chips Alliance, and this is an open source framework for instruction set generation and uh and testing. Of course, comparing you know, generating random streams of data and trying to see if the implementations behaving correctly as compared to uh instruction set simulator.

C

um So we had to add a bunch of things to make it work with the current code base, but it wasn't extremely hard. One other interesting piece of work, we'd managed to do in the last few months is improving the python-based generator in Risk idv, so risk idv itself is kind of based on UVM for the instruction generation part, and that means, of course, if you were listening to what I was saying earlier, we can't do that with open source. Yet it's something we'd like to do so.

C

If you're interested in this again talk to us, um we'll eventually get to do this, but for now we're kind of limited to uh different approaches, and one of the other approaches that riska DV also has experimental support for is completely bypassing the system Overlook part and using python instead and we'd kind of extended and improved this python implementation to actually work.

C

It's still not like on par in terms of um features with the assistant vertical version, but nevertheless it's kind of good enough to do some level of testing, and we can run it completely publicly completely open source, which I believe is, is pretty cool, and we know that there's interest in in kind of extending this to cover you.

F

C

All of risk idv in the future, which we'd also like to do another thing like from from the technical work, that's gone into the project recently and that part will kind of grow. So so right now, we'll probably be kind of extending the project to also cover some additions and extensions to the core itself. So now that we have some methodologies to test it better and the framework to kind of run, CI and so on, we can start adding features without the fear of you know, making things explode.

C

So one of the problems we had was that uh you know we needed a JTAG and collector as this jdeg interface, and we created some testing to make sure that the jdeg interface actually works again open source. It can run in CI and we actually found a problem right. So we we kind of found that you couldn't access approval from the debug interface and culture just didn't connect to the bus in a certain way.

C

You know it was kind of a simple and stupid bug, but uh certainly kind of proves that this testing is needed because it felt like a very obvious addition right, a small change, but actually uh it wasn't fully correct. So uh that's why we need this entire infrastructure here and now we have tests to, of course see that this bit is working fine, so going towards the end of the presentation.

C

um One of the things that was also pretty Central to the project is trying to figure out okay, how to actually make this scale how to make it collaborative and, uh as you know, in in the Asic space, you know many of the workloads are not trivial, they're, not something you can just uh run in five minutes. They might take hours to complete and require a lot of memory and a lot of uh compute power. So one of the projects that we've been working on in chips Alliance for years now is so-called custom.

C

Github runners and those custom GitHub Runners are effectively a technology that allows you to instead of the default Runners that you get with uh GitHub.

G

C

You run your own machine, we're using gcp for this, obviously, because it's collaboration with Google- and this allows us to pretty much scale up to whatever kind of machine we want and, however many resources we require um run. You know lots of things in parallel without fear of running out of resources and stalling everything and we're kind of using it in a bunch of projects. This is not the only project using those Runners, but specifically for cliptra.

C

One of the things you were working on I mean we will recreate blazed this Trail, but here it's going to be especially important. The ability to run proprietary tools on top of the open source stuff so that you know we can do it without exposing Secrets without uh kind of causing any problems for everyone. But still we can get the support we need. You know we can kind of run the real flows that those companies are using in a in a secret and secure way only exposing the results that we need to expose.

C

So this is one of the ongoing tasks, partly because it's kind of hard to set it up. One of the problems that open source is trying to to to fix is that open source is really easy to set up like everything's just there and you get the code and voila. But of course this is hard to do, but we want to do this so that we can increase the confidence in the code. So the GitHub Runners are a central technology here and they're also open source.

C

You can see this on GitHub, you can read the blog note about it and if you have a similar need, where you'd like to kind of scale up your CI for especially for public projects on GitHub right, that might be a good solution.

C

uh One last thing I wanted to say is that the work is ongoing, so we'll probably have some more news by September and in September right at the beginning of Oktoberfest, so no suggestions there, but uh um you can come to Munich and enjoy not only the beer but also a really really awesome conference. It's one of the greatest open source conferences out there. uh It it definitely kind of groups, a really interesting crowd. uh It's happening in September 15th to 17th.

C

It might be a little bit Troublesome with hotels because of Oktoberfest, but uh we hope we'll manage and uh um yeah we're, certainly going to do a clip there uh and hopefully, of course in September.

C

Some of the things I was talking about that are in progress, might already be in place, so yeah you're very kindly invited if you're interested in these things, I'm sure that you'll find Oracle interesting, there's also an American version of Alarcon called latchap that some of you might have attended, and this happened already this year, but I assume that next year, there's going to be a new addition that you should definitely join too all right. Thanks for your attention. That's all I have foreign.

C

Do you have questions perhaps or anyone from from the online call.

D

Okay um course.

F

That you're using are where you can system, where a lot of uh tier curiosity, what made you guys choose uh system.

C

So I would repeat the question for for everyone to hear what made that choose uh or implemented in system verilog. It's a great question because we also host chisel, right and- and there are other kind of languages and methodologies to build RTL in these days. I guess system verlock is just a conservative choice of you know, like all those big companies are huge silver luck, shops, including Google, including you know, Nvidia and Microsoft and AMD.

C

So I guess you know like most of those developers being system, verlock people, the choice is kind of obvious, but um yeah I will be curious to see how we could you know enable more cores in the future, but certainly that's not a goal at this point, because you know we want to have a stable set of features.

C

You know uh it's a security project, so our focus is making sure that things work, uh but there's no reason really that you couldn't, you know, use a different core and I'm sure that there might be you know, research or or you know, other efforts that might try this out right. Bmx.

H

C

And so what was the question there? Yeah.

C

So the question is whether anyone's looking at doing system product AMS, which is kind of more open source friendly. Let's say uh we're not looking at it right now, but like we're, certainly happy to talk about it and see if we should be looking at it.

C

I I mean I've heard about it, but uh um I think that the problem in this space is, of course everyone has a bit of a different idea: how to do things and including like even the language, you're writing in and the testing methodology, and so on, so even getting to like agreement, how we should do things in collector. It's not easy, but but yeah I mean, if you have Concepts on how things should be, could be done better in a more open source friendly way. Then we're happy to talk to you so.

I

There's a parallel community.

H

Columbia assistant.

C

Okay, so so just a like clarification that fairlake AMS is a standard that we could do open source and uh uh so so yeah I, don't know enough to kind of answer. Your question fully, but uh certainly happy to talk to you. Offline uh yeah,.

B

C

What's the main difference with open typing and collector? That's a question. We get a lot and uh thank you for it, of course. uh So of course, we're friendly towards open Titan and uh we're actually using a lot of the great stuff that open Titan has built. The project was born from the need to actually do an integrated route of trust. You.

E

C

And that wasn't originally a goal of open Titan at all, so uh open Titan, of course, recently I think they've been saying. Oh, we can also do an integrated version, but like originally, it was built as a standalone root of trust chip, uh originally to kind of replace the Titan.

C

uh The the you know the root of trust implementation on Google, uh so I think that's a major difference now, of course, like ultimately I think both projects could theoretically cover both use cases because there's no, no, nothing stopping you from just taking calyptra and making it into a chip, uh but it's just a different focus, and also it's just governance as well like Calypso wanted to be. uh You know, independent and uh set up as a Chip's light work group rather than a standalone project, or part of you know open Titan.

C

So but ultimately, of course, a lot of people kind of ask this question, and the answer is obviously there's a vast amount of collaboration that we're hoping for here and uh the aim was never to you, know kind of uh be competitive in the direct sense uh just like in the open source space in other areas. You'll always get you know, 20 different ways to solve the same problem, which is a good thing.

C

It's a good sign so in case anyone's feeling bad about you know, collectra being there when open Titan is also here, uh don't be it's. The projects are meant to collaborate rather than compete.

C

Yes, indeed, there is there's one main repo I'm, not sure like I should have put the links here, but I can fix that later. um So, there's a main repo with links to other repos under chips, Alliance defeated substance, that how is it secure comparison?

C

I would say that I mean I'm, not an expert again, but uh we're doing a kind of subset of the capabilities of of open Titan in a sense, um there's two kind of variants in which the chip can operate more independently and less independently of the main sock.

C

um They have a pretty kind of weird name right, which I never remember. uh Just give me a second culture. There's.

B

C

Media integrated and dependent values variants, um but in terms of like the actual uh security model implemented by those two projects, I am not the expert for sure you can talk to Prabhu. You can talk to Matt, I. Suppose they'll probably be able to give you more detailed information how these two persons compare.

C

Right I think that, like hopefully, nobody has more questions because we probably should go on to the next talk.

A

Actually, you have a few minutes to spare, but you did very well so thank you so much for an excellent talk. That was that was really good. I appreciate it, so our next speaker will be uh corn Nielsen. He is going to be chatting with us about collector, validating firmware against multiple Hardware's models. Continuous integration core is a embedded engineer, working for Google he's written firmware for Google's in-house Titan root of trust ship and has dabbled in Rust firmware development. Since 2015 is now leveraging rust to develop, secure, maintainable firmware for calypra. So core.

A

Are you able to uh share your screen and present.

J

Can you guys hear me.

A

Yes, and we see you as well, thank you. Let me try to present here.

K

Let's see if that works all right. Thank you. Thank.

A

You for joining us today, by the way.

J

Yeah sorry, I uh I would like to be there in person, um but unfortunately, I ended up with covet at the last minute, so, um um okay whoops, uh hopefully the presentation is working. Is that right?

J

Okay, all right! um So thanks for for the introduction, um I added a GitHub link to the last minute to the clip draft firmware repository, which is where all this stuff lives, including our actual uh RTL for the fpga test bench. Even though it's not technically software, the software team wants to own it.

J

um You can follow links there to the main repos too. um So what I want to talk a little bit about is uh testing The Collector firmware. This is going to be a bit lower level talk than the last one, um but uh so maybe just a first little bit about just software testing in general. You know we have unit tests, and these are you know great everybody uses them they're faster right. They fast execute it's really easy to to do coverage guided fuzzing.

J

So what this means is where you can execute the test case in an environment where the input to the test is generated, based on how the how the the fuzzer sees the code run.

J

um So it can like create input for the test to try to find interesting branches and other problems and just explore the possible State space of the code and unit tests are a great way to do this, um but in tests also are not really great. When you want to say hey, is this firmware safe to, like you know, put into the you know into put into the ROM? You know um uh because they well one thing: they test source code, not the final compiled machine code.

J

So if there's compiler bugs or other you know, issues like that, um you know the young tests aren't going to really help. You find those because usually they're being compiled for a different architecture or uh using you know, obviously different optimization passes and whatnot um they're unlocked like they've, noticed runtime issues like stock spec overflows or Watchdog timeouts, um it's very difficult to write unit tests that interact with Hardware. um You can but oftentimes.

J

You end up kind of doing lackluster implementation of the hardware as part of your test case that doesn't behave the same as the actual hardware and from my experiences tends to be very error prone um uh and then oftentimes to enable some of these unit tests you have to create abstractions and, depending on the language you're using those abstractions may not be zero cost rest can help a bit here, but it's still, uh you know this is a an issue that plugs many C code bases I've.

J

Seen um and uh 100 coverage is unrealistic because you know there's lots of kind of glue code and stuff like that, that you can't that's part of the firmware. That's you know not part of the unit tests and they're, often brittle, because they're testing components of the firmware rather than the firmware as a whole. So as those internal bits change, you know you oftentimes have to update the test cases so um and so for validation. We kind of prefer to use integration tests and you can run these against the final machine code.

J

They can be executed against an RTL environment so where the the firmware's running you know inside the final RTL that it's going to be bundled with, um and we can discover any bad assumptions that the firmer makes about how the hardware behaves. We can sometimes even discover Harbor bugs uh uh and uh 100 coverage as possible. Right like it's directly possible to write your integration test to You Know cover every single branch that the machine code does um and you can take your assertions which, in unit tests are typically part of the actual running binary.

J

So you know if you're, if you're, if your binary goes crazy and has some kind of you know crazy memory issue or not if it goes into the weeds, uh you know you might accidentally uh make the test pass uh by accident, uh whereas with the.

B

J

Test because you've kind of got your test has two parts: the firmer running inside the test environment, and then it has the test Runner itself, which can do assertions based on what it observes the firmware do, um and you know, even if the firmware goes in completely crazy. You know it's not returning the expected results.

J

The test will still fail, um uh but integration tests also have a lot of a lot of problems, they're, usually pretty difficult to write um and so much so that it's actually pretty rare from my experience for them to be kind of, like cover every single exhaustive case that the code can handle, um and uh you oftentimes have to run these tests in different environments.

J

And it's you know, you write one one test that runs in some like UVM, environment and another one that can run against the same fpga or uh or some full system cylinder you're using and you know it's trying to use everybody's Frameworks as they're meant to be used, tends to be oh I'm.

J

Writing the same test three times depending on which environment's running in and that's not really very fun or a good use of time, um and it's also pretty difficult to do coverage guided fuzzing because uh you know typically the way the you know the open source fuzzing Frameworks work is they.

J

They insert symbols into the into the final instruction stream that basically keep track of where execution is going, um which is great when you're running on a you know, big fancy host CPU, but your embedded firmware CPU, probably doesn't have enough RAM to hold those extra instructions um and doesn't have a place to put the actual results which oftentimes are.

J

You know multiple megabytes of data, so uh you can do this with help from the outside, where, if you're, if you want to change your RTL or your CPU, to keep track of where execution is going and write that somewhere, you can do this stuff later, but it's actually pretty hard to do, um and but some of the stuff we can fix when we can make an integration tests easy to write, we can make them.

J

uh So they are exhaustive um and we can make it so that we can write a single test once that executes in multiple different environments, which is kind of the um what I, what I'm, what we're trying to do with with clipdro um so I. Just before we go. Let's talk a little bit about about uh just general, very high level. Block diagram is very simplified, just to point out the parts that I care about for this talk.

J

So this don't use this for anything that matters, um but uh in general, the way that we think about the the hardware from the test point of view is: we've got basically the SSC management CPU. So this is the thing that's talking to calypra and this APB bus that goes the clifter exposes.

J

um So if I go over to it's, it's Clipper top RTL, which is a domain, a very long module. It has these APB signals which are wired up usually to some fabric in the soc that some management controller will will talk to so that APV bus here um is generally what we want our test case to manipulate. So we want to be able to basically generate any possible memory transaction on that bus from the test case um and using you know that bus we can do stuff like fill in the fuses.

J

So we can we can. We can write uh commands to the mailbox, which is the main way to communicate with the firmware, um and we can do a bunch of stuff like um uh set various uh configuration signals and whatnot, um and then the viewer CPU encycloped right, which is, is where is running the firmware that we actually want to test. um So this has access to a bunch of peripherals uh cryptographic.

J

Peripherals has access to the ROM, has access to SRAM all that stuff is kept secret from the outside world um by the hardware, um and we want to test the firmware. That's running inside that environment, so I'll just go again through the RTL. Here we have I mentioned the APB bus signals. We've also got some reset signals that you know assess might want to manipulate.

J

um We have the obfuscation key, which is a silicon secret, that that clicker uses to obfuscate a few Secrets, um and then it has a few uh status bits that we test want to look at and also a place for um for trng data to enter the device on the outside um and then as well as the the various Security State related stuff, which is for whether debuggings allow that sort of thing and the behavior of the firmware changes depending on some of these signals.

J

So the test cases need to have a way to uh to manipulate them. um So I'm just again going back to the APV bus we have this. uh All of our registers are are using.

J

um What's it called system, RDL I think which generates a bunch of great registered documentation, um and also we generate all of our our code to access these registers. Based on that too, so the test cases can easily access any of these registers. The APB registers that are exposed to SOC whenever, whenever it wants to and can do so in a pretty a pretty nice way, which I'll show later oops, um so I'll just kind of scroll down I got.

J

You know the ability to you know lock the mailbox check the mailbox link, so you know execute it, set all the fuses which are done only at startup, ideally being done by a hardware, State machine and not the soc CPU. But uh you know uh uh the uh anyways we'll talk a little bit about the various different uh horror models that we're using. So the first one is the one that's basically not using the real RTL, because at the time we started working on the firmware, the RPL didn't really exist at all.

J

So we built basically just a rust uh library that has consists of a risk risk five uh emulator, basically trying to emulate the viewer core um or interpret like an interpreter for the viewer car and then a bunch of the various peripherals, as they were documented by the harbor folks that we didn't quite get right.

J

But you know over time, as we've tested you know, in this environment and other barns, we were able to converge the the uh the full system simulator to behave the same as the RTL, at least in the scenarios we care about I'm, going to test, compile to self-contained executables. That can run on Linux and Mac. Os and I can probably run other places too, but I don't think anybody's tried um and these tests are running really fast, like we can run our entire test Suite.

J

This was I, don't know, 50 plus tests uh that they're booting multiple clipped or you know many many times, uh and we can run that test Suite in just a few minutes, um but it's not super high fidelity. Obviously the timing is not is not accurate. um The real Hardware is, you know, takes a lot longer to go to operations and um whatnot, but these are great, as kind of did. We actually break anything really significant, uh and so we run these test cases um for app for every commit.

J

So before you can, even you know, merge your PR into the repository. uh These tests have to pass under this full system, simulator um and um yeah. They can. We can run our production firmware in this environment, um so we don't have to compile anything special for it.

J

um Ideally, we can take the final binary that you know goes goes to tape out in and run it here, um and uh we can do some fancy stuff like get it to uh tell us some coverage data where it went, which is just you know, a little bit of code in the emulator to do that.

J

Oh and we can log the CPU bus activity, which is pretty great for debugging certain things um and we can even simulate some impossible scenarios like bad Hardware or one star that I'm really interested in is actually is glitch resistance. So this is where the you have a attacker who's, manipulating the clock or the or the reset signals, or the maybe in the power supply to the CPU or to the whole SOC and trying to correct the CPU into like skipping some instructions, and you know to maybe get past coin verification, or something like that.

J

So we can add hooks to the to The Interpreter to basically simulate you know, skipping branches or corrupting register contents and stuff like that. So you could you can imagine- and we haven't done this yet- but I'd love to be able to run. uh You know millions of iterations, where it's randomly trying to you know screw around with the CPU registers or the CPU execution, and then CFA does this clifter ever let uh you know non-validated uh machine code get into the instruction memory or whatever right.

J

um So that's the kind of our you know our our our our main. You know test environment, uh but not super accurate, and then you know three or four months ago we started working on a second model which is a varilator model, so this model can is built against the real RTL um and it's High Fidelity single, accurate. So um and you know it's actually compiles pretty fast, you can make a change to the RTL and then see your your. The change live. Writing the real firmware.

J

You know usually within a few minutes- um and uh you know the uh in this case the APB bus, reads and writes these are basically the test case itself or the infrastructure behind all this is manipulating the actual signals being passed to Clipper top um and uh because of that, it's very cycle accurate. You can your test case. Can you know be really fussy? What happens if this signal happens at exactly this time? What does the firmware?

J

Do you know that kind of stuff, which is kind of really nice, trying to dig for little weird timing, edge cases and whatnot, but the downside is that execution is very, very slow um just to do a single boot of their later test uh test takes you know, hours like an hour and a half I think to just to get Clipper to boot, all the way to runtime firmware right now.

J

um So you know it's not really feasible to run. You know 100, plus tests on every commit. If it's going to take. You know seven days of a relatively powerful machine to run those tests. um So we can't we don't actually want our tests, but we do run nightly um some. What we call because consider important tests that test the the success cases and stuff like that, um and uh you know it's also a great environment for debugging.

J

So you know I can I can just by changing the the flags to how I run the test. I can have it right out of ecd file that contains all of the crypto signals, which is great for trying to give to the hardware engineers and try to track down a bug um or you can.

J

You can have the test or the environment, basically just parse all the bus activity and log into a text file which you know makes it easy to grip and whatnot, um and uh this is the yeah the best place to uh debug. uh You know issues with the with the hardware or the firmware you're, not sure. What's wrong um and unfortunately their leader is, you know not perfect, it's not.

J

It doesn't, doesn't handle every single thing that Clippers RTL can do um so one case I know, that's a problem is that we we can't actually uh run like put the CPU to sleep and expect everything to work properly right now, um which I understand I, don't really looked at this much myself.

J

What I understand has something to do with the Clipper, not supporting some kind of clock or sorry, barely they're, not supporting some sort of clock eating or something, um and then uh this new model which just started working this week um is a real-time fpga model. So uh here we're trying to Target the zcu 104 Dev board, which is you know about roughly two thousand dollars um and this thing barely fits collector.

J

We actually had to trim down some of the the number of key Vault registers and stuff like that to make it fit, um but it was the kind of most reasonable Choice, given the cost and uh size constraints that we needed, um and so this has a this is the zinc the zinc uh fpgs have this application processor, um which can basically run Linux and run rust binary. So we can just compile our test cases to run this a53 processor and then those test binaries can just talk to the APD bus using mmio directly.

J

So it's super fast and and and super great, um you know this environment here is uh uh probably I mean. Hopefully we can. We can start running test cases uh on pre-submits with this, so we'll need a little form of fpgas and somebody's basement somewhere to to execute these uh execute these tests, um but yeah I'm I'm hopeful that this will become part of our standard CI to execute everything in this environment.

J

uh One of the downsides to it right now is that we are using the clock signal, that's being generated by the oscillator on the on the board, so the test case itself doesn't have any control over. You know single stepping or trying to control the test. So there's some test cases that involve timing, that we can't really run in this environment.

J

But you can imagine that, with some tweaks to the art, the test harness, we should be able to to be able to give the the test control over the clock um and and do those uh sort of timing. Things in this environment as well, um and the main downside to the fpga is that if you want to change the RTL, it takes hours to to recompile the Midstream. And you know it's not a great uh prototyping environment for our talents for sure, but for as an environment that tests the firmware.

J

It's it's fantastic um and um if you want to do now, maybe is just go through a couple demos of just the test learning in different environments. uh If we have time I'm, not quite sure what time I started, it I'm sorry I, think I'm, maybe 20 minutes in.

A

Yeah, that's that's correct. Okay,.

J

um So, let's see here um so I'm going to share a different tab here, so this one can uh here's kind of a basic test case. uh That's that's! That's uh um that's doing a basic scenario here, so here we're just uh creating a uh the hormone itself, um and so this, depending on on basically the environment variables or the configuration Flags to the test. They will instantiate various models.

J

So you can write the test once and then, depending on how you want to run it it'll run in the different environments, and the idea is that almost all of our tests should be able to run in any environment. You want to the only you know the only real Factor being. How much do you?

J

How do you want to wait for it in the case, um so here I'm, basically saying Hey I want to create a new model I'm, giving it um some initialation when I want to use this ROM, please so I'm, just building our regular ROM, but I could pass in Iran. That you know is testing some specific. uh You know Hardware feature or something like that, but it doesn't start to be the actual production firmware. But in this case it is um uh telling it well I want the security state to be our production, Security State.

J

So this is where jtegs disabled- and this gives us access to all the all the uh the the secrets. um If jtex enabled you can't use any of the secrets um and then it's it's telling the model saying Hey I want to run this kind of special flow, which is where it's setting a special register.

J

Saying I want to run this this flawless, throwing down to manufacturing and generates the uh the certificate signing request for clip drill which is done um early in manufacturing, and then we can receive the actual request from the mailbox and then parse it in SSL. So you know again because this test case is not running and that's part of the firmer, but it's running outside the firmware can use libraries like openssl and other things to just parse out.

J

uh You know the x509 and do what it wants to do um so and again, even though these are integration tests, they are written like as if they're unit tests, so you can just you know, run the test in the IDE and there it goes it ran.

J

A test ran really fast right and it's pretty much as fast as you can test, even though it actually would be collectora inside the emulator, um and you can see the you know the our debug version of the of the firmware you're printing out everything it's doing, and then here's the actual CSR that came out and it was you know it started that it matched the golden data. You know, and so it's a pretty great as a development environment, because you can just you know, write these.

J

You know tests as if they're um as if they're uh as, if all right, right as if they're unit tests, um and uh so that's the test running in in that environment I'll show another example here.

J

uh So this is again running the same test, but I've added. This features equals there later, uh which will uh uh run it in the very later environment. As you can see, it's running really slow. I already started it. You know it's was four thousand Cycles in it wrote the actual set the value of the the boot FSM, but it hasn't even printed the first line, yet so I think it takes around a minute to print the first line, but as you can see, this isn't very fun.

J

I'm just going to control C it before it actually prints anything. um uh But one of the things I did mention here is hey I want to write a trace to this file here.

J

So if I control see this and I'm going to take a look at this Trace file, I hold the right, um and this contains all the activity that saw so here we can see the soc from the test case, basically writing to various, probably a fuse registers and various things and and preparing, and then you can see the microcontroller, which is what we call the viewer core inside clifter and starts doing a bus activity, and here you can see it's uh it's clearing the main memory. It's clearing.

J

You know our main SRAM, which is what it's been doing and profitable. Is it doing like it when I stopped it right? So you know you can get one of these Trace files for the for the whole Boot and try to crack down what's going wrong or if you want more detail on that, you can have it right there, making all the videos just change this to the BCD, and then that gives me a nice vcd file. That gives all the signal and clicker which you don't really want to use with the full firmware.

J

But if you have an isolated test case, you can usually get something to send to the hardware Engineers with above their their TL, um or you can look at it yourself um and then uh I'll take a look at the same test case this time running in the fpga environment and I kind of forgot to test this today. So hopefully this works. Let's see.

J

Oh yeah there it goes all right, so it's a little bit slower than the software emulator uh I. Think that's just because we haven't really got the clock set. You know to the ideal speed yet, but you can see it uh it through the whole thing. This is still very early. um The clock cycle counter is not quite accurate, yet I think that's just like real time or something.

J

um But you know the test ran it passed and if I add the.

K

Verify here it should show you the specific episode.

G

J

Right just show me all the all of the uh the uh the certificates that were generated by the by the firmware, um but yeah.

J

That's basically uh uh kind of a quick demo of this um and uh I don't have a whole lot more just to talk about other than uh we also provide C bindings to these Hardware models, um uh which is which is great for vendors, who oftentimes want to test their integration logic against a accurate model of cliptra, and so they can use the emulated model if they want something that runs fast or or the bear leader fpga models if they want something more accurate um and yeah I think that's pretty much it any questions.

K

C

I actually have two if I may, um so, if we can go back to her later, the the slide uh there. uh You mentioned a bunch of things. So, first of all, uh you said that execution is very, very slow, I mean it's partly expected, but on the other hand, uh I'm interested in you've been investigating the changes we've been doing uh both in terms of execution runtime, but also uh you you mentioned. Compilation was was kind of long because we have been optimizing this for.

J

Population is surprisingly good. um Like I said I can I can compile almost all of clifter on my workstation in, like maybe two or three minutes, I'm actually really impressed and most of that time's not very later itself. It's actually the C compiler compiling the C code.

C

ah Okay, I see yeah because it takes several minutes. For me it was long but yeah yeah.

J

Compared to the fpga stuff or vendor to other criterates, I've used I'm actually very impressed with the performance very later. It seems to be faster on the same hardware, and uh the only real issue is that it's maybe not quite as accurate um I'm hopeful that will change someday.

C

um So but it would be interesting to have a conversation about. You know why. Why is the execution so slow and if that could be improved because, like we are looking at these things in other contexts, and even if it's you know not anything, we can do anything about. It's still useful to to understand where the bottlenecks are.

J

As an example, I would be totally in favor of that. Please contact me if you want some help. Yeah.

C

We'll definitely be in touch, and the other thing is that you mentioned uh very leader does not support everything in the RTL and certainly that's another thing that uh that one is for sure something that should be fixed.

J

Yeah um and I I have to talk to our RTL folks to find out more details on what those problems are, um but in in the case right now. Basically, everything except for low power, stuff works. Great um I haven't found a scenario that doesn't in Clipper that doesn't work outside those of your low power things.

C

And the other thing was about the fpga emulation, I mean.

B

C

All great stuff, by the way, uh kind of uh I really like the testing methodology, but uh have you thought about co-simulation between uh so for context? We have a simulator called Reno that can do like close simulation over later so you'd run peripherals in verilator and you'd run the core in a fast instruction set simulator, which potentially you could get pretty good performance and pretty good Fidelity.

J

Yeah, this is something that's definitely possible and I would love to do it. um It's just a matter of basically getting exposing the the internal HP bus to all the peripherals uh to a test harness. uh So it's it's really just you know running a bunch of RTL um boilerplate to kind of link that stuff back and writing. The actual horror model should be pretty easy to take our risk five uh interpreter and hook it up to uh hook it up to the HP bust of a violator instance.

E

um So yeah I'm pretty much.

J

In favor of this, that said, I'm not and I really want to do this before, but now that we have the fpga working I'm, not quite as excited because the fpga you know always almost as fast and handles most of the stuff we care about. um But uh you know if I had infinite time. I would absolutely do that and I think it's a great a great solution.

C

um Thank you all right, I'll pass it on to other people.

F

Hello, uh I actually also have two questions. First, one is much faster, though, um out of curiosity for the uh fpga, the bitstream um creation, and also the the simulation running are using like a third-party tool chain like uh fire, Sim or something.

J

um No, this is just using bravado with a TCL script, so we just have a TCL script that just generates all this basically interacts with the Movado garbage or proprietary stuff um and then generates the Midstream at the end. um And then you know, as far as RCI cares, it's really just a black box that takes in RTL and returns to bitstream.

F

Cool thanks, um second question: uh it kind of comes from a lack of understanding, uh my own lack of understanding, of how non-psycho accurate simulations uh run. um Could you go to the the hardware model number one slide, so um it mentions low Fidelity, and so it's not clock cycle accurate. So do you? Do you ever encounter a situation where um the emulator actually gives a result? That is actually not accurate to how the real system would run if it was running cycle, accurate.

J

um Well, a great example where this would happen would be say something like a testing a watchdog like are we single watch that timer correctly right, so the Watchdog timer is going to be said in Cycles, right and so the number of Cycles it takes to get up to a particular point in the time you know the on the real Hardware it's going to be different than what it is similar I mean it's not impossible. We could. You know, obviously people that are writing. You know Nintendo emulators.

J

They get the stuff right, but it's a lot of effort and it isn't that important for us. um You know you know uh we can have certain tests that are testing some of these more timing, related scenarios which there's not many like for the most part. Clipart doesn't care about time at all. um But it's uh uh you know we can we're we're happy that we can test those things in in in in one of the more accurate environments, um and we don't necessarily have to you know it comes something like a watchdog.

J

Do you really care? If, oh, my my Watchdog fired, you know 10 early, you know yeah, not really as long as you as long as you before you release the final. You know build that you're running it in a more accurate environment or.

G

Running the test in a.

J

More accurate environment um so again, most of this, what tends to happen is uh working in the in the very later model. I'll notice, hey this test case, doesn't pass in there later and find out why that is in terms of oh actually, there's a bug in our in our in our in our the peripheral in our Simulator.

J

The peripheral doesn't actually behave the same way as the as the real hardware, and sometimes that's because there's a bug in the hardware, but more often it's because we misunderstood the documentation and implemented it incorrectly in the uh in the emulator so over time, because we're running the same tests in all these environments right whenever something goes wrong in one environment, not the others, they're like okay, there's something wrong in one of the environments and that's when we we try to figure out why that was.

F

Okay, thank you.

J

80 of the time it's because we got the software Engineers got the or misinterpreted the spec.

A

Thank you so much core for your uh great talk and I do hope. You feel better yeah. Thank you all right. So our next talk is uh Jocko Hoffman, who was uh from Western Digital. He talked is entitled. Omni, extend coherence, scale out confusion over commodity Fabrics. He is a member of the research organization at Western Digital. His research interests include application, specific accelerators coherence and open source Hardware design tools. He holds a PhD in computer science from TU darmstadt.

K

Thank you. Now we got the slats up. How does it work? Oh I have to screw on it or can I just yeah. Well, then, I have to do that. No, not quite I do that somewhere.

K

I was not aware to join yeah I. Do.

K

Sorry about that yeah.

K

I can reach 72 if you need me to just to meet in this one. Yes,.

B

Do you want to give you a companion notes, presenter or it presents.

B

K

Well! Sorry about that, oh it's! Okay! Yeah! um To be able to should be alone.

K

A

You're in good.

K

So, let's see if we can share.

F

K

I want to start.

L

K

Thank you. Sorry.

L

About that I didn't know, I have to enter the meeting itself.

L

So let's get started once everyone settled again. um Thank you for the introduction Rob. My name is Jack Offman and today I'm going to talk a bit about what we did in the omniacent space over the last year or so, and the reason open source releases we had, which you can find on GitHub.

L

um Just to give you a quick overview. What I present today um I will also go into detail. What Omni extend actually is, but just give you a high level overview.

L

um We have released a memory endpoint for Omnia, extend which you can use to expose um any kinds of memories: um XI attached, memories like hbm or DDR DDR over ethernet, and that's the important thing it's coriantly, so um to use that we also release an open piton to Omni, extend Bridge. So you can take your existing CVA 6 design and take the Koreans past. They are using the open, p term and translate that transparently to Omni extend another thing.

L

Both of them are released under Apache 2.0, completely available on GitHub and also has been used in on fpga. So far, and also in simulation, which I will talk about later so what's on the extent for those of you who don't know it, it's an approach to get a single coherence domain over multiple ethernet attached hosts.

L

So you can just use your off yourself Hardware, you don't need any, um for instance, Infinity band links or anything it's just using ethernet um to be as a widely usable as possible, and the whole thing is based on Open Standards. So for one we're using Thai link as like the base, bus on which can be found on older sci-fi risk. Five trips, for instance, and then Omni extend, is a layer around that which deals with the whole nastiness of ethernet being not in order being like dropping packets.

L

If you're, unlucky and stuff like that, so that's the omnics part around this and the goal of the whole thing is to give you fast load store access to huge amounts of memory over ethernet.

L

So how could the system look like um the red Parts is what I brought you to you today, but the gray parts are other things. Others are working on or others presented. So, um as I said, you have the CVA 6, which is connected to ethernet, and then it can access some memory from from.

L

From over here, which is, for instance, on ddr4, and then you can have a user space software which also accesses the same data, for instance loading some firmware image into the memory or even another cba6 working together, booting a Linux on multiple cores. You could do stuff, like that.

L

um Other approaches are like a storage snake where you just have your normal um network interface, which speaks on the extent and which could do stuff like um RDMA like operations, but you could also, with cxl.mem, really do um some kind of coherent access to the um to the ethernet attached memory.

L

The whole thing is available, as I said on GitHub. So if you want, you can just use these QR codes and the memory endpoint itself is designed to run on fpgas the open source release targets, 10 gig ethernet, just because of the availability- and it's really easy- you can just buy a hundred dollar switch on Amazon. If you want to and get this running with relatively cheap fpgas.

L

So it's very easy to use, even for let's say, resource constraints, organizations um it uses axi for memory, interfaces which targets a bunch of memory controllers which are available. So if you're running sidings fpga, you usually have like hbm or ddr4 or all other kinds of memories and the endpoint that supports the whole stack of tiling operations.

L

So, apart from the coherent accesses, you can also run, let's say atomics if you need that for your application or non-created accesses which are interesting for like bug, transfers and the whole thing also acts as lowest points of clearance.

L

If you want to spin your own Omni, extend Hardware extract the Omni extend logic out of the whole thing, so you can also use the same codes for um for your own applications.

L

The other part of CVA 6 is 100 Gig compatible and gives the CVA 6 the transparent access to to the endpoint memory or any other Omni extend compatible memory and, in addition, I also have a software Library, which gives you the and we get request the interface. So you can talk from the software to the endpoint, so you can play around with playing yeah the whole software stack and you can also easily debug the whole stack to see where it locks up and so on.

L

um So how does omni extend look like or what does it give you? um As I said, you have different types of operations, that's the whole um tiling stack of operations, so you have atomics, you have non-coherent and coherent accesses.

L

um It supports, burst transfers, so you're not constrained to Tiny loads in stores. We can do. Let's say cache line sizes of 512 byte or you can do with Mark maula bigger up to the maximum is really set by the ethernet. So when you have jumbo frame, you can do about an 8K read or write.

L

um Omniac sign gives you the flow control mechanisms, as I've said earlier, ethernet is not really suited well for for these kinds of high reliable protocols, because what happens on a CPU if a message gets lost, it will just look up, usually there's no recovery, because it's assumed that's not really happening um or ethernet. We have to assume that so omniax10 gives you a way to deal with the out of order transfers, drop packet detection and how to handle that.

L

um The memory endpoint is compatible to 103 and what I call 1.1, which adds some convenience features onto omniax10s like dropping and creating connections. So the system is less static. It's completely implemented in RTL, in this case blue Speck, which is sadly a bit not well known, but um completely available. Now on the under bsd3 on GitHub and it's a highly productive RTL. So it's not doing high level synthesis or anything, but it's very, very productive way of describing Hardware down to the lowest um registers or whatever um the endpoint is device independent.

L

So I've tested it on a variety of styling AMD fpgas, but you can also run it on um on. Let's say, Intel fpgas or you can run it on skywater 130 or something similar when you um bring your own srams. Basically, um as I said before, we have a technical power there and attaches to XA.

L

So how does it look like in a very high level view you get a 10 gig, ethernet stream, sax I4 stream?

L

In there, then, we have an omni extend Handler, which deals with all the nasty ethernet stuff and gives uh Channel separated, and in order a few of the messages to a tiling Handler, then, let's buffer it into input fifos to the OS flow control and then split onto the different Channel handlers which deal with stuff like coherence or non-coherent requests, atomics and so on then again for flow control, output, credit handling, it's distributed again to different output, fifos, which are then packaged in the sender and for one Captain, the recent Logic for later use, if you actually drop a package or if not like it just goes also over ethernet and you hopefully receive it on the other end.

L

If you want to use your own Hardware, you can basically get everything. That's not cray in what we have released and the gray part you have to print your own, bring yourself, but it's relatively easy to add your own tiling, compatible design to um this architecture.

L

The prac results to just get going are relatively low and everything's open source, so you need the blue spec compiler, which you can get. There are releases for like mac and a variety of Linux distributions readily available.

L

um Vs V tools is just a little packet to which provides some make files to make the building easier and you need rust, which luckily is also open. Source and Russell is used for the interface the simulation.

L

If you want to go to synthesis, of course, you have to use your favorite very log flow, which calls blue Speck in the end, produces rail, lock and then the verilog can be ingested by whatever Downstream tools you use.

L

um If you want to just use excitingx design, you can use topasco, which packages the whole thing and gives you a bit stream and makes it available over PC Express, and so that's all the ethernet stuff up. The GitHub also contains pre-compiled ipx SEC, very lock releases, so you can just plug and play if you want to.

L

In addition to the um to the hardware, I also provide software implementation, which is entirely written in Rust and gives you not that high performance, but rather correct implementation of the requester library.

L

So you can do read and write requests in, let's say powermic, let's say Echo area, hearing or non-coherent forms and it's very useful for debugging testing and you can do the whole full system simulation through that those tools in Hardware or, if you want to you, can run it on your on your PC or a laptop or whatever, um to to do a whole system simulation.

L

There's. Also, some helpful tools included so I'm going to omniac send RS is just the omnixton library which you can use in your Downstream tool.

L

So if you want to omniax102e, which brings you a terminal user interface to really just to Peak and post to see, what's in the cache when it doesn't get invalidated, which is interesting to play around with bit load, if you want, let's say a load, a bit stream into the um off into the ethernet attached memory and config for for reading status releases to see what's going on in the hardware, the simulation looks pretty close to what I showed earlier with the full system view.

L

But instead of adding ethernet in the middle I'm using Linux virtual ethernet devices, and then you can run the whole stack. We also provide a Wireshark plug-in if you want to see messages which nicely parses those and shows them to you in a like human, understandable way.

L

Let's see if this video works. um Sadly those here in the room might have a hard time seeing it because the screens are a bit small, but I will try to stop so I'm just starting it. So nothing interesting happened yet and that's the script I provide which starts on the top left the simulation. And then you get three of those two windows which are basically three clients for this memory and you can do a variety of reads.

L

So the simulation starts gives you a bunch of messages and then you connect um now, for instance, on the top right to the end point and the end point hopefully responds so we get a little green, active connection active and then it's starting to run. I connect some more and I do a cache tweet.

L

So the cache read will be shown on the bottom right of the um let's see if I can get the pointer again on the bottom right of the window.

L

So we now have loaded the cache line and on Branch mode, because we have only Reddit, we haven't written anything, so we don't have exclusive access, that's what entailing apprentices and then we go on do a cache right now we are in trunk mode, so we have modified it and when we now go on and read from another that's happening on the bottom left and then we written again and um are now on track mode down here and the top right went into none mode.

L

So you can't use this whole system to to play around and the same software you can use if you move to the fpga or or Asic to to do the same experiments on Hardware.

L

So let's go on exactly what I'm showing here. So when you go on to Hardware, you can go. um Yeah use the same software and analyze your your system, Behavior just a quick overview. How fast is it um with a lot of caveats? The protocol is. The hardware protocol is not where it's used for software, it's also not using stuff like dbtk, which would also improve performance.

L

um But that said, um the software can saturate a 10 gig Link at four kilobytes um chunks um can do a bunch of requests so up to 2 million requests per second with batching and the average time per request, which is a bit tricky. It's total request, divided by runtime, is about 484 nanoseconds, but again as with batching, the average latency is much higher 69 microseconds in this case, and if you tweak a bit I got it down to 3.8 microseconds, that's it it's in software and Hardware. It's a lot faster than that.

L

Omni extend part has been developed by Lewis Communications and provides you transparent access to this um memory, for instance, the memory endpoint and it's based on open Pizza on CVA 6. transparently translates these open python requests to tilings. So if you already have open Petron in some project, you can try and add this translation layer to see how it would work over Omni extend.

L

um There is a tightly coupled simulation right now available um with the Omni Excel endpoint, but the VF socket simulation is right now not yet available, but that will be coming soon. Hopefully um how does it look like? You have uh um signings fpga with the CVA 6 in this case, um which has multiple levels of caches. um In this case we have a level one data instruction cache and then a NOC and the NOC attaches to the tiling.

L

The network on ship that changes to the tiling, 1.8 interface encapsulated encapsulates those into the omniax10 1.0.3 and form it and then sends it out over in other open source tool. That's Lewis, 100 gigabit Ethernet Mac, which is also completely open, source and available on GitHub. If you need a 100, Gig Mac, for instance, Fork science, fpgs and the translation is also pretty straightforward. It takes the on Twitter messages from the NOC puts it on five roles, translated to the format that the Mac needs and sends its art over the LMX 3 core um yeah.

L

As I said, the main thing that's still uh Missing is like integrating the cvx6 into the socket-bit simulation to make it even easier, accessible or better accessible. um Apart from that, it's ready to experiment and I'm glad. We also have some users with interesting questions and um fruitful conversations and I hope more will come and play around with this.

L

um Apart from that, thank you for your attention and yeah I hope for some interest. Thanks.

K

Thank you. Any questions.

H

G

H

Any special support for any special support for things like IPv6 and what you're doing.

L

um There's no IP involved at all, so our protocol sets up much on a lower layer, so we don't use any IP. We don't use any TCP, it's not its own protocol.

A

Any other questions from the audience or online.

L

K

You again, thank you appreciate it.

A

All right, so our next speaker is uh Jack kainik, who is going to be chatting about chisel, 3 and Beyond. Jack is a senior staff engineer at sci-fi, where he works on digital design methodology. He is active in the open source Community as a maintainer of chisel Hardware description, language and he holds an MS in electrical engineering and computer science from our friends across the bay here. Berkeley.

G

K

M

Okay, just sharing.

B

My screen, hopefully.

M

You can hear me: okay,.

M

Okay, so hi everyone, um thank you for the introduction. Rob, so I have been true, so I'll just kind of dive in- and this is maybe my second talk in person since before the pandemic. So sorry I think they're still a little rusty. It's it's nice to see people in person, but it is a different experience than just the little boxes on the screen, but I'm glad that all of you are here as well so good to see everyone.

M

um So what I'm going to talk about today is chisel, which probably most people have are at least somewhat familiar with, who are attending this meeting, but there always are new people. So I'll give a little bit of background.

M

um I have like four slides kind of explaining which is Liz, and then the most of this is four chisel users. So sorry, if you don't know anything about it and that's not enough of an introduction, but it's mainly going to be about features um and what's been going on. So, first of all what is chisel, it's a acronym or a bit better, a acronym uh or constructing Hardware in scholar, embedded language.

M

um uh It is a domain specific language where the domain is digital design. This is just like. Verilog is a domain-specific language where the domain is digital design, just distinguishing it from general purpose languages like python, C, plus or Scala. um It is neither high level synthesis nor behavioral synthesis, so it is not C to Gates. It is not skeleton Gates, it is a program.

M

You write a scholar program and the execution of that program constructs a hardware graph that Hardware graph is turned into verilog, um which you can then use with your standard tool flows. Why it's embedded in Scala is because Scala has a general purpose. Programming language has a lot of things that we in the programming languages, communities like like parameterized types, object-oriented, programming, functional programming and static typing. These are things that are popular. You know the previous talk talked about rust. A lot.

M

Russ is another one of these great languages with these great features um and there's a reason why they are growing in popularity, and it is intended for writing reusable Hardware generators, AKA libraries, the goal of being productive in writing software is you can leverage other people's software? The goal of all this open source work is, you can leverage other people's work and in order to do that, it has to be reusable, and our experience has been that um you know very long, especially but vhl as well.

M

It can be hard to write things like you can reuse at some level right. This is why we have bus protocols, but that is a very specific and well-defined level. There are a lot of other places where you'd like to get reuse and it's harder to do, and we hope that different language trucks can make it easier.

M

So I mentioned that chisel is not high level synthesis, and so it's important to point out that it is. You can write something kind of like verilog I'll note that this is very much like synchronous.

M

um You know digital design, we're not talking about anything analog, we're not talking about. You know all the kind of the weirder more electrical engineering aspects here, it's mainly just the like the functional description of what you do, but at that level, which you often write in verlog, you can write things in chisel that look very similar. So you know, let's not worry too much about the syntax here, but this is just a basic module that has ports. We've got an input, we've got an output, it's parameterized by a bit width.

M

We've got a couple registers and we're just summing a value over three Cycles where we have. You know two registers with the previous two cycles version of it right, and so this is a three-point moving sum, but what would happen if you wanted more than three points? What if you wanted different weighting for the values from previous Cycles? What we really want is a generic fir filter and so in chisel and I will not dive into this code. So just trust me and you can go run this example. This is an example that is parameterized.

M

My mouse may or may not be visible. I'll just not do that, but it is parameterized just like the previous one on bit width, but it's also parameterized based on the coefficients that you want to use in your filter, so that is, it is a sequence of units which allows it to be.

M

um You know the number that you have is parameterizable, as well as their values, and so this lets us completely um make it generic in. However, many cycles we're filtering over and how we're waiting the different pieces, the different um delayed values- and so this allows us to parameterize by those with no loss in the quality of the of the RTL. That's going to come out, you can. You can do the exact same moving sum filter before with this more generic version.

M

But what's nice about the generic version, is you can also have like a one? You could express a one cycle. Delay filter, that's just a register, but you can do that. You can do a triangle filter with different impulse responses, and so the point of chisel is allowing you to write this sort of thing where you can be as parameterized as possible, with no loss in the quality of the design. You're Building, no qor impact um okay, so that was for the people who don't know what chisel is. The rest of.

M

This is mainly for our users, so apologies um to the rest of you, but um there's been a lot going on. This is probably the most recent content I've ever had in a talk which is really exciting.

M

um So first of all, this has been the longer term effort, but we have a new compiler underneath chisel, so chisel emits an IR called fertile and then fertile has a compiler that compiles it down to verilog. We have we used to have our um our compiler implemented in Scala. Similarly to how chisel is implemented. Now it has been re-implemented with mlir. Mlir is a kind of a generalization of llvm, it's part of lldm and it's for writing custom.

M

Compilers um importantly, this uses it's much faster and uses a lot less memory, that's like a great thing, but the real goal of it with those benefits aside, because those are very nice, it's really more about more rapid feature development. So mlr stands for multi-level intermediate representation. I think it was originally machine, learning and then a very convenient re-backronyming to multi-level, um and it allows for coexistence of multiple IRS.

M

So, what's really convenient here is you can have your fertile ir and next to it, you can have some specialized assertion IR, you can have it those both lower into system verilog, and things like that, so it makes development of new features much much easier.

M

Now part of this part of this move to a new compiler. This is a common issue. When you have large infrastructural changes, a lot of older features, kind of in order to make development easier, we had to drop some of our older features, so the old chisel 2 compatibility layer is now gone. Fixed, Point interval types are gone, I'll talk about fixed point in a second um and the old scholar, fertile compiler apis, which were nice and I use them.

M

A lot are no longer there either, but the benefit of this is that the smaller public API surface is making development a lot easier and you're going to see some we're. Finally starting to reap the benefits of some of that pain. I'll. Note that there's ongoing work on on fixed point to make it a library, so I recently saw the library work here. I think there's some improvements we should make, but it's not fully gone. It's just gone as like a primitive, um so it's just some other minor organizational changes.

M

We've renamed chisel from chisel 3 to Chisel I feel like every project goes through this. You have like version one two three and then you just go back to no numbers um and we, our artifact, is now officially under chips Alliance or as it used to still be under the old um Berkeley organization.

M

We've changed our versioning scheme from kind of a Haskell style PVP, where the major version was 3.6. Like the point. The sixth part was part of the major version which is now that everyone has been December confuses people, so we have moved over to now. It is chisel 5 and then chisel 6 for don't worry about it. It's kind of reserved for possible future use um and chisel 5 has been released in chisel. 6 is on the verge of being released, which has made my life a lot easier for doing releases.

M

When you have to coordinate across lots of repositories, it slows things down quite a bit um with the caveat that circuit. Of course, as an llvm project is in a different repo, so we kind of pull that in as a dependency, but the rest of chisel is all in a single repo.

M

Okay. So now what's new as of 6.0.0 M2, that m is Milestone two. This is just a way of like marking things before we're willing to call it 6.0, but I will say that we're very close to the actual 6.0 release, and most of this is from like the last three to six months, um so we have much better Source locators. So this example doesn't really matter too much.

M

But what I'm trying to show is that this trait bar is in a file called file.spella in a certain directory and class example is also in a file called file.bar, but in a different directory. It's very common that you end up with the same name like people really like the name, utils, that's a common one and there's other names that just come up all the time and when you have a large code base with you know, hundreds or thousands of files.

M

You end up with Source locators, and you look at this and you're like okay file about scholar and file.scala, which one did it come from. I actually came up with, like nice, little grep tricks where you could like check. If that thing existed, and it would tell you that's the right one, but in reality it's better to just have a source locator that gives you a more descriptive path and so the new source locators do that you'll also notice. We have Source locators on ports which we didn't have in the past.

M

That was true as of chisel 3.6, which was released in April I, think, um but one of the best parts about those new source locators is if it's unambiguous. What something points to now the tooling can use that for error messages, and so in this example, I have. This is an out of boundance index bar is an 8-bit value and I'm asking for bit eight but we're computer scientists. We start counting at zero, Oops I messed up off by one as usual.

M

You used to just get an error saying ah you're wrong, and it's in this file. Good luck! Now the kind of error messages that people expect in programming languages. You will get the source locator as well, but it will also pull the line and give you a carrot and I cannot tell you like.

M

This is such a small thing and it wasn't even that hard to implement, but it's amazing how much nicer things feel when you get this kind of error, reporting um and also in the um you know, that was an error message from chisel itself, but we have a compiler underneath it and that compiler has its own error messages. Sometimes because there's certain things we can't check, and so this is an example where this wire is not fully connected under all circumstances.

M

And so you get an error saying it's not, and then you get this other representation that doesn't look like the Chisel. You wrote, Because, that's what's the compiler seeing and so now, um with virtual and these better Source locators. It will point you to your original chisel to tell you where you messed up, which is very handy.

M

um Okay, I mentioned that performance is a lot better now and that's a big one. People like performance and it makes easy graphs.

M

This is a synthetic Benchmark, because I was more interested in spending time working on features than on making really good and fair benchmarks, but in this case I'm showing a stack of the Chisel infernal run times and so chisels and red fertile is in blue, and so it's a combined runtime, obviously, and so for this synthetic Benchmark um going from chisel 3.5.2, which was roughly a year and a few months ago, like April of last year to 3.6 April this year, chisel got twice as fast and switching from sfc to the new fur tool circuit.

M

Sorry, I use those interchangeably but they're. The same thing virtual is a part of the circuit project that was, for this Benchmark 3.2 times faster for combined speed up of two and a half X and memory use. Similar benefits. Now Memory use is more of a Max than an additive thing, so you can see that chisels using 18 less memory, but um in this particular Benchmark furtool uses 72 percent, less memory, and so the overall memory use has gone down quite a bit now, a better benchmarking.

M

Chisel is kind of hard because, as it evolves, my the designs I would use the Benchmark. It also evolve, and so I have to like back Port things to do benchmarking and that's why it's kind of annoying, but it's easier with with benchmarking, just fertile because it has a spec, and so it's it's much easier to compare across versions.

M

Here's some benchmarks from my my friends on the circuit project, where for a small, microcontroller kind of a larger in order, PPU and then a large out of order CPU, you can see that the runtime of circuit was from 6 to 11x faster, which is huge right. That's that's much much better and in memory use fairly similar on the microcontroller 14x up to the outer border. Cpu 4.4 X, like all this, this reduced memory use is incredibly helpful for our you know.

M

Your CI runs on your servers and all of that all right, so performance is great. But that's not the point. The point is all the new features. So that's what I'm going to talk about constant types are kind of a small one, but still useful so prior to Chisel. 6 and again, 206 isn't quite out yet, but you can use the Milestone to release async reset values had to be literals, and these checks were kind of haphazard. It's somewhat complicated to implement something that will you know when you base your functionality on optimizations.

M

It leads to weird behavior and and hard to understand Behavior. So these constant types give us a more principled way to deal with that, such that it's very predictable and it also supports new use cases. For example, how do you have an async reset register that is reset buy a strap pin? You could, of course, just allow it, but how do you check that people are doing it right? Well, the way we do that is with these constant types, and so here's an example of the old error message in the past.

M

It's like this looks like it should work, but it's not known to the compiler that some is literal because you could assign a non-literal later, and so. Instead, if you use this modifier for your types called const now the compiler knows it's a constant and it will check it, and so you get much better Behavior here and there's also possible future integration with physical design right because why is it dangerous to reset something asynchronously from a port? Well, because if that value can change there at certain times, you can have physical issues right.

M

There are like power concerns here, and so you need to constrain that Port um to tell the physical design tool that this is allowed. So when you have this information in your type system, you can also potentially communicate it to your your backend tools.

M

um Another com, another recent feature that is pretty big is- or this is bigger- I guess in in use cases is what are called probes, which sometimes we'll call reference types. My brain uses both. So that's why I put both but basically prior to Chisel 6.

M

um There were ways to do: verilog, xmrs for verification or whatever or physical design in your chisel, but it required like a lot of custom user extensions to the the Scala photo compiler and really, as a user. You don't care about that. You just want a guarantee and a mental model for how you can lower from chisel to verilog and how you can use it with your system, parallel, UVM verification flow, for example, and so this is now at the Chisel level. We have a first class feature to capture these use cases.

M

um This isn't that interesting of an example. It's just showing that you can declare ports that are what we call probe ports, which is something that you can read from the outside and rather than lowering to an actual Port, it will lower to a barrel log XMR- um and um these read write ports are, are these are xmrs that can be forced and that's really important, because for any of you, who've done a lot of system analog verification. The semantics of force can be surprising.

M

There are like back propagation semantics that get people it's a it's a really easy um sharp edge to fall off on well, a nice thing of having a compiler is we can lower to something that doesn't have that problem, but you do have to tell us that you're going to do that, so we make you Market as read writeable or writable, really another feature that sounds kind of obvious to any compiler people, but is a good thing to that.

M

We've added are intrinsics, so I pre I mentioned that you could extend the Scala fertile compiler with annotations and custom Transformations, but this was arguably too flexible, like you could do almost anything, and that makes composition difficult and it also didn't really support. Custom system error, login Mission, which was really problematic for a lot of the extensions that people wanted to do. Intrinsics are like reeling that back into something a lot more restricted, but because of the way mlir works and therefore how circuit and virtual work.

M

um It's really easy to do. Really. Nice custom system, dialogue, emission and this it gives us a nice platform for taking some of these more ad hoc extensions that you might have seen in rocket ship and turning them into like first or at least supported features with like a defined API as part of chisel.

M

Some simple examples that are already there, but there are more coming, are clock Gates, that's, obviously a really important thing, plus art readers or plus args, which are obviously useful in in simulation, a way to check if something is X which I'll just kind of leave at that. But a really big one that has already been implemented now is very exciting, which is temporal properties. So this is something people have asked for for years.

M

You might think of you might be thinking of this as SVA or system barlog assertions, but people want the ability to describe more complicated temporal properties and not just immediate, like a equals B they want well I'll show you they want that a implies B, followed by C right and things like that. This is like the you know, industry standard in verification, especially in formal verification, and so this API is built on top of those intrinsics which allows it to happen faster.

M

Eventually, we will turn these into first class IR features, but the intrinsics give us kind of a a way to hoist ourselves to that level. Much faster. This took one of my colleagues only a few weeks to implement on his own, maybe like two or three weeks, and so there's some tweaks that will probably be likely before we release 6.0. But this code is code that I ran yesterday. It works. You can generate your system very log assertions, it's very exciting.

M

um Another thing that I I originally had seven slides on this, but in the interest of time I turned it into one. I could talk about these types of semantics all day. It would bore everyone in this room, so I'll just kind of gloss over it, but the connection, semantics and chisel in the past were way too permissive uh until two years ago. It didn't care if the types matched at all nominal hyping, meaning if you name a type like this, is a class named Foo.

M

Then you know it didn't require that, though, that you could only connect foods to Foods structural typing is like are the actual fields of this type? The same it didn't require that either and it didn't require the width to match, didn't require much it just kind of did something. Chisel 3 made this a bit stricter. The fields had to match, but being that rigid is also problematic, because if you have two types you want to connect and their fields match 99, but one doesn't match. What do you do?

M

That rigidity then causes you to blast out the connection and you've thrown away the safety anyway, so not actually helpful and so and it also it's still just happily deal with mismatch switch with, which is not great. So we have new operators starting in chisel 3.6 that checked the nominal types must match, but you can easily opt out. It checks that the structural types the fields match, if you don't have matching nominal types, but you can wave those fields and it checks that the widths must match. But you can.

M

The term is squeeze that's the API, but you can say yes, this width doesn't need to match, but all the rest do so that gives you the rigidity of when you connect things. It checks everything for you, but if you have, if you have a need to waive just one small part of it, you can do that and it's really useful. It's been very successful for us at sci-fi at least, um and we're planning to migrate the older operators in newer versions, all right. So that was everything that is already there.

M

You can go, go use, chisel, 6.0.0, M2 and that's all there. So what's up next um well, I'm gonna immediately start with one that's only sort of next, because it's actually already there but um is aggregate preservation, and this is supported by furtool.

M

um It works well for vex I'm, marking it next, because there's some complexity with how this works for bundles, and so it's currently off by default, but you can turn it on yourself and use it, and it also can impact qor.

M

This may come as a big surprise, but even these expensive proprietary tools, don't always do as good of a job, as you think, and the optimizations we run in for a tool on well, let me show you the the current lowered output, the optimizations we run on the scalarized output actually does improve the QR you get out. um I think we need to do more of a study on that, so I'm going to kind of hand wave a bit, but we have noticed um that you, when you turn this on, you can get worse QR.

M

That being said, this example just shows something very simple: we've got an input, Vector an index and an output and we're just dynamically indexing. It right common, like array, access, if you will and as in each of the users know, chisel, will scalarize that Vector into individual pieces and then, of course, this output from fur tool is already better than what sfc used to do. It used to kind of unroll, muxes, which looked really ugly, even though it gave you good results. People didn't like how it looked for a tools doing something better.

M

But now, if you turn on aggregate preservation, you can have your back as a system analog array in your ports, and it looks like what you would expect at the system. We're log level.

M

um I talked I talked a bit about Connections and so there's other aspects of width safety that have been um weaker than they should be in chisel in the past. These are kind of famous warts that we're fixing, um and so here's an example where I have a dynamic bit selection, so I have an 8-bit value and I can select a bit using four bits or two bits which are wrong.

M

You only need three bits, and so in both of those cases, you'll get a warning that your width is too large or too small, and so we're rolling out more of these warnings in order to kind of tighten up the semantics around widths. Now, one really big problem with rolling out new warnings is for certain things.

M

You might get a lot of them, and so this is a little bit of a scary example, because but the real numbers that I have internally are over a hundred for some of these new warnings, like 100 warnings and I, talked about those great new warnings where you get the warning the line and the carrot. So that's three lines per warning right. So if you had 241 warnings, that's 700 lines of warnings right, which is almost worthless, because you know you're not going to fix this.

M

Many things in one go and you're going to just try to make more progress, and then one of your co-workers is going to add a new one and no one's going to notice right and that's like a huge problem. You want like really. You want warnings as errors, but if you have hundreds of them you can't like in in GCC, you can't turn on warnings as errors.

M

When you have too many warnings right and it makes it really difficult to kind of manage that so there's a feature in Scala itself that worked basically deriving a similar feature in chisel, where you can have more focused warning configuration. This is currently an open PR, so the actual syntax is not fully specified yet. But what I'm getting at here is you can pick by ID.

M

You can filter with a glob based on source file because of those new source locators, and then you can provide an action, this s, meaning silent or suppress a W for keeping it as a warning, an e for erroring.

M

So what I've done here is assuming some warning ID3 which I haven't assigned them yet so this is made up, but um this would allow you to filter all warnings of this category in this directory and silence them in anything else is an error, and so this gives you nice flexibility where you can say: okay, I'm, going to fix, you know: I, have a thousand Scala files in my code base I'm going to fix them for these 50 and turn it on as errors in those files and then just keep the rest, and so this allows you to migrate your code in a much more reasonable way.

M

um Another thing that's pretty exciting, and this is very much Half Baked. So don't don't hold me to the syntax, our groups. This is a way of kind of grouping. Non-Functional statements you can think about this as, like the traditional methodology of having a module that you bind in um for your verification.

M

And so these groups can then be optionally included in simulation, and this will and a really nice feature of these groups as they're currently specified. Is they allow nesting and hierarchical access? So if any of you've done a more traditional system, bear log verification methodology where you put your assertions in a module and you bind it in um it- is outside of the verilog spec to have a nested bind to have a bind that you bind into your bind, that's not allowed.

M

It is allowed by varilator in at least one proprietary tool, but rejected by the other one, and so you know one nice thing about a compiler is that we can support that API and then lower it to something that works according to the parallelog spec and so, and also if you have like some hierarchy of modules say here: I have Foo instantiating bar in system, Barrel log. If you have two bind modules that you're binding into each of those they you can't talk across those modules. There's no way to communicate.

M

Like you have the sideband collateral and the sideband collateral can't talk well with these groups. It can okay, I'm almost done so, mostly okay timing and so here I'm, showing that you can have your design. You can have this optional group that I've called assertions, and you can do something with it. It doesn't really matter what the logic is: I'm, delaying it by a cycle, and then you can in the parent module.

M

You can access that information that is defined here in the the group in the child, and then you can put an assertion on it, and so this allows you to do really powerful verification. In a way.

M

That's you can keep separate from your your system for all you emit, which is really important for traditional flows for synthesis or verification, and it lowers it generates things such that it just automatically handles the fact that some of these constructs are not directly supported by verilog, and so you can lower them to cross model, references and stuff um and another thing that I'll just kind of gloss over, because this is very much a work in progress. Our properties, which are, is a new type system for information about the generated Hardware.

M

This is for for a lot of things like um you know. What is the the latency of your interconnect that may be a derived parameter based on how the generator works or your address map, and these are things that you need for software verification, physical design, whatever and then. Finally, the native dependency on Virtual has made chisel installation a bit more annoying. So we have some work.

M

That's almost finished where we will co-distribute virtual with chisel such that when you, this is kind of obvious, but it's you know: Distributing Scala things has a very convenient process for it and then Distributing native binaries has its own entirely different process, and so trying to mix those worlds can be complicated. So we've come up with a system to make them distribute together um and there's also now windows binaries for my windows, users, and if you have other platforms that we're not currently supporting, please come help us out.

M

Oh, this is open source on GitHub and that's it. So, please check out our website. It has links to the repo and check out circuit's website as well. Thank you.

A

All right, thank you. So much Daniel.

A

Thank you for a great talk. Are there any questions here in the audience or online.

G

um A while ago, probably three four three four years ago, so maybe my question is outdated, but yeah I'm, working on a formal verification and one problem, I've found this generated designs when I find a bug.

G

I only get the corner example on a generated whatever system dialogue is there any plan is any big vendor actually trying to back annotate that information such that I can basically debug in a chisel which would be way more convenient.

M

um I, don't know how much I can say at this point, but there is. There are ongoing discussions on that. So the simple answer is today: you have the source locator. So if you find an issue with a signal, it's it's pretty easy to see what it comes from, but more integrated support with the tools um there are discussions. That's all I can say.

G

A

Are there other questions here in the audience or online.

F

Hi, um could you clarify the uh the new naming scheme so I see that 3.6.0 is remapped to 6.0.0, but what about you mentioned 352 and 356.? So what are those map to.

M

Yeah so to clarify 3.6 is not remapped like 3.6.0 was a major release. The next major release is 5.0.0, so that is a version that comes after 3.6 and the version that comes after 5.0 is 6.0. Okay,.

F

So there's no so there's no remapping.

M

It's just that.

F

Is five? Zero? Zero still in upcoming 500.

M

Is out okay and six zero zero is coming soon and then 5.1 will also come out, but that's you know kind of a uh whatever back patches thanks.

M

uh Well, it's not so it's not remapping, it's just new numbering for future versions, and this is a problem with animations. It takes forever to go back.

M

Oh I went too far. There we go so 3.6 is followed by five. It's followed by six and I'll. Make sure that this is clarified on the website as well.

K

um With the whole mlir move, can you somewhat say how difficult it was for you to change like the whole interface between Scala and mlir.

M

Yeah so fortunately, fertile already had a speck, and so implementing the spec was pretty easy. As with most things, you know that first 90 took almost no time and then I talked about all those custom, extensions that we had. That was by far the hardest part and that's part of the motivation for why we're just like we're not doing it that way anymore.

M

If you want like a lot of these features, you're seeing are more more principled and specified versions of things that we at sci-fi kind of already had in ad hoc forms and the reason that they weren't public was not because they were Secret Sauce, but because the implementation was bad and you don't want other people relying on your bad code, because then you have to fix it, and so a lot of this. A lot of the effort, the hard part has been okay. The basic spec that supports Hardware is like really easy.

M

All the other stuff took forever, and so that's what these new features are about are trying to make that work better and be specified in a first class feature.

H

I was just wondering if there's your Scala go to some intermediate representation: I could generate from other languages. Yes,.

M

So fertile and fertile is our intermediate representation. It is, there is a fertile spec on the chips Alliance repo and whereas in the past, like all these custom, extensions via annotations were like not in the spec, because they were custom extensions. All these features I'm talking about are in the fertile spec. So if you want to see it's, it was basically stagnant for like three years and it is evolving pretty fast. Now so I recommend checking it out and you can totally generate it from whatever.

A

Thank you so much for a great talk, gotta be quick. Try.

B

A

B

Oh sorry, going back to the follow-up the formal discussion, just clarification when you're talking formal and your discussions with major vendors. Are you talking about logical equivalence or assertion based formal.

M

um For integration with tools, I'm just talking about discussions about how it might be possible to remap from Barrel log back to Chisel I, don't have any specifics on what kind of tools there.

A

Thank you. Thank you much again, jack for a great presentation appreciate it.

M

Thank you. Everyone.

J

A

Right so our next speaker will be uh Mikhail mosif and he is a senior staff engineer with Intel Labs he's working in Hardware design and verification automation tools. He is a maintainer of the Intel compiler persistency and participates in system C uh work group. He has a PHD in computer science, so McHale. uh The four is yours.

D

How long can you see me yes,.

A

And we see you as well.

D

Okay, great, let me share my screen.

D

D

Okay, can you see my presentation.

A

D

So um my talk today is about a single Source library for digital design and virtual prototyping. uh Let me start with a problem statement, so normal in conventional Hardware design flow uh beside Hardware design, RTL sources. There are some another models like pre-silicon models used for architecture, exploration or performance evaluation.

D

Some, a virtual prototyping models, use it for firmware or software development, and uh usually these models are developed independently from Hardware RTL, and that leads to extra resources for for multiple models and some problems like parallel modific, parallel modifications of the model and uh checking equivalence of this model and uh RTL design.

D

So to cope all these problems, we propose a single Source design flow, which use the same system, C source code for Hardware design, for pre-cilicon models and for virtual platform models.

D

So initially we have some architecture specification and a designer creates a systemc design. You using single Source Library.

D

These systemc design can be translated into synthesizable system data log with Intel compiler for systemc, which is open source tool, and this system very local, synthesizable and finally, um silicon can be uh can be created from from it to be used in pre-silicon models and virtual models. The systemc design just taken as a C plus plus source and compile it with a normal, C plus plus compiler.

D

So let me introduce a single Source Library. It's a library of high level communication channels which supports two modes. The first mode is cycle, accurate, which is a normal cycle, accurate mode with clock reset and all the details and simulation and second mode is a fast simulation mode. There is a Knock Lock, no reset and all the simulation is request or data driven, um as that leads to no extra process activation until there is really the process. Functionality is really needed.

D

Besides uh increase in simulation speed using this library in force, forces some new design style, so instead of processing organization with individual signals or register or ports, that is high level channels that reduce uh on total line of codes and uh also reduce the risk of uh some synchronization errors.

D

So the library uh includes a number of channels, uh initiator and targets are used in pair and allows to communicate between two processes. There are also multi-initiator and multi-target intended to connect multiple processes.

D

uh All those other signal input, output, output, ports are very similar to system supports, register and FIFA uh could be used as convenient buffers to store some data in products.

D

There are also some additional uh channels which are not open, sourceed, yet so I skipped my detailed descriptions, each of the channels support to motorcycle accurate mode and for simulation mode and each of the channel implements interfaces, for example, City put interface, ticket interface and others. These interfaces looks uh very similar to tlm 1.0 interfaces.

D

uh Let's briefly discuss the interface details so put interface, allows a process to put some request or some data into a channel, and uh there is a radio function to know that the channel is really related to the next request. So of course, Channel could be uh reset and cleared uh and uh there are non-blocking and Main block input, a function uh to be used in method and thread process get interface looks very similar, but instead of red there is a request function to know.

D

If there is a request in the channel reset get clear, get ozone used in reset section or to clear the request in the channel and it's possible to pick or get the request and blocking and.

E

D

And my blocking stack.

D

So, uh let's discuss the typical use cases, as I said, initiator and Target Houston player to connect to processes. One process put some data into initiator processing module, one processing module two get some data from the Target and of course this is the same data which is which is sent by initiator to Target.

D

uh Fifa can be used as convenient buffer just to store a number of uh data chunks in it so signal input. Output ports can be used as a normal signal and in input output. Ports of system scene uh about FIFA also can be used as convenient uh inter-process communication uh mechanism between two processes inside of one module. Although there are on dipport it's a baseboard for implementation of Amber, Port, meister and subordinate port or ocp port or other ports.

D

There is some memory operator to instantiate a memory like ceramarif, Roman side of the design and also some register is um it's a special Channel which is used to introduce state for stateless State, stateless amiotic process and systems.

D

So let me give a little bit deeper into initiator and Target, so uh they allow us to create combinational connection and bifurite connection. So combinational connection are used when Target process is always ready to get request.

D

In this case, we don't need any back pressure, just need to have some requests and request data battery connection is used when Target process can be not ready, sometimes, and in this case, data is stored in some internal register, although uh there are optional, a register to introduce a pipeline just to achieve required performance and optional file, fun just to have a convenient buffer inside of Target.

D

So there is a very simple piece of code: a system C code with module, a module B in module a we have initiator and a random process which generates some random numbers and put random numbers into the initiator in module B. We have Target, and this target is used in chat products as these checkprots um uh gets. It gets a request from the target checks if it's, if it equals to 42 and it's I sort of the inter.

D

So um here a random process is thread process. So there is a reset section where initiator is a reset and um uh while loop, where all the um all the functionalities implemented. So here you can see my block input, so that means, if initiator, is radius, there is no delay, but if it's not ready, um there is a wait until it's ready inside of this block. Inside of this, my blog input function so in check browsers, also reset get, which is just initialized, Target, outputs and uh the logic.

D

So in top module, uh both of modules are instantiated and the initiator just bound to the Target. Of course, Target also could be bound to initiate.

D

The next uh channel is FIFA, which could be used in free uh cases. So, first of all, it's uh for two processes. uh Communication, second, is uh as a buyer for one of for one process and the third one is using inside of the target as a journal buffer.

D

Let me uh give you a simple example here: in this example, we have two processes, a producer and consumer and producer. Both of the processes are combinational processes, method, processors, and there is one file which is used to communicate between these two processes. ah As you can see, size of the FIFA is two, so we can store two uh two requests in in this file, so in producer we do reset put, and if there is some condition and fireplace radium, we produce next value and put into the five and uh in consumer process.

D

It's also combinational. We do reset get and uh get try to get value, and if there is a real request, this get we have to click on true, in this case, we consume this value. Do something useful for it.

D

So, as I said, signals and input output, ports are also required for um uh for for the designs uh which I use in single Source Library. That's because a single is uh very simple and could be used uh could be written by one processes but read by multiple processes.

D

um uh Fifa Channel cannot be used in such a scenario. So, for this special case a signal and input output. Ports are uh also include who are also included into the library.

D

So a register, as I said it used to extend um at the process combinational process of this state. So normally when we need some um uh asynchronous graduation, so means um uh need to take and process some input or data without a latency. We are using method process and if we need to have state for this process, We additionally use a thread process to avoid extra process. We just add the register. In our case, it's a counter, and this counter here is used to count number of input request taken from the target so counter.

D

uh Here we call reset for the counter and just increment counter for each request taken from the target. Of course, there could be a number of register used in one method process.

D

So um systemc has a support uh which has a specialization for a signal, a cm and ACL, and the support also could be used for all the channels I mentioned before. So here is an example, a support for a Target and initiator.

D

So here we have a child model with Target and in this target administrator, and we would like to promote this into the Target and initiator to parent module, be connected to some initiator and Target in the test bench So. To avoid some replication of initiator Target, we just have a support which is internally implemented as a pointer and uh bound uh a support to real initiator and Target inside of the child mode.

D

Really, there are lots of feature I avoid to discuss every feature just mentioned. One of them, C, plus plus structures or classes, could be used as a channel payload for signal for initiator for Target for python uh to uh support that um to support that we need to satisfy some uh requirements from systemc. First of all, such a structures should have a default Constructor and comparison operator, although it should have operated less for output stream and easy Trace.

D

uh As soon as we provide all this operators, we can use it inside of, for example, in our case inside of the target, uh and we can use it as a normal payload. There is no restriction here.

D

So let me compare the this single Source library with existing Solutions. uh There is a much cheap, open source library from Nvidia, and it's a really good library, with lots of channels also it has initiator targets, five percent uh other channels, and um it's really a good.

D

The main difference is that this library is intended to be used with catapult chos2 and only visit, and this tool is a commercial one. Our library is uh can be used together with Intel compiler for systemc, which is open source. Also, this Library much lip um has some limitation. It can be used in a clock threads only and because of that, a latency of every module could be one or more. So we cannot create your combination of module.

D

In our case, we don't like to have such limitation, and our channels can be used in method process and thread process, uh and we can create a module with the latency is used.

D

Okay experience uh this uh single Source library has been created about half of years ago and open sourced uh just a few months ago, but we already have about 100 designs with these uh channels, and um currently we have one big internal project twin tool, which is using this single Source Library. Here. I just provide a few examples to show how using the single Source Library can use the total number of lines of codes and what is the simulation speed up for a fast simulation mode and comparison? This cycle, accurate model.

D

So concluded in my presentation, single Source Library provide a communication channels. It could be used for cycle accurate mode and fast simulation mode for a virtual platform and um some very.

F

K

D

And using these, it was not even shows that it increases design, efficiency and design. Time can be reduced about two times um single Source Library channels are synthesis and distributed as a part of this file, and you can find the library and these toolians GitHub repa, which is showed here. So thank you for your attention.

D

A

Thank you Mikhail any questions here in the audience.

A

Are there any questions online.

A

Say, do you have a question.

E

Hello, go ahead, yeah, actually I missed the most part of the this webinar would I be able to. You know, get the access you know from Linux Foundation afterwards, or something like that.

A

uh Yes, the recording will be made available will be posted on our website and also work on getting the presentations.

A

H

B

A

Michael this is Rob I. Just I did have one question: I'm just curious: how many developers uh work on this either at Intel or in the open source community.

D

Also, a really inside the window, we have only two developers. um Previously, we had more, uh but community helps a lot and there are lots of bugs and feedbacks and feature requests from Community some examples provided by community. So yeah. It's really all right, big.

A

Help, no that's great I, don't know. If, if you can comment, I'm just curious does uh Intel use this for all of its design work or do they use different uh different uh design, entry choices.

D

um No, it's quite a new technology and I, of course, uh I'm promoting this technology inside of and outside the window. So, yes, we have multiple projects with the Ingle compiler of system, C and number of silicons uh already um uh alive, uh developed residential compiler for system C, but most of the project, of course, some big course some other projects they do not use.

K

This this technology.

A

That's great, thank you so much again for your talk, I appreciate it.

B

A

Was another question sorry.

J

F

Yeah there's a question in the chat is the generated verilog output. Yoast is compatible.

D

Excuse me, which compatible compatibility is what.

A

Is a generated verilogyosis compatible.

A

Yosys is an open source synthesis tool.

D

Oh yes, yes, okay, okay, it's compatible, I, um have experiments with the sure, look and pass it about uh 1000 uh unit tests uh um are generated by Intel couple system, C for furlock and told the test passed after some, some minor fixes. So, yes, generally, you can use it. These uses.

G

A

All right, thank you. Mick I appreciate your talk. Thank you.

G

All right, our.

A

Last speaker this morning is melissin from uh University of Michigan he's a researcher there and works actively in the analog community and in particular, open fast suck and I've. Had the pleasure of working with many for over two plus years now, I guess right so uh chairing the and many children chairs. Excuse me our analog work group, so thank you for making the time to come. Thank.

I

You um can you help.

H

B

um Once again, are you.

I

I've gone through my screen there, so oh, oh or maybe, are.

A

You like, and you have to do, are you called into the meeting? Yes, you are so you should be.

B

Able to sh and it's you.

A

Should share it from um the meeting? Okay go to the meet where the meat yeah there you are and then that button right there.

I

But can I extend my screen on there.

A

Yeah that'll it'll do that: okay.

H

Okay- oh this one's over here.

B

That one and hit share inspection.

A

Make sure your mic and camera are turned off: okay,.

G

B

I

All right, um thanks for the intro uh Rob, my talk is um about building confidence uh in open IC design using openfa stock. So there are so many reasons why open design is a must. Today we need more chips, which means more open source, Eda software, more collaboration and a drastic shift in our mindset as designers or Hardware designers.

I

The way to do this, obviously, is by lowering costs and barrier uh to to design, but the current status call in the chip industry is limiting and we have to go through a bit of History to understand why? So, let's take an example on software development in the 90s, this is what it looked like. We had in-house compilers lots of incompatibilities and a lack of standardization, so everything was always dependent.

I

Relics of the 90s is a common way to refer to these absurd issues. uh We had back then and there's a whole bunch of funny Vlogs about it, but more seriously. This reminds us of something today to me. It definitely looks like IC design today, where, of course, we have access to much more computing power, but in general the semiconductor companies have in-house tools.

I

Our tool chain, lack interoperability and required to this effort to extend data formats and, most importantly, tools are very buggy and support is lacking as a matter of fact, I'd like to survey our new students and ask them to describe their tape out experience, usually an older node using one word, so the result isn't great about half of them is negative and the other third is neutral.

I

Now. Imagine if you ask new generations of students with an already software oriented mindset who might have less motivation to sit and debug where their tool is crashing after TDS placing route run without actually being able to look into the source code, I'm, not sure if this may still make chip design as exciting and the usual comment is. We are fighting the tool instead of learning chip design, so at one of the major tech companies where they have an army of software Engineers like Google.

I

Here, the ratio to Hardware Engineers is very imbalanced, which is actually very symptomatic of the overall U.S Workforce situation, and also new students, career choices based on the career Explorer website. We have about 70 000 in Hardware engineers in the United States and uh which is growing by five percent in a in a decade. While we have a 10 times more software Engineers, which is supposed to grow by 30 percent in the same time span.

I

So the idea here is to find ways of turning in software Engineers into Hardware friendly engineers.

I

To give you a more tangible uh example of my idea here last summer, Ali who's, an undergrad.

E

I

Who's 19 years old, undergrad student from computer engineering department, so he's more software friendly he reached out and wanted to work on uh to do some work around open at Bay sock and he ended up winning uh at the Issac code, the chip, notebook competition and presented his work on openfa stock, digital ldo and the reason he was able to do that to do. That is because he was working with software rather than working with transistors.

I

That's a great demonstration of having more software minded students to do hardware and software, so um at companies at software companies, development is within hours, which means deployment of new products could be done in weeks, while in ic design it will. It is well accepted to talk about months and years during project planning and I'm, not even mentioning the risk and costs here now.

I

Of course, there are a Time constants in chip fabrication that are hard to change, but looking back at my comparison to software in the 90s, there's definitely room for improvement to our methodologies and practices in Hardware.

I

In fact, we, if you talk to our software colleagues um and even smaller community of Eda friends, it sounds like Hardware. Development is broken, that's how they feel about it, and uh is this the case, though? The answer is no. We still make a very complicated chips, but we can we do better, yes, and it is about time to find uh to fight in windmills and democratize ship design.

I

uh So this is my last for my motivation here, but uh another angle is costs right. Chips are very expensive to make. uh What we really need to understand here is when it comes to fabricating a new chip. The cost of building up a new integrated circuit from scratch at the wafer level can be very high. Licensing IP is expensive, very in for custom designs and Ada. Software can be very costly and j-pass masks can vary from widely from a few thousands to uh millions of dollars.

I

This year at the issaccc planner session, it was discussed that it is required about 18 times supported designed from 65 nanometer to uh to a 500, nanometer node and it uh to build a design at five nanometer. It was about half a billion which is uh crazy when you think about it. So in 2020 with my co-author here, uh Team Ansel, we published this work about enabling open design uh with the release of the open source pdk, as you may have heard, and our work on EDS uh software, such as openroad.

I

This stage was set to develop ways of Designing and built an open source ecosystem. On my side, I started working on the open source corollary of episock, that we call now open a Facebook.

I

So in summary, fa stock is a DARPA program, uh part of idea. It is a military University, an effort led by the University of Michigan, in partnership with arm the Facebook specializes in autonomous SOC synthesis, which include all the building blocks, such as analog generators, memories, and course. So over the course of the program. We checked out a good number of chips since we have 12 csm-65, and we demonstrated all of this in in the Ari summits and the past four years, I have been involved in new initiatives that aims to enable chip design.

I

We started with the idea program, fa, stock and open road and then thanks to skywater and Google, and the open source pdk initiative, where I work closely uh with uh with these guys and along with Partners like Nest, we started, we think in some of the hardware design methodologies that could potentially revolutionize how cheap design is done. Now that the idea program is uh is over, um we are still alive thanks to funding industry, Partners such as Google, KLA and nist, and thanks to the open, PW program and open source tools.

I

uh Our framework evolved in many ways since we have but um I want I'll go through that in my presentation here, uh but we were able to crank out a large number of chips uh over these past three years.

I

So these are uh some of the projects we've been working on, um and the reason uh mentioned in openfa stock here is that it allowed us to crank out these chips, which will build confidence to Silicon results. So uh some of the projects we've been working on are usually just demonstrators in open source Community like sensors and bldos, but also the Nano fabrication accelerator. We're typing out a huge chip. This week and next week uh we also started working on privacy Mass.

I

We started working with um with um with Fitbit on rapid prototyping for the wearables, and finally, we were funded through to SRC for Hardware security and make the first open source implementation of the open, Titan of uh or an open uh I'll go I'll go through that later in this presentation.

I

So the release of the skywater pdk has been a huge success and enabler. It is activating Innovation and collaboration in many ways for different semiconductor communities and as such, Nissan Nano fabrication. Community uh are extensively benefiting from this, and now we are building a career wafer with integrated CMOS circuits that allow physicists and nanofabrication Engineers such as Dr Brian, Hoskins and and his team to monitorically, uh integrate and characterize emerging novel uh memory devices.

I

This is exciting and unprecedent uh for designers to create new circuits to complement the nanofab research work. We are currently building an automated platform using our openfa stock framework to enable this across different nodes and cater to different specifications defined by the Nano fabrication researcher.

I

We are also putting together proposals with other partners to seek funding for this type of effort to NSF. We have achieved some initial results, including the release of the cryo models in Skyward 130, together with the partners such as cool cat and nist, we have developed a test style to enhance these models. Thanks to the assistance from Dr Akin from cool cad. We have also created more complex circuits for the nanofab accelerator capable of operating at 4 Kelvin.

I

One of the most interesting designs we have built was a gf12 nanometer, uh which consists of a recursive star, dc-dc converter, based uh power management units generated using our automated framework.

I

The fabricated chip was tested by nist across the wide temperature range from 400 Kelvin down to 20 Kelvin, and the setup is shown on the left before getting uh to this point. We had to build confidence in this tools, as I said, and our framework heavily used open, overroad, which in open source tool, and since we were part of the the team, we were able to crank out this chip here, which consists of 64 temp sensor array in the first mpw program.

I

The the curves here are obviously can result which I've shown previously in uh in the chicks Alliance meetings. Similarly, in mpw2 we have, we have made an SOC with the array of digital ldos for different load currents. We can trade off uh settling time output, voltage, Ripple efficiency and even overshoots and other shoots.

I

This type of generators combined with the usage of cloud infrastructure provided by Google we will enable we are enabling data based, optimization and hoping to see more of this work using programmatic layout approaches using uh GDs Factory, for instance, foreign and as part of the idea program, we were able to uh to use the gf12 shuttle to create prototypes using open tools. Our team successfully designed the first version of the open, uh Titan root of trust, which included temp sensors that we developed using openface up.

I

We also made the Bluetooth transmitter using an adpl which used a sub framework back to the Nano fabrication accelerator program. We have put some test apparatus on a test chip on a silicon chip which will be used as a career wafer for nanofabrication or Nano device fabrication. This leads to improved measurement quality and test due to the reduced parasitics.

I

We also have contributed to four Shadows already each had Noble circuits and my understanding, um based on information from my colleagues that needs this would have required a long time to just design one of these test chips.

I

Next was the Fitbit uh Team who reached out and started the collaboration with my group to help with rapid prototyping and accelerating the custom design process. They typically utilize, off-the-shelf components and assemble them on their bench. A workbench. However, occasionally they may be missing. Some of the components with particular specifications, so open, okay, stuff was uh was a great great fit here and we taped out already two chips in GF, 180 and Skyward 130, which consists of another front ends that will be used in their prototype signal chain.

I

Finally, Intel has provided us as well with their 16 nanometer shuttle where we used. uh um We used it to build three fairly complicated test ships. One of them is the complete opentizing route of trust with security blocks. Noise injection circuits I'll go through that in some of my slides next and um and and then we did a tape out on first November 2022 and uh we did one uh another one uh a month ago, which consists of a 2.8 Giga sample per second time, interleaved ADC, so um yeah.

I

So now, um I would like to discuss this. uh The open Titan work, we're making and all the peripherals we're building around it, um and this work is funded through SRC, uh so my group was funded to SRC to implement an open which of trust using fully open and open Flow that is transparent and auditable. Since open Titan is on GitHub, it was logical to adapt it and include our analog peripherals, such as the two random number generators and uh pmus uh to make the root of trust diagram. On the left fully auditable and transparent.

I

We propose to use our expertise in open source, Eda and IC design, combined with the openfa stock framework. We have developed multiple uh silicon validated IPS before, and we plan to extend this work here to add the necessary building blocks for our route of trust, such as the secure pmu and the RM based trng. So I'll I'll go through that in the next slide.

I

So uh we are proposing different counter measures, uh measures to mitigate side, Channel attacks such as voltage domain stacking, which I believe could minimize the power signature, our AES accelerator. We are also developing a new, secure and programmable pmu, which underscore the flexibility uh and Broad applicability of the openfa stock generator approach. The pmu generator include uh office, power, specification circuits for side, Channel, attack, resilience and include voltage. Regulation and noise are not and a novel noise injection technique.

I

So our generator approach uh here is to expand our openfa stock style generator approach to security. Primitives such as the rmbase to rmgs.

I

To perform the power of physication and reduction, uh the whole system is partitioned in two power domains. The memory instance such as the instruction data and boot memory, should be sitting at the higher voltage domain. um Here it is around 2.4 to 1.2, while the computational logic is actually at a sub threshold voltage. Since the bottom voltage domain only contains glue digital logic, including the encryption blocks, we plan to use a voltage scale engine to further improve the Energy Efficiency, as well as reduce the power signature during the AES operation.

I

um In stacked voltage implementation, the current drawn by the memory is directly reused by the logic in the bottom voltage domain, ideally bypassing the voltage regulator. As a result, the reused current is not subjected to power. Conversions losses, therefore, improving power, delivery efficiency, two dc-dc converters uh based pmus using our openfa stock framework, are going are being implemented to uh to provide the Municipal Power domains, and we expect that the two similar voltage swing would confuse the attacker and make DPA harder.

I

The second counter measure is the secure programmable pmu, with noise injection structures. The top level architecture consists of six stage converter um and the non-orbital app in Cloud digital DAC and noise injection block, which will be implemented using open face Arc and based on this. In our early stage exploration, we try to replicate the work in the previous slide. We'll use the pulse weight, uh making the RM cell go from reset to set and delete phase, so um there's there isn't any Technologies using RM.

I

So this is mainly a research project, but we have other options using a fully Digital two random numbers which we actually implemented in Intel 16. Last statement, this is the November 2022. We already got the chips from Intel. They actually provided us with the socket and also this was great and my student is building the PCB, uh so it can be so we can start the testing in this test trip.

I

We made a a temp sensor, an array of tap sensors using an end-to-end flow, open Flow, so that was great to see and we're actually implementing special router and open road to be able to do some of the analogue routing.

I

uh The second tape out, which is actually the final version, we're planning to do in um in um in uh using open Titan. We have already taken of this block here. um We actually included two random numbers, you know, HSB is oscillator uh and the pmu and the PLL and so on, and all of this is using open source tools um uh which is really great, except for verification where we don't have open tools yet at uh a pinped technology.

I

uh This is the implementation. There is a PLL as I said, and a trng. All of these are using open road for the implementation, since they are digital, friendly um and uh and and uh and um on the next slide. I would like to talk about our um our framework and um and how we're using uh python to to do to use openfa stock here.

I

So um I like this slide here from uh from Team, Ansel and Poppy here from Google, but um the summer is made custom, so you can easier to build at scale just like software, um so uh property has been doing a great job, actually uh packaging, all these tools, uh with ant micro and so on, uh so that we can build this tool easily on our tool chain and use them in notebooks and so on.

I

um Openfa stock here is actually uh mainly code right, so um it's actually uh very different than um how circuit uh design works. So it's kind of confusing for Hardware designers uh but um open a Facebook is actually auditable and transparent. No malicious or jumping with the generated designs can be done. uh We also uh are having regression tests, systematic metrics extraction and dashboards.

I

Automation has been an issue for a long time and uh it's it's really hard to say that analog automation is is, uh is fixed now, but uh since we're working in the open now we can still combine multiple expertise in the Eda analog circuits and software. So we can fix this problem, so we started exploring new ways of automation, uh using GDs, Factory and so on. Certain blocks, such as objective cells, which we use in our procedural generators and uh on circuits like transimped and simplifiers and other circuitry.

I

This is a direct control over the layout and are not suitable for the cell-based approach that we usually use in openface Hub. So this provides a Time limiting uh step reporting to new pdk. We want to automate as much as possible while keeping the generator agnostic pdk agnostic. To address this. We are developing programmatic generators employing GDs Factory, which is developed here by Joaquin from the uh from Google X photonics team.

I

um Gds Factory provides a framework for programmatically uh built-in layouts with object-oriented code. The main advantage of this is to keep cell software generation dependent on pdk, so pdk is in our framework, is, is implemented as a python class, including the layer stack design rules and the library of component, such as P cells.

I

um um The generators are now implemented as python functions, which will return full layout stored in GDs Factory components. This function can act, accept Arguments for automatic customization. Generators can also call other generations to produce a hierarchical design. The user interface with the generator is a python package user. The user can import a pdk package and the target generator. Then, in one of the line of the code, they can actually create a full layout with programmatic generators.

I

We can actually achieve quick creation of layout uh from Python's functions, enabling views across different designs with wiring design parameters, and here we are automatically generating a current mirror and the comparator.

I

um The other work that is being done here and thanks to Google summary of code this year, which allowed us to have three motivated students who are actively working on implementing features to enable Automation and interfacing between tools Mojito made. This diagram here is implementing uh print the front end of the lion classes in open road, as well as the zdl's GDs writer and reader to help cointegrate with other tools such as videos Factory. This is a really enabled enablers for Android automation, since we can build our circuit within opennode rather than outside of node.

I

So now, regarding Outreach uh fine um this earlier this year, we were encouraged to submit a proposal to the U.S embassy and considerate in Japan to organize workshop and to train the local high-tech Labor uh force, and uh last month we were notified that this proposal, which is built around uh openfa stock and which usually uses the open, pdk and notebooks and so on, was selected and they are considered making it as a model for the next workshops in and training in in the Asian region.

I

Now next, in my talk, I'll talk about activities that utilize is actually funding. um These are this is our committee here, which is the sscs IEEE sscs committee, which is chased by Professor Boyce merman, and this is the our roster here. It is open. We are always turning the members, so please reach out if you're interested in joining us, uh but um Chief science has helped us through Google and so on, to fund some of these efforts here.

I

So one of them is the chippeton which started in 2021 uh and we launched it with Professor Boris. We see students from industrial communities in the US and all over the world, uh joining us in a weekly meetings to learn how to do a tape out um and uh and there and get travel grants to Collective conferences such as ICC and so on.

I

So the first edition was in 2021. We had about 11 submissions, uh taped out more submissions, obviously, and as you can see, they are mainly from all over the place across the globe.

I

The second version had more K-pop, so around 14 we had more uh more elaborated circuitry there, and uh this is really great because it can generate IPS for uh chips, lines and our community.

I

But in this year's tripathon we changed a bit the way we want to run the we want to run it by making the teams propose new circuit ideas that could be used to build a lab on a chip and the reason we want to do that is because there's a lot of communities that doesn't have equipment this equipment to to test their chips or their open design. So having this, uh this lab on a chip would allow them to have oscilloscopes uh macros that they can use to test their uh circuitry.

I

um The other things we started is uh and to uh funding to Gypsy lines the coded chip, competition, which we had the first version here at iss6203. uh We had about seven winners from different places across the globe, globe and chips. Alliance has funded it here, as you can see on the on the picture, uh the next, the next code, that your composition happened in BSI 2023, we had uh two winners from University of Toronto and um and um and one interesting thing is.

I

We started seeing analog circuitry, uh combined with AI or machine learning, to generate the circuits in a notebook, and that's thanks to the open design and the pdks um in in the um this is an uh this is the call for the code check competition we'll have a next one in isscc 2024.

I

um Sorry I didn't update this slide, but um the plan is to continue this. uh This competition here in dlsi and isscc to funding from sscs.

I

The other thing we wanted to uh do is uh allow venues for publication right. So open design is not, as uh you know, Leading Edge as a closed design. So it is hard to publish in conferences like isscc and vlsi. So with the with our community, we we did the first uh session at iskas, which is around open source design, and we did a review of all the Silicon results we had in mpw one and two and so on.

I

uh So I shared this session with my colleague, Priyanka uh Rana from Stanford, and we had a bunch of work um from such as open, RAM and uh and so on.

I

um The other thing I wanted to mention here is um we had an open source workshop at vdsi which had about 180 attendees. That's unheard of that's a recording video side in the circuit community and it was great to see uh so many people in Japan join us in our Workshop um and I think they are thinking about making this as a as a as a normal thing that happens every year.

I

So uh thanks for your time and please let me know if you have any questions.

A

Thank you so much buddy any questions from uh here in the audience.

F

um So with uh open, how do you pronounce it open fast talk face yeah um so with that? um Is that a program that like um students and like grad, students and stuff do or uh or if not, what, what kind of program is it I, don't really understand.

I

You mean like with the usage uh or who's using it.

F

I

Okay, so um there's this is a good question right, so um I'm running a research group and um I deal with a lot of with students. uh So one thing we do I'm trying to implement is having um you know more software student to hardware and uh the way we do that is through abstraction and open fa. Soft allows that, to our frame the cell based approach, you can check the other talks in chips.

I

Minus that goes through that I do want to repeat the same talk over again, but the idea is basically, uh you identify some cells that are basically a block, and you don't have to. If you're a software person, you don't have to understand, what's happening in terms of transistors, you can just reuse that block and build it uh with the whole topology.

I

Now, uh regarding the education, it's very good for the students to to play with the code and generate, at the end of the day, a circuit that that is simulatable and all the other thing is we actually work with industry Partners to create, or even a government transition is to create real circuits, so um is being used to generate blogs that are working on silicon, and we put that through our results. So we address both I guess.

F

And thanks can I talk to you more.

I

F

Afterwards, sir, thanks.

A

uh Other questions here in the audience.

A

Are there any questions online.

H

H

K

I don't see the child.

H

A

Kenny said: I can read it to you if you wish.

I

To design so we have plans to make an open service, um and actually we just discussed it with the the cars yesterday who was working at packaging. um We have some students working on that and building everything from using open tools now.

I

Making a study is using open tools is, is not easy, uh but we'll go in with the baby steps and building a PLL and so on, like all the building blocks and when we'll have it ready, I think we're gonna use, openfa stock and Port is across Technologies, um so I think that's uh in our plans. Now it's taking time and resources.

A

uh Other questions.

A

So I had I had one question: I always always enjoy your talks and the work oops. We do have a question, but I'll ask my question first, but how um you know I talked a little bit about this last night. As you know, um you know it seems with the open source CDA, which is a great effort, and it's very similar to you know what I experienced in my career in terms of developing internal tools within a large Semiconductor Company right. It's it's always hard.

A

The designers hate you, whatever I mean it just it goes on and on and I won't go on and on. But you know in terms of how are things going in terms of industrial adoption of the work that you are doing? Is it do you? Have anybody who's using this in Industry right now, or is that still kind of a challenge.

I

um Well, I think the closets we're getting is uh through Fitbit uh either using uh some of the work we're making. um Now the the the KLA is also interested and we're building a timing to leave the ADC in Intel 16 and didn't open Tools in Intel 16.

I

So the only thing we're using is actually Python scripts and open road, uh which is great already, uh but um um you know SRC is another example: it's uh it's an industry driven thing and they are interested in our two random numbers, generators and pmus, and this is the other option right with the Nano fabrication accelerator. So I think this is a good adoption adoption uh now um you know we need to disseminate this more and if there's any interest, please let me know, because we have silicon results and we've shown this across different designs.

I

So it's not like me.

A

uh Yeah, you have done you've done great work.

I

In that regard, So yeah thank.

A

You I one question online here: uh let's see here, do you consider to cooperate with other foundries to provide more technology in this project.

I

um I, don't understand what means uh provide more Technologies, but basically supporting you. Technologies like what uh pragma I see pragmatic is doing um that's definitely possible and uh we reported fa sock or open a face up to different nodes. So I don't see why that's not going to happen. The other thing is GDs. Factory is actually a really great platform to create Python scripts to generate layouts. So now we're supporting both and uh we're creating this. uh These designs, that can um you know, support different Technologies, actually.

A

I was gonna also just agree relative to the efforts that you're doing I know Tim this Champion as well about trying to make Hardware design more software like and approachable people right. You know and my time in this role and chatting with uh different government entities or regions of the US or the world right. It's like we have this Global problem of getting people interested in engineering and I. Think a lot of that has to do making it approachable I was chatting with some folks with from Germany last night.

A

Actually- and it's like you know, we don't want to start off by showing them. You know partial differential equations or you know something like that where they run.

B

For the hills so.

A

I

A

G

I

That's a really good point right and I think one thing we're trying to address with openness, office obstruction, um because building generators requires expertise. Right, like you, can't be a PL guy and an ADC guy and so on right. So you have to obstruct some of these issues and that's how we're building open and pay stock. So people don't have to deal with some of the uh analog black magic or basically.

A

All right well, thank you so much for an excellent presentation, Maddie and thanks to everyone uh for uh presenting today and for everyone who attended both here uh in San, Francisco at Google and also online, and thanks again to Google for hosting us here today, so appreciate it we'll work on making the video available and uh if the presenters could send me the uh the slides we will get them out there as well. So thank you again. Everybody.

B

Thank you and lunch is available.