DAC 2017: Machine Learning Powered by EDA
So, what is machine learning? Well, briefly it’s giving a set of features — X’s, we want to predict the outcome, Y.
I’ll mostly be talking about supervised learning, and at times relating it to characterization prediction.
There are two types of supervised learning.
- Regression where we want to predict a numerical value. Let’s say we have a memory described by its words, bits, and muxing. We might want to predict its time or power value.
- Then there’s classification, where we want to predict one of a few outcomes. Again, let’s say we have SPICE models for voltage and temperature. This time, we might want to predict whether a circuit is going to pass or fail.
There are several machine learning algorithms that can solve these problems.
Over the next few slides, I’ll describe a handful, and gives some pros and cons. Unfortunately, there’s no magic bullet, or as they say, no free lunch. Many of you probably have experienced the use of look up tables for maybe use in characterization estimators or Fast SPICE simulators.
Look up tables bear a very close resemblance to a machine learning algorithm called k-Nearest Neighbors. But there are other algorithms that can perform even better, particularly when the data is complex.
Areas where I see machine learning have the most potential in EDA — I think Solido has already proven it with the Variation-Aware Design High-Sigma Analysis — they’ve been using machine learning for years here.
Fast Spice Simulators, maybe we can replace those look up tables, with a more continuous learner.
The area I’m most familiar with is characterization, where we can use it for memory or logic gate power estimation or timing estimation. Doing this will reduce the uncertainty, or the padding, that people like to apply, and hopefully give you a more competitive product.
I was able to leverage machine learning to create a tool like a memory selection tool. So, if you have a really accurate estimation, you can create what I call this efficient frontier of memory selection. Basically, each dot represents a portfolio of memories for a given area of power trade-off for that chip.
Now when people select memories for the chip they usually end up somewhere on the interior of this graph, and they’re not going have the most efficient selection.
Let me start the survey of machine learning algorithms, describing one I’ve already mentioned, k-Nearest Neighbors. The way this works is basically, you have golden data that you store in a database. When you want to make a new prediction, you query this database to find the closest matches — nearest neighbors — and return the mean or the weighted mean.
Now, while this method is pretty intuitive and easy to understand, it suffers from what is called the curse of dimensionality. What that means is, as your input features grow, your golden data needs to grow exponentially.
Here’s a plot of prediction using a k-Nearest Neighbor model on the left. This is predicted vs. actual data, versus an advanced machine learning model on the right. The one on the right is clearly what we want to achieve, where all the points line up on the line.
The next model that people are pretty familiar with, I suspect is linear regression. It can do a pretty good job if the following conditions are met:
- The feature space is rich enough. What that means is that often we need to derive additional features through interactions or other basis functions to model non-linearities.
- We should standardize our inputs. What that means is make sure the inputs are all of a similar range.
- We should also use regularization for feature selection.
Unfortunately, most of the time these things are not done, and linear regression doesn’t perform very well on new samples.
The next one is Decision tree; I think people are pretty familiar with this. Think of the game 20 questions — it’s a pretty intuitive algorithm, although often it’s too simplistic to model the complex data. Like linear regression, it can do well on the training data, but not so good on new sample data.
Now here’s a model that will actually perform well. It’s called Ensemble method. It combines many learners into a meta-learner. It will often prove the accuracy on new sample prediction — which is our goal after all. Examples of Ensemble methods improve Bagging/Random forest and boosting. The down side to this one is the models can be pretty large. So, if you need to persist it, the disc space required can be substantial.
Of course, I’d be remiss if I didn’t mention Artificial Neural Networks or Deep Learning. This is everyone’s favorite buzz word. These are excellent at finding those undiscovered features to model — those non-linearities. But it’s a black box, it’s hard to interpret, and can take a long time to train. The upside though is once you have a model you can persist it pretty compactly on disc.
So, here’s a summary table of a memory leakage prediction using the five models I just talked about, roughly ranked from worst performing to best performing for this example. The two at the bottom in green, Ensemble and Neural Networks performed the best by far. You can kind of see that by the metrics in the middle columns.
Now before I talk about my experience with Solido’s tools I wanted to take a step back, give a little background, why I do believe in their approach.
About five years ago I became interested — most would say obsessed — with machine learning. I created a machine learning-based characterization estimator which was much more accurate than our existing, kNN estimator. Out of that, came two critical tools:
- First was a memory selector which, given a physical memory words and bits requirement, it would return the best candidate out of thousands of choices.
- Following on that, I created a memory optimization tool which described a chip in terms of its logical memory. Out of all the memories on the chip, it would return the best combination of physical memories. Now this resulted in a huge area of power savings for the chip teams, that made them happy as you can imagine.
In about that same time, I had a really good discussion with Jeff Dyck [Solido] who you’ll hear from [later] about Solido’s FFX regression technology. The technology they used called FFX is a form of Regularized Linear Regression, with a bunch of crazy feature engineering. And I mean a bunch.
It was very similar to the first machine learning models I had developed for my estimator. With that inside knowledge, I became 100% sold on their methodology. Here’s some hands-on experience I’ve had with their tools:
- [Solido’s] High Sigma Monte Carlo tool which I’ve used for bit cell, sense amp, critical circuit analysis. I’ve used both modes:
- They have a regression mode, which is when the output variable is continuous, and you want to say predict bit cell current or sense amp differential.
- The classification mode, for when you’re looking for a failure boundary such as bit cell stability.
This tool is absolutely required for design signoff.
2. Another must-have tool for memory design is [Solido’s] Hierarchical Monte Carlo tool. In memories, we have circuits that are 3-4 sigma problems, up to 6-7 sigma problems. With this tool, you can basically verify the entire circuit without overdesign. It’s much less pessimistic than having to verify the entire critical path to the bit cell’s 6-7 sigma requirement.
3. Finally, one of their newer tools called ML Char [ML Characterization Suite].
This is similar to the work I’ve done in the past, but on a much grander scale. One of things it can do is corner predictions. Given a set of basis corners, it can fill out the corner portfolio — basically saving days and days of simulation time.
Other potential applications for this tool I think are slew load table population, or even creating a standalone estimator, or filling out that existing lookup table estimator, but at a much finer grid.