benchmarks performance AI MLPerf artificial intelligence

Graphcore and the Problem with First-Party Benchmarks

For a time, I seemed to collect degrees, and my first was in Marketing. Before I moved into technology, I’d been both a Marketing Director and a board member of a national marketing association. Over the years, I’ve found that engineering-driven firms often don’t understand marketing, which you might call a soft science—untruthful. In fact, at IBM, one of the most ethical companies globally, I had a conversation with a Director of Marketing, who explained that marketing at IBM (at the time) was like selling air. In other words, people had no choice but to buy your product so you could tell them anything you wanted. The conversation resulted from me complaining that we were promising products we had no plan to build to keep a competitor from gaining share to governments. I thought that was likely to end badly at the time, and it eventually did (my entire upline was fired a few months later after I resigned and became an analyst).

This series of events all came to mind when I got a chance to review some benchmarks put forth by Graphcore, a new engineering-driven company bringing to market what looks to be a revolutionary training and inference AI (artificial intelligence) solution. It also reminded me of the story of when Steve Jobs brought NeXT to market and used fake, manufactured videos to convince investors he had working code (which he didn’t). I’m still kind of surprised Jobs didn’t end up in jail for that.

Let’s talk about benchmarks this week and why you need to be careful when you see benchmarks from any company, let alone a young engineering-driven firm like Graphcore.

Graphcore’s Creative Benchmarks

No one publishes benchmarks that make them worse than the competition, and the goal of the internal group doing the benchmarks isn’t accuracy but to show technology dominance. This practice is always going to bias the result. Still, there are often control groups like the legal and internal audit departments of large companies that will offset this bias because bad benchmarks can lead to lawsuits from customers and investors.

I only trust third-party benchmarks, where I can both assure the benchmarking firm’s independence and review their methodology. But I will consider benchmarks from a large public company because I know the control structure inside that kind of firm due to working for several of them myself over the years.

Graphcore isn’t yet selling their solution, which is a red flag because that means that no third-party buyer has yet even been able to see if their solution works, let alone that it will massively outperform anything else in the market as they allege.

Looking at their benchmarks, they did several creative things like comparing a significantly more expensive and higher performing part to a lower-performing and far cheaper (about 1/3rd the cost) part from a competitor. Besides, they messed with the test set on other benchmarks and massively lowered the test batches (down to one impractically small image) to appear faster.

In this space—AI Training—they should be using MLPerf, a peer-reviewed benchmark, and use an independent third-party to do the work so that the benchmarks are defensible. Doing this themselves, even if they had been fair in their approach, still would have resulted in an unbelievable result. But by not using a third-party, their team opened themselves up to challenge, and, at least in my review, they failed the challenge.

Now there are rules that I’m sure most of you know when it comes to benchmarking, and the first is you need to do your own with your workloads because packaged benchmarks often don’t reflect how you will use the part. Second, when considering a solution, any that don’t have third-party benchmarks that can be validated and repeated should be removed from the RFP list as a waste of your time and effort. And third, particularly with new firms, unless they compensate you for risk, don’t be eager to be the first customer. The saying is that prospectors get the arrows, settlers get the land. Let someone else plow this field unless the firm is compensating you for your risk.

Wrapping Up

Vendor sourced benchmarks are unreliable by nature but lack the oversight to make them trustworthy, particularly from small companies. If you are going to use generic benchmarks, make sure they are peer-reviewed and industry-accepted, like MLPerf. New companies come with risk, and you are entitled to a cost offset and incentives to take that risk and become an advocate if the product works. Otherwise, wait until a trusted third-party validates the solution, so you get the land and not the arrows.

Finally—and I’m speaking to those who do benchmarks here—faking benchmarks can get you into trouble with organizations like the SEC, which have no sense of humor. Pay the money for unbiased third-party benchmarks, and if your product isn’t competitive, fix it. Don’t misrepresent it because fraud can land you in a world of trouble. It is far safer to put off a product release until the product performs well competitively than to lie to your first customers. Because in this age of social media, a customer on the wrong side of over-promising can far too quickly and easily destroy your brand.

Scroll to Top