Where do we stand on benchmarking the D-Wave 2?
Since there’s a fresh round of discussion on the performance of the D-Wave 2 processor, we thought it might be helpful to give an interim report on where we stand with benchmarking.
Let’s quickly review how the chip is programmed. See Figure 1 in the slideshow. You program the connection strengths between variables represented by the qubits to define what mathematicians call a quadratic optimization problem. Then you ask the machine to return the optimal solution.
Since there’s an astronomical number of different problem instances you could program the chip to solve, it’s impossible to check the performance on all of them -- you need to look at subsets. So what are good sets of instances to study? If you don’t really know where to start looking, you might as well just pick instances at random, measure relative performance, and see what happens.
But a more pragmatic approach is to study problems that arise in practical applications. At this stage we’re mostly interested in answering the question: Can we find a set of problems where the hardware outperforms the best known algorithms running on classical hardware? Since quantum optimization processors are still in rapid evolution, we’re less interested in the absolute runtimes; rather, we want to see how the scaling of the runtime increases as the number of variables increases. The hardware outperforms off-the-shelf solvers by a large margin
In an early test we dialed up random instances and pitted the machine against popular of-the-shelf solvers -- Tabu Search, Akmaxsat and CPLEX. At 509 qubits, the machine is about 35,500 times (!) faster than the best of these solvers. (You may have heard about a 3,600-fold speedup earlier, but that was on an older chip with only 439 qubits. We got both numbers using the same protocol.)
While this is an interesting baseline, these competitors are general-purpose solvers. You can create much tougher classical competition by writing highly optimized code that accounts for the sparse connectivity structure of the current D-Wave chip.
Two world-class teams have done that. One is a team at ETH Zurich led by Matthias Troyer, considered to be one of the world’s strongest computational physicists. With help from Nvidia, his team managed to write classical simulated annealing code running on GPUs that achieves an incredible 200 spin updates per nanosecond. The other tailor-made classical competitor was written by Alex Selby. You may recall he won £1 million for cracking the Eternity puzzle.
Alex devised a smart large-neighborhood search that improves subsets of the 509-variable string while keeping the complement constant. The trick is to use only subsets that lie on tree structured graphs. These tree structured neighborhoods can be searched over in linear time using dynamic programming techniques. Because of the sparse connectivity, these neighborhoods can be very large -- up to 80% of all variables. This makes this solver very powerful.
Both authors were kind enough to share the code with our team. In fact, Matthias’s postdoc Sergei Isakov wrote the fast annealing codes and is now a member of our group.A portfolio of custom solvers designed to beat the hardware on its own turf is competitive
So what do we get if we pit the hardware against these solvers designed to compete with the D-Wave hardware on its own turf? The following pattern emerges: For each solver, there are problems for which the classical solver wins or at least achieves similar performance. But the inverse is also true. For each classical solver, there are problems for which the hardware does much better.
For example, if you use random problems as a benchmark, the fast simulated annealers take about the same time as the hardware. See Figure 2 in the slideshow.
But importantly, if you move to problems with structure, then the hardware does much better. See Figure 3. This example is intriguing from a physics perspective, since it suggests co-tunneling is helping the hardware figure out that the spins in each unit cell have to to be flipped as a block to see a lower energy state.
But if we form a portfolio of the classical solvers and keep the best solution across all of them, then this portfolio is still competitive with the current version of the hardware. Again, a good example is the structured problem in Figure 3 in the slideshow. It slows down the annealers, but Alex Selby’s code has no problem with it and obtains the solution about as fast as the hardware does. Sparse connectivity is a major limitation
A principal reason the portfolio solver is still competitive right now is actually rather mundane -- the qubits in the current chip are still only sparsely connected. As the connectivity in future versions of quantum annealing processors gets denser, approaches such as Alex Selby’s will be much less effective.
One indication that sparse connectivity is a culprit also comes from well-understood examples such as the “Hamming weight function with a barrier” problem -- quantum annealing tackles it easily but classical annealing fails. But we haven’t been able to implement such examples as benchmark problems yet because of the sparse connectivity.
There’s a list of other hardware aspects still limiting performance that future iterations will need to improve -- reduced control errors, longer coherence times, error correction, richer non-stoquastic couplings between qubits, etc.A big data approach may lead to new conclusions
So will we have to wait for the next generation chip with higher connectivity before we can hope to see the hardware outperform the portfolio solver? Until very recently we thought so. But remember that these latest benchmarking results were obtained from relatively small datasets -- just 1000 instances in the ones that got recent attention.
It’s easy to make premature conclusions on such small sets, as there are not enough data points from possible subsets of problem instances that might indicate a speedup. Moreover, as several groups independently discovered, such random problems tend to be too easy and don’t challenge the quantum hardware or classical solvers.
Ever since the D-Wave 2 machine became operational at NASA Ames, the head of our benchmarking efforts, Sergio Boixo, made sure we used every second of machine time to take data from running optimization problems. Simultaneously we gave the same problems to a portfolio of the best classical solvers we’re aware of. We now have data for 400,000 problem instances. This is the largest set collected to date, and it keeps growing.
Eyeballing this treasure trove of data, we’re now trying to identify a class of problems for which the current quantum hardware might outperform all known classical solvers. But it will take us a bit of time to publish firm conclusions, because as Rønnow et al’s recent work shows, you have to carefully exclude a number of factors that can mask or fake a speedup.
So stay tuned!
--1. Engineers from IBM, the maker of CPLEX reported that they tuned the CPLEX parameters to perform better on this task but the performance was still several hundred times slower than the hardware. See the paper here: http://goo.gl/5nVFHH2. McGeoch and Wang 2013: http://goo.gl/djchcX3. For a detailed description of Alex Seby’s code see here: http://goo.gl/bhxp9y4. You can also enhance the simulated classical annealers with so-called cluster updates. But if you customize the annealers to be faster for structured problems, they’ll be slower on the random instances.5. The problem we refer to is discussed in section III B of a paper by Ben Reichardt (http://goo.gl/9PwOhu). It is a variant of a problem originally proposed by Edward Farhi.6. Figure 5 in a reference by Katzgraber and Hamze (http://goo.gl/ZW0b0v) illustrates why this is the case. Random problems tend to have a global minimum that can be reached without having to traverse high energy barriers. #quantumcomputing #quantum #quantumphysics #quantumcomputer