Where do we stand on benchmarking the D-Wave 2?

Since there’s a fresh round of discussion on the performance of the D-Wave 2 processor, we thought it might be helpful to give an interim report on where we stand with benchmarking.

Let’s quickly review how the chip is programmed. See Figure 1 in the slideshow. You program the connection strengths between variables represented by the qubits to define what mathematicians call a quadratic optimization problem. Then you ask the machine to return the optimal solution.

Since there’s an astronomical number of different problem instances you could program the chip to solve, it’s impossible to check the performance on all of them -- you need to look at subsets. So what are good sets of instances to study? If you don’t really know where to start looking, you might as well just pick instances at random, measure relative performance, and see what happens.

But a more pragmatic approach is to study problems that arise in practical applications. At this stage we’re mostly interested in answering the question: Can we find a set of problems where the hardware outperforms the best known algorithms running on classical hardware? Since quantum optimization processors are still in rapid evolution, we’re less interested in the absolute runtimes; rather, we want to see how the scaling of the runtime increases as the number of variables increases. 

The hardware outperforms off-the-shelf solvers by a large margin

In an early test we dialed up random instances and pitted the machine against popular of-the-shelf solvers -- Tabu Search, Akmaxsat and CPLEX. At 509 qubits, the machine is about 35,500 times (!) faster than the best of these solvers. (You may have heard about a 3,600-fold speedup earlier, but that was on an older chip with only 439 qubits.[1] We got both numbers using the same protocol.[2])

While this is an interesting baseline, these competitors are general-purpose solvers. You can create much tougher classical competition by writing highly optimized code that accounts for the sparse connectivity structure of the current D-Wave chip. 

Two world-class teams have done that. One is a team at ETH Zurich led by Matthias Troyer, considered to be one of the world’s strongest computational physicists. With help from Nvidia, his team managed to write classical simulated annealing code running on GPUs that achieves an incredible 200 spin updates per nanosecond. The other tailor-made classical competitor was written by Alex Selby. You may recall he won £1 million for cracking the Eternity puzzle. 

Alex devised a smart large-neighborhood search that improves subsets of the 509-variable string while keeping the complement constant. The trick is to use only subsets that lie on tree structured graphs. These tree structured neighborhoods can be searched over in linear time using dynamic programming techniques. Because of the sparse connectivity, these neighborhoods can be very large -- up to 80% of all variables. This makes this solver very powerful.[3]

Both authors were kind enough to share the code with our team. In fact, Matthias’s postdoc Sergei Isakov wrote the fast annealing codes and is now a member of our group.

A portfolio of custom solvers designed to beat the hardware on its own turf is competitive

So what do we get if we pit the hardware against these solvers designed to compete with the D-Wave hardware on its own turf? The following pattern emerges: For each solver, there are problems for which the classical solver wins or at least achieves similar performance. But the inverse is also true. For each classical solver, there are problems for which the hardware does much better.

For example, if you use random problems as a benchmark, the fast simulated annealers take about the same time as the hardware. See Figure 2 in the slideshow.

But importantly, if you move to problems with structure, then the hardware does much better. See Figure 3. This example is intriguing from a physics perspective, since it suggests co-tunneling is helping the hardware figure out that the spins in each unit cell have to to be flipped as a block to see a lower energy state.  

But if we form a portfolio of the classical solvers and keep the best solution across all of them, then this portfolio is still competitive with the current version of the hardware. Again, a good example is the structured problem in Figure 3 in the slideshow. It slows down the annealers, but Alex Selby’s code has no problem with it and obtains the solution about as fast as the hardware does.[4] 

Sparse connectivity is a major limitation

A principal reason the portfolio solver is still competitive right now is actually rather mundane -- the qubits in the current chip are still only sparsely connected. As the connectivity in future versions of quantum annealing processors gets denser, approaches such as Alex Selby’s will be much less effective.

One indication that sparse connectivity is a culprit also comes from well-understood examples such as the “Hamming weight function with a barrier” problem -- quantum annealing tackles it easily but classical annealing fails.[5] But we haven’t been able to implement such examples as benchmark problems yet because of the sparse connectivity.

There’s a list of other hardware aspects still limiting performance that future iterations will need to improve -- reduced control errors, longer coherence times, error correction, richer non-stoquastic couplings between qubits, etc.

A big data approach may lead to new conclusions

So will we have to wait for the next generation chip with higher connectivity before we can hope to see the hardware outperform the portfolio solver? Until very recently we thought so. But remember that these latest benchmarking results were obtained from relatively small datasets -- just 1000 instances in the ones that got recent attention. 

It’s easy to make premature conclusions on such small sets, as there are not enough data points from possible subsets of problem instances that might indicate a speedup. Moreover, as several groups independently discovered, such random problems tend to be too easy and don’t challenge the quantum hardware or classical solvers.[6] 

Ever since the D-Wave 2 machine became operational at NASA Ames,  the head of our benchmarking efforts, Sergio Boixo, made sure we used every second of machine time to take data from running optimization problems. Simultaneously we gave the same problems to a portfolio of the best classical solvers we’re aware of. We now have data for 400,000 problem instances. This is the largest set collected to date, and it keeps growing. 

Eyeballing this treasure trove of data, we’re now trying to identify a class of problems for which the current quantum hardware might outperform all known classical solvers. But it will take us a bit of time to publish firm conclusions, because as Rønnow et al’s recent work shows, you have to carefully exclude a number of factors that can mask or fake a speedup.

So stay tuned!



--
1. Engineers from IBM, the maker of CPLEX reported that they tuned the CPLEX parameters to perform better on this task but the performance was still several hundred times slower than the hardware. See the paper here: http://goo.gl/5nVFHH

2.  McGeoch and Wang 2013: http://goo.gl/djchcX

3. For a detailed description of Alex Seby’s code see here: http://goo.gl/bhxp9y

4. You can also enhance the simulated classical annealers with so-called cluster updates. But if you customize the annealers to be faster for structured problems, they’ll be slower on the random instances.

5. The problem we refer to is discussed in section III B of a paper by Ben Reichardt (http://goo.gl/9PwOhu). It is a variant of a problem originally proposed by Edward Farhi.

6. Figure 5 in a reference by Katzgraber and Hamze (http://goo.gl/ZW0b0v) illustrates why this is the case. Random problems tend to have a global minimum that can be reached without having to traverse high energy barriers.


#quantumcomputing   #quantum   #quantumphysics   #quantumcomputer  
261
122
mazi spider's profile photoKristian Hermansen's profile photoSergio Flores's profile photoVonVictor Rosenchild's profile photo
50 comments
 
What classes are needed to work on projects like this?
 
Why not run a factorization problem? I was just wondering...
 
+Jerry Zhang Yeah, factorization seems like a real world use case that would shake things up a bit. Especially now with all the hooha around NSA...in fact, maybe we should just ask NSA. They probably have a couple of these boxes already crunching away on our crypto keys ;) 
 
Great article, Thank You!
Any hint about the costs of these classical solvers, and the portfolio? So that we could have a picture on the cost effectiveness of current quantum computing.
 
 +Kristian Larsson Because the D-wave box is a quantum annealer designed to solve only a specific class of optimization problems, and not a universal digital quantum computer that is Turing-complete. The standard formulation of the quantum factorization algorithm doesn't fit in its architecture.
 
Shannon: That's incorrect. The architecture is universal for classical computation. You can prove this by constructing a NAND gate using it http://www.dwavesys.com/en/dev-tutorial-nand.html. It is not (currently) universal for quantum computation, but we know how to do this if there's ever a good use for one developed.
 
Geordie is right. Here is an old paper which describes an adiabatic algorithm for factoring using a diagonal final Hamiltonian like the one implemented by the D-Wave 2: http://arxiv.org/abs/0808.1935. It's worth mentioning that this algorithm for factoring has not been shown to run in polynomial time like Shor's algorithm. Shor's algorithm would likely require nonclassical couplings.
 
Sorry, yes that's what I meant -- that it's not currently universal for quantum computation, and by "standard formulation" I was referring to Shor's algorithm. +Ryan Babbush thanks for the reference, I forgot about that one. +Kristian Larsson I'm not involved in the project.
 
+Jerry Zhang It can't do factorization, because this is NOT a universal Gate quantum computer BUT adiabatic quantum computer. 
 
Sol: just to repeat what Ryan said above, this architecture can do factorization -- it's universal for classical computation and there are algorithms for factorization designed for this system. What it can't do is run Shor's algorithm, which is explicitly designed for gate model type systems. Note that no actual gate model type system can run Shor's algorithm either -- at least for factoring numbers greater than order(10).
 
+Geordie Rose Thanks Geordie for correcting me. But if you re-designed the computer to do large-scale factoring, wouldn't the US & Canadian governments be all over you?. Or at least their Intelligence agencies?!. I don't think it's worth it.
 
Shannon and Sol are 100% right and need not apologize.  The D-wave machine is not known to be able to implement any algorithm with even a provably polynomial speedup over classical computing, let alone an exponential one.  It cannot even run Grover's algorithm.  Using it to implement the number field sieve would not be a good idea.

Yes, the current setup can implement NAND gates, but I am not aware of evidence - either theoretical or empirical - that the gap scales well even when implementing a series of NAND gates.
 
To clarify my comment a bit, yes technically the machine can factor numbers, but with no guarantees on the success probability.  Asking the machine to sort would be similar - yes it could create an objective function that is minimized for sorted lists, but it would probably have an exponentially small probability of getting the answer right.  While technically it "can" sort, it's basically correct to say that it cannot.
 
Hey +aram harrow. I think we are on the same page regarding factoring. I was careful to point out that unlike Shor's algorithm, AQC on the diagonal Hamiltonian embedding of factorization has not been shown to run in polynomial time. However, this rather old (2008) PRL (http://arxiv.org/abs/0808.1935)  claims numerical evidence that the running time may grow quadratically with the number of qubits. I find that pretty surprising though and we need to keep in mind that they are only able to simulate small problem sizes. Another approach using a diagonal Hamiltonian is presented in this PRL: http://prl.aps.org/abstract/PRL/v108/i13/e130501
 
The Hamiltonian in 0808.1935 uses an exponential amount of energy.  You don't need any numerical results or extrapolation to see this, just look at their eq (3).  But if you have an exponentially strong Hamiltonian, then you can factor with a classical algorithm in constant time.  It is sad that PRL published it.

The factoring-143 paper you mention avoids this problem, but I would be extremely surprised if that algorithm was any better than exponential time, or even if it did better than using Grover's speedup, let alone something like the number field sieve.  That paper also "compiles" heavily; for more on this, see
http://arxiv.org/abs/1301.7007
 
Ah, okay, I remember seeing this one on arXiv but never got around to reading it. Thanks for the reference!
 
Ryan: Even if you could run the above 2008 algorithm, they say at end of their paper "this maybe very efficient for integers of up to 16-bit"!. Doesn't that translate into a number about 5 digits long?. Please enlighten me. Thanks.
 
aram: Your comment about Sol and Shannon being correct is False. I think you must have not understood what we were discussing. Algorithmic scaling wasn't what was being discussed. If you have a problem with the statement that the architecture is universal for classical computation then I'd suggest acting like a scientist and providing evidence for why you think that. I'd be interested to hear something useful. Re. the 0808.1935 -- that paper is not the way you'd do it if you wanted to do it properly.
 
OK guys: Here is a concrete question so that everyone can take a crack at it!. If you were able to run the most efficiently known algorithm on DW2, can anyone venture to guess how long would take it to crack RSA-2048-bit number, which is 617 digits long. Dr. John Preskill, in a recent talk at the University of Waterloo,in Ontario, said that a universal gate quantum computer would factor it in 2 seconds!. Any takers?.
 
Evidence:
The Hamiltonian is a sum of X and ZZ terms and can only be changed slowly (adiabatically).

Thus Shannon is correct to say the D-wave machine is "not a universal digital quantum computer."  While it can indeed encode any objective function, this can incur an exponential cost for general problems.  When one model takes exponential time to simulate another model, then they are not considered equivalent.

Adiabatic annealing, of the sort done by the D-wave hardware, is not known to be able to implement algorithms like quicksort in polynomial time.  As far as I know, the gap for e.g. quicksort has not been rigorously established, but my bet is that it would be exponentially small the same way that random 3-SAT instances are known to have exponentially small gaps.
 
Both pieces of your 'evidence' are incorrect. The Hamiltonian is a sum of X, Z and ZZ, and you can change it as fast as you want. Anyway.

This is getting off track a bit and that's enough from me. I think we would all agree that a quantum annealer is not the tool of choice to do factoring. In addition we don't care about factoring from an applications perspective in the first place. 

I would rather the focus be on the amazing fact that the machine outperforms simulated annealing for some tasks in scaling as well as in wall clock time. And that the machine exhibits entanglement!!
 
As far as I can tell, nobody has even proved that this is an adiabatic quantum computer or that it is exhibiting any quantum behavior whatsoever. All that's happening is specialized HARDWARE is being raced against SOFTWARE. Nobody actually knows what's going on in that box.
 
Evan: Have you read some 75+ peer-reviewed scientific papers written in the last ten years or so, by D-Wave, USC, Lockheed & other scientists, and published in Nature, Science & Physical Review Letters, about your assertion? Are you technically competent enough to understand them? If you are, then I recommend that you go to their website and start reading. It Should take you ONLY a year to go through all of them!!!.
 
I'm a researcher in the field. Most of the papers don't say much. Most of the USC papers are just doing what Google is doing here... racing against a classical computer, which is the wrong approach to figure out quantumness, but maybe the right approach to figure out if the thing is useful in the short term. One recent paper my former advisor was on found no evidence of quantum behavior. I'm siding with him 
 
Evan: Please read just this ONE paper that was uploaded to the arxiv about a week ago & then give me your opinion. That's not asking much, is it?. Then give me a brief critique of the paper. Thank you. Here it is & good luck!.
http://arxiv.org/pdf/1401.3500v1.pdf
 
Ok, also Z terms.  That doesn't really change anything.  My understanding was that the Hamiltonian controls were band-limited so that control is slow enough that you can't observe Rabi oscillations or Bell inequalities.

For quicksort, the gap of the adiabatic algorithm is unproven, but I suspect it is very small.

You've achieved a very impressive machine.  But calling it a "universal quantum computer" is clearly BS.
 
Just to be clear, no-one has ever claimed the current system is a universal quantum computer (although making such an architecture appears quite doable, if there were a reason for doing so). 
 
I’m really glad you agree that it is an impressive machine – thanks for that. Really it’s only the second generation of what I’d call ‘real computers’ we’ve built. As a historical analog, I think of it like the superconducting version of Intel’s 4004 in terms of level of integration and technology maturity. Of course the analogy isn’t perfect.
 
There’s a lot of room to get better. But the recent results – beating highly optimized simulated annealing in both wallclock and scaling, at least for some problems, and the new world record for number of entangled solid state qubits, are I think showing us that we’re on the right track.
 
Geordie: You say above "and the new world record for number of entangled solid state qubits, are I think showing us that we're on the right track". Where does this record appear? Is it in that recent "bechmarking paper", or in your own recent paper about "entanglement" that you uploaded to the arxiv. I really would appreciate knowing so that I can refer any doubters to it, in my arguments with them. Thanks a lot.
 
Sol: it's the work referenced in the previous post on this channel :-)
 
Thank you Jean-Francois for pointing this out! The post mentions the optimizations by IBM engineers in the first footnote. Please note that we typically compare to a single core. That is why we spoke of a speedup factor of several hundred matching the value of 320x in your comment.
 
Hartmut, thank you as well.  I respectfully think that comparing a $10M machine (not sure about the figure, you know better) to a single core machine when any up to date commodity PC has at least 8 core is a bit misleading.  CPLEX users use mutlicore machines, and our software is optimized for that.  Any comparison between A and B should use A and B in the best possible way.

Not your fault, but we've seen your news transformed into: D-Wave machine is 10,000 faster that IBM fastest supercomputer.  This adds to the misleading side of things.

The footnote you cite re IBM engineers points to a paper that has results far worse than what I obtained.  Hence my reaction here.

Last comment, I enjoyed redoing McGeoch et al experiments with CPLEX, thanks for enabling that!
 
Well, Its does seem that D-Wave has been announced as a formable player of Q computing in the British Media, It would be interesting to run some forms of application data on say "sentinel GIS data", even if far removed from Q algorithms for performance, it would seem, they are achieving things.
 
Looks like D-Wave is well positioned and in good shape going into the next round..  They have adequately demonstrated  that their hardware harnesses quantum processes. This latest post by the A.I. team makes it reasonable to expect breakout for certain classes of problems in terms of relative performance to known classical benchmarks in effect trapping the classicist in there own harbors  an innovators dilemma and then pivoting , opening the possibility of breaking through on other frontiers (other real hard problems ) into  "blue water".  

It is a good thing that Troyer and others challenged D-wave because they defined the Pareto frontier for the sort  of classical algorithms that they deem important.  This will provide a referent for clear and convincing breakthroughs along this frontier in the next round.  As the Google AI team indicates , this could occur with the present generation of D-Wave on larger data sets.  

But the recent round of new financing D-Wave received indicates that it is a reasonable bet that a breakout may be eminent, if it has not occurred already.  The reactionaries are stretched to the limit, the most recent paper in Science was weak and weakly played. It probably cost D-Wave in the latest round of financing , but D-Wave still has the wind in its sails. 

Another round  to D-Wave.  They took the oppositions best shot in stride. 
 
Is there a measurement method to compare this machine calculations to the activity of a human brain?  could this machine could also be capable of some human creativity and ingenuety? or it's intelligence will rest on calculations and optimization problems and intelligence novelity is just not the purposse of this project?
 
Just a quick observation: if you change the Hamiltonian too fast, then the system's evolution isn't adiabatic any more. By definition. Which means you're in an excited state, not the ground state, and so the fundamental premise of adiabatic annealing no longer applies. 
 
Here's the foundation document for annealer as classical computer:
http://arxiv.org/abs/quant-ph/0405098
It appears as though, according to theorem 1.3 in this paper, that the new machine under construction at GQAIL is not only shooting for longer coherence and error correction via surface code, but for a gate-equivalent architecture through annealing, right Geordie? ;]
 
If factorization is not ready, how about finding collisions in hash functions. 
 
+Geordie Rose
Would your architecture for a universal quantum computer be capable of performing a fourier transform in sub-exponential time?
Add a comment...