Shared publicly  - 
 
Finally started up a blog after years of intending to get around to it: http://ojanvafai.com/post/mathml-in-webkit
14
2
Alex Milowski's profile photoPhil Schaf's profile photoOjan Vafai's profile photoDavid Barton's profile photo
32 comments
 
Nice article! I enjoyed it. So I guess that means no MathML in Chrome for the foreseeable future.
 
I agree with Frédéric here that you are over emphasising the width issue and in the process preventing the support of scientific and educational documents in Chrome for the foreseeable future (since Google seems reluctant to allocate any resources for that and relies apparently on volunteer programmer support. If you use designed font glyphs for extendible brackets the maximum width is known in advance and doesn't grow arbitrarily with the height. If you need to scale the glyph due to lack of designed glyphs you can scale in the same way. The MathML support in Chrome 24 was a massive step forward and taking it out again with only a partial sketch of a plan of a re-implementation, and no resources allocated to that re-implementation is a massive step back. There is a vast amount of structured content that could be on the web or in ebooks as soon as webkit has solid mathml support. The current code could easily be patched (as you know since you were involved in looking at patches that never landed) to address this issue which would allow Chrome to have really quite reasonable Math rendering now.  This would allow the content to be generated and then looking at real world content rather than test cases one could get a much better idea what if any parts of the implementation need to be improved.
 
"A massive step forward", but for a technology that there's basically zero author interest in.  MathML interest on the web is comparable to interest in MNG, if you remember the old days well enough to know that battle.  That's a lot of why we're not allocating major resources toward it.

We all think MathML is implementable, eventually.  But the only way to make that happen is going to be to contribute to a solid implementation that isn't buggy and doesn't introduce security holes.  Until then enabling MathML not only doesn't buy the web terribly much but actively makes Chrome unstable and exploitable, which we will not do for any reason.
 
No one is suggesting that Chrome be shipped with security holes.
The current webkit implementation has been running lots of additional MathML tests for months. Exactly one issue has shown up related to stretching brackets, the question is whether it is really necessary to take down the whole of MathML support when a fix for that is known.
 
In my opinion, the MathML code in WebKit is not up to a standard of quality that is worth shipping. If anything, shipping such a low quality implementation poisons the well for the whole platform because web developers learn that just feature detecting MathML is insufficient to use the browser native implementation.

When tried to fix the stability issues in MathML, I looked at a lot of different examples of MathML rendering. Many of them were far too poor for any serious document to use. In all these cases, MathJax did a great job of rendering these.

I don't think disabling WebKit's incomplete MathML code in Chromium holds back the scientific community at all. They are already much better off using MathJax than the WebKit native implementation, and, in my opinion, MathJax does a great job. In fact, I have yet to find an equation where MathJax does a poor rendering.

It's not even just rendering issues. For example, try selecting text of a complex math equation in a WebKit nightly (http://nightly.webkit.org/) and then try the same in the version rendered by MathJax. The former does some craziness. The latter works exactly how you'd expect it to.

As you say, I tried to fixed the stability issues. After spending a good deal of time on it, I came to the conclusion that any fixes on the current code would just be monkey-patching and putting off a quality, stable implementation.
 
+Peter Kasting MathML is not an authoring language so your comment about zero author interest is true but moot. MathML is often compared to TeX. While this makes sense in the context of representing math within computer applications and files (eg, web pages) it does not make sense from the point of view of a human typing it. While TeX is intended to be typed, MathML never was.

There is a huge interest in MathML but it is not from authors but from publishers of educational, scientific, and engineering content. There is also a huge interest from the accessibility community as MathML is the key to working with screen readers to make math accessible.
 
@Peter, you said "MathML interest on the web is comparable to interest in MNG,".  How much interest does there need to be for the Chrome team to say that MathML is being used a lot?  A million hits/month on  MathJax's servers?  10K equations being accessed/month using assistive technology that can't get it via CSS/SVG that MathJax produces? 
 
Interesting discussion though somewhat depressing.

Firefox is a good example for how partial support can create a positive feedback loop between early adopters and volunteer developers. From a MathJax point of view even more so. We will soon be able to go back to tweaking FF's MathML rendering rather than replacing it (more like jquery perhaps, or a typical polyfill). This means massive speed improvements while providing the same quality typesetting and user experience. Naturally, we would have loved to do that with WebKit eventually.

I was wondering. If Chrome's security concerns are so severe, why is Apple happily shipping the code? [I'm not trying to be rhetorical.]

It's one thing for Google to say it can't spend resources on MathML development. But from your post I get the impression that the volunteer work so far is to be wiped out. That will only alienate third parties who consider providing the developers that Google isn't willing to provide.
 
I don't think Google employees are in a position to answer questions about why Apple does things.  However, I've heard from numerous different people on the team whom I trust quite thoroughly that the current implementation is deeply problematic from a basic design perspective, a buggy output perspective, and a security perspective, so for the arena which we control, which is Chrome, it makes sense to not ship.  Indeed, it was a mistake for us to ever ship this code in the first place.  I suspect if we hadn't gotten people's hopes up, you guys would be less disappointed in us now.  Our fault.

I don't think anyone is unhappy that volunteers have been trying to tackle this.  Perhaps the problem is that the volunteer in question wasn't given the appropriate assistance and oversight to come up with the ideal design earlier.  It's impossible for me to say (as someone totally disconnected from this issue) whether that's the issue or where fault lies.  Regardless, however, we are obligated by virtue of a large userbase to uphold stability, security, and quality standards in what we ship to the larger public.  We can't just choose to ship something for the purpose of hopefully encouraging more developers to come contribute unless those prerequisites are met.

Ojan's post seems like an attempt to sketch out a high-level design here that might work, and in that sense, is an attempt to be constructive.  My more pessimistic post above isn't intended to deter people from wanting to implement MathML -- which we'd probably be happy to ship if the above problems could be solved -- but rather to be explanatory about why, in the universe of a zillion things to do in a web browser, this one isn't getting a ton of priority development time from the internal team.  Perhaps I should have refrained from even posting that, however, since my comments are really paraphrases of what other more knowledgeable folks have said to me when I've asked about MathML support, and I'm not in a position to debate or provide precise statistics myself.
 
Thanks for that long reply. It seems things are royally stuck.
 
I'm a bit confused about the assistive technology argument. If people are writing MathML content and using a library like MathJax to render it, doesn't the assistive technology just need to be able to process MathML content the same way it would if there were native MathML support?

I suppose the problem right now is that MathJax replaces the MathML content with HTML content. That seems like a solvable problem (e.g. MathJax could display:none or visibility:hidden the <math> elements instead of removing them and assistive readers could handle that case appropriately).
 
For advanced technologies such as synchronized highlighting (needed for dyscalculia, dyslexia and ADHD), visual rendering is critical.

Of course, MathJax could be extended to make this possible, but it will add yet another layer of complexity for AT solutions such as Google's ChromeVox (just had to get that in there ;) cc +T.V. Raman),
 
So if Math were implemented more like SVG as you suggest, would it still be able to take part in paragraph line breaking? If I understood your description (which may not  be the case) it would seem that each math element would be committed to a single rectangular area like an image or svg?
 
Very good question. In general, MathML rendering has to be closely integrated into text layout. In particular, it has to baseline align, deal with side-bearings, not affect line-spacing if inline, particate in line-breaking, etc. The MathML renderer has to have access to the ambient text properties (font, size, color, etc.), do its own layout, and then tell the text layout engine how much space it needs. And that's just to do basic layout.
 
Getting line-breaking to work right would require some careful thought, but I don't see why it wouldn't be possible. In that way is not exactly the same as SVG. That's actually what I had in mind in my post when I said said "similar to". Line-breaking happens at layout time, not when computing intrinsic widths. This does mean that a container element that sizes it's width to it's content (e.g. a width:auto inline-block) would treat the unwrapped equation as its minimum width. That doesn't seem like a big loss to me. Again, the use-cases for such a thing are not terribly compelling.
 
Peter K/Ojan:  I never got a response for what kind of usage numbers you feel are needed for you to feel that MathML really is being used by more than a small niche of people.  Or do you agree that it has widespread use so that the only issue is you don't like how it is implemented in Chrome?
 
I already said I'm the wrong person to answer that.
 
But you said that MathML is a "technology that there's basically zero author interest in.  MathML interest on the web is comparable to interest in MNG, if you remember the old days well enough to know that battle.  That's a lot of why we're not allocating major resources toward it."

The reason I'm pushing you on this is that it seems like you and others in the Chrome group believe your assertion.  I think you are way off base, but I have no idea what evidence will convince you (I'm not asking you to answer it for the group) that in fact MathML support does have lots of interest.  If I can convince you, that's one less nay-sayer in the group.

So, if MathJax had a million users/month, would that convince you that math usage is widespread.  Or does it need to be some higher number?
 
"Again, the use-cases for such a thing [line breaking] are not terribly compelling."

It's a vital feature of any mathematical document, especially so on the web where the author can not control the page size and so manually control line breaking in advance. Mathematics is not some graphical insert, it is a part of the text and should naturally flow with the text in paragraphs and tables etc.  It is not necessary that all features of mathematical rendering are implemented in an initial release but you should not try to force the mathematics rendering into an architecture that prevents the reasonable display of mathematics, especially on the mobile platform and other  contexts with constrained text width.  The existing webkit implementation and the implementation in firefox both benefit from re-using much of the html/css text layout machinery. The current problem with calculating widths of stretched brackets is soluable in that context (as can be seen in firefox).
 
In terms of whether there is sufficient demand for MathML support, this is largely a chicken-and-egg situation. Many publishers have MathML in their content management systems but have to generate HTML or EPUB with equations as images because of the lack of MathML support in browsers. MathJax is breaking this logjam somewhat but its performance is not as good as native MathML support and it has other issues that result from its JavaScript implementation. I believe many publishers are still not generating MathML in their HTML for lack of browser support.
 
One of the keys to good implementation of MathML in a browser is to recognize that math is fancy text, not a specialized graphic format. Think of it as text layout on steroids. This argues against MathML implementation paralleling that of SVG.
 
+David Carlisle I think you misread my comment there. The use-case I was talking about not being compelling was: "This does mean that a container element that sizes it's width to it's content (e.g. a width:auto inline-block) would treat the unwrapped equation as its minimum width."

+Frédéric Wang actually pointed out a good use-case for that specific case though. Now that I put a little more thought into it, I think you can make that case work as well. In fact, it would be pretty straightforward.

+Frédéric Wang does Gecko do layout during intrinsic width computation? If so, is David Baron OK with that?

+Frédéric Wang, think you could find a reasonable subset of CSS that covers nearly everything people would want to do with equations. text-shadow on equations seems totally sensible to me. Importantly, you should be able to put any styling on the top-level <math> block. That way you can get all the fancy things like transforms and animations.
 
+Frédéric Wang also, what are the requests that users have had that MathJax is unable to do without better native support? There might be simple primitives we could add to the platform that would allow MathJax to meet all their use-cases. This is valuable to do independent of whether we also ship a native MathML implementation.
 
Ojan: "So, you have an algorithm that is bottom-up intertwining both above and below it with an algorithm that is top-down. This complexity leads to bugs and crashes." I think your phrasing makes it sound trickier than it is. (I'd quote Eric Seidel instead, who told me at one point that computePreferredLogicalWidths just had to fulfill its contract, however it wanted to do that internally.) More importantly, is there a webkit bug you can reference for this?

Fred: Note David Barton and David Baron are two different people. :) (I'm sure you knew this.) Anyway I think Ojan is asking if mozilla's experts are ok with MathML code there that overlaps with HTML layout code.

FWIW, I think the best description of the current WebKit MathML design is that it uses anonymous boxes, sort of like how line layout does. Instead of just creating extra boxes (render tree nodes) for individual lines of text, to make it easier to deal with multiple lines of wrapped text from a single DOM Text node, WebKit MathML also creates anonymous boxes for things like <msubsup>. This is used for e.g. a variable with both a subscript and a superscript, like x<sub>2</sub><sup>3</sup> except that in mathematics the 3 is supposed to be vertically aligned with the 2. The easiest way to do this is to put the subscript and superscript together in a tall anonymous box, and layout this box to the right of the variable. (In WebKit, this clever idea is due to Alex Milowski I believe; I don't know if other implementations of MathML do the same thing.) Then the actual computed padding or coordinates or whatever in these boxes can be adjusted if necessary during layout by the MathML layout code, like line layout code does for its anonymous boxes.
 
I think you understate the complexity this causes. For what it's worth, Dave Hyatt (unquestionably the authority on WebKit's layout code) recently encountered the MathML code causing an infinite loop in the Regions code due to exactly this in the #webkit  irc channel.

8:58 AM <dhyatt> minPreferredLogicalWidth() is not supposed to depend on layout(), otherwise you can get into cycles
8:58 AM <abucur> ok. when we compute the preferredwidths for RenderMathMLRow
8:58 AM <dhyatt> which is what you are running into
8:58 AM <abucur> well...
8:58 AM <abucur> void RenderMathMLRow::computePreferredLogicalWidths()
8:58 AM <abucur> {
8:58 AM <abucur>     ASSERT(preferredLogicalWidthsDirty() && needsLayout());
8:58 AM <abucur> #ifndef NDEBUG
8:58 AM <abucur>     // FIXME: Remove this once mathml stops modifying the render tree here.
8:58 AM <abucur>     SetLayoutNeededForbiddenScope layoutForbiddenScope(this, false);
8:58 AM <abucur> #endif
8:58 AM <abucur>     computeChildrenPreferredLogicalHeights();
8:58 AM <dhyatt> .....
8:58 AM <abucur> the last line is doing a children layout
8:58 AM <dhyatt> this should never have made it past review.
 
Ojan: I know Hyatt is the rendering expert. I asked him and others for design feedback many times, and got almost none. I would love to have a conversation with him or others, where I could explain that I don't think it's circular, because computePreferredLogicalWidths is only calling layout on its descendents, and I could ask why they think I am wrong. I asked you for a bugzilla bug page so I could at least read about it. Someone's comment in a chat, no matter how expert, is just not something that tells me very much (it gives his first impression opinion, but not his reasons, or anything I can ask about and understand). Similarly, Fred is an expert on the Gecko code, not WebKit. I understand he's saying that Gecko doesn't use an anonymous box for msubsup, but respectfully not why that means it's a bad idea. WebKit has some embellished operator code and it works fine with that, and anonymous boxes help not hurt with normal line breaking.

Look, I'm not saying I know more than the experts. I'm saying that this isn't the way to do a design discussion, wait until I worked for a year and now say "it looks fishy to me - let's throw away the MathML code". Regions is an experimental feature, it seems more likely the problem is with that, since there were no bugs due to this before now. Also I might mention that the regions issue with MathML I'm sure came up due to a bug a volunteer filed a few days ago, against all odds since MathML isn't even on in Chrome Canary - the nightly builds - any more. He can only continue to test MathML because he's set his Chrome Canary to not update any more. This just can't be the way to treat volunteers who work on WebKit.

Constructively, I think we're discussing a lot of different issues here. If you want to know more about why MathJax would work much better with native support, several MathJax experts can answer that. If Hyatt or someone wants to delete my computePreferredLogicalWidths code, fine, that's better than turning off MathML. Though I think someone could at least talk to me about it. And if we want to know how Gecko works, Fred is very generous with his help. But we need someone who actually knows both MathML and WebKit, and who wants to work on the code, to weigh the different approaches.
 
The fundamental problem here is that no one from a WebKit supporting company (e.g. Google, Apple, etc.) is working on this in the same way they did for SVG.  Outside volunteers, even paid ones, would be stuck in the same way Dave Barton or I got stuck.  Everyone defers to Dave Hyatt, who is very helpful when you get his time, but getting his time is rare.  Others helped as well, but that is spotty too.

This has to become a priority.  I fail to understand how we can't have one dedicated resource for awhile to get this sorted.  If another volunteer, even a paid one, showed up from "outside", they'd still be isolated.

I remain unconvinced based on my experience that the current approach of using the CSS line layout algorithms cannot be made to work properly.   There are some new primitives that are needed.

Dave Hyatt's recommendation to me was that I needed a new line layout algorithm.  I dug into the line layout code extensively, per his recommendation, and came to the conclusion that it would be fraught with errors and bugs to do so.  This isn't SVG but something more akin to text layout with its own quirks.

You can say what you will about the current implementation's design.  I did my best with essentially nothing but the code for other rendering objects as a guide and a whole lot of experimentation. 
 
My post was not at all meant as a slight to those who've worked on MathML in WebKit. Learning how rendering works is difficult. That's part of what inspired me to write this post. It's especially hard when it's in an area that none of the reviewers are terribly excited about. It took me over a week of trying to fix the design flaws in the MathML code to fully understand the tradeoffs involved and come to the conclusions I did in the blog post. Up until that point, the current design seemed (mostly) correct to me.

As a reviewer, it's often hard to step back and see the larger design issues in code you're not directly involved in writing. It's unavoidable that occasionally a project will go too long without getting the reviewer attention that it needs. On a project of WebKit's scale, with reviewers attention being drawn in a hundred directions all the time, it's even harder. http://blog.bitergia.com/2013/03/01/reviewers-and-companies-in-webkit-project/ give some more context for this.

This isn't just a problem for volunteers, non-reviewers or people working on lower prioritized features. It happens to me more often than I'd like that I spend a week putting a patch together, put it up for review and in the review process come to believe that the patch is fundamentally broken with a week wasted. 

This makes things sound more bleak than they are. There are dozens of patches that go in everyday from volunteers, who are not from one of the big companies working on WebKit, that get the review attention that they need and are a huge boon to the project. It's the exceptional cases that are unusually complicated (like MathML) that have a hard time.
 
Personally, I'm fairly thick skinned (or, at least I try to be) so I didn't take it as a slight.  I'm just trying to point out that working outside of the various dedicated organizations on a complex problem without access to those who might know better may yield strange designs.
 
(Just found this thread.)  I'm not involved in writing code for rendering, but I have several general comments.

1. The issue of importance: If Johnny can't see math in web pages, why should he think it's important?

2. MathJax limitation: For something like an article that formats to 10 pages of PDF, mathjax is too slow.  For this I assume HTML/CSS  with web fonts since most end users don't have fonts.  See, for example, the Funke-Milson article from arXiv at http://www.albany.edu/~hammond/demos/Html5/arXiv/.

3. How to measure "author interest".  For example, cruise university professor websites looking at preprints made from LaTeX and offered as PDF.  PDF is desirable only if the reader wants to download and print.  Most web users will not bother with PDF for mere browsing.

4. In working with a library for SGML translations of articles, I think most article content can be processed more or less linearly, but I think that each math element needs full recursive descent, collecting information both on the way down and on the way back up.  My guess is that optimal rendering of math in html should work the same way.
 
MathJax is no solution. It’s hindering you from viewing math-heavy pages for minutes while waiting for the equations being rendered to take their final size, as they are shifting around everything else in the process, preventing you from reading.

MathML should be created from TeX serverside, delivered with the page’s HTML, and only converted clientside via JS in old browsers that don’t support MathML, limiting the shifting problem to those.
Add a comment...