13 plus ones
Shared publicly•View activity
View 11 previous comments
- Well thank you. :)
I'd honestly be interested in your proposed solution for what css3test attempted to do. To recap:
- Raise awareness in authors that CSS3 has much more to it than the fancy stuff (rounded corners, gradients, transforms and the like)
- Show authors that even their favorite engines only support a relatively small portion of CSS3
- Show authors that WebKit isn't such a huge standards champion when it comes to CSS3. Even if it's ahead, it's only by a small bit.
- Show authors that IE(10) and Opera aren't as bad as they think when it comes to CSS3 support.
- Show authors that stuff like -webkit-mask or -webkit-box-reflect is non-standard and thus, NOT CSS3
- Show authors that just because an engine superficially claims to support a CSS3 feature, it doesn't mean it implements everything in the specification for that feature (examples: CSS gradients in non-background properties, border-image longhands, tab-size: <length>, various computations in calc() etc)
- Give people an easy to find and pass around metric to measure the breadth of CSS3 implementations in the various engines
What would you do? I'm really interested in your response.
Also, does your sentiment about css3test extend to http://www.css3.info/selectors-test/ ? It kinda was my inspiration for css3test.Feb 8, 2012
- Well, I find tests that focus on a specific feature to be much more useful that those that focus to ill-defined targets such as 'CSS3', actually. One foot wide and a mile deep is way more useful - to me at least - than the one-mile-wide and one-foot-deep tests. But while that makes sense for an implementor, I completely understand how the latter model 'sells' better for users. While the desire for a Consumer Reports-like score is perfectly understandable, CR also tests products intended for specific uses e.g. much of what a range of scores means for a set of vacuum cleaners is implied in what a vacuum cleaner is used for. As a vac is not a platform from which you build any household tool you want, things are relatively easy to score and the result can be made pretty meaningful to potential buyers.'CSS3' is not such a canned product; if I look at a site like Airbnb and a browser with a score of 90% on css3test, what does that score mean in terms of how easy it will be to author that design for that browser? Or, if I wrote Airbnb for a 100% browser, what happens in a 95% browser? Minor cosmetic issue like the lost of the logo jiggle on hover? Is some layout broken? Both? There is no way to tell from the score, of course, because that's not a goal and it can't be captured that way. But CSS is not some collection of equally valuable spare parts flying in close formation; if a browser scores 90% on every single module tested, the real-world interaction of .9 x .9 x .9 x .9 support could turn to be significantly less useful for a given design than a very different distribution with a lower average. So is the issue really how many properties-values are supported, or what can be achieved ? Are all property-value pairs equal in importance for all designs? I don't think so. Bottom line: sure, your goals are positive. But so what? Are they sufficient ? And if the test makes the expected difference, what improves? What is fixed? How do we measure that success? It's rather hard to tell. Is having such a metric very important? Does it really solve author problems? I think the answer is generally either no, 'not really' or at best 'maybe'. And I also think that giving more meaningful answers is hard enough that we keep falling back on what's easier: looping through some small set of asserts and increment counters. I'm not at all convinced the resulting stats are anywhere near as valuable as they are popular. Does that make more sense?Feb 8, 2012
- Yes, much better, that's way more constructive than sarcasm :)
I completely agree that depth is better. After all, the deeper the tests, the more accurate the score! The reason I didn't do that was purely technical: There doesn't seem to be a way for deep automated testing in the browser. The W3C testsuite (which is a perfect example of deep testing) requires human interaction in every test, either to tell the result (is this green?), or to even perform the test in some cases ("hover over this, is it green?"). So while this is good for implementors, I can't require that kind of effort from the average author. :(
Regarding CSS3, I'm using it as short for "a set of the most stable and not abandoned Level 3 CSS specs". Yes, not every spec was born equal, but that could be said for every subset. For example, not every feature in a certain spec is equal either. Supporting @font-face is much more important than supporting old-style Opentype numerals for example. However, this gives me a very good idea: Adding weights to each feature wouldn't be hard, and would make the score more valuable. Although I'm afraid that the hyped features would get the highest scores, which is exactly what I wanted to avoid. Any ideas?
Btw, speaking of the W3C testsuite, I'd love to contribute but I can't find anything to guide me about how to do so. Is it even possible, or only implementors can contribute?Feb 8, 2012
- Actually, W3C testsuites such as the one for CSS2.1 maximize spec coverage i.e. they mean to 'test the spec'. Each normative statement is tested individually and we check whether two implementations pass each such statement. That's it. Even with 10,000 such testcases, experience shows that is not sufficient for real-world interop because the real world combines all those things together in interesting and sometimes crazy ways. For implementors, they're super helpful. So, quite selfishly, I love them! For authors, they have huge potential as an educational/learning tool. As a proof of real world interop - where real-world involves messy combos of HTML, CSS and JS - their usefulness is far more limited. I see css3test as shooting for the comfy compromised middle. It's not so narrow as to be overly specialized, it's not so deep as to find lots of interesting bugs....but it also doesn't combine feature in any meaningful real-world sense. Each compromise is individually reasonable but the combination feels like one of those designs where the three core components indivually aimed for 70% impact/usefulness and the combo ends up being 0.70^3. So if I had to try and narrow down my frustration with these tests is that they don't take any real stand, nor do they express enough of a point of view on what matters to move the needle in any lasting way. I guess I'd better get a real something for someone than too little for everyone. The weighting idea is interesting, but I'd assume the weights would depend on your end goal e.g. maybe you start from a bunch of designs - blog, retail site, social page, news.... - and rank features by their importance to achieve that type of pattern? This is starting to sound like some new take on the CSS Zen Garden: instead of one-content/many-stylesheets we have one-feature-set/many-designs.
As far as contributing testcases, start here http://wiki.csswg.org/test. It's rather raw; feedback welcome.Feb 8, 2012
- Thanks for the info about the testsuite! Is there any list about which specs don't have tests besides this: http://www.w3.org/Style/CSS/Test/Overview.en.html ?
> the combination feels like one of those designs where the three core components indivually aimed for 70% impact/usefulness and the combo ends up being 0.70^3.
Ha, maybe. But have you noticed that 0.7^3 > 1/3 ? ;)
> The weighting idea is interesting, but I'd assume the weights would depend on your end goal e.g. maybe you start from a bunch of designs
How so? For example, the absence of layout stuff would break everything more badly than say, the absence of box-shadow. Even in the most light flexbox cases and the heaviest uses of box-shadow, the impact doesn't get even close. So, I think a crude approximation could be specified per feature. The question is, who will decide and in what basis?Feb 8, 2012
- It's your test, you decide! Take a stand. It's sure to get some lively attention and interesting feedback.Feb 8, 2012