Shared publicly  - 
 
[long post ahead]

Much will be talked about the recent developments in the ARM server space, and my opinions are just one of many. Still, I do have a few things I want said based on what's happened so far (about 24 hours into it):

1. I find it very, very hard to stay respectful of people who act as technical evangelists that choose to say demeaning and hurtful things about people who disagree with them. Doing it once can be excused, doing it twice is a sign of a deeper problem.

2. It's hard to keep open-minded discussions with people you disrespect. This means that the controversy that's going on is causing real damage and is making it harder for us to reach the goals we all want to achieve: ARM servers with Linux on them being successful.

3. I have not read the SBSA (ARM's specification for a standardized server system architecture) due to the license terms, but I know in large part what it describes based on posts from others. It is an incredibly important document, because it does away with so much of the variables where ARM vendors in the past have chosen to differentiate between each other for no useful purpose. It allows us to write software that is much simpler, and that has a better chance to work across a large range of hardware with very small changes made over time.

4. It is a complete red herring to mix the SBSA up with ACPI. The fact that there will be a standardized hardware base platform means that any light-weight hardware description method will do to describe where the mandated components are located. ANY.

5. To think that you can do away with any need of changing kernel code just because you pick ACPI for your hardware description is misleading and completely false. No architecture has been able to do this in the past, and I don't expect any of them will do it in the future. I fully expect a need for minor updates to drivers for new generations of hardware chipsets. Things like PCI IDs, the odd errata workaround, fixes for exposed driver bugs, etc. And of course, feature enablement for new functionality.

6. In spite of this, with any lightweight hardware description (done right, which includes the one we already have in the kernel -- devicetree), newer implementations the base platform can be used by older code. This hinges on the presumption that hardware vendors don't do crazy stuff such as change the programming model for their PCI-express host controllers for no good reason, shuffle register layouts around for the fun of it, etc. It moves the burden of making the platform easy to maintain from a software perspective back to the hardware architects, where it belongs. The whole premise of having a standardized hardware platform supported by software relies on the fact that hardware vendors will restrain themselves from doing these kind of crazy things, so this should be self-fulfilling. Some vendors might have to learn expensive lessons to understand it, but hopefully only very few.

7. Just because future documents will require ACPI implementations by the server vendors doesn't mean the upstream kernel will use it, or even accept code that makes use of it. There is zero connection between the two. Really.

8. What goes into the upstream kernel will be judged on its own merits. Some of it will be patch by patch, but some of the things also need to be looked at as a larger picture since we need to see how deep the slippery slope goes. This means that we might need to pull the brakes sometimes until we see the bigger solution and the fuller implementation, and that's OK. Most of these merits are purely technical, but there are also aspects of long-term maintainability and wider impact of whichever solution ends up being favored that needs to be considered.

9. For (8) to happen, the discussion has to be brought out to the wider community and not be done on an island by itself (with or without locked doors). There are no shortcuts to this, unfortunately, and some things will just take time.


[I am of course keeping comments enabled on this post, but I want to keep discussion on-topic and reserve the right to delete trolling comments. Point (1) and (2) above goes both ways.]
56
15
Atanas Beloborodov's profile photoMarc Jones's profile photoJordan Crouse's profile photoJosé J. Cabezas Castillo's profile photo
27 comments
 
ARM architecture standalone "cortex" is one available option. It appears you want the non-standalone "cortex" that fits kernels with the largest available application. I still have my bid on APU+ARM for both (with optional PnP CPU board).
 
Good post. While I don't agree with all of your conclusions, I think your points 8 & 9 are particularly important and correct. While I think ACPI is absolutely the correct solution for ARM servers, I also think that the process for mainlining support cannot be short circuited. It will take time and be messy in places, but it is necessary.

In defence of the people working on UEFI and ACPI support right now, I think that is exactly what we are doing right now. The patches have been posted to the LKML and are being discussed which is exactly what should be happening.
 
+Grant Likely: Yes, some of it is definitely happening now, but mostly on the patch-by-patch basis. The only people who today have the bigger picture are those who have signed NDAs and are talking about things behind locked doors, so it's hard for the rest of us to get a feel for where things are headed -- the horizon is quite close to us right now.
 
I agree about the NDA thing, it's sad to watch from outside of that firewall, but the good thing is that those of us outside it can't really do much, so we can get on with our other work :)
 
+Grant Likely what's the big deal with ACPI? DT is well understood, openly developed, well established in the kernel, and openly available to everybody.
ACPI is baroque, the development is closed, the resulting standard is less than flattering, up-to-date/latest documentation is hard to come by, and even on x86 the kernel support is flakey and has gaps (look for the latest SMP/NUMA stuff for a start, especially the parts about start-up vs. run-time detection).
So why ACPI?
 
+Greg Kroah-Hartman: DT has been done for servers for many many years. It hasn't been until the complexities mobile platforms, and fine-grained power management in particular, that things have gotten messy. For servers it will do just fine.
 
+Greg Kroah-Hartman Also, I think the respective development model of both choices is important, as well as the respective level of implementation in the Linux kernel. I find it a bit sad in this concern that +Linus Torvalds in your re-post calls for "any standardization whatsoever", maybe not fully realising that the kernel's ACPI implementation is rather incomplete and unreliable in places. When running into problems with new ACPI-based systems on Linux one still has to refer to the reference implementation - which is Windows - in order to verify whether the kernel is buggy, or in fact the ACPI implementation is broken.
What do you do when you hit such a problem on one of the manyfold ARM server platforms to come, all "standardized" like this?
 
+Greg Kroah-Hartman DT was pretty much invented for servers, it's only since ARM started using it that it got much use outside standard PC/desktop applications. The PowerPC Macs were the first DT systems I used for example. Like +Olof Johansson says the issues have been more things like power management and whatnot, plus greatly increasing the number of people in the ABI definition business.
 
+Greg Kroah-Hartman Realistically, the only reason we care about ACPI is because Microsoft care about ACPI. If Microsoft care about ACPI on ARM in the server market, we should care about ACPI on ARM in the server market. If Microsoft don't care about the ARM server market then we should really not give a shit. So far, we don't have a good idea about that question.
 
+Matthew Garrett we care about ACPI as that is what is shipped on zillions of machines that we want to run Linux on.  Those hardware vendors only care about ACPI because Microsoft told them to.

It will be interesting to see if Microsoft cares about the ARM server market.  I'm guessing not, for a variety of reasons, which is going to make this whole ecosystem a major pain to deal with as there will not be any large "Windows 201x" certification requirement which is the reason why we now have the "unified" PC system architecture we do today.

As hardware vendors don't know what to do on their own, it looks like ARM has tried to create a unified standard for people to follow.  It is going to be fun to watch to see if anyone really cares about it and just does their own thing.  I'm betting on the hardware companies messing it up, this is going to be a fun ride to watch.
 
wait, that's exactly what you just said, doh, I need more coffee...
 
+Thilo Fromm firmware, be it ACPI, or UEFI, or anything else, is going to have bugs, just like we do in the kernel, our code isn't "perfect" at all.  And Linux should emulate Windows, bugs and all, because that is the reference implementation, which is fine, there's no problems with that.  Without the ability to test against a "reference", the ARM solution, no matter what it is, is going to be rough, and we will have to deal with it in Linux.

Which is also fine, it will keep us busy for years to come, I like job security :)

+Olof Johansson you are right, I forget about PPC device tree stuff all the time, sorry.
 
Whoa, impressive! Do you guys ever sleep? :)
+Greg Kroah-Hartman It's not quite what I meant. Chances are there won't be a reference on ARM servers, just as +Matthew Garrett suggested. Add to that the fact that even by now Linux isn't fully able to run on new x86 ACPI systems without having a Windows handy to debug stuff.
Now compare this to DT, which is pretty much established in the "cute embedded systems hacks" ARM Linux world. 
 
I'm familiar with what is needed to augment ACPI in order to also provide the sort of arbitrary device properties DT does. What I am not familiar with, is the opposite side of the coin. What is DT lacking that ACPI provides? Someone mentioned power domains.... 
 
Frankly, I don't want ACPI on ARM. It's another heap of indeterminate firmware bugs, hiding poorly-documented hardware, and unlike on x86 there's no reference client to test against.
 
+Darren Hart Well power domains are well defined in ACPI, but DT didn't have support for them in a generic manner until very recently. 

Tomasz Figa just very recently posted patches that add generic power domains to DT in https://lkml.org/lkml/2014/1/11/141

The other big thing that DT lacks is that by design there is no bytecode interpreter to execute firmware specific actions.
 
+Greg Kroah-Hartman Fixing firmware bugs is a living but not one I'd rather be doing. We had a shot for fixing the firmware problem for once on a major arch, but this latest business is a huge step backwards in the eyes of many in the cute embedded nonsense hacks community.
 
+Greg Kroah-Hartman participating in the ARM server standardization process (let alone creating a reference client) poses a little more difficult than it is implied by your comment. Which is actually the core problem here.
 
I honestly don't understand the acpi vs dt discussions. There will be hardware with EFI. There will be hardware with u-boot. There will be hardware wuth ACPI and there will be hardware will Device Tree. How those are going to be mixed together in a particular platform I can't predict yet - maybe we'll see all permutations.

Wgere does that bring us as Linux people? Exactly, we have to support them all. Regardless of some people's wishful thinking. OEMs dictate on us what firmware does, not us ourselves. So what's the point in discussing pros and cons at all?
 
+Alexander Graf I do not think that "OEMs dictate on us what firmware does".
If we tell them that booting Linux with devicetree is easy (they just have to port uboot and abstract the hardware in a DT), while using ACPI will require an order of magnitude more effort and "real" support, we grab them where it matters, i.e. their purse strings.
 
Matthias, for tgem ACPI is easier. They already have the experience and engineers to work with it from their x86 legacy.

Of course that won't apply for every OEM and some may go with dt. That's a good thing. It means we bring the discussion back to technical levels and leave the decision up to the market as it's supposed to be.
Add a comment...