The Sorry State of Open Source in the Age of Cloud Computing
Today we are using more open source software than ever. Web browsers, operating systems, data bases, libraries, compilers, editors, ... there is more open source than ever and everybody is using it. BUT there a fewer and fewer user facing open source apps. All those phone apps and fancy web pages are full of open source, but the actual application is not open source. Over time this could become a problem. Seeing means believing. People see Apple and Facebook. They do not see what powers this software.
So what happened? Cloud computing happened.
Let's assume that some idealistic programmers come up with a new productivity application that runs on the web, phones, tablets, etc. Being idealistic, these guys do not aim at becoming the next Instagram. Instead, they want to make their software available to everybody. Back in the good old days when I was among those creating KDE, we just needed some FTP server at the university to upload our tar balls. As the project took on momentum, project members created mirrors across the globe. Eventually distributions started shipping it on CDs. Everything we had to achieve back then was to ship some static content.
But today users do not download software. They upload their content. The architecture of cloud applications assumes that DATA STORAGE and processing is carried out by some server in the cloud. Data storage AND data processing are provided by the same company. They are tightly coupled. Thus, the users of my imaginary new productivity application will expect that this application will store their data, because this is what they are used to and it is very convenient. No need to do backups. Data is available everywhere and anytime. Sharing data with others is easy, and so on.
My imaginary open source programmers could of course rent some compute instance from Amazon or host some server at their university. But if their application is really successful, eventually three things will happen:
a) People start storing really important data on this server
b) Many and more people want to store their data
c) Some f***ing bastards try to store and share data that is legally and ethically not acceptable.
a) means that these developers will start to worry that a loss of data could become devastating as people start to rely on these servers. They fear that unauthorized people could get hold of the private data stored on the servers.
b) means that they need more servers which will eventually cost real money that they do not have
c) they need to filter and monitor the data hosted on their servers and there is always the danger that a judge in some country on the other side of the world finds them guilty of serving copyrighted material, or of hosting text that lampoons jesus christ, or any other god you have never heard of.
What could be done?
A) We need to come back to an architecture where users are responsible for their data again. Storage services such as drop box, box.com
, spideroak, skydrive, google drive etc. could eventually come to the rescue. Most users have an account at one of these services. Some even pay for additional storage. What is missing is a powerful API that allows open source apps running in the browser or on the phone to save data there. Ideally there is some common API such that users can choose whom they want to trust with storing their valuable data. We are not there yet entirely, but we are getting close. An important building block is cross-domain support in web browsers. This could allow my imaginary open source folks to serve the static HTML, JS, CSS content from servers hosted at some university while storing user data on the likes of DropBox. Modern browsers with CORS can do this. For many apps this will do the trick and points a, b, c are solved.
B) You may argue that cloud servers do more than just storing data. They process data and prepare it for queries by storing it in databases. I argue that modern phones have more horse power than the PC I used to compile the first versions of KDE 15 years ago. Modern web browsers are very powerful. They feature multi threading, they have integrated databases (IndexDB or Sqlite). They can draw in 2D and 3D. They can process video etc. We need to do more of the heavy lifting on the client and less on the server.
C) Now you may still argue that some features require knowledge of the entire user data. For example, to search for other people, cake recipes, song recommendations that others have published, some server must index all data to make it available. I argue that this job can be done by a federation of servers operated by my imaginary open source fellows, because points a), b) and c) do not really apply here. If data can be searched by everybody, then this data is (most of the time) public anyway. Thus, the indexing servers would deal with public data. If the index servers crash, this is awkward, but no data is lost. The real data is still stored by dropbox, skydrive, etc. Because of A) and B), most of the load that cloud servers carry today has been redistributed already. What remains is much less load. A large open source project can easily come up with a federation of servers that can crawl user data and index it. With respect to legal reliability: The indexing servers do not store any infringing documents. In the worst case they point to other servers which in turn store the infringing data. While this is still not 100% save, it is legally much safer.
Finally, search engines might be interested in helping out with searching and indexing, because this is what they are supposed to.
To conclude, successful user-facing open source apps could still be possible in a world dominated by phone- and web-apps. We just need to rethink our software architecture and rely less on servers and more on clients. The most difficult issue to solve is enabling users to take care of their data themselves again. Technically this is possible. The user-facing cloud service providers just need to provide proper APIs. There is hope that they will do so eventually.