Shared publicly  - 
Accessing a database - ownCloud continued

( EDIT: continued from )

private static function processQuery( $query ) {

No need to look at the code. Really, don't.

Here is the docblock.

@brief does minor changes to query
@param string $query Query string
@return string corrected query string
This function replaces PREFIX with the value of $CONFIG_DBTABLEPREFIX and replaces the ` with ' or " according to the database driver.

Yes. Fucking with quoting, using str_replace against SQL command strings. What could possibly go wrong?

Hm, maybe I should be looking at the code, too.


                }elseif( $type == 'mssql' ) {
                        $query = preg_replace( "/\`(.*?)`/", "[$1]", $query );
                        $query = str_ireplace( 'NOW()', 'CURRENT_TIMESTAMP', $query );
                        $query = str_replace( 'LENGTH(', 'LEN(', $query );
                        $query = str_replace( 'SUBSTR(', 'SUBSTRING(', $query );
                        $query = str_ireplace( 'UNIX_TIMESTAMP()', 'DATEDIFF(second,{d \'1970-01-01\'},GETDATE())', $query );

                        $query = self::fixLimitClauseForMSSQL($query);

That's not exactly what the docblock says.

We are actually looking at a much more fundamental problem here.

So ownCloud probably used MDB2 as a database access layer and a database modeling layer at first, and later went PDO, because MDB sucks a lot, conceptually, functionally and performance wise.

What MDB2 doesn't do, and PDO doesn't, either, is query building. There are very few frameworks that can pull this off and don't suck, hard.

Nextstep had DBKit as a full scale ORM, and that worked to a good degree for static queries, at least when I tested it in the 90ies against Oracle and Sybase. Ruby has RoR and ActiveRecord, and that does work to some degree, except that Ruby does not really scale. It has at least a solid exit strategy, and that's worth a lot in database abstractions.

MDB did one thing (data modeling), and not very well, and didn't do the other properly (query building). I know because at the time MDB was being discussed and written, I was still active in the PHP community and did try to do the DBKit ORM thing in PHP (Version 3 at that time) and it simply did not pay off in a request-based framework.

Basically, you have to parse a lot of crap, instantiate a lot of objects and shovel a lot of strings in order to build a string, which you send to the database for parsing. At the end of the request you demolish everything and at the next request you start from scratch. The PHP execution model and this kind of abstraction did not go very well together, at all.

In an application server that would change, you'd instantiate all the modelling and query building crap and then just ask for the query (probably cached and prebuilt from a previous request). But app servers scale differently than per-request models (they cost you per session, not per concurrent request, basically their n is factor 100 or more higher for the same number of active users, and their memory footprint probably is larger as well).

In a query compiler you'd model all the crap and generate an SQL dictionary for static queries for the target database as part of the deploy and schema generation.

Anyway, what ownCloud does here:

a. They are using MDB2 for database access: to build a connection to a database, abstract away the actual PHP commands to do that, which are database specific.

b. They are using MDB2 for schema management: To compile their supposedly database agnostic XML into SQL DDL for the target database. Only that it is not: Their str_replace() processing tries to fix stuff that MDB2 does not model, or they try to work around MDB2 not knowing intricacies of certain database server versions.

That is, MDB2 is simply not up to that task, at all.

c. They are pushing literal SQL into their application and push this out to the database, which may be any of MySQL, PostgreSQL, sqlite, Oracle or MS SQL Server.

MDB2 does not help here at all, and PDO does not, either, so the SQL has to be as portable as possible, and then needs to be massaged in a very extremely bad way in order to be able to run on anything that is not MySQL (and probably an old version of MySQL, assuming MyISAM and 4.x TIMESTAMP semantics and other crap).

That's of course doomed. Evil. And a shitload of potential security problems.

The point of using a bindParam interface is that you do not str_replace quote signs, mysql_real_escape exotic unicode characters and other stuff. Yet, ownCloud trying to honor the pwn in its name does this all over the fucking place.

The proper solution would be

a. a query building Fullsize ORM. See my remarks above. This is hard to get right, and a projects in its own right, I but several are available for PHP.

b. application specific, target database specific literal SQL, as static as possible, hidden away in manually written bookkeeper functions that persist your stuff, and a thin layer of support functions making this less painful.

Solution a is not in any of the tools used. Solution b has not been chosen.

Instead this. pwning this is harder than normal, b/c they at least use bindParam in their DML, but the approach would be to go and try to find combinations of dynamic query building that can be tricked into turning their numerous, distributed and uncontrolled str_replaces into a doomsday machine.

When the NSA shit hit the fan, a lot of people stated that they would now host their own stuff.

And installed this.

This is an improvement?  R U sure? What's next? TikiWiki advocacy?

I think I'd rather trust Google and the NSA with my data right now. Seriously.

( *EDIT: continued in )
Florian Haas's profile photoHarald Wagener (oliof)'s profile photoStephan Adig's profile photoKristian Köhntopp's profile photo
class PDOStatementWrapper{
        public function execute($input=array()) {
                        if ($type == 'mssql') {
                                $input = $this->tryFixSubstringLastArgumentDataForMSSQL($input);

        private function tryFixSubstringLastArgumentDataForMSSQL($input) {
                $query = $this->statement->queryString;
                $pos = stripos ($query, 'SUBSTRING');
(Query ohne Parser zerparsen und SUBSTRING aufrufe anpassen oder die Query zerstören, wer kann es schon genau sagen)

Ich habe Kopfschmerzen und will ins Bett.
Wenn die, die es besser könnten, lieber Google nehmen, wird das wohl auf lange Sicht auch nicht besser. Danke trotzdem für den Hinweis. Es schauen viel zu wenige Experten durch open source code.
Das letzte mal als ich ähnlich hässliche Replace Orgien gesehen habe war, als ich in den S9Y Code geschaut habe ...
Yep, da habe ich mich auch nicht rangetraut, aus denselben Gründen. Ein paar ALTER TABLE ADD INDEX an den schlimmsten Stellen mußten reichen.
Wenn ich Schuhe besser nähen könnte als die verfügbaren Schuster, würde ich meine Schuhe selber nähen.
+Arthur S. hast du das gelesen was der alte mann schreibt ?

kannst du da was machen? evtl umschreiben in python oder wenigstens perl?

und ja upstream is worth to ping :-)

Den ein oder anderen Index habe ich manuell in der DB eingefügt. Da mein S9Y auf PostgreSQL läuft (und ich wahrscheinlich der einzige bin der das mit der Plattformunabhängigkeit ausprobiert), ist da jedesmal sehr viel Ärger angesagt.
Just out of curiosity: do you think Google (or the NSA, for that matter) are full of whiz kids and there are never any fuckups like this in their code?
Ich dachte mir schonI, dass Owncloud nicht sauber geschrieben ist nur da ich kein PHP kann konnte ich mir das nie im Detail ansehen. Nur was bräuchte man um die Situation zu verbessern? Mit einem Senior Webentwickler der PHP und Python kann wäre ich ja auch bereit bei einer Django Implementierung zu helfen aber ich vermute, dass die Owncloud Entwickler ein ähnliches Problem haben. 
I do not have to believe, but know for a fact that Google has an engineering culture that relies on the code review. Their corporate ideology despises hacks such as regexing parseable data.

This code would never have seen a production data center at Google, and would have been an issue at the authors or architects next performance review at Google.

Corporate processes break down sometimes, so it might be that isolated examples of such thinking exist somewhere in the more shady places of their codebase, but I would be very surprised if I or anybody were to find sufficient instances of it to make this a rule instead of an isolated flaw that somehow escaped review. 
But they still won't show you, or anyone else, their production code. Just sayin'.

"Can't happen", "would have never gone to production", "would have gotten the person fired" etc. are all generalizations I ain't buying.
I have to agree with +Florian Haas least here in Germany no one will be fired for moronic code. 
But I think in the US it is different, depending on the company and in which department you are coding, and eventually when you have a good OPS Security with brilliant DevOps PPL
Google does not manage people like Darth Vader does.

They do set personal goals for the near and mid-term, and them being Google, they actually track and measure these. In this case, aquistion of a coding style closer to the corporate engineering standards as documented in their style guide, language readability achievements and similar things would be things that I as an outsider would guess are things that would be on the table (Not being a Googler, just reading up about how they manage engineers compared to for example my employer does, and how we manage cases like these internally, this is a complete guess on my side, based on the papers and statements I internalized and my personal experience. That is, I am just making this up as I write it, as usual).

I do know for a fact that they do require reviews of architecture and implementation. See also
Having stringent review and audit processes and a strong engineering culture did the NSA spooks a fat lot of good with Snowden.

Good on 'em, in that case.
+Florian Haas "Just out of curiosity: do you think Google (or the NSA, for that matter) are full of whiz kids and there are never any fuckups like this in their code?"

That is precisely the problem that HR and IT management for a dev environment have to solve. "On the average you will only hire only average people" is an old IBM management saying.

It is a management objective for any IT dev management together with HR to create a culture and environment, in which these people can grow to become better and achieve their personal peak, safely.

That is, you need to have an environment where best practice is discussed, common violations of these best practices are showcased, discussed and contrasted with their fixes, postmortem style (sysadmin version:, community

That is, you need to have process where people that are not brilliant whiz kids have the ability to fail, with their shortcuts being detected, called out and where they are tasked with doing it properly.

That is, you have to create a management environment that values "enginerring it properly" over "ship dat now!", and actually provides ressources in the form of time and people to do stuff properly.

That is you have to have a culture that defines craftsmanship standards for any subfields used in your company, and then actually lives them, demonstrates them among the peers and celebrates and rewards them in their culture and evaluation.
And that style can be applied to internal code reviews as well.
people need to learn and they should make mistakes cause we can learn from mistakes

but if one of the ppl is doing 3 or more of the same mistakes in several releases tjan i think it's allright to rethink the decision of hiring this guy/gal
+Stephan Adig if the same mistake is made in several releases it's not a failure of the person making them, it's a failure of the company in allowing these failures to happen. Edited to clarify: Firing someone usually doesn't help avoid these failures, it's just someone new will fail the first time that hasn't done it before.
+Harald Wagener I think we need to make a diff between Juniors and Seniors here. I expect from a Senior to not make stupid mistakes in the code which violates several code policies (which are documented etc.) A Junior has to learn and has to make mistakes and needs a good lead who helps him to not make mistakes again :) 
+Harald Wagener for sure. But introducing already known bugs, in several releases, and being a Senior? 
Add a comment...