All my (= Alvaro Carballo Garcia = Custom Solvers 2.0 = varocarbas) online activity is meant to provide a comprehensive enough picture of myself, eminently focused on professional (programming/engineering) aspects and for potential clients. So, basically all what I do online as Custom Solvers 2.0 or varocarbas is work (the hardest type of work: the one unpaid and which I don't like), whose outputs aren't immediate and whose point I cannot even defend other than as the best way to escape from a nonsensical situation.

I do understand that all this sounds a bit too cold and might even give an idea about my personality which isn’t true and even bad for my own interests, but cannot do anything else. My ideas regarding what I want and how to get it are very clear; and also the way to accomplish such a goal: being patient and focusing on my strongest points.

I usually deal with all the slightly-more-personal stuff in my Slashdot account (CustomSolvers2). Still pretty work-related and still pretty self-promotional, but this is what internet is for me: a channel to eventually find the clients which I want. Actually, I am a very intimacy-prone person who doesn't enjoy group, generic, unpersonalised activities of any kind; and all my past concessions on this front have proven to be tremendously bad ideas.

In summary, if you have an appealing-to-me project, please let me know (via my sites). If you are looking for technical/programming/engineering stuff, always from an objective and honest perspective, you might like quite a few of my online outputs, even perhaps some of the non-technical stuff; but, please, bear in mind my motivation and conditions. If you are just looking for any kind of social interaction, better don't forget that my current tolerance with empty, generic, meaningless, etc. conventions or with all what is associated with them (e.g., misinterpretations, prejudices, group-thinking, etc.) is none.


This is just to clarify that the referred problems to move the bots from the shared server to a local computer consist almost exclusively in re-tuning their operating conditions (firstly, just to reach the performance so far; once there, to maximise the better hardware availability). What is a kind-of long and tiring process.

Bear in mind that with "the bots" I am referring to a complex reality involving a big number of inter-depending applications (40 to 70; although this number might be notably higher within the short term) performing not precisely simple actions, which are very difficult to be tracked. For example, in its current over-optimised stage, 1 bot performs the following actions:
- Analyses the robot.txt of the domain (=> network resources).
- Visits certain number (=> parameter to be tuned) of pages of that site (=> more network resources).
- Looks for links in each page up to certain point (=> another parameter to be tuned).
- To perform all the actions until here, numerous requests (reading/writing) to over 5 tables of the associated database (=> DB resources) are done.
- In case of finding external links, it would communicate with other 2-5 tables and perform different actions on them (=> more DB resources).

Just running the same bots with exactly the same configuration under different hardware (even higher) resources might provoke a serious drop-down of their performance. Parameters which work fine for certain CPU/memory are horrible for a different one.

The transition from shared-server to a full computer has been particularly difficult because all the previous tuning was based upon non-existing-now restrictions. Just re-taking full control of database + CPU + memory + bandwidth to adequately manage the behaviour of the bots hasn't been straightforward at all (still struggling with it). To not mention finding the new ideal bot parameters (e.g., number of sites, number of links, time per execution, etc.) under the new conditions or tuning up the database (MySQL). On top of all that, I have been working on some backup sub-systems, although this would have been done anyway even in case of continuing on the shared server.

The bots are currently running from a computer which I will most likely change next week. That is, I will have to re-tune everything next week, although the required effort will be much lower then.


The web-domain ranking has already reached the 1M records threshold despite the problems associated with moving to a local setup (as explained in my last post).

This seems a good moment to highlight the perhaps-not-too-evident limitations associated with the current "everything is connected" approach (i.e., starting the analysis from a single domain). The bots need some time not just to account for a higher number of domains, but also for improving the reliability of the ranking. The crawling process started in a US domain and that's why US/English-speaking/western domains have a starting advantage. To not mention the numerous problems associated with the first stages of a reasonably complex system, what provokes relevant amounts of information to be regularly ignored. That's why this is an iterative (or multi-round if you wish) approach which needs time to become reliable enough.

I am including a clear 1-10 assessment (dependability, whose current value is 2) to help visitors get a quick idea about the status at the given moment. Also bear in mind that the lower positions are always the less reliable ones. Focusing just on the 25%-50% top positions at each point seems the most reasonable approach.

Note that I have also fixed some issues in the search functionality of which I wasn't aware. Remember that I am the only person dealing with everything (Custom Solvers 2.0 and varocarbas public/private activity). Sometimes, it might become a bit too hard, mainly when dealing with unexpected complications like having to suddenly move all the bots to a local setup (still working on this; hopefully, everything will be over by next week). So, if you find any problem in any of my two sites, please let me know because I am certainly not aware of it (fixing any problem right away is precisely what characterises my work).

Lastly, I want to highlight that all this unplanned (not-getting-a-penny/not-too-rewarding) over-work on the domain-ranking front has provoked delays in some of my pending public outputs. For example, the completion of contents (which will be hopefully finished within the next days) or the release of the next FlexibleParser part, DateParser (originally planned by March 2017, a deadline which might not be met).

What is wrong in G+ lately with the post visibility? I have to post various times to make sure that unlogged users see my post!! This time has been very curious: everyone could see it after I posted it, but suddenly the same URL output an error?!

The attached picture shows the view when I am logged-in (in Spanish, but it clearly says "public") on the left-hand side and un-logged in right-hand side?!


There has been a not-planned change in the domain ranking, whose immediate consequence is that the bots will stop running on the (shared) server.

I am currently preparing a local (= my office) setup to allow the bots continue retrieving information. Visitors of will still be able to browse through the databases as so far. For the time being (at least, during the current week), the bots will be running slower; additionally, the databases will be updated only once per day.

The reasons for this sudden change aren't too relevant and my opinion about my hosting provider (MDDHosting) is still very good. In any case, this fact does seem to be a confirmation that running a system like this on a shared server (even after having over-optimised every detail) isn't possible for different reasons.

The new local hardware conditions aren't clear yet, although they will certainly be quite restricted too. Note that the whole point of this attempt (and all the public activity of Custom Solvers 2.0 and varocarbas) is to prove my programming/building-complex-&-optimised-pieces-of-software skills to potential clients. That's why dealing with restricted hardware isn't precisely a negative aspect, but actually almost a requirement.

I will write a post explaining all the changes as soon as the new setup will be ready, most likely, by next week.

Attached to this post, I am showing one last pathetically stupid bot. This little piece of whatever has been visiting various times per day during the last quite a few months(!!). Out of all the pathetically stupid bots which I have mentioned here, this is undoubtedly the most persistent one (= the most stupid one).

The page with which this dumb bot is so obsessed is a RSS feed containing minor information about the formal projects: the ones being numbered and shown in the LHS menu of its front-page.

At the moment, I am not planning to add any new project. Note that they imply a relevant amount of effort (for what I don't get a penny); additionally, the intended goal (i.e., showing more comprehensive samples of my work) is pretty much fulfilled already.

I will certainly continue updating (currently working on the domain ranking,, but by relying on formats which are less time-consuming than whole projects.

The nice graph attached to this post shows the current behaviour of CPU & databases in the varocarbas server (I mean... the portion of the shared server which it uses). Note that, without the domain-ranking crawling bots, this graph would be almost zero as far as is very efficient ( too).

At the moment, the bots are retrieving around 3500 new domains/hour (going above 4000 under good enough conditions), what is a performance notably better than what I was expecting just 1 week ago. So, my over-optimisation work (the way in which the bots retrieve and store information, coordination between bots, databases configuration, etc.) seems to have turned out quite well.

There are others graphs (about memory, bandwidth, etc.) but this is the most relevant one because of showing the only not-automatically-restricted hardware feature: DB connections (the blue line). For example, if the bots consume more than 100% CPU (= 1 core), the system would automatically restrict them. But if the database graph goes above 100% (something which has recently happened), the system wouldn't restrict the bots, other domains in the shared-server would be affected and my hosting provider would come back to me (as they recently did :)).

It is very important to bear in mind that everything in (including the search functionality or the custom URLs to browse through the ranking) uses the same resources. So, consuming too many resources might also be translated into a slower site or a difficult (even impossible) domain-ranking search. Logically, things will keep getting worse, as far as the bigger the databases, the more resources will be required to perform the same actions.

Anyway, really happy with how are things going, with how the server (+ hosting provider) is behaving so far and just looking forward to getting a big and reliable enough domain ranking.
