Profile cover photo
Profile photo
Custom Solvers 2.0
68 followers -
Custom Software Development & Numerical Modelling
Custom Software Development & Numerical Modelling

68 followers
About
Custom Solvers's posts

Post has attachment
GOOGLE-RELATED (NEW) WEIRDNESS

I have seen right now a new difficult-to-justify behaviour of a Google-related product and varocarbas.com domain ranking (read my previous posts about the apparently-arbitrary visibility restrictions of this account, curiously when talking about that ranking).

In the attached picture you can see what Google's Chrome shows when visiting a search-related page of that domain ranking. Apparently, it thinks that the page is written in Portuguese (?!) and offers the option of translating it to English (the default language I have in that browser). Bear in mind the following:
- There isn't a single Portuguese word in any of those pages (other than, eventually, the name of some of the listed domains). Most of varocarbas.com contents are written in English; it also includes some parts in Spanish, but not in the domain-ranking pages. I cannot speak Portuguese (and haven't ever tried to learn it or written a single word in that language). I haven't ever worked with Portuguese-speaking people. As a curious note, my name (Alvaro) is Spanish, but might also be Portuguese (or Italian or...).
- Secondly, this weird warning disappears when navigating to any other varocarbas.com page which isn't part of the HTML versions of the domain ranking search functionality (i.e., any URL not including https://varocarbas.com/domain_ranking_search/keyword_to_search or https://varocarbas.com/domain_ranking_search/raw/keyword_to_search. You can find more information about how to maximise the URL-friendliness of this search functionality in https://varocarbas.com/url_friendly/).

Just in case anyone has even a single doubt: I am not a blind supporter/critic of any company, site, product, etc. In most of the cases, I am not even a concerned consumer keen on letting everyone know about corporate bullshit. On the other hand, I am trying to differentiate myself (as a programmer, as a company, even as a person) from the every-day-more-common crap which shares with me my main communication channel (= internet). To not mention ignorant, incompetent (or beyond: extremely clueless but unaware of that fact idiots), unfair, invasive and dishonest individuals who seriously think that they have even a chance when someone like me is around.

Note that I am not sure if these "issues" between Google products and http://varocarbas.com (domain ranking) are related at all (+ other curious "phenomena" which I have been seeing lately; curiously, always affecting http://varocarbas.com and never customsolvers.com, even though both sites are part of exactly the same reality: me at work). In fact, I don't even care (for more than the few minutes which I have spent in writing this post), because this is what G+ has become to me: part of my online self-promotion where I share small bits of my personality/knowledge, but a quite secondary one. The place where I store curious, weird, secondary, Google-related-weirdnesses, not-too-logical events.

Lastly, I want to remind that, for the time being, all the domain-ranking related updates will be included in the corresponding http://researchgate.net page (https://www.researchgate.net/project/Web-Domain-Ranking).

Photo

Post has attachment
ABOUT MY ONLINE ACTIVITY

All my (= Alvaro Carballo Garcia = Custom Solvers 2.0 = varocarbas) online activity is meant to provide a comprehensive enough picture of myself, eminently focused on professional (programming/engineering) aspects and for potential clients. So, basically all what I do online as Custom Solvers 2.0 or varocarbas is work (the hardest type of work: the one unpaid and which I don't like), whose outputs aren't immediate and whose point I cannot even defend other than as the best way to escape from a nonsensical situation.

I do understand that all this sounds a bit too cold and might even give an idea about my personality which isn’t true and even bad for my own interests, but cannot do anything else. My ideas regarding what I want and how to get it are very clear; and also the way to accomplish such a goal: being patient and focusing on my strongest points.

I usually deal with all the slightly-more-personal stuff in my Slashdot account (CustomSolvers2). Still pretty work-related and still pretty self-promotional, but this is what internet is for me: a channel to eventually find the clients which I want. Actually, I am a very intimacy-prone person who doesn't enjoy group, generic, unpersonalised activities of any kind; and all my past concessions on this front have proven to be tremendously bad ideas.

In summary, if you have an appealing-to-me project, please let me know (via my sites). If you are looking for technical/programming/engineering stuff, always from an objective and honest perspective, you might like quite a few of my online outputs, even perhaps some of the non-technical stuff; but, please, bear in mind my motivation and conditions. If you are just looking for any kind of social interaction, better don't forget that my current tolerance with empty, generic, meaningless, etc. conventions or with all what is associated with them (e.g., misinterpretations, prejudices, group-thinking, etc.) is none.

ABOUT THE DIFFICULTIES TO MOVE THE BOTS

This is just to clarify that the referred problems to move the bots from the http://varocarbas.com shared server to a local computer consist almost exclusively in re-tuning their operating conditions (firstly, just to reach the performance so far; once there, to maximise the better hardware availability). What is a kind-of long and tiring process.

Bear in mind that with "the bots" I am referring to a complex reality involving a big number of inter-depending applications (40 to 70; although this number might be notably higher within the short term) performing not precisely simple actions, which are very difficult to be tracked. For example, in its current over-optimised stage, 1 bot performs the following actions:
- Analyses the robot.txt of the domain (=> network resources).
- Visits certain number (=> parameter to be tuned) of pages of that site (=> more network resources).
- Looks for links in each page up to certain point (=> another parameter to be tuned).
- To perform all the actions until here, numerous requests (reading/writing) to over 5 tables of the associated database (=> DB resources) are done.
- In case of finding external links, it would communicate with other 2-5 tables and perform different actions on them (=> more DB resources).

Just running the same bots with exactly the same configuration under different hardware (even higher) resources might provoke a serious drop-down of their performance. Parameters which work fine for certain CPU/memory are horrible for a different one.

The transition from shared-server to a full computer has been particularly difficult because all the previous tuning was based upon non-existing-now restrictions. Just re-taking full control of database + CPU + memory + bandwidth to adequately manage the behaviour of the bots hasn't been straightforward at all (still struggling with it). To not mention finding the new ideal bot parameters (e.g., number of sites, number of links, time per execution, etc.) under the new conditions or tuning up the database (MySQL). On top of all that, I have been working on some backup sub-systems, although this would have been done anyway even in case of continuing on the shared server.

The bots are currently running from a computer which I will most likely change next week. That is, I will have to re-tune everything next week, although the required effort will be much lower then.

DOMAIN RANKING 1M & SOME CLARIFICATIONS (ATTEMPT 3)

The varocarbas.com web-domain ranking has already reached the 1M records threshold despite the problems associated with moving to a local setup (as explained in my last post).

This seems a good moment to highlight the perhaps-not-too-evident limitations associated with the current "everything is connected" approach (i.e., starting the analysis from a single domain). The bots need some time not just to account for a higher number of domains, but also for improving the reliability of the ranking. The crawling process started in a US domain and that's why US/English-speaking/western domains have a starting advantage. To not mention the numerous problems associated with the first stages of a reasonably complex system, what provokes relevant amounts of information to be regularly ignored. That's why this is an iterative (or multi-round if you wish) approach which needs time to become reliable enough.

I am including a clear 1-10 assessment (dependability, whose current value is 2) to help visitors get a quick idea about the status at the given moment. Also bear in mind that the lower positions are always the less reliable ones. Focusing just on the 25%-50% top positions at each point seems the most reasonable approach.

Note that I have also fixed some issues in the search functionality of which I wasn't aware. Remember that I am the only person dealing with everything (Custom Solvers 2.0 and varocarbas public/private activity). Sometimes, it might become a bit too hard, mainly when dealing with unexpected complications like having to suddenly move all the bots to a local setup (still working on this; hopefully, everything will be over by next week). So, if you find any problem in any of my two sites, please let me know because I am certainly not aware of it (fixing any problem right away is precisely what characterises my work).

Lastly, I want to highlight that all this unplanned (not-getting-a-penny/not-too-rewarding) over-work on the domain-ranking front has provoked delays in some of my pending public outputs. For example, the completion of customsolvers.com contents (which will be hopefully finished within the next days) or the release of the next FlexibleParser part, DateParser (originally planned by March 2017, a deadline which might not be met).


Post has attachment
What is wrong in G+ lately with the post visibility? I have to post various times to make sure that unlogged users see my post!! This time has been very curious: everyone could see it after I posted it, but suddenly the same URL output an error?!

The attached picture shows the view when I am logged-in (in Spanish, but it clearly says "public") on the left-hand side and un-logged in right-hand side?!
Photo

DOMAIN RANKING BOTS GONE LOCAL

There has been a not-planned change in the http://varocarbas.com domain ranking, whose immediate consequence is that the bots will stop running on the http://varocarbas.com (shared) server.

I am currently preparing a local (= my office) setup to allow the bots continue retrieving information. Visitors of http://varocarbas.com will still be able to browse through the databases as so far. For the time being (at least, during the current week), the bots will be running slower; additionally, the http://varocarbas.com databases will be updated only once per day.

The reasons for this sudden change aren't too relevant and my opinion about my hosting provider (MDDHosting) is still very good. In any case, this fact does seem to be a confirmation that running a system like this on a shared server (even after having over-optimised every detail) isn't possible for different reasons.

The new local hardware conditions aren't clear yet, although they will certainly be quite restricted too. Note that the whole point of this attempt (and all the public activity of Custom Solvers 2.0 and varocarbas) is to prove my programming/building-complex-&-optimised-pieces-of-software skills to potential clients. That's why dealing with restricted hardware isn't precisely a negative aspect, but actually almost a requirement.

I will write a post explaining all the changes as soon as the new setup will be ready, most likely, by next week.
Wait while more posts are being loaded