Profile cover photo
Profile photo
Peter Hoffmann
95 followers -
Software Engineer, Python Developer
Software Engineer, Python Developer

95 followers
About
Posts

Post has shared content
Absolute Limits to Scalability

How many machines should my Hadoop Cluster have? For storage, that is an easy calculation. But how many cores or CPUs do I need?

I do believe that the HDFS block size and the data size can be used to establish an upper limit: If I have a block size of 128M and have a data set to process of say 100GB, I will have 800 blocks worth of data to process. With 10 cores per box, I cannot use more than 80 boxes to run my processing, and that is a hard upper limit.

There may be other side conditions that limit my maximum number of nodes even further.

Is this correct reasoning, or am I running off into a completely wrong direction?
Add a comment...

Looking for a #php #developer in #karlsruhe. If you know one, please contact me. #job
Add a comment...

Post has attachment

Post has attachment

Post has attachment

Post has shared content
"""Dear BDFL,

I'm writing my talk for a local PyCon (it is on Saturday - and I'm late as ever), and one of the questions I'm trying to answer is why Python doesn't have real information hiding the in the way of the C++ (or Ruby) ideas of public, protected and private.

Everything I've found online just mentions the state of what it is now, with leading underscore being considered private, and double underscore getting mangled with the classname, but these are still public (and we are consenting adults). Was there a particular driving force behind the access methods in Python? Or was it a collection of smaller things?

--Procrastinating in the Southern Hemisphere
"""

Dear Procrastinating,

There is actually some information hiding possible -- but only by writing C extensions. :-)

The main reason for making (nearly) everything discoverable was debugging: when debugging you often need to break through the abstractions (since bugs don't confine them to the nice abstractions you've created for your program :-) so I though it would be handy to be able to see anything from the debugger. And since the debugger is written in Python itself (for flexibility and a number of other reasons) I figured the same would apply to other forms of programming -- after all, sometimes debugging doesn't imply using a debugger, it may just imply printing a certain value. Again, too much data hiding would make things more complicated here.

The other observation was that even in C++, there are usually ways around the data hiding (e.g. questionable casts). Which made me realize that apparently other languages could live just fine with less-than-perfect hiding, and that hiding was an advisory mechanism, not an enforcement mechanism. So Python could probably be just fine with even-less-than-perfect hiding. :-)

Good luck with your talk, and enjoy the event!
Add a comment...

Post has attachment

Post has shared content
RSS-Feed eines Google+-Profils im Google Reader erstellen:

1. G+-Profil ID der Person über die URL seiner G+-Startseite ermitteln.
2. In Roogle Reader: "Add a subscription"
3. Dort eingeben: http://plusfeed.appspot.com/ gefolgt von der Profil-ID-Nummer
Add a comment...

Post has shared content

Post has shared content
Nice!
Fellow Identi.ca users: I created a utility to display a dent as an image. If you want to reference a dent, compose the URL by appending the notice number to http://dentimage.com/ and add ".png" to the end of it. Use the "Add link" button in G+. Here is +Evan Prodromou 's first post on Identi.ca. Here is the URL I used http://dentimage.com/1.png. Let me know if you have any thoughts or suggestions.
Add a comment...
Wait while more posts are being loaded