Profile

Cover photo
Tom Klimovski
Works at EPAM Systems
Attended RMIT
Lives in Melbourne
610 followers|341,094 views
AboutPostsPhotosYouTube+1'sReviews

Stream

Tom Klimovski

Shared publicly  - 
 
Learn how to machine learn
Drive
How to Machine LearnHow to Machine Learn step 0: Decide on a project that will force you to put everything together, this was mine step 1: Install Anaconda (2.7) and Learn Pandas pandas cookbook things I wish I knew about pandas early on 10 min to pandas step 2: Get Comfortable with ML Theory & PyData https://w...
1
Add a comment...

Tom Klimovski
owner

Learning/Webinars  - 
 
Hortonworks webinars - can be quite useful. Here's the next one coming up:

Create a Smarter Data Lake with HP Haven and Apache Hadoop
1:00 PM Eastern / 10:00 AM Pacific
The Smart Content Hub solution from HP and Hortonworks enables a shared content infrastructure that transparently synchronizes information with existing systems and offers an open standards-based platform for deep analysis and data monetization. Join this webinar and learn how you can: 1/ Leverage a 100% of your data, including text, images, audio, video, and many more data types can be automatically consumed and enriched using HP Haven and Hortonworks Data Platform. 2/ Democratize and enable multi-dimensional content analysis and empower your analysts, business users, and data scientists to search and analyze Hadoop data with ease. 3/ Extend the enterprise data warehouse to synchronize and manage content from content management systems, and crack open the files in whatever format they happen to be in. 4/ Dramatically reduce complexity with enterprise-ready SQL engine.
1
Add a comment...

Tom Klimovski
owner

Hadoop  - 
 
Solid article: Learning How to Learn Hadoop;

Architecturally, Hadoop is just the combination of two technologies: the Hadoop Distributed File System (HDFS) 
that provides storage, and the MapReduce programming model, which provides processing[1] [2].
HDFS exists to split, distribute, and manage chunks of the overall data set, which could be a single file or a 
directory full of files. These chunks of data are pre-loaded onto the worker nodes, which later process them in 
the MapReduce phase. By having the data local at process time, HDFS saves all of the headache and inefficiency 
of shuffling data back and forth across the network.
1
Add a comment...

Tom Klimovski
owner

Hadoop  - 
 
 
Learn How Hadoop Fits in Modern Data Architecture at Strata http://bit.ly/1sgrkJL #hadoop
Learn How Hadoop Fits in Modern Data Architecture at Strata with our partners and discover how they integrate with Hortonworks Data Platform (HDP)
1
Add a comment...

Tom Klimovski

Shared publicly  - 
Few technology leaders have seen the forces of digital disruption so repeatedly and at such close quarters than Nigel Dalton, CIO of the REA Group.
1
Add a comment...
Have him in circles
610 people
firefly xu's profile photo
Sumeru Chatterjee's profile photo
Andrew Kurinnyi's profile photo
Benjamin Hogg's profile photo
Simon Colmer's profile photo
Jason Novinger's profile photo
Michael Chaudhary's profile photo
Sam Lin's profile photo
Baron King's profile photo

Communities

9 communities

Tom Klimovski
owner

Hadoop  - 
 
Join us in 2015 – Mark your Calendar

Mark your calendar for April 15-16, 2015 for Hadoop Summit Europe in Brussels, Belgium or June 9-11, 2015 for Hadoop Summit North American in San Jose, CA. Or block the whole week and attend the pre-conference activities. Call for abstracts for Hadoop Summit Europe will open in the next few days.
1
Add a comment...

Tom Klimovski
owner

Hadoop  - 
 
Why use Apache Spark?

Apache Hadoop has revolutionized big data processing, enabling users to store and process huge amounts of data at very low costs. MapReduce has proven to be an ideal platform to implement complex batch applications as diverse as sifting through system logs, running ETL, computing web indexes, and powering personal recommendation systems. However, its reliance on persistent storage to provide fault tolerance and its one-pass computation model make MapReduce a poor fit for low-latency applications and iterative computations, such as machine learning and graph algorithms.

Apache Spark addresses these limitations by generalizing the MapReduce computation model, while dramatically improving performance and ease of use.

Fast and Easy Big Data Processing with Spark

At its core, Spark provides a general programming model that enables developers to write application by composing arbitrary operators, such as mappers, reducers, joins, group-bys, and filters. This composition makes it easy to express a wide array of computations, including iterative machine learning, streaming, complex queries, and batch.

In addition, Spark keeps track of the data that each of the operators produces, and enables applications to reliably store this data in memory. This is the key to Spark’s performance, as it allows applications to avoid costly disk accesses. As illustrated in the figure below, this feature enables:

Low-latency computations by caching the working dataset in memory and then performing computations at memory speeds, and
Efficient iterative algorithm by having subsequent iterations share data through memory, or repeatedly accessing the same dataset
5
Add a comment...

Tom Klimovski
owner

Hadoop  - 
 
How-to: Process Time-Series Data Using Apache Crunch: by Jeremy BeardMay 

Did you know that using the Crunch API is a powerful option for doing time-series analysis? Apache Crunch is a Java library for building data pipelines on top of Apache Hadoop. (The Crunch project was originally founded by Cloudera data scientist Josh Wills.) Developers can spend more time focused on their use case by using the Crunch API to handle common tasks such as joining data sets and chaining jobs together in a pipeline. At Cloudera, we are so enthusiastic about Crunch that we have included it in CDH 5! (You can get started with Apache Crunch here and here.)

Furthermore, Crunch is a really good option for transforming and analyzing time-series data. In this post, I will provide a simple example for bootstrapping with Crunch for that use case.
1
Add a comment...

Tom Klimovski
owner

Hadoop  - 
 
In the previous article we described how to collect WiFi router logs with Flume to store in HDFS. This article will describe how we did the transformation, parsing, filtering and finally loading into Hive’s data warehouse.

Let’s start by looking at the raw data sample on HDFS.


Jean-Pierre König
Head of Big Data & Analytics - YMC AG
In the previous article we described how to collect WiFi router logs with Flume to store in HDFS. This article will describe how we did the transformation, parsing, filtering and finally loading into Hive’s data warehouse. Let’s start by looking…
2
2
Georg Zigldrum's profile photov Kumar's profile photo
Add a comment...

Tom Klimovski

Shared publicly  - 
 
Cello
1
Add a comment...

Tom Klimovski
owner

Hadoop  - 
 
 
Discover HDP 2.1: Apache Solr for Hadoop Search w/ @jcsears @Rohit2b @paulcodding http://bit.ly/1vtcRJG #hadoop
1
Add a comment...
People
Have him in circles
610 people
firefly xu's profile photo
Sumeru Chatterjee's profile photo
Andrew Kurinnyi's profile photo
Benjamin Hogg's profile photo
Simon Colmer's profile photo
Jason Novinger's profile photo
Michael Chaudhary's profile photo
Sam Lin's profile photo
Baron King's profile photo
Communities
9 communities
Places
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Currently
Melbourne
Previously
Toronto
Work
Occupation
I.T. Consulting
Employment
  • EPAM Systems
    Consultant, 2012 - present
  • ThoughtCorp
    Consultant, 2012
  • C3
    I.T. Consulting, 2012
  • Accenture
  • Agilent
Education
  • RMIT
Basic Information
Gender
Male
Birthday
June 23
Relationship
Married
Tom Klimovski's +1's are the things they like, agree with, or want to recommend.
Bumblebee - Ubuntu Wiki
wiki.ubuntu.com

Bumblebee Project. Bumblebee aims to provide support for NVIDIA Optimus laptops for GNU/Linux distributions. Using Bumblebee, you can use yo

Home
www.proofhub.com

ProofHub is an easy to use online project management software for effective planning and online collaboration over projects for their faster

Rubular: a Ruby regular expression editor and tester
www.rubular.com

Rubular is a Ruby-based regular expression editor and tester. It's a handy way to test regular expressions as you write them. Rubular is an

Oscilloscope Watch
www.kickstarter.com

The watch for the electronic geek. All the features of a watch combined with an oscilloscope and a waveform generator.

Minimal Sets
www.minimal-sets.com

Dj promotion sets, Electronic videos

An Absolute Beginner's Guide to Node.js
blog.modulus.io

Tutorial that covers what Node.js is and how to use it to build your first application.

face to gif
hdragomir.github.io

What is this? face to gif is a simple webapp that lets you record yourself and gives you an infinitely looping animated gif; What is the out

The UI Grail – jQuery UI Sliders on a Crosstab « Nic Bertino.
nicbertino.com

You'll have to stay with me and really be a sport until I can get a virtual machine going at home with a COGNOS training install on it. Unti

Summary — Shark 3.0a documentation
image.diku.dk

SHARK is a fast, modular, feature-rich open-source C++ machine learning library. It provides methods for linear and nonlinear optimization,

Google+
market.android.com

Real-life sharing rethought for the web, wherever you are. Google+ for mobile makes sharing the right things with the right people a lot sim

Send from Gmail (by Google) - Chrome Web Store
chrome.google.com

Makes Gmail your default email application and provides a button to compose a Gmail message to quickly share a link via email

Plex - A Complete Media Solution
www.plexapp.com

Plex on Your Desktop. Experience your media on a visually stunning, easy to use interface on your computer or Home Theater PC. Your media ha

Hacker News
news.ycombinator.com

Hacker News new | comments | ask | jobs | submit, login. 1. Remembering a relationship, one chat at a time (good.is). 101 points by danso 1

The pioneer
www.googleartproject.com

All; acrylic; bronze; ceramic; chalk; clay; cloth; copper; etching; glass; gold; ink; ivory; limestone; lithograph; marble; oil painting; pa

SpaceX
plus.google.com

SpaceX designs, manufactures and launches the world’s most advanced rockets and spacecraft.

Liquibase | Database Refactoring | home
www.liquibase.org

sql-database.jpg You never develop code without version control, why do you develop your database without it? Liquibase is an open source (A

Model X | Tesla Motors
www.teslamotors.com

Model X: Utility Meets Performance.

love the beer list
Food: GoodDecor: Very GoodService: Very Good
Public - 2 years ago
reviewed 2 years ago
Public - 4 years ago
reviewed 4 years ago
Work is awesome
Public - 4 years ago
reviewed 4 years ago
Michelle is there providing top-notch service. Duck always goes down especially well with the 1lt Asahi cans
Public - 4 years ago
reviewed 4 years ago
9 reviews
Map
Map
Map
I'm yet to find a better steak-house
Public - 4 years ago
reviewed 4 years ago