Profile cover photo
Profile photo
Xinh Huynh
87 followers -
Software developer, big data, data science. San Jose, CA.
Software developer, big data, data science. San Jose, CA.

87 followers
About
Xinh's posts

Post has attachment

Post has attachment

Post has attachment
Overview of Spark 2.0 Dataset / DataFrame API, Part 2
Introduction In Part 1 of this series, we examined type-safe operations with Datasets. In Part 2, we will cover untyped operations with DataFrames. Being untyped, DataFrames are well-suited for data exploration, ad-hoc queries, and data munging. DataFrame D...

Post has attachment
Overview of Spark 2.0 Dataset / DataFrame API, Part 1
Introduction Spark 2.0 features a new Dataset API . Now that Datasets support a full range of operations, you can avoid working with low-level RDDs in most cases. In 2.0, DataFrames no longer exist as a separate class; instead, DataFrame is defined as a spe...

Post has attachment

Post has attachment
Overview of Spark DataFrame API
Introduction Spark DataFrames were introduced in early 2015, in Spark 1.3. Since then, a lot of new functionality has been added in Spark 1.4, 1.5, and 1.6. More than a year later, Spark's DataFrame API provides a rich set of operations for data munging, SQ...

Post has attachment

Post has attachment
Reading JSON Nested Array in Spark DataFrames
In a previous post on JSON data, I showed how to read nested JSON arrays with Spark DataFrames. Now that I am more familiar with the API, I can describe an easier way to access such data, using the explode() function. All of the example code is in Scala, on...

Post has attachment

Post has attachment
Spark Window Functions for DataFrames and SQL
Introduced in Spark 1.4, Spark window functions improved the
expressiveness of Spark DataFrames and Spark SQL. With window functions, you
can easily calculate a moving average or cumulative sum, or reference a value
in a previous row of a table. Window func...
Wait while more posts are being loaded