The goal of this training workshop is to provide introduction and guidance on how to develop and run applications with Hadoop and Spark clusters. This course is being offered to in-person attendees on UT’s main campus in POB room 2.402 and to remote attendees via webcast. Due to support considerations, access to Wrangler for hands on exercises will be restricted to confirmed in-person attendees.
There are four sessions in this training workshop. In the first session we will give an introduction on the MapReduce programming model and how to develop Java applications with the MapReduce library to use on a Hadoop cluster. In the second session we will demonstrate how to run Hadoop applications, using the Hadoop Streaming interface to utilize other programming languages and other Hadoop based libraries. In the third session we will give an introduction on developing Spark applications with Java. And in the fourth session of the workshop we will will demonstrate different ways to utilize Spark clusters including running Spark applications using spark-shell and other packages.
This training will primarily use the Java programing language. The participants are advised to have prior knowledge and experiences with Java application development. During the training sessions, in-person participants will have opportunities to practice with prepared exercises and examples on Wrangler cluster. In-person attendees who would like to participate in class exercises should also have basic knowledge on working with Wrangler cluster or review our previous training materials on this topic before the workshop starts at https://portal.tacc.utexas.edu/training#/session/18