Hey all,
If you want to keep up with all developments please join the slack channel, as I won't have time to continue pasting info from there onto here.
Join us on in the
channel. See you there!

For people just joining this community, this is the index of the posts where we discussed every lab. Hope you find those posts useful!

Getting back in after VM shutdown:
restart VM in virtual box (ignore its log-in request)
then go into: bigdata-bootcamp/vm folder
and do vagrant up
Once it is ready, do vagrant ssh

Not exactly labs, but close

From +Romeo Cabrera :
Try to complete the labs which are available in the prof's website
I'd say more you need
1) experience with ML
2) being comfortable with Linux CLI
3) having experience with data (transformations, joins, sql, etc)

Big Data Labs:
G+ Big Data Labs Prep community:
Big Data Bootcamp - Bigdata Bootcamp
GT CSE big data bootcamp
Gatech OMSCS CSE8803 Big Data for Health Prep
Spring 2017 (21kB)

this is a good scala crash course:
this one was also a popular tutorial:
YouTube Derek Banas
Scala Tutorial

Tbh I think the best way to learn scala would be getting your hands dirty. Not sure if watching 20? 30? hours of a MOOC would be worth the time.

I learned it hacking my way through the assignments (which were very similar to the labs) and googling when needed ("how outer join spark scala ", " how get top rows spark scala ", "how aggregate two fields spark scala")... you get the idea

YMMV, of course :)

Hi all,
Someone has recommended using the Scala Cookbook , and I just found we have free safari access to it.
Go to
then scroll down and click on Safari Books Online then search for Scala Cookbook

Welcome to all students prepping for Fall 2017!

I'm about half-way through the week 1 material for
Functional Programming Principles in Scala by École Polytechnique Fédérale de Lausanne on Coursera

The class is in week 2 right now, but there's time to catch up!

My own plan is to learn scala first and then do the sun labs. I think there may be some of us who want to start with the sun labs. Feel free to post info about your progress, questions, tips, etc. regardless of which you decide to do first.

The posts from Romeo Cabrera are from prep during last fall so pay the dates on the lab schedule no mind.

Glad you're all here!

LAB 9: Spark MLIB

Discussion for the Spark Basic lab: "Objectives: Understand input to MLlib. Learn to run basic classification algorithms. Learn to export/load trained models. Develop models using python machine learning module.."

LAB 8: Spark SQL

Discussion for the Spark SQL lab: "Learning Objectives: Load data into Spark SQL as DataFrame. Manipulate data with built-in functions. Define a User Defined Function (UDF).."

+Romeo Cabrera any idea if this class is considered as an elective for 'Machine Learning Specialization' ? Asking this as it does not show up as elective in DegreeWorks.
