Change Location × Sunnyvale, CA

    Find Me

    • Use Current Location

    Recent Locations

      2 Day Hadoop Ecosystem fundamentals October 5-6, 2013 in Sunnyvale

      • 2 Day Hadoop Ecosystem fundamentals October 5-6, 2013 Photo #1
      1 of 1
      October 5, 2013 - October 6, 2013

      Saturday   9:00 AM - Sunday 5:00 PM

      1085 El Camino Real
      Sunnyvale, California 94087

      Map
      Performers:
      • No Performers Listed
      0 people track this event
      EVENT DETAILS
      2 Day Hadoop Ecosystem fundamentals October 5-6, 2013

       

      Course Description

      DatumFora (an ExpoNential Inc.'s company) is offering this extensive weekend class on Hadoop platforms. This is a fast paced, vendor agnostic, technical overview of the Hadoop landscape. No prior knowledge of databases or programming is assumed. This survey course is targeted towards both technical and non-technical people who want to understand the emerging world of Big Data, with a specific focus on Hadoop. 

      Students will experience real Hadoop clusters and the latest Hadoop distributions. We will discuss vendor offerings for Hadoop including Cloudera, Hortonworks, and MapR. The lab work will be conducted on Cloudera based deployments to facilitate hands-on experience.


      Duration

      October 5-6, 2013 (Sat-Sun)

      9am - 5pm ( Lunch will be provided)


      Location

      Domain Hotel

      1085 E El Camino Real  

      Sunnyvale, CA 94087

       

      Audience

      Engineers, Programmers, Networking specialists, Managers, Executives


      Ecosystem Components Covered

       HDFS, MapReduce, HBase, Hive, Sqoop


      Objectives

       

      - Introduce students to the core concepts of Hadoop

      - Deep dive into the critical architecture paths of HDFS, MapReduce and HBase

      - Teach the basics of how to effectively write Hive scripts

      - Explain how to choose the correct use cases for Hadoop

      - Give each student access to an individual 1-node Hadoop cluster in Rackspace to run through hands-on 

      - Provide links to the best books, blog posts and videos for students to learn more about Hadoop on their own

       

      Course Outline

      Day 1:    Big Data, HDFS and MapReduce Primer

       

                      Hadoop

      - Parallel Computer vs. Distributed Computing

      - Brief history of Hadoop

      - Scaling with Hadoop

      - RDBMS/SQL vs. Hadoop

      - Hadoop Daemons introduction: NameNode, DataNode, JobTracker, TaskTracker

      - Intro to the Hadoop ecosystem: HDFS, MapReduce, Pig, Hive, HBase, ZooKeeper

      - Vendor Comparison - Hardware + Software recommendations for Hadoop

                      LAB #1: Hadoop Installation, Hadoop cluster specific operations and sample job execution

                     

                      HDFS 

      - Linux File system options

      - Sample HDFS commands

      - HDFS sample architecture at Yahoo!

      - Data Locality

      - Rack Awareness

      - Write Pipeline

      - Read Pipeline

      - NameNode architecture (EditLog, FsImage, location of replicas, safe mode)

      - Secondary NameNode architecture

      - DataNode architecture

      - Heartbeats

      - Block Scanner

      - Fsck Health Check + file breakdown

      - Balancer

                      LAB #2: Various HDFS specific operations

       

                      MapReduce 

      - MapReduce Architecture

      - JobTracker/TaskTracker

      - Combiner

      - Partitioner (shuffle)

      - Counters

      - Speculative Execution

      - Distributed Cache

      - Job Scheduling (FIFO, Fair Scheduler, Capacity Scheduler)

                      LAB #3: Understanding MapReduce jobs through execution

       

      Day 2:  Hadoop Ecosystem

                      Real-time I/O with HBase 

      - HBase versions and origins

      - HBase architecture

      - HBase core concepts

      - HBase vs. RDBMS

      - HBase Master and Region Servers

      - Data Modeling

      - Column Families and Regions

      - HBase Internals: Bloom Filters and Block Indexes

      - Write Pipeline / Read Pipeline

      - Compactions

                      LAB #4: Exploring HBase command 

       

                      Hive

      - Hive philosophy and architecture

      - Hive vs. RDBMS

      - HiveQL and Hive Shell

      - Managing tables

      - Data types and schemas

      - Querying data

      - HiveODBC

                      LAB #5: Analyzing real world data using Hive and performing analysis

       

                     Sqoop 

                    - Data Processing through Sqoop

                    - Understand Sqoop connectivity model with RDBMS 

                    - Using Sqoop example with real time data applications 

                      LAB #6: Using Sqoop with Excel and PowerPivot to Perform Data Analysis

       

                      Next-gen Hadoop  

      - HDFS improvements: HDFS Federation, NameNode HA, Snapshots

      - MapReduce improvements: YARN, Performance

      - HBase GeoRedundancy, DR, and Snapshots

      - Brief introduction on Mahout, Oozie, Pig and Avro etc.


       

      Cancellation Policy

      Cancellation prior to 15 days are entitled to 85% refund. No refund will be issued within 15 days. We will however transfer the registration to a future class or to another person. If you have specific questions, please contact us at info@datumfora.com




      Categories: Conferences & Tradeshows | Museums & Attractions | Sales & Retail

      Event details may change at any time, always check with the event organizer when planning to attend this event or purchase tickets.
      COMMENTS ABOUT 2 Day Hadoop Ecosystem fundamentals October 5-6, 2013