Master Big Data with Hadoop Essentials
Pre –Requisites for Course
- OOPS Concepts (Polymorphism, Inheritance, encapsulation etc)
- Java Basics like Interfaces, Classes and Abstract Classes etc.
- Collections
- File I/O
- SQL
- Linux Basic Commands
Course Content:
Introduction
- An Overview of Hadoop
- Comparing with SQL Databases
- The Hadoop Distributed File System
- Map Reduce Programming Model
- Hadoop Common Utilities
- Hadoop Ecosystem Components
- Hadoop Architecture
Components of Hadoop
- Hadoop Distributed File System
- MapReduce Programming Model
- Hadoop Common Utilities
The Hadoop Distributed File System (HDFS)
- HDFS Design & Concepts
- Blocks, Replication
- Hadoop dfs and dfsadmin Command-Line Interfaces
- Basic File System Operations
- Reading Data by HDFS Java Client API
- Distributed Cache
- DistCP – Data loading into HDFS parallel
MapReduce Program
- Building Blocks of MapReduce
- The MapReduce Program flow (MR Skeleton)
- Sample MapReduce Program
- MapReduce API Concepts
- The Mapper
- The Reducer
- The Combiner
- The Partitioned
- The Shuffle
- Hadoop Data Types
- Hadoop Serialization
- Hadoop Streaming API (any programming language)
- Integrating Hadoop with R Language
- Some MapReduce Program Examples
Common MapReduce Algorithms
- Sorting and Searching
- Indexing
- Crawling
- Logs Processing
- Machine Learning
- Data Aggregation
- Term Frequency – Inverse Document
- Frequency
- Word Co-Occurrence
- Predictive Analytics
Programming Practices
- Develop MapReduce Programs
- Monitoring Clusters
- Performance Tuning
- Sending Job Specific Parameters
- Partitioning into multiple output files
- Using Distributed Cache
Apache PIG
- Introduction
- Grunt
- Pig Data Model
- Comparison with RDBMS
- Pig Functions
- Pig User Defined Functions (UDF)
- Joins
- Sorting
- PIG LATIN
Apache Hive
- Introduction to Hive
- Hive in the Hadoop Ecosystem
- Hive installation
- CLI Options
- Data Types and File Formats
- Text File Encoding
HiveQL – Queries
- Queries using various Clauses like WHERE, GROUP BY etc..
- JOINS
- Casting
- Views
- Indexes
- Hive Schema Design
- Standard Functions and User Defined Functions
- Hive Thrift Service
HBASE (NOSQL)
- Introduction to Hbase
- Installation of Hbase
- CRUD Operation of Hbase with Examples
- REST API with Hbase and Examples
- MapReduce Integration with Hbase and Examples
Sqoop
- Introduction to Sqoop
- Import data from RDBMS to HDFS
- Export data from HDFS to RDBMS
FLUME
- Introduction to FLUME
- Flume Architecture
- Flume Configurations
- Flume Agent, Flume Collector
- Running Flume in various distributed modes
- Sample Examples
© 2024 Unisoft Technologies - Nagpur | Developed By In House Team