Data Science Course at Unisoft Technologies, Dharampeth, Nagpur
What is Data Science?
Data Science is a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.
Why Learn Data Science?
Data Science is one of the most highly sought after skills in the tech industry. With the amount of data being generated every day, businesses need data scientists to analyze and derive meaningful insights from the data.
Why Learn Data Science from Unisoft?
Unisoft Technologies provides a comprehensive Data Science course that covers all the key concepts and skills needed for a successful career in Data Science. Our experienced faculty, state-of-the-art facilities, and strong industry connections make us the best choice for Data Science training in Nagpur.
Job Opportunities in Data Science
Data Science opens up opportunities in various sectors like healthcare, finance, retail, e-commerce, etc. As a data scientist, you can work in roles such as Data Analyst, Data Engineer, Machine Learning Engineer, and more.
Prerequisite:
Modules & Certifications:
- Statistical Analysis with R
- Programming with Python
- Fundamentals of Linux
- Programming Java with IoT Developer
- Cloud Computing & HPC Applications
- Data Collection & SQL
- Big Data & Hadoop Technologies
- Data Visualization
- Advance Analytics
- Practical Machine Learning
Certifications :
- SQL 12c (1Z0-071)
- RHEL 7 (RHCSA)
- Python (PCAP)
- CCP Data Engineer
Job Deployment assured post certification.
Statistical Analysis with R
Course Content:
Probability & Statistics
- Introduction to Statistics – Descriptive Statistics, Summary Statistics Basic probability theory
- Statistical Concepts ( uni-variate and bi-variate sampling, distributions, re-sampling, statistical Inference, prediction error)
- Probability Distribution (Continuous and discrete-Normal, Bernoulli, Binomial, Negative Binomial, Geometric and Poisson distribution)
- Bayes’ Theorem
- Central Limit Theorem
- Data Exploration & preparation Concepts of Correlation
- Regression
- Covariance
- Outliers etc.
R Programming
- Introduction & Installation of R
- R Basics
- Finding Help
- Code Editors for R
- Command Packages
- Manipulating and Processing Data in R
- Reading and Getting Data into R
- Exporting Data from R
- Data Objects- Data Types & Data Structure
- Viewing Named Objects
- Structure of Data Items
- Manipulating and Processing Data in R ( Creating, Accessing, Sorting Data Frames, Extracting, Combining, Merging, Reshaping Data Frames)
- Control Structures, Functions in R (numeric, character, statistical)
- Working with objects
- Viewing Objects within Objects
- Constructing Data Objects
- Building R Packages
- Running and Manipulating Packages
- Non Parametric Tests – ANOVA
- Chi – Square
- T – Test
- U – Test
- Introduction to Graphical Analysis
- Using Plots (Box Plots Scatter Plot, Pie Charts, Bar Charts, Line Chart)
- Plotting variables
- Designing Special Plots
- Simple Liner Regression
- Multiple Regression
PYTHON
What is Python?
Python is a clear and powerful object-oriented programming language, comparable to Perl, Ruby, Scheme, or Java.
Some of Python's notable features:
- Uses an elegant syntax, making the programs you write easier to read.
- Is an easy-to-use language that makes it simple to get your program working. This makes Python ideal for prototype development and other ad-hoc programming tasks, without compromising maintainability.
- Comes with a large standard library that supports many common programming tasks such as connecting to web servers, searching text with regular expressions, reading and modifying files.
- Python's interactive mode makes it easy to test short snippets of code. There's also a bundled development environment called IDLE.
- Is easily extended by adding new modules implemented in a compiled language such as C or C++.
- Can also be embedded into an application to provide a programmable interface.
Runs anywhere, including Mac OS X, Windows, Linux, and Unix.
Is free software in two senses. It doesn't cost anything to download or use Python, or to include it in your application. Python can also be freely modified and re-distributed, because while the language is copyrighted it's available under an open source license.
Some programming-language features of Python are:
- A variety of basic data types are available: numbers (floating point, complex, and unlimited-length long integers), strings (both ASCII and Unicode), lists, and dictionaries.
- Python supports object-oriented programming with classes and multiple inheritance.
- Code can be grouped into modules and packages.
- The language supports raising and catching exceptions, resulting in cleaner error handling.
- Data types are strongly and dynamically typed. Mixing incompatible types (e.g. attempting to add a string and a number) causes an exception to be raised, so errors are caught sooner.
- Python contains advanced programming features such as generators and list comprehensions.
- Python's automatic memory management frees you from having to manually allocate and free memory in your code.
See the
SimplePrograms collection of short programs, gradually increasing in length, which show off Python's syntax and readability.
Course Content:
1. Python Introduction [6 hours]
- Getting started [1 hour theory/practical]
- Keywords and Identifier [1 hour theory/practical]
- Statements & Comments [1 hour theory/practical]
- Python Data types [1 hour theory/practical]
- Python I/O and Import [1 hour theory/practical]
- Python Operators [1 hour theory/practical]
- Practical
2. Python Flow Control [5 hours]
- Python if...else [1 hour theory/practical]
- Python for Loop [1 hour theory/practical]
- Python while Loop [1 hour theory/practical]
- Python break and continue [1 hour theory/practical]
- Python Pass [1 hour theory/practical]
- Practical
3. Python Functions [6 hours]
- Python Function [1 hour theory/practical]
- Function Argument [1 hour theory/practical]
- Python Recursion [1 hour theory/practical]
- Anonymous Function [1 hour theory/practical]
- Python Modules [1 hour theory/practical]
- Python Package [1 hour theory/practical]
- Practical
4. Python Native Data types [6 hours]
- Python Numbers [1 hour theory/practical]
- Python List [1 hour theory/practical]
- Python Tuple [1 hour theory/practical]
- Python String [1 hour theory/practical]
- Python Set [1 hour theory/practical]
- Python Dictionary [1 hour theory/practical]
- Practical
5. Python Files [5 hours]
- Python File Operation [1 hour theory/practical]
- Python Directory [1 hour theory/practical]
- Python Exception [1 hour theory/practical]
- Exception handling [1 hour theory/practical]
- User-defined Exception [1 hour theory/practical]
- Practical
6. Python Object & Class [5 hours]
- Python Namespace [1 hour theory/practical]
- Python Class [1 hour theory/practical]
- Python Inheritance [1 hour theory/practical]
- Multiple Inheritances [1 hour theory/practical]
- Operator overloading [1 hour theory/practical]
- Practical
7. Advanced Topics [6 hours]
- Python Iterator [1 hour theory/practical]
- Python Generator [1 hour theory/practical]
- Python Closure [1 hour theory/practical]
- Python Decorators [1 hour theory/practical]
- Python Property [1 hour theory/practical]
- Python Examples [1 hour theory/practical]
- Practical
8. Python Programing Example
- Introduction
- Python Program to Print Hello world!
- Python Program to Add Two Numbers.
- Python Program to Find the Square Root.
- Python Program to Calculate the Area of a Triangle.
- Python Program to Solve Quadratic Equation.
- Python Program to Swap Two Variables.
- Python Program to Generate a Random Number.
- Python Program to Convert Kilometers to Miles.
- Python Program to Convert Celsius To Fahrenheit.
- Decision Making loops
- Python program to check if a number is positive, negative or zero.
- Python Program to Check if a Number is Odd or Even.
- Python Program to Check Leap Year.
- Python Program to Find the Largest Among Three Numbers.
- Python Program to Check Prime Number.
- Python Program to Print all Prime Numbers in an Interval.
- Python Program to Find the Factorial of a Number.
- Python Program to Display the multiplication Table.
- Python Program to Print the Fibonacci sequence.
- Python Program to Check Armstrong Number.
- Python Program to Find Armstrong Number in an Interval.
- Python Program to Find the Sum of Natural Numbers.
- Function
- Python Program To Display Powers of 2 Using Anonymous Function.
- Python Program to Find Numbers Divisible by Another Number.
- Python Program to Convert Decimal to Binary, Octal and Hexadecimal.
- Python Program to Find ASCII Value of Character.
- Python Program to Find HCF or GCD.
- Python Program to Find LCM.
- Python Program to Find Factors of Number.
- Python Program to Make a Simple Calculator.
- Python Program to Shuffle Deck of Cards.
- Python Program to Display Calendar.
- Python Program to Display Fibonacci Sequence Using Recursion.
- Python Program to Find Sum of Natural Numbers Using Recursion.
- Native Data type
- Python Program to Add Two Matrices.
- Python Program to Transpose a Matrix.
- Python Program to Multiply Two Matrices.
- Python Program to check whether a String is Palindrome or Not.
- Python Program to Remove Punctuations From a String.
- Python Program to Sort Words in Alphabetic Order.
- Python Program to Illustrate Different Set Operations.
- Python Program to Count the Number of Each Vowel.
- File
- Python Program to Merge Mails.
- Python Program to Find the Size (Resolution) of an Image.
- Python Program to Find Hash of File.
9. Python MySQL Connectivity
- Getting started with MySQL python connector
- Python connecting to MySQL databases
- Python MySQL query
- Python MySQL insert data
- Python MySQL ipdate data
- Python MySQL delete data
Fundamentals of LINUX
Administration I - RH124
Overview
Red Hat System Administration I (RH124) is entry level course for those who want to make a career in full-time red hat system administrators by introducing key command line concepts and other enterprise-level tools. After taking this course student will be able to do tasks such as lab-based knowledge checks, facilitative discussions to ensure maximum skill transfer and retention, installation, establish network connectivity, manage physical storage and perform basic security administration.
Audience
This course is designed for IT professionals who are new to Linux Administration and require core Red Hat Enterprise Linux skills to perform administration tasks that will be encountered in the workplace.
Course Content:
Installation of Red Hat
- Required Hardware For Installation
- Obtaining Red Hat Enterprise Linux
- Booting the Installer
- Configuring Language and Installation Source
- Creating Partitions
- Creating User
Accessing the Command Line
- Accesing command line using Local Console
- Accesing command line using the Desktop
- Introduction to Shell
- Executing Commands Using the Bash Shell
Basic Commands I
- Bash Concepts
- tty, whoami, who, w, users
- Linux user accounts: Super, System, Normal
- User --> uid. User --> has a primary group. Group --> gid
- id, groups, lsusb, lspci, cal (cal 9 1752), date, echo,
Managing Files From Command Line
- The Linux File System Hierarchy
- Managing Files Using Command-Line Tools
- Finding command executable with which.
- Important directories like /bin/, /boot/, /dev/, /etc/, /home/, /lib/, /mnt,/opt/, /proc/, /root/, /sbin/, /tmp/, /usr/, /var/
Getting Help from System
- Manual Pages : man
- Information Pages : info, pinfo
- Whatis Database : whatis, makewhatis, apropos
Basic Commands II
- Creating Directories : mkdir
- Copying Files and dirs : cp
- Moving Files and dirs : mv
- Removing Files and dirs : rm
- Reading Files using cat,head,tail,less,more
- I/O redirectors and pipe
VIM Editor
- Editing Files in Command Line with vi-clone VIM
- Creating encrypted file with vim
Substituting User Identities
- su command to sustitute user identity
- /etc/sudoers file and sudo (substitute user and do) command
- BASH login-shell and non-login shell
Users and Groups Administration
- Creating Users using useradd command and setting their password with passwd command
- Important files which stores user account information : /etc/passwd,/etc/shadow
- Modifying User Properties using usermod command
- Password age using chage command
- Deleting Users using userdel command
- Creating Groups using groupadd command and assigning passwords to them with gpasswd command
- Important files which stores group information : /etc/group, /etc/gshadow
- Modifying Group Properties using groupmod command 9. Deleting Groups using groupdel command
- Creating new user and setting its password without using useradd and passwd command
Permissions I
- ls -l output explained
- Linux File Types (files, directories, device files [character and block], sockets, pipe, links)
- Default permissions on a file and directory (concept of UMASK).
- Finding files in Linux file system using find command
- Changing permissions (alphabetical and numerical) using chmod
Permissions II
- Finding command executable using which command
- Special permissions: SUID, SGID, Sticky Bit
Networking
- Introduction to networking and IP addressing
- Setting up an IP address to the machine using nmcli and nmtui utility
- Setting up an IP address to the machine directly in file
- Introduction to Gateway, Name resolution (local and DNS)
- Commands: ifconfig, ip, route, traceroute, dig, host, nslookup, hostnamectl, nmcli, nmtui
Contolling Services & Daemons
- Introduction to Booting Process
- Identifying Automatically Started System Process
- Controlling System Services
- Using Systemctl to Manage Services
Package Management
- Installing Packages using RPM.
- Configuring YUM client repositories.
- Configuring YUM Server
- Configuration of EPEL, REPOFORGE, RPMFUSION repositories
Remote Shell Access
- Secure Remote shell access using ssh
- Password less SSH Authentication with SSH Keys
- Securely Copy files to and from remote machines using scp.
Backup and Restore
- Archive and Compress files using tar.
- Remote Incremental backup using rsync.
Analyze and Store Logs
- Locate and accurately interpret relevant system log files for troubleshooting purposes.
Accessing Linux File Systems
- Identify Filesystem and Devices
- Mounting Unmounting File System
- Making Links Between Files
- Locating Files using find and locate command
NTP server
- Configuration of Ntp server
- configuration of Ntp client
Programming JAVA with IoT Developer
Course Content:
Cloud Computing & HPC Applications
Course Content:
Introduction to Cloud Computing
- Definition
- Characteristics
- Components
- Cloud provider
- SAAS, PAAS, IAAS and other Organizational scenarios of clouds
- Administering & Monitoring cloud services
- Benefits and Limitations
- Deploy application over cloud
- Comparison among SAAS, PAAS, IAAS
- Cloud Computing Platforms
- Infrastructure as Service: Amazon Ec2
- Platform as Service: Google App Engine
- Microsoft Azure Utility Computing
- Elastic Computing
- SLA
- Clusters
- Cloud Analytics
- Challenges of Cloud Environment
- HPC in the cloud
Parallel Processing Concepts
- Physical Organization and building blocks of High Performance Computing Systems
- Processors and Multi-Core Architectures
- Vector Processing
- Super-Scalar
- In-order execution
- Instruction-Level Parallelism etc.
- FMA, 32 and 64 bit types
- ISA
- Accelerators such as GPGPUs and Xeon Phi.
- Threads and Processes
- Multi- Processing OS
- Parallel I/O
- General Concepts
Parallel Programming Models and Parallel Algorithms Design:
- Application domains of HPC
- Decomposition Techniques: Data parallelism
- Functional Parallelism
- Divide and Conquer etc.
- Characteristics of Tasks and Interactions
- Mapping Techniques for Load Balancing
- Methods for Containing Interaction Overheads
- Granularity of parallelism
- Programming OpenMP
Oracle Database SQL - Exam Code : 1Z0-071
Overview
Become an Oracle Database SQL Certified Associate and demonstrate understanding of fundamental SQL concepts needed to undertake any database project. Passing the exam illustrates depth of knowledge of SQL and its use when working with the Oracle Database server. Gain a working knowledge of queries , insert, update and delete SQL statements as well as some Data Definition language and Data Control Language, the optimizer, tales and indexes, data modeling and normalization. By passing this exam, a certified individual proves fluency in and a solid understanding of SQL language . data modeling and using SQL to create amd manipulate tables in an Oracle Database.
Qualified candidates have knowledge of general computing concepts, knowledge of command line interfaces and experience working in command line.
Exam Details
Exam Title: |
Oracle Database SQL |
Exam Number: |
1Z0-071 |
Exam Price: |
₹20,379 + 18% GST More on exam pricing |
Format: |
Multiple Choice |
Duration: |
120 minutes |
Number of Questions: |
63 |
Passing Score: |
63% |
Validated Against: |
Exam has been validated for product version Oracle Database 11.2.0.1.0 and up to 19c
|
Course Content / Exam(s)
Course Content:
Overview of various Oracle Versions and Their Differences
- Oracle Database Versions
- Oracle Database Version V2 to 8,8i,9i
- What is 'i' in Oracle Database 8i or Oracle Database 9i
- Oracle Database Version 10g to 11g
- What is 'g' in Oracle Database 10g or Oracle Database 11g
- Oracle Database Version 12c
- What is 'c' in Oracle Database 12c
Oracle Database Installation
- Oracle Database 12c installation on Windows.
- Oracle Dababase 12c installation on Linux is covered in Oracle Administration 12c Paper.
- Overview of Various Oracle Tools
- SQLPLUS (CLI)
- How to Open, Connect, Use, Disconnect, Choose Database to Connect,... etc.
- SQL Developer (GUI)
- Overview of Using SQL Developers - What's New in latest Version,... etc.
- SQL cl
- Another lite weight newly launched Java Based Command Line Tool with Added features.
- SQLcl provides in-line editing, statement completion, and command recall for a feature-rich experience, all while also supporting your previously written SQL*Plus scripts.
Overview of SQLPLUS Command’s
- CONNECT
- DISCONNECT,
- DESCRIBE,
- SET command's
- set linesize
- set pagesize
- set underline
- set timing {on|off}
- set time {on|off}
- set wrap {on|off}
- set sqlprompt { | _USER _DATE }
- set serveroutput {on|off}
- and...etc
- SPOOL
- START
- @
- @@
- INPUT
- LIST
- CHANGE
- APPEND
- TTITLE
- BTITLE
- BREAK
- etc.
Overview of SQL Statement’s
- Data Manipulation Language - DML
- Data Defination Language - DML
- Data Control Language - DCL
- Transaction Control - TC
Retrieving Data Using the SQL SELECT Statement (Projection)
- SELECT clause
- Capabilities of SQL SELECT Statements
- Basic SELECT Statement
- Selecting All Columns
- Selecting Specific Columns
- Writing SQL Statements
- Column Heading Defaults
- Arithmetic Operators or Expressions
- Addition (+)
- Subtraction (-)
- Multiplication (*)
- Division (/)
- Operator Precedence
- Defining a Null Value
- Null Values in Arithmetic Expressions
- Defining a Column Alias
- Using Column Aliases
- Concatenation
- Double Pipe or Virtical Bars (||)
- Literals
- Alternative Quote (q) Operator
- Displaying the Table Structure
- Using the DESCRIBE Command
- Duplicate Rows - Distinct or Unique Keyword
- FROM clause
- Use of Special Table Called – Dual
Restricting and Sorting Data (Selection)
- WHERE clause
- Comparison Operators
- Logical Operators
- Rules of Precedence
- ORDER BY clause
- Ascending Orders
- Descending Orders
- Columns, Numeric Position, Alias , Expression, Nulls First & Nulls Last
- ROW LIMITING clause (aka OFFSET FETCH clause)
- Substitution Variables (&,&&) –
- Define
- Undefine
- SET Verify
Using Single Row Functions to Customize Output
- Character Functions
- Case Conversion
- Character Manipulation
- SUBSTR
- INSTR
- LENGTH
- REPLACE
- TRANSLATE
- TRIM
- LTRIM
- RTRIM
- RPAD
- LPAD
- SOUNDEX
- CHR
- ASCII
- Number Functions
- ROUND(number)
- TRUNC(number)
- MOD
- FLOOR
- CEIL
- SIGN
- ABS
- SQRT
- POWER
- Date Functions
- SYSDATE
- CURRENT_DATE
- SYSTIMESTAMP
- CURRENT_TIMESTAMP
- EXTRACT
- MONTHS_BETWEEN
- ADD_MONTHS
- NEXT_DAY
- LAST_DAY
- ROUND(date)
- TRUNCT(date)
Analytic Functions or Windowing Functions
- ANALYTIC FUNCTIONS
- LEAD
- LAG
- RANK
- DENSE_RANK
- LISTAGG
- FIRST
- LAST
- FIRST_VALUE(*)
- LAST_VALUE(*)
- PERCENTILE_CONT(*)
- SUM(*)
- COUNT(*)
- MAX(*)
- MIN(*)
- AVG(*)
In the listing of analytic functions shown above, the functions that allow the windowing_clause are followed by an asterisk (*).
- QUERY_PARTITION_CLAUSE
- ORDER_BY_CLAUSE
- WINDOWING_CLAUSE
Using Advance Subqueries
- Correlated Subqueries
- Correlated Update & Delete.
- Subqueries using EXISTS and NOT EXISTS Operators
- WITH clause –(Containing Set of Subqueries) & Recursive WITH clause
- Subqueries in FROM clause –(Inline Views)
Data Definition Language - DDL
- Database Objects
- Naming Rules
- CREATE TABLE Statement
- Referencing Another User’s Tables
- DEFAULT Option
- Creating Tables
- Data Types
- Numeric
- Binary
- Character
- Date Time
- LOB’s
- Including Constraints
- Constraint Guidelines
- Defining Constraints
- NOT NULL Constraint
- UNIQUE Constraint
- PRIMARY KEY Constraint
- FOREIGN KEY Constraint
- FOREIGN KEY Constraint: Keywords
- CHECK Constraint
- Deferrable Constraints
- Enable & Disable Constraints.
- CREATE TABLE: Example
- Violating Constraints
- Creating a Table Using a Subquery
- Read-Only Tables
- DROP TABLE Statement
- TRUNCATE TABLE Statement
- ALTER TABLE Statement
- Renaming an Object
- COMMENT Statement
- E-R Diagram’s or Model & Introduction to Normalization or Normal Forms
- Creating Table Using Subqueries –(CTAS) & Creating Virtual Columns.
Data Control Language (DCL) & Controlling User Access
- Creating A USER,
- Default Tablespace & Quota
- Privilege Management
- Granting Privileges to User or Role (System or Object Privileges)
- Revoking Privileges from User or Role (System or Object Privileges)
- Managing Roles
Managing Tables Using DML & TC Statements, Locks, Deadlocks & For Update Clause.
- Data Manipulation Language
- Adding a New Row to a Table
- INSERT Statement Syntax
- Inserting New Rows
- Inserting Rows with Null Values
- Inserting Special Values
- Inserting Specific Date and Time Values
- Creating a Script
- Copying Rows from Another Table
- Changing Data in a Table
- UPDATE Statement Syntax
- Updating Rows in a Table
- Updating Two Columns with a Subquery
- Updating Rows Based on Another Table
- Removing a Row from a Table
- DELETE Statement
- Deleting Rows from a Table
- Deleting Rows Based on Another Table
- Database Transactions
- Database Transactions: Start and End
- Advantages of COMMIT and ROLLBACK Statements
- Explicit Transaction Control Statements
- Rolling Back Changes to a Marker
- Implicit Transaction Processing
- State of the Data Before COMMIT or ROLLBACK
- State of the Data After COMMIT
- Committing Data
- State of the Data After ROLLBACK
- State of the Data After ROLLBACK: Example
- Statement-Level Rollback
- Read Consistency
- Implementing Read Consistency
- FOR UPDATE Clause in a SELECT Statement
- FOR UPDATE Clause: Examples
Managing Objects with Data Dictionary Views
- USER_CONSTRAINTS
- USER_CONS_COLUMNS
- USER_CATALOG
- USER_TABLES
- DBA_OBJECTS
- USER_OBJECTS
- ALL_OBJECTS
- USER_TABLES
- USER_TAB_COLUMNS
- USER_VIEWS
- USER_INDEXES
- USER_IND_COLUMNS
- USER_SYNONYMS
- USER_SEQUENCES
- USER_TAB_PRIVS
- USER_TAB_PRIVS_MADE
- USER_TAB_PRIVS_RECD
- SESSION_PRIVS
- USER_SYS_PRIVS
- USER_COL_PRIVS
- USER_COL_PRIVS_MADE
- USER_COL_PRIVS_RECD
- SESSION_ROLES
- USER_ROLE_PRIVS
- ROLE_SYS_PRIVS
- ROLE_TAB_PRIVS
- ROLE_ROLE_PRIVS
- DBA_DIRECTORIES
- ALL_DIRECTORIES
- USER_EXTERNAL_TABLES
- USER_EXTERNAL_LOCATIONS
- USER_TAB_COMMENTS
- USER_COL_COMMENTS
- USER_UNUSED_COL_TABS
Manipulating Large Data Sets – (Multi-table INSERT Statements)
- Using Subqueries to Manipulate Data
- Retrieving Data by Using a Subquery as Source
- Inserting by Using a Subquery as a Target
- CHECK OPTION Keyword on DML Statements
- Using the WITH
- Overview of the Explicit Default Feature
- Using Explicit Default Values
- Copying Rows from Another Table
- Overview of Multitable INSERT Statements
- Types of Multitable INSERT Statements
- Unconditional INSERT
- Conditional INSERT ALL
- Pivoting INSERT
- Conditional INSERT FIRST
- Pivoting INSERT
- MERGE Statement
- MERGE Statement Syntax
- Merging Rows: Example
Examination Score Report & Certificate
Big Data & Hadoop Technologies
Pre –Requisites for Course
- OOPS Concepts (Polymorphism, Inheritance, encapsulation etc)
- Java Basics like Interfaces, Classes and Abstract Classes etc.
- Collections
- File I/O
- SQL
- Linux Basic Commands
Course Content:
Introduction
- An Overview of Hadoop
- Comparing with SQL Databases
- The Hadoop Distributed File System
- Map Reduce Programming Model
- Hadoop Common Utilities
- Hadoop Ecosystem Components
- Hadoop Architecture
Components of Hadoop
- Hadoop Distributed File System
- MapReduce Programming Model
- Hadoop Common Utilities
The Hadoop Distributed File System (HDFS)
- HDFS Design & Concepts
- Blocks, Replication
- Hadoop dfs and dfsadmin Command-Line Interfaces
- Basic File System Operations
- Reading Data by HDFS Java Client API
- Distributed Cache
- DistCP – Data loading into HDFS parallel
MapReduce Program
- Building Blocks of MapReduce
- The MapReduce Program flow (MR Skeleton)
- Sample MapReduce Program
- MapReduce API Concepts
- The Mapper
- The Reducer
- The Combiner
- The Partitioned
- The Shuffle
- Hadoop Data Types
- Hadoop Serialization
- Hadoop Streaming API (any programming language)
- Integrating Hadoop with R Language
- Some MapReduce Program Examples
Common MapReduce Algorithms
- Sorting and Searching
- Indexing
- Crawling
- Logs Processing
- Machine Learning
- Data Aggregation
- Term Frequency – Inverse Document
- Frequency
- Word Co-Occurrence
- Predictive Analytics
Programming Practices
- Develop MapReduce Programs
- Monitoring Clusters
- Performance Tuning
- Sending Job Specific Parameters
- Partitioning into multiple output files
- Using Distributed Cache
Apache PIG
- Introduction
- Grunt
- Pig Data Model
- Comparison with RDBMS
- Pig Functions
- Pig User Defined Functions (UDF)
- Joins
- Sorting
- PIG LATIN
Apache Hive
- Introduction to Hive
- Hive in the Hadoop Ecosystem
- Hive installation
- CLI Options
- Data Types and File Formats
- Text File Encoding
HiveQL – Queries
- Queries using various Clauses like WHERE, GROUP BY etc..
- JOINS
- Casting
- Views
- Indexes
- Hive Schema Design
- Standard Functions and User Defined Functions
- Hive Thrift Service
HBASE (NOSQL)
- Introduction to Hbase
- Installation of Hbase
- CRUD Operation of Hbase with Examples
- REST API with Hbase and Examples
- MapReduce Integration with Hbase and Examples
Sqoop
- Introduction to Sqoop
- Import data from RDBMS to HDFS
- Export data from HDFS to RDBMS
FLUME
- Introduction to FLUME
- Flume Architecture
- Flume Configurations
- Flume Agent, Flume Collector
- Running Flume in various distributed modes
- Sample Examples
Data Visualization – Analysis and Reporting
- Information Visualization
- Data Analytics Life Cycle
- Analytic Processes and Tools
- Analysis Vs. Reporting
- Modern Data Analytic Tools
- Visualization Techniques
- Visual Encoding
- Visualization Algorithms
- Data Collection and binding
- Cognitive Issues
- Interactive Visualization
- Visualizing big data – structured Vs. unstructured
- Visual Analytics
- Geo mapping
- Dashboard Design
Advanced Analytics
- Introduction to Business Analytics using some case studies
- Making right Business Decisions based on data
- Exploratory Data Analysis – Visualization and Exploring Data
- Descriptive Statistical Measures
- Probability Distribution and Data
- Sampling and Estimation
- Statistical Interfaces
- Predictive modeling and analysis
- Regression Analysis
- Forecasting Techniques
- Simulation and Risk Analysis
- Optimization
- Linear, Non Linear & Integer
- Decision Analysis
- Strategy and Analytics
- Overview of Factor Analysis
- Directional Data Analytics
- Functional Data Analytics
Practical Machine Learning
- Supervised and Unsupervised Learning
- Uses of Machine Learning
- Clustering
- K means
- Hierarchical Clustering
- Decision Trees
- Oblique Trees
- Classification Problems
- Bayesian Analysis and Naïve bayes classifier
- Random Forest
- Gradient boosting Machines
- Association rules learning
- Apriori and FP-growth Algorithms
- Support Vector Machines
- Linear and Non Linear classification
- ARIMA
- ML in real time
- Neural Networks and its application
Examination Score Report & Certificate
Conclusion
With the rise of big data, Data Science has become a key skill. Learning Data Science from Unisoft Technologies can open up numerous job opportunities and help you build a successful career.