Brainmatics

Big Data with Hadoop

Saat ini semakin banyak perusahaan swasta maupun instansi pemerintah yang mengalami kesulitan dalam mengelola data yang volumenya makin tumbuh besar dan semakin kompleks jenisnya. Kebutuhan untuk mengelola serta menganalisis data-data tersebut yang kian membesar dan kompleks tidak cukup hanya diproses oleh aplikasi pengolah data konvensional, maka dari itu munculah sekarang istilah dengan nama Big Data. Dalam Training ini peserta akan mengerti tentang konsep dan karakteristik dari Big data tersebut, Serta peserta juga akan diajarkan untuk menggunakan framework yang bersifat Open Source untuk mengelola Big Data  yaitu Hadoop. Hadoop ini banyak dipakai oleh perusahaan-perusahaan raksasa seperti Facebook, Yahoo, IBM, Intel, Amazon, dan E-Bay.

 

 

OBJECTIVE

1. Peserta memahami teori dan karakteristik Big Data
2. Peserta memahami teknologi GFS, MapReduce, dan Bigtable yang dimiliki oleh Google
3. Peserta memahami konsep dan teknologi yang digunakan untuk Big Data dengan tools Open Source menggunakan Apache Hadoop
4. Peserta memahami konsep dan penggunaan Hadoop dan HBase

 

AUDIENCE

 

PREREQUISITES

1. Mengerti Bahasa Pemrograman Java

 

CONTENT

1. Introduction to Big Data with Hadoop

1.1. Installing Single-node Hadoop Cluster
1.2. Installing a Multi-node Hadoop Cluster
1.3. Adding New Nodes to Existing Hadoop Clusters
1.4. Executing Balancer Command for Uniform Data Distribution
1.5. Entering and Exiting From the Safe Mode in a Hadoop Cluster
1.6. Decommissioning DataNodes
1.7. Performing Benchmarking on a Hadoop Cluster

2. Exploring HDFS

2.1. Loading Data from a Local Machine to HDFS
2.2. Exporting HDFS Data to a Local Machine
2.3. Changing the Replication Factor of an Existing File in HDFS
2.4. Setting the HDFS Block Size For All the Files in a Cluster
2.5. Setting the HDFS Block Size For a Specific File in a Cluster
2.6. Enabling Transparent Encryption for HDFS
2.7. Importing Data From Another Hadoop Cluster
2.8. Recycling Deleted Data From Trash to HDFS Saving Compressed data in HDFS

3. Mastering Map Reduce Programs

3.1. Writing the Map Reduce Program in Java to Analyze Web Log Data
3.2. Executing the Map Reduce Program in a Hadoop cluster
3.3. Adding Support for a New Writable Data Type in Hadoop
3.4. Implementing a User-defined Counter in a Map Reduce Program
3.5. Map Reduce Program to Find the Top X
3.6. Map Reduce Program to Find Distinct Values
3.7. Map Reduce Program to Partition Data Using a Custom Partitioner
3.8. Writing Map Reduce Results to Multiple Output Files
3.9. Performing Reduce Side Joins Using Map Reduce
3.10. Unit testing the Map Reduce Code Using MRUnit

4. Data Analysis Using Hive, Pig, and Hbase

3.1. Storing and Processing Hive Data in a Sequential File Format
3.2. Storing and Processing Hive Data in the ORC file format
3.3. Storing and Processing Hive Data in the ORC File Format
3.4. Storing and Processing Hive Data in the Parquet File Format
3.5. Performing FILTER By Queries in Pig
3.6. Performing Group By Queries in Pig
3.7. Performing Order By Queries in Pig
3.8. Performing JOINS in Pig
3.9. Writing a User-defined Function in Pig
3.10. Analyzing Web Log Data Using Pig
3.11. Performing the Hbase Operation in CLI
3.12. Performing Hbase Operations in Java
3.13. Executing the MapReduce Programming with an Hbase Table

 

5. Advanced Data Analysis Using Hive

5.1. Processing JSON Data in Hive Using JSON SerDe
5.2. Processing XML Data IN Hive Using JSON SerDe
5.3. Processing Hive Data in the Avro format
5.4. Writing a User-Defined Function in Hive
5.5. Performing Table Joins in Hive
5.6. Executing Map Side Joins in Hive Performing Context Ngram in Hive
5.7. Call Data Record Analytics Using Hive
5.8. Twitter Sentiment Analysis Using Hive
5.9. Implementing Change Data Capture Using Hive
5.10. Multiple Table Inserting Using Hive

 

6. Data Import/Export Using Sqoop and Flume

6.1. Importing Data From RDMBS to HDFS Using Sqoop
6.2. Exporting Data From HDFS to RDBMS
6.3. Using Query Operator in Sqoop Import
6.4. Importing Data Using Sqoop in Compressed Format
6.5. Performing Atomic Export Using Sqoop
6.6. Importing Data Into Hive Tables Using Sqoop
6.7. Importing Data Into HDFS From Mainframes
6.8. Incremental Import Using Sqoop
6.9. Creating and Executing Sqoop Job
6.10. Importing Data From RDBMS to Hbase Using Sqoop
6.11. Importing Twitter Data Into HDFS Using Flume
6.12. Importing Data From Kafka Into HDFS Using Flume
6.13. Importing Web Logs Data Into HDFS Using Flume

7. Automation of Hadoop Tasks Using Oozie

7.1. Implementing a Sqoop Action Job Using Oozie
7.2. Implementing a Map Reduce Action Job Using Oozie
7.3. Implementing a Java Action Job Using Oozie
7.4. Implementing a Hive Action Job Using Oozie
7.5. Implementing a Pig Action Job Using Oozie
7.6. Implementing an e-mail action Job Using Oozie
7.7. Executing Parallel Jobs Using Oozie (fork)
7.8. Scheduling a Job in Oozie

 

8. Machine Learning and Predictive Analytics

8.1. Using Mahout and R
8.2. Setting up the Mahout Development Environment
8.3. Creating an Item-based Recommendation Engine Using Mahout
8.4. Creating an User-based Recommendation Engine Using Mahout
8.5. Using Predictive Analytics on Bank Data Using Mahout
8.6. Clustering Text Data Using K-Means
8.7. Performing Population Data Analytics using R
8.8. Performing Twitter Sentiment Analytics using R
8.9. Performing Predictive Analytics Using R

9. Integration with Apache Spark

9.1. Running Spark Standalone
9.2. Running Spark on YARN
9.3. Olympic Athlete Data Analytics Using Spark Shell
9.4. Creating Twitter Trending Topics Using Spark Streaming
9.5. Twitter Trending Topics Using Spark Streaming
9.6. Analyzing Parquet Files Using Spark
9.7. Analyzing JSON Data Using Spark
9.8. Processing Graphs Using Graph X
9.9. Conducting Predictive Analytics Using Spark MLib

10. Hadoop Use Cases

10.1. Call Data Record Analytics
10.2. Web Log Analytics
10.3. Sensitive Data Masking and Encryption Using Hadoop

 

INSTRUCTOR

Image

Satya Sanjaya Menamatkan pendidikan tinggi jurusan teknik mesin di Universitas Indonesia Jakarta. Aktif sebagai konsultan dan telah menyelesaikan banyak project dibidang System Programmer Administrator dan Database Administrator, seperti dalam penerapan Microsoft Windows NT Server, Microsoft SQL Server, Microsoft Sharepoint, dan ASP.Net MVC. Sekarang aktif sebagai pengajar / trainer diberbagai training center di Jakarta, untuk materi training berbasis Microsoft seperti Visual Basic, Visual Interdev , SQL Server, Visual Studio.Net, Microsoft Windows Server, Microsoft SharePoint, Microsoft Project, Crystal Report.