Introduction to Map-Reduce Programming model
(Assuming you have basic working knowledge of Java)
MapReduce programming paradigm is based on the concept of key-value pairs. It also provides powerful paradigms for...
Learn How To Use Partitioning In Hive To Improve Query Performance
In previous Hive tutorials we have have looked at Hive as the Hadoop project that offers data warehousing features. Installing and configuring Hive was...
How Data Science Can Evolve Over the Next Decade?
Today, we are living in a world where gadgets have eventually gained popularity and with the passage of time are becoming capable to transmit...
Pandas Library In Data Science
Pandas is the most widely-used open-source Python package in the field of data science and data analysis. Its name is an abbreviation for the...
Why R is important for data science professionals
R is actually a programming environment and language made specifically for graphical applications and statistical computations. It is licensed under the GNU license, just...
Learn How To Process Data Interactively And In Batch Using Apache Tez Framework
Within Hadoop, MapReduce has been the widely used approach to process data. In this approach data processing happens in batch mode that can take...
7 Predictive Analysis Tips for Hadoop
Introduction to predictive analysis
It’s hard to find a good analysis tool, in today’s technical era that fits and suits our business requirements. Predictive analysis...
Passing Multiple Files for Same Input in Hadoop
Introduction
Hadoop is well known for its data processing capability for searching and sorting and can also be used for batch processing analysis. In order to...
A step by step guide to install Hadoop cluster on Amazon EC2
This is a step by step guide to install a Hadoop cluster on Amazon EC2. I have my AWS EC2 instance ec2-54-169-106-215.ap-southeast-1.compute.amazonaws.com ready on...
R Programming Series: Clustering using FactoExtra Package
In this series, we have learned about Dynamic Map creation using ggmap and R, creating dynamic maps using ggplot2, 3D Visualization in R, Data...