Learn How To Import Data From Mysql Into Hadoop Using Sqoop
Sqoop is a tool in the apache ecosystem that was designed to solve the problem of importing data from relational databases and exporting data...
Learn to Install and Configure a Single Node Hadoop Cluster on Ubuntu
Hadoop as a production platform is supported on Linux but Windows and other flavors of UNIX such as Mac OS X can be used...
Running Hadoop on Apache Mesos: A Distributed kernel system
Apache Mesos – An overview
Apache mesos is an open source cluster management kernel based system. It is built on same principles as Linux kernels...
Learn How to Query, Summarize and Analyze Data using Apache Hive
Apache Hive is project within the Hadoop ecosystem that provides data warehouse capabilities. It was not designed for processing OLTP workloads. It has features...
Learn How to Develop Effective Data Models in Hive
Within the Hadoop ecosystem Hive is considered as a data warehouse. This could be true or false depending on how you look at it....
Learn how to stream data into Hadoop using Apache Flume
Apache Flume is a tool in the Hadoop ecosystem that provides capabilities for efficiently collecting, aggregating and bringing in large amounts of data into...
MapReduce Program In Detail
In our previous guides, we saw how to run wordcount MapReduce program on a single node Hadoop cluster. Now we will understand the MapReduce...
Learn to create input splits on an incoming data with MapReduce Programming
Introduction
Map reduce is the core technology of Hadoop and is the backbone of big data and Hadoop framework. This technology works in conjunction with...
Importance Of Exploratory Data Analysis Before ML Modelling
Exploratory Data Analysis (EDA) is the crucial process of using summary statistics and graphical representations to perform preliminary investigations on data in order to...
Learn How To Write Advanced Queries To Manipulate Data Using Hive
In previous Hive tutorials we have looked at installing and configuring Hive, data modeling and use of partitions to improve query response time. For...