Introduction to Plots with R-Language


Introduction to Visualization and Plots

When working with statistics and models, plots and data visualization are the most used tools for technical, or non-technical representation of data, greatly beneficial for decision-making. The R language provides a wide range of inbuilt functions and libraries such as “ggplot” for visualization. In this tutorial, we will go through the built-in plotting methods in R. Before moving further, if you are unaware of this language then you can opt for R Programming for Absolute Beginners for FREE! It gives you all the basics of the language such as introduction, handling data in R, flow control, handling packages, Graphs generation and much more.

Categories of Presentation of Plots

There are mainly four categories to separate plots of functions:

  1. Comparison: Plots that compare variables. Example- Line, column, and bar charts.
  2. Composition: Plots that show how some quantity is composed or built. Example- Stacked charts, pie charts.
  3. Distribution: Plots that consider the given data and show its distribution on some criteria such as classes. Example- Histogram, scatter-plot, and area plot.
  4. Relationship: Plots that show the relationship between variables. Examples- Scatter plots for 2 variables, bubble charts.

Types of Plots

Mentioned below are the most common types of plots.

  • Histogram

Also called frequency plot, a histogram breaks down continuous data into classes or “bins”. In the chart, the X-axis represents bins and corresponding to any bin on the X-axis, the value on Y-axis indicates the “class frequency”. In R, the hist() function [Documentation] is used for plotting histograms, with two main arguments: Data, and Breaks (the number of bins you wish to create). An example for the in-built “islands” dataset:

Histogram Plot

Bar/Line Chart

Line charts and bar charts are something most of us are familiar with, where a 2 or 3-dimensional graph uses a line to connect a series of data points. It is the most used plot in statistics and analysis, for example, to plot the accuracy of a neural network, or tracking the value of a stock in the stock market. Let’s plot some sales data as given:

The resultant line chart:

Line Chart

You can also use bar charts:

Bar Chart

  • Box plot

A box plot displays five statistically significant numbers: Minimum, 25th percentile, median, 75th percentile, and maximum. It is useful to represent the spread or variance of data. For example, consider the IRIS dataset which has 4 features and targets for 150 training samples. The following creates a box plot for two variables in R [Documentation]:

Box Plot

For presenting, I chose not to print any other text in the plot beside axis labels, for clarity. The top and bottom lines indicate the maximum and minimum values, the thick black line in the medium indicates the median and the borders of the shape surrounding the median are the 1st and 3rd quartiles.

  • Heatmap

A heatmap is commonly used to look for hotspots in two dimensions. You can think of a heatmap as a replacement to histogram where colors are used instead of height or width, to present hotspots in the data. For instance, Heatmaps can be used to see what areas of webpages are most clicked or viewed. Pass a matrix to the heatmap() function in R and you can see the heatmap. An example is given below:

The resultant heatmap:

Heat Map

  • Scatterplot

Generally, in 2 or 3 dimensions, the points are of format (x1,x2,…xN) and are represented by shapes and/or colors in the vector space. Scatter plots are also used in representing neural network accuracy in prediction or classification. As an example, below is a simple 1-dimensional scatter plot of the IRIS data. The X axis is the index (0…149) of the sample and the Y-axis is the petal length of the sample.

Scatter Plot

  • Correlogram

Correlograms help in the visualization of data in terms of correlation matrices. Since corrgram() is not an in-built, we need to install the corrgram package. The following commands will install and use corrgram on the IRIS dataset:


Neural Networks with R

I will assume you know enough about Neural Networks and the syntax of R language, to go ahead with this demonstration of creating, training, and visualizing a neural network with R.

The neuralnet library for R provides a framework to create NNs. The following code creates a neural network for IRIS dataset.

To visualize this neural network, the plot() function understands that you pass an object from the neural net library so it displays the neural network connections, weights, and statistics.

Neural Network with R

It is also possible to use plots and compare the accuracy of the NN for different sizes of training datasets. For instance, there will be a significant difference in the accuracy of a network with 50% training data and 50% testing data, compared to a network with 90% training data and 10% testing data. Thus, you can create a for the loop by varying the size of the training data and store the NN accuracy at every iteration, and use a plot to see what the suitable training set size is for you. An example of this is given below:

Variation of RMSE with Length of Training Set


In this article, we looked at categories and types of plots, and the essence of data visualization in R. For an example we also look at the visualization of a neural network in R. So this was it, I will hope that now you have a basic idea of different types of plots in R programming language. For more comprehensive understandings, you can go with Introduction To Data Science Using R Programming online tutorial. It consists of various sections which teach you various R tools, Data Visualization, Leaflet Maps, Statistics and Data Manipulation in detail.



Please enter your comment!
Please enter your name here