"Data visualization is the representation of data through use of common graphics, such as charts, plots, infographics, and even animations to translate the data/information into a visual context."
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import files
uploaded = files.upload()
import io
my_data = pd.read_csv(io.BytesIO(uploaded['Iris.csv']))
The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.
It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.
The columns in this dataset are:
sepal_length, sepal_width, petal_length, petal_width, & class_label
class_label = ['Iris_setosa', 'Iris_virginica', 'Iris_versicolor']
my_data.info()
my_data.describe()
Line charts are used to represent the relation between two data X and Y on a different axis. Here we will see some of the examples of a line chart in Python :
sns.lineplot(data=my_data, x="class_label", y="sepal_length")
my_data2 = my_data.query("sepal_length > 5")
sns.lineplot(data=my_data2, x="class_label", y="sepal_length")
####LINE GRAPH
#sns.lineplot(data=my_data, x="class_label", y="sepal_length")
sns.lineplot(data=my_data, x="petal_length", y="sepal_length", hue = 'class_label', legend = 'auto')
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally.
category_order = ['Iris_setosa',
'Iris_virginica',
'Iris_versicolor']
sns.catplot(x='class_label',
data=my_data,
kind='count',
order=category_order)
plt.show()
""" Count plot: It gives you the count of the instances of variable under each category"""
sns.barplot(data=my_data, x="class_label", y="sepal_length")
plt.show()
""" Bar plots look similar to count plots, but instead of the count of observations in each category, they show the mean of a quantitative variable among observations in each category."""
A Pie Chart is a circular statistical plot that can display only one series of data. The area of the chart is the total percentage of the given data. The area of slices of the pie represents the percentage of the parts of the data. The slices of pie are called wedges. The area of the wedge is determined by the length of the arc of the wedge. The area of a wedge represents the relative percentage of that part with respect to whole data. Pie charts are commonly used in business presentations like sales, operations, survey results, resources, etc as they provide a quick summary.
#PIE CHART
import matplotlib.pyplot as plt
import seaborn as sns
data = [150, 150, 150]
#define data
class_label = ['Iris_setosa',
'Iris_virginica',
'Iris_versicolor']
#define Seaborn color palette to use
colors = sns.color_palette('pastel')[0:3]
#create pie chart
plt.pie(data, labels = class_label, colors = colors, autopct='%.0f%%')
plt.show()
Donut Charts or Doughnut Charts are a special kind of Pie chart with the difference that it has a Blank Circle at the center. The whole ring represents the data series taken into consideration. Each piece of this ring represents the proportion of the whole Data Series or percentage of total if the whole ring represents 100% of data. Donut Chart got its name from the Donuts which has a circle at its center.
# Create a pieplot
#define data
data = [150, 150, 150]
class_label = ['Iris_setosa',
'Iris_virginica',
'Iris_versicolor']
plt.pie(data)
# add a circle at the center to transform it in a donut chart
my_circle=plt.Circle( (0,0), 0.7, color='white')
# Give color names
plt.rcParams['text.color'] = 'red' ###changing text colors
plt.pie(data, labels=class_label, colors=['red','green','blue']) ### Adding data labels
p = plt.gcf()
p.gca().add_artist(my_circle)
# Show the graph
plt.show()
A scatter plot is a plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. Here each value in the data set is represented by a dot. It is used for understanding the relationship between the 2 variables.
my_data.plot(kind ="scatter",
x ='sepal_length',
y ='petal_length')
plt.grid()
sns.set_style("whitegrid")
# sepal_length, petal_length are iris
# feature data height used to define
# Height of graph whereas hue store the
# class of iris dataset.
sns.FacetGrid(my_data, hue ="class_label",
height = 6).map(plt.scatter,
'sepal_length',
'petal_length').add_legend()
Pair Plot: A “pairs plot” is also known as a scatterplot, in which one variable in the same data row is matched with another variable's value, like this: Pairs plots are just elaborations on this, showing all variables paired with all the other variables.
###PAIR PLOT
#sns.pairplot(data=my_data,kind='scatter')
sns.pairplot(my_data,hue='class_label')
Box plot was was first introduced in year 1969 by Mathematician John Tukey.Box plot give a statical summary of the features being plotted.Top line represent the max value,top edge of box is third Quartile, middle edge represents the median,bottom edge represents the first quartile value.The bottom most line respresent the minimum value of the feature.The height of the box is called as Interquartile range.The black dots on the plot represent the outlier values in the data.
### BOX PLOT
#sns.boxplot(x=my_data["petal_length"])
sns.boxplot(x='class_label',y='petal_length',data=my_data)
A heatmap is a two-dimensional graphical representation of data where the individual values that are contained in a matrix are represented as colours. The Seaborn package allows the creation of annotated heatmaps which can be tweaked using Matplotlib tools as per the creator's requirement.
df = my_data.iloc[0:150,0:4]
plt.figure(figsize=(7,4))
sns.heatmap(df.corr(),annot=True,cmap='summer')
A histogram is a graph showing frequency distributions. It is a graph showing the number of observations within each given interval.
sns.histplot(data=my_data, x="sepal_length")
Density Plot is a type of data visualization tool. It is a variation of the histogram that uses ‘kernel smoothing’ while plotting the values. It is a continuous and smooth version of a histogram inferred from a data.
sns.kdeplot(data=my_data, x="sepal_length")