8 Best Seaborn Visualizations

Hands-on statistical plots with Seaborn using the penguin dataset.

Tirendaz AI
Geek Culture

--

Photo by Towfiqu barbhuiya on Unsplash

To perform a project in data science, you first need to understand data. Data visualization is one of the best ways to understand data. Matplotlib and Seaborn in Python are generally used for data visualization.

In this blog post, I’m going to cover the following topics:

  • What is Seaborn?
  • Scatter plot
  • Histogram
  • Bar plot
  • Box plot
  • Violin plot
  • Facet grid
  • Pair plot
  • Heatmap

Let’s dive in!

What is Seaborn?

Seaborn is a Python library for data visualization built on Matplotlib. Matplotlib is used to plot 2D and 3D graphs, while Seaborn is used to plot statistical graphs. Because Seaborn builds on Matplotlib, you can use these two libraries together to create very powerful visualizations.

You can install the Seaborn with the following command:

pip install seaborn

When you install the Anaconda, Seaborn is installed automatically. After installing Seaborn, we need to import this library to use it. Let’s import Seaborn:

import seaborn as sns

With Seaborn, you can easily load some famous datasets used for data science. In this post, I’m going to use the palmer penguin dataset in Kaggle, which is used as an alternative to the iris dataset.

Penguin Dataset

Let’s load the penguin dataset with Seaborn.

data = sns.load_dataset("penguins")

Let me show the first five rows of the dataset.

data[:5]
The first five rows of the penguin dataset

Let’s see the structure of the dataset.

data.shape

#Output:
(344, 7)

Seaborn has some themes you can use. You can control these themes with the set_theme method. Let’s control themes with the rc parameter.

sns.set_theme()
# For the image quality of the graphic.
sns.set(rc={"figure.dpi":300})
# For the size of the graphics
sns.set(rc = {"figure.figsize":(6,3)})

Now, let’s go deep into the statistical plots.

1- Scatter Plot

The best technique for understanding data is the scatter plot. The scatter plot is used to display the relationship between variables. Let’s see the scatter plot of culmen lengths and depths by penguin species.

sns.scatterplot( x = "bill_length_mm", 
y = "bill_depth_mm",
data = data,
hue = "species")
Scatter plot for penguins species

As you can see, the length of the culmen is on the x-axis, and the depth of the culmen is on the y-axis. You can see how the species differ from each other from this scatter plot

2. Histogram

The second type of plot I’m going to show is histogram. Histogram shows the distribution of the data. You can use the histogram plot to see the distribution of one or more variables. Now let’s see the histogram of the flipper length using the histplot method.

sns.histplot(x = "flipper_length_mm", data = data)
Histogram plot for flipper length

Note that the histogram calculates the number of observations that fall within the intervals. You can also flip the plot with y parameter.

sns.histplot(data=data, y="flipper_length_mm")
Flipped histogram plot

You can control the width of the rectangles in histogram the bindwidth parameter. Let me show this:

sns.histplot(data=data, x="flipper_length_mm", binwidth=3)
Histogram plot by controlling the width of bins

You can also add a kde, which represents the probability distribution curve, to the histogram plot. Let me show that.

sns.histplot(data=data, x="flipper_length_mm", kde=True)
Histogram plot with kde

You can use the hue parameter to see the histograms of categories.

sns.histplot(data=data, x="flipper_length_mm", hue="species")
Histogram plot for penguin species

In this plot, you can see the histograms of the categories that show the penguin species.

3. Bar Plot

A bar plot represents an estimate of the central tendency for a numeric variable with the height of each rectangle. Let’s see the bar plot showing the flipper lengths of penguin species.

sns.barplot(x = "species", y = "flipper_length_mm", data = data)
Bar plot for penguin species

By default, the bars are calculated based on the mean of the values. You can use another statistic instead of the mean using the estimator parameter. Let me use the hue parameter to see the flipper lengths of the species by sex.

sns.barplot(x = "species", 
y = "flipper_length_mm",
data = data,
hue = "sex")
Bar plot for penguin species by sex

4. Box Plot

The box plot is used to compare the distribution of numerical data between levels of a categorical variable. Let’s see the distribution of flipper length by species.

sns.boxplot(x = "species", y = "flipper_length_mm", data = data)
Box plot for penguin species

Here, the boxes show the quartiles of the data. The length of the whiskers represents the rest of the distribution. You can think of values ​​outside of min-max as outliers. You can use the hue parameter to see a boxplot of flipper lengths of species by sex.

sns.boxplot(x = "species", 
y = "flipper_length_mm",
data = data,
hue = "sex")
Box plot for penguin species by sex

5. Violin Plot

You can think of the violin plot as a box plot. This plot is used to compare the distribution of numerical values ​​among categorical variables. Let’s see the violin plot of flipper length.

sns.violinplot(x = "species", y = "flipper_length_mm", data = data)
Box plot for penguin species

You can also use the hue parameter to see the violin plot of the flipper lengths by sex.

sns.violinplot(x = "species", 
y = "flipper_length_mm",
data = data,
hue = "sex")
Violin plot for penguin species by sex

Thus, the violin plot was drawn separately according to the sex variable. Isn’t it great? You can draw excellent plots with Seaborn. Let’s see how to plot multiple graphs in one graph.

6. Facet Grid

You can use a facet grid to see a grid graph of the different subsets in your dataset. For example, let me draw the histogram plot of the penguins’ flipper length according to the island and sex variables. Let’s assign column and row variables to add more subplots to the figure. First, I’m going to specify the variables that will be in the rows and columns.

sns.FacetGrid(data, col="island", row="sex")
Facet grid

When you run this command, 6 subareas occurred because the island variable has 3 categories and the sex variable has 2 categories (2*3 = 6). Let’s draw a plot on every facet using the map method. For example, let’s see the histograms of flipper length.

sns.FacetGrid(data, col="island", row="sex").map(sns.histplot, "flipper_length_mm")
Facet Grid with Histogram

You can also draw a different plot on every facet. For example, let’s see the scatter plot of flipper length.

sns.FacetGrid(data, col="island", row="sex").map(sns.distplot, "flipper_length_mm")
Facet Plot with displot

Awesome! You can easily draw subplots with Seaborn.

7. Pair Plot

Seeing the pair relationship between the variables in the dataset is one of the important steps of data analysis. You can use the pairplot method to see the pair relations of the variables. This function creates cross-plots of each numeric variable in the dataset. Let’s see the pairs of numerical variables according to penguin species in the dataset.

sns.pairplot(data, hue="species", height=3)
Pair plot with kde

Since the variables are numerical, a probabilistic density function is automatically drawn on the diagonal axis of the graph. You can use the diag_kind parameter to draw histograms on the diagonal axis.

sns.pairplot(data, hue="species", diag_kind="hist")
Pair plot with histograms

8. Heatmap

Finally, let’s look at the heatmap. Heatmap is one of a very useful visualization techniques. You can use this technique to see correlations between numerical variables. Let’s use the corr method to see this.

sns.heatmap(data.corr())
Heatmap

You can see the relationship between the numerical variables in this graph. You can also use the annot parameter to see the numeric values ​​in each cell. Let me show you this.

sns.heatmap(data.corr(), annot=True)
Heapmap with numerical values

So, numerical values ​​were set in each cell.

Conclusion

Data visualization is one of the important steps in data science projects. It is very important to explore the data before analyzing data. In this blog post, I talked about data visualization with seaborn. Seaborn is one of Python’s most important libraries used for data science. Seaborn is mainly used for plotting statistical graphs. You can find the notebook and dataset here. Thank you for reading. I hope you enjoy it.

Don’t forget to follow us on YouTube | GitHub | Twitter | Kaggle | LinkedIn 👍

Data Visualization with Python

11 stories

If this post was helpful, please click the clap 👏 button below a few times to show me your support 👇

--

--

Tirendaz AI
Geek Culture

Generative AI & Data Science | Top writer on Medium | YouTuber on AI: https://bit.ly/subscribe-tirendazai