aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Automation
  • Data

Can We Make Data Tidy?

  • Aelia Vita
  • December 7, 2021
  • 3 minute read

Imagine: You are about to sit down with a newly-fetched data set, excited about the insights it will bring you and then you find out it is no use. If you’ve been there, then you know for sure what an untidy dataset is.

A statistician from New Zealand once said: Tidy datasets are all alike, but every messy dataset is messy in its own way. Indeed, as data may come in various forms and shapes, sometimes we are inundated with it.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

As a result, our data science team becomes shortsighted and oops.. disillusioned by mountains of unworkable data. The only way data specialists can facilitate analysis is by keeping data clean and organized.

What is Tidy Data?

Essentially, tidy data is a term coined by Hadley Wickham in his Tidy Data paper (remember that statistician from NZ?). He defined tidy data as data that is neatly organized and all set for analysis.

This way of organizing allows you to easily produce charts, diagrams, and summary statistics. As it often happens, not all data comes out of the database clean, therefore cleansing it is essential to efficiently analyze it.

Without further ado, let us break down the principles that allow you keep your data nice and clean.

Tidy Data Principles

1. Each row is an Observational Unit

We’ll start with one of the basic principles. When you are giving your data the once-over, you should make sure each row contains an observation.By definition, observation is the individual unit under question.

If we look at the table above, an observational unit could be called ‘people’. You can see that each person has an individual row on the table and all of the information for that person is in the same row. Observations are included in rows, variables are represented as columns and there is only one observational unit per table. Now THIS is tidy data.

READ MORE: [button style=’accent’ url=’https://aster.cloud/2021/09/23/how-and-why-we-choose-to-clone-all-data-on-github/’ target=’_blank’ arrow=’true’ fullwidth=’true’]HOW AND WHY WE CHOOSE TO CLONE ALL DATA ON GITHUB[/button]
[button style=’accent’ url=’https://aster.cloud/2021/12/02/data-cleaning-in-python-the-ultimate-guide/’ target=’_blank’ arrow=’true’ fullwidth=’true’]DATA CLEANING IN PYTHON: THE ULTIMATE GUIDE[/button]

2. Each Column is a Variable

A variable is the unit you are assessing. Again, if we turn to our table above, age, hair_color and height fall within the category of variables.

Read More  SAP Backup, The Blended Way

In tidy data each variable is represented in a separate column.
Okay, now a one-second quiz: What is wrong with this dataset?

Yep, you guessed it right. Never put multiple variables in one column, otherwise your data analysis is doomed.

 

3. Each Cell is a Value

If you have got hold of the first two principles, this one should already be a no-brainer.

Anyway, we’ll make an extra effort to lay it all out. Each cell should contain only one value. It is also important that all values in the same column are formatted the same way.

On this data set, you can see that we have a table with four variables and three observations.

Each cell contains one piece of information and our values all match. All of our age values are digits, hair color values are whole words – you get the idea. Therefore, this dataset is tidy and almost fit for analysis.

4. Each Column has a Unique Name

In an ideal dataset, columns should have specific and descriptive names. Let us demonstrate you an example of this principle.

The third column is labeled hair_color. This is a more specific heading that if we were simply to call it – hair.

The word ‘hair’ can refer to anything from hair length to hairstyle. This level of specificity will help you speed up the analysis process.

The Final Word

Tidy data is an essential part of realizing the full data potential that exists. Once your data is tidy, it can be used as input into a wide range of other functions.

Read More  Building an Ethical AI Future - The Need for Cross-Sector Collaboration

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

Aelia Vita

Related Topics
  • Cleaning Data
  • Data
  • Datasets
  • Technology
You May Also Like
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024
dotlah-smartnation-singapore-lawrence-wong
View Post
  • Data
  • Enterprise
  • Technology

Growth, community and trust the ‘building blocks’ as Singapore refreshes Smart Nation strategies: PM Wong

  • October 8, 2024
nobel-prize-popular-physics-prize-2024-figure1
View Post
  • Data
  • Featured
  • Technology

They Used Physics To Find Patterns In Information

  • October 8, 2024
goswifties_number-crunching_202405_wm
View Post
  • Data
  • Featured

Of Nuggets And Tenders. To Know Or Not To Know, Is Not The Question. How To Become, Is.

  • May 25, 2024
View Post
  • Data

Generative AI Could Offer A Faster Way To Test Theories Of How The Universe Works

  • March 17, 2024
Chess
View Post
  • Computing
  • Data
  • Platforms

Chess.com Boosts Performance, Cuts Response Times By 71% With Cloud SQL Enterprise Plus

  • March 12, 2024

Stay Connected!
LATEST
  • 1
    Just make it scale: An Aurora DSQL story
    • May 29, 2025
  • 2
    Reliance on US tech providers is making IT leaders skittish
    • May 28, 2025
  • Examine the 4 types of edge computing, with examples
    • May 28, 2025
  • AI and private cloud: 2 lessons from Dell Tech World 2025
    • May 28, 2025
  • 5
    TD Synnex named as UK distributor for Cohesity
    • May 28, 2025
  • Weigh these 6 enterprise advantages of storage as a service
    • May 28, 2025
  • 7
    Broadcom’s ‘harsh’ VMware contracts are costing customers up to 1,500% more
    • May 28, 2025
  • 8
    Pulsant targets partner diversity with new IaaS solution
    • May 23, 2025
  • 9
    Growing AI workloads are causing hybrid cloud headaches
    • May 23, 2025
  • Gemma 3n 10
    Announcing Gemma 3n preview: powerful, efficient, mobile-first AI
    • May 22, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • Understand how Windows Server 2025 PAYG licensing works
    • May 20, 2025
  • By the numbers: How upskilling fills the IT skills gap
    • May 21, 2025
  • 3
    Cloud adoption isn’t all it’s cut out to be as enterprises report growing dissatisfaction
    • May 15, 2025
  • 4
    Hybrid cloud is complicated – Red Hat’s new AI assistant wants to solve that
    • May 20, 2025
  • 5
    Google is getting serious on cloud sovereignty
    • May 22, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.