aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Automation
  • Data

Can We Make Data Tidy?

  • Aelia Vita
  • December 7, 2021
  • 3 minute read

Imagine: You are about to sit down with a newly-fetched data set, excited about the insights it will bring you and then you find out it is no use. If you’ve been there, then you know for sure what an untidy dataset is.

A statistician from New Zealand once said: Tidy datasets are all alike, but every messy dataset is messy in its own way. Indeed, as data may come in various forms and shapes, sometimes we are inundated with it.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

As a result, our data science team becomes shortsighted and oops.. disillusioned by mountains of unworkable data. The only way data specialists can facilitate analysis is by keeping data clean and organized.

What is Tidy Data?

Essentially, tidy data is a term coined by Hadley Wickham in his Tidy Data paper (remember that statistician from NZ?). He defined tidy data as data that is neatly organized and all set for analysis.

This way of organizing allows you to easily produce charts, diagrams, and summary statistics. As it often happens, not all data comes out of the database clean, therefore cleansing it is essential to efficiently analyze it.

Without further ado, let us break down the principles that allow you keep your data nice and clean.

Tidy Data Principles

1. Each row is an Observational Unit

We’ll start with one of the basic principles. When you are giving your data the once-over, you should make sure each row contains an observation.By definition, observation is the individual unit under question.

If we look at the table above, an observational unit could be called ‘people’. You can see that each person has an individual row on the table and all of the information for that person is in the same row. Observations are included in rows, variables are represented as columns and there is only one observational unit per table. Now THIS is tidy data.

READ MORE: [button style=’accent’ url=’https://aster.cloud/2021/09/23/how-and-why-we-choose-to-clone-all-data-on-github/’ target=’_blank’ arrow=’true’ fullwidth=’true’]HOW AND WHY WE CHOOSE TO CLONE ALL DATA ON GITHUB[/button]
[button style=’accent’ url=’https://aster.cloud/2021/12/02/data-cleaning-in-python-the-ultimate-guide/’ target=’_blank’ arrow=’true’ fullwidth=’true’]DATA CLEANING IN PYTHON: THE ULTIMATE GUIDE[/button]

2. Each Column is a Variable

A variable is the unit you are assessing. Again, if we turn to our table above, age, hair_color and height fall within the category of variables.

Read More  Huawei Calls For Network Evolution At COP27 To Enable Green Development

In tidy data each variable is represented in a separate column.
Okay, now a one-second quiz: What is wrong with this dataset?

Yep, you guessed it right. Never put multiple variables in one column, otherwise your data analysis is doomed.

 

3. Each Cell is a Value

If you have got hold of the first two principles, this one should already be a no-brainer.

Anyway, we’ll make an extra effort to lay it all out. Each cell should contain only one value. It is also important that all values in the same column are formatted the same way.

On this data set, you can see that we have a table with four variables and three observations.

Each cell contains one piece of information and our values all match. All of our age values are digits, hair color values are whole words – you get the idea. Therefore, this dataset is tidy and almost fit for analysis.

4. Each Column has a Unique Name

In an ideal dataset, columns should have specific and descriptive names. Let us demonstrate you an example of this principle.

The third column is labeled hair_color. This is a more specific heading that if we were simply to call it – hair.

The word ‘hair’ can refer to anything from hair length to hairstyle. This level of specificity will help you speed up the analysis process.

The Final Word

Tidy data is an essential part of realizing the full data potential that exists. Once your data is tidy, it can be used as input into a wide range of other functions.

Read More  3 Technologies Poised To Change Food And The Planet

For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

Aelia Vita

Related Topics
  • Cleaning Data
  • Data
  • Datasets
  • Technology
You May Also Like
Getting things done makes her feel amazing
View Post
  • Computing
  • Data
  • Featured
  • Learning
  • Tech
  • Technology

Nurturing Minds in the Digital Revolution

  • April 25, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024
dotlah-smartnation-singapore-lawrence-wong
View Post
  • Data
  • Enterprise
  • Technology

Growth, community and trust the ‘building blocks’ as Singapore refreshes Smart Nation strategies: PM Wong

  • October 8, 2024
nobel-prize-popular-physics-prize-2024-figure1
View Post
  • Data
  • Featured
  • Technology

They Used Physics To Find Patterns In Information

  • October 8, 2024
goswifties_number-crunching_202405_wm
View Post
  • Data
  • Featured

Of Nuggets And Tenders. To Know Or Not To Know, Is Not The Question. How To Become, Is.

  • May 25, 2024
View Post
  • Data

Generative AI Could Offer A Faster Way To Test Theories Of How The Universe Works

  • March 17, 2024
Chess
View Post
  • Computing
  • Data
  • Platforms

Chess.com Boosts Performance, Cuts Response Times By 71% With Cloud SQL Enterprise Plus

  • March 12, 2024

Stay Connected!
LATEST
  • 1
    Pure Accelerate 2025: All the news and updates live from Las Vegas
    • June 18, 2025
  • 2
    ‘This was a very purposeful strategy’: Pure Storage unveils Enterprise Data Cloud in bid to unify data storage, management
    • June 18, 2025
  • What is cloud bursting?
    • June 18, 2025
  • 4
    There’s a ‘cloud reset’ underway, and VMware Cloud Foundation 9.0 is a chance for Broadcom to pounce on it
    • June 17, 2025
  • What is confidential computing?
    • June 17, 2025
  • Oracle adds xAI Grok models to OCI
    • June 17, 2025
  • Fine-tune your storage-as-a-service approach
    • June 16, 2025
  • 8
    Advanced audio dialog and generation with Gemini 2.5
    • June 15, 2025
  • 9
    A Father’s Day Gift for Every Pop and Papa
    • June 13, 2025
  • 10
    Global cloud spending might be booming, but AWS is trailing Microsoft and Google
    • June 13, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • Google Cloud, Cloudflare struck by widespread outages
    • June 12, 2025
  • What is PC as a service (PCaaS)?
    • June 12, 2025
  • 3
    Crayon targets mid-market gains with expanded Google Cloud partnership
    • June 10, 2025
  • By the numbers: Use AI to fill the IT skills gap
    • June 11, 2025
  • 5
    Apple services deliver powerful features and intelligent updates to users this autumn
    • June 11, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.