Rarely do the data you wish to visualize come packaged in a way that will allow you to do so without preparation. Review the sections below for important considerations and resources that will help "make your data useful" and ready for visualization and storytelling.
Sometimes you will be the person who collects the data you work with; other times the data may be handed to you. Suppose you haven't been part of the data collection process. In that case, you will need to understand how the data were collected, who collected it, what the data consists of, and the purpose for initially collecting the data (especially if it was for a different, unrelated project). It may be helpful to take a tour of your dataset to understand how it is organized and the types of data that are included.
When working with survey data, it may be useful to have a copy of the originating survey instrument handy, including answer choices.
The assessment professional should engage with respective data stewards or custodians of shared data to understand the meaning and context of data included in the table. You may need to refer to data dictionaries to decode categorized data. Data stewards and custodians are useful to contextualize data when attempting to join data across data sources to ensure compatibility.
Consider creating a list of demographic variables in your dataset that you may wish to aggregate or disaggregate. Common demographic categories include:
It may be helpful to run frequency distributions on the demographic variables to understand your data.
Age
Gender/Gender Identity
Parental Status
Martial Status
Race/Ethnicity
Sexual Orientation
Socio-economic Status
Undocumented Status
Veteran Status
Admission Type (e.g., Freshman, Transfer Student, Graduate Student)
College of Enrollment
First Generation Status
Year in School
"Visualization-friendly" data can be categorized, summarized, or calculated. Preparing your data ready for analysis and visualization will often require some cleaning, organizing, or transforming.
Data cleaning involves fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.
Data organizing often involves
Connecting two or more datasets.
Creating new categories to combine like items.
Adjusting the date and time information into a visualization-friendly format.
Combining data from two columns into one.
Data transformation involves converting data from one format to another (e.g., words to numbers).
In an ideal situation, you will collect your data in a way that is intentionally designed to answer your stakeholder questions. For example, if your stakeholders wanted to know X, then you might create a series of survey questions that would enable you to directly answer that question. However, if a dataset is handed to you "after the fact," you need to carefully examine the data to see if you can answer your stakeholders' questions as asked. If not, or if you can only answer a part of a question, then it is recommended that you return to the stakeholders for further discussion about what you can or cannot do, and if additional data need to be collected or identified before continuing with your assessment and subsequent visualizations.
Image Source: Data Therapy -Getting Data to Answer Your Questions https://datatherapy.org/2015/03/02/getting-data-to-answer-your-questions/
"Visualization-friendly" data can be categorized, summarized, or calculated. Preparing your data ready for analysis and visualization will often require some cleaning, organizing, or transforming.
Data cleaning involves fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.
Data organizing often involves
Connecting two or more datasets.
Creating new categories to combine like items.
Adjusting the date and time information into a visualization-friendly format.
Combining data from two columns into one.
Data transformation involves converting data from one format to another (e.g., words to numbers).
Documenting your thought process, assumptions, and steps taken during data cleaning and analysis will not only provide you with reminders of your processes but will make your work transparent. Transparency helps foster trust among collaborators and stakeholders. Others can understand how you arrived at your conclusions and replicate your process and results.
Association of Research Libraries: Research & Assessment Cycle Toolkit - Organize & Analyze
Data Therapy - Getting Data to Answer Your Questions (https://datatherapy.org/2015/03/02/getting-data-to-answer-your-questions/ )
Microsoft Support: Top Ten Ways to Clean Your Data [using Excel]
Tableau: Guide to Data Cleaning: Definition, Benefits, Components, and How to Clean Your Data