Skip to Main Content

PSC 530: Quantitative Analysis: Datasets

This LibGuide is in support of Dr. Liebertz's PSC 530 course.

Federal Datasets

Not all datasets collected by the government may stay accessible online.

There are some alternative ways to find federal datasets that have been removed from government websites and databases.

USA Marx Library's Government Documents section contains many print copies of historic federal datasets. There are several LibGuides for how to search our government documents by topic. You can start with the FDLP Collection: Home.

For digital datasets, there are different alternative databases and websites that you can check:

Some Places to Search for Datasets

General Dataset Use Guidelines

Datasets may need to be handled differently depending on their subject and the kind of data collected.

Generally, keep these things in mind when working with other people's data:

  • Evaluate the quality of the data. If you can't find information about your dataset, consider if the dataset would be good for your research project.
    • Who created the data?
    • Where did they get it?
    • How did they measure and record the data?
    • When is the data from?
  • Keep a text document where you track important changes:
    • File names
    • Dataset column names
    • How you combined datasets (if applicable)
  • Always make a new copy of a dataset to edit. Don't directly edit your only copy.
  • Keep citations of the datasets that you use.

Creating Dataset Guidelines

Datasets and data can really vary in content. 

Whenever you are collecting data and making a dataset, you should also keep a text document (README file) where you include all the context a person would need to understand your dataset. This would also help you if you ever need to look back on this dataset in the future.

Because data can be so many different things, check with your assignment guidelines and with your professor about requirements.

These are some qualities about your data that you might want to keep track of:

  • The authors of the dataset: you and any collaborators
  • When was the data collected?
  • Where was the data collected?
  • What group of people was studied?
  • What quality were you studying? (number of kids, social anxiety test scores, incarcerated men's feelings about isolation...)
  • How did you measure that quality? (interviews, counting, assessments...)
  • What units of measurement did you use? (percentages, inches, gallons...)
  • How did you mark missing data?
    • Example: You want to keep track of who visits class every day. You are sick, so you cannot go to class one day. You cannot record who went to class the day you are absent. How do you mark that skipped day in your data sheet?