Skip to Main Content

How to Start a Social Sciences Research Project: Data

This guide suggests some ways to start a research project.

General Dataset Use Guidelines

Datasets may need to be handled differently depending on their subject and the kind of data collected.

Generally, keep these things in mind when working with other people's data:

  • Evaluate the quality of the data. If you can't find information about your dataset, consider if the dataset would be good for your research project.
    • Who created the data?
    • Where did they get it?
    • How did they measure and record the data?
    • When is the data from?
  • Keep a text document where you track important changes:
    • File names
    • Dataset column names
    • How you combined datasets (if applicable)
  • Always make a new copy of a dataset to edit. Don't directly edit your only copy.
  • Keep citations of the datasets that you use.

Federal Datasets

Not all datasets collected by the government may stay accessible online.

There are some alternative ways to find federal datasets that have been removed from government websites and databases.

USA Marx Library's Government Documents section contains many print copies of historic federal datasets. There are several LibGuides for how to search our government documents by topic. You can start with the FDLP Collection: Home.

For digital datasets, there are different alternative databases and websites that you can check:

Data Repositories

These are some data repositories. They are either general and cover a wide range of topics, or they focus on social sciences.

You can search many data repositories the same way you search databases for academic articles: you build a search string by combining search terms with Boolean Operators.

Example: I want this kind of data: college students answering questions about their mental health

I can use this search string: (College OR university) AND student AND (questionnaire OR survey OR interview OR "focus group" OR qualitative) AND ("mental health" OR depression OR anxiety OR "mental well-being" OR "mental illness")

Statistics

Statistics are a processed version of datasets. 

Example: A dataset would be a spreadsheet of everyone in a county and their race. A statistic would be that 45% of that county is Black.

When finding and using statistics, first check for these qualities:

  • Currency - 
    • How recently was this created?
    • How old is the dataset used to create these statistics?
    • Has it been updated?
  • Relevance - 
    • What do other sources on the same topic say?
    • Am I preferring to use this specific source because I agree with it?
  • Authority - 
    • Who is the author?
    • Are these statistics created or published by a person or group who benefits from the subject?
  • Accuracy - 
    • Where did the creators get their information from?
    • How was the original data collected?
  • Purpose - 
    • Why was this statistic created?