There are two categories of data that you might be using in your research: Original data or data created by someone else. It is also possible that you will use both kinds. For example, creating your own dataset from interviews, but using statistical data from Stats NZ to place those interviews in context.
When creating your own data, regardless of the method used, you will need to work out how it will be structured, recorded, filed, stored, documented, tracked and traced, validated and potentially anonymised. You can save a lot of time, effort, and frustration later on by working out how you will do this at the start, and then following through with it during your research.
You will also need to determine the same things when working with data from someone else. Often, data sets obtained from another source contain data that you do not intend to use, so you will need to re-organise the data in a way that makes sense for your purposes. Any time you analyse the data, you should also save backups, which will help prevent data corruption as well as let you trace back to where research insights or divergent paths happened.
If you have selected, or are following, a metadata scheme, then that may require you to structure, file, document and validate the data in a specific way. Consistency is important not just for yourself and possible collaborators, but also if you publish or share your data in the future.
If you are going to be using data created by someone else, you will first need to find it. While it is a good idea to consider this before your research, it is possible that the need or opportunity arises during your research for new or more data.
You can find data in many different ways. The most common option is to look in repositories, but often older data will be kept in archives and may not be accessible electronically. Look for help searching or accessing these archives from the Library or the archives directly.
The links below will take you to some repositories, collections and dataset searches which may be of interest:
There are a lot of other datasets from various repositories and databases around the world. Many of them open, some requiring access applications. Some repositories have specific, curated data, while others are more general. It is important to search through these databases much like searching for journal articles in a literature review. Be patient, but be specific or you may find yourself getting off track as you find other interesting data.
How you store and backup your data may require thinking about your ethics application, who owns the data you are using, who could or should have access to it, what security is in place around your data, how long you will need to keep the data (both the duration of your research, and how long you need to keep it for after) and how you are going to name and arrange your backups.
Some of these, and other aspects, may change during your research. It is important to remember that what you have in your data management plan (DMP) may need to be adjusted as your research happens. Be sure to update your DMP if and when this happens.
Your storage needs largely depend on two factors. The first is file size. This can be wildly different depending on the data types. If you are using large or detailed images in your research, that may require more storage space. Good quality video and audio can also take up a lot of space. Spreadsheets and field notes may not take as much space. But consider splitting spreadsheets up into different sheets, both as a way of arranging your data and as a way of making sure that more detailed spreadsheets do not become cumbersome.
The second factor to consider is access. This involves access for collaborators, sharing your data and security. But it is also important to consider the data owner, your funder requirements and legal issues. Some data, like medical data, will have extra restrictions or requirements. If you are getting your data from someone else, they should be able to inform you about legal requirements for your storing and sharing it. Seek advice from the university and the Library if you are unsure.
Discuss with ITS how they can help you store and access your data:
Analysing your data comes with some risk. Every time you sort or rearrange your data, you should make a backup. This backup should be named according to your metadata scheme if you have one, or the naming convention established in your data management plan. This will help you to keep track of different versions of your data without losing important analysis. Some research is quick, and some research takes many years to complete. Having inadequate data management during your analysis can be costly in time, money and frustration.
Consider what your software needs will be during your analysis. Remember that your data may change over time, too. What starts out as an audio interview could eventually end up as a graph of common phrases, or a photograph could end up as statistical data. Geospatial data could end up as tables, maps or graphs. Different software will allow you to do different things. Some research areas have specific software that will be needed, or the data you are accessing might older software that is no longer easily available.
Your hardware needs may also change over time. Consider the project as a whole, as well as the little parts that make up the whole. Cameras or sensors are just as important to take note of as GPUs, CPUs and hard drives.
Discuss your options with ITS (Kuhukuhu) to see what support they can offer with both software and hardware:
For assistance, reach out to the Open Research Team at library@waikato.ac.nz.