Historical Research and Data Protection

I have spent the past few days re-familiarising myself with the Data Protection Act (DPA) – partly because of forthcoming blog-posts I want to write, partly because of recent debates over confidentiality and historical research (for example, see here or here). My research relies quite heavily on archival sources relating to living (or assumed to be living) subjects, and this places particular legal responsibilities on how I secure and disseminate my research, in accordance with the DPA.

I want this week to therefore post a summary, mediated by my own experience, of how the DPA affects historical research. Though I am not in any way an expert on information law, I have reviewed several thousand sensitive patient-files, medico-legal reports and court records throughout my PhD, and offer the following overview in the hope that it might act as a useful starting-point for historians wishing to learn more about the DPA.


A Brief Introduction.

The DPA came into effect in 2000, and regulates the use of ‘personal data’ (that is, information related to a living subject, which may be used to identify that subject, and which is being processed or recorded in an accessible, organised way and/or with a view to its later systemisation). The Act defines ‘sensitive’ personal data as that relating to a living subject’s race/ethnicity, religion, political views, physical/mental health, criminal record or any pending or concluded criminal allegations. Sensitive data is to be handled more strictly than others forms of personal data, and there are further restrictions on who can collect it. Consent is usually required if an organisation or individual wishes to collect or hold personal data or sensitive personal data.

The DPA is relevant to any historian ‘processing’ – that is, obtaining, recording or holding – personal data on a living subject or a subject assumed to still be alive. In retaining personal data on a living subject, researchers become its ‘controller’, meaning that they are responsible for that data and have to ensure that they comply with the terms of the DPA. Failure to do so may involve investigation and/or sanction from the Information Commissioner’s Office (ICO) or legal action from the data-subject.

Historians are likely to process or obtain personal data in two ways:

  1. In an archival setting (any notes/photographs/photocopies you take/make may contain personal data); and
  2. In an oral history interview (your recording of the interview will almost always be classed as personal data, as will any consent-form that you ask the interviewee to sign; if you have a list of interviewee’s names/contact-details then this is personal data; written transcripts that are derived from the recording will contain personal data unless they are anonymised).

In my experience, the DPA affects the research-process at three stages – when negotiating access to the archive, when in the field and when in the later retention/dissemination of research-materials.


The DPA and Archival Access.

The DPA is governed by eight foundational principles. I paraphrase them as follows:

  1. That data is processed fairly and lawfully;
  2. That data shall be processed/retained only for specific, lawful purposes;
  3. That data should be processed/retained in a way that is adequate for the data-controller’s purposes (i.e., not excessive);
  4. That all data held should, as far as possible, be accurate;
  5. That data should not be retained for longer than is necessary;
  6. That subjects have certain rights with respect to their data (e.g., a right to claim compensation if the data-guardian breaks the DPA);
  7. That data should be appropriately secured;
  8. That data should not be transferred outside of the European Economic Area unless the host country or jurisdiction ensures adequate protections.

None of these principles offer much succour to the historian interested in the recent past (e.g., Principle 5 would pose all sorts of issues for storing materials in archives long-term). But consider the following qualifications to the DPA.

Firstly, note that the Act only applies to living subjects: the DPA does not apply to subjects who have already died, so long as this can be proven. The Act does not stipulate what happens if the subject’s death is unknown. But there is a provision (specifically, section 51(4)) that allows the Information Commissioner to liaise with various trade associations to establish best practice. This is precisely what the Commissioner did with various archival bodies over a decade ago, and who agreed on using the 100-year rule when in doubt. Thus, if a subject is over 100-years old, then it is assumed that they are no longer living, and the DPA does not apply. (What happens when the subject’s age cannot be calculated is explained here). If you are using personal data relating to a subject born before 1916, therefore, then the DPA will not impact on your research.

I have heard some contemporary historians bemoan how the DPA stymies access to twentieth-century records. Certainly, the Act does impose some limitations, but it is incorrect to assume that it denies all researchers access to personal data on living subjects. Note this – even if you are working with living subjects, the DPA need not necessarily apply. There is a second exemption from the Act, and it’s a fairly big one.

Section 33 offers qualified exemption for the purposes of historical or statistical research, provided that two conditions are met. These are:

  1. That the data are not processed to support measures or decisions with respect to particular individuals; and
  2. That the data are not processed in such a way that substantial damage or substantial distress is, or is likely to be, caused to any data subject.

I would suggest that these conditions are easy to achieve for practising historians, so long as care is taken in how research is disseminated (historians, furthermore, are unlikely to be taking decisions on living subjects). So long as both of these conditions are met, Section 33 grants exemption from certain aspects of the DPA, specifically:

(2) For the purposes of the second data protection principle, the further processing of personal data only for research purposes in compliance with the relevant conditions is not to be regarded as incompatible with the purposes for which they were obtained.

A very roundabout, obtuse way of saying that Principle 2 need not always apply to researchers, that data can be re-used in a way that it was never originally intended for. Note the circumspect use of language, however – this exemption does not completely invalidate Principle 2, only that re-processing data is ‘not to be regarded as incompatible’ with it.

(3) Personal data which are processed only for research purposes in compliance with the relevant conditions may, notwithstanding the fifth data protection principle, be kept indefinitely.

A much clearer point. You know how Principle 5 states that data must be held for no longer than necessary? Well, that doesn’t apply to historical research. Archives can keep personal data on living subjects for as long as they like; oral historians can do the same.

(4) Personal data which are processed only for research purposes are exempt from section 7 if —

(a) they are processed in compliance with the relevant conditions; and

(b) the results of the research or any resulting statistics are not made available in a form which identifies data subjects or any of them.

The ‘relevant conditions’ are those that I identified earlier (that data is not processed to make decisions about living subjects, that data is not processed in such a way that causes distress). Researchers are exempt from section 7 – which relates to the security of data – provided that they subscribe to the (a) ‘relevant conditions’ and ensure that (b) it is not possible to identify data-subjects from the re-processing of the data.

(5) For the purposes of subsections (2) to (4) personal data are not to be treated as processed otherwise than for research purposes merely because the data are disclosed —

(a) to any person, for research purposes only;

(b) to the data subject or a person acting on his behalf;

(c) at the request, or with the consent, of the data subject or a person acting on his behalf; or

(d) in circumstances in which the person making the disclosure has reasonable grounds for believing that the disclosure falls within paragraph (a), (b) or (c).

This sub-section qualifies the preceding points, and states that personal data is not to be treated as ‘for research purposes’ simply because it has been disclosed for reasons a, b, c or d. In other words, just because data has been distributed to a bona fide researcher by whoever controls the data, does not automatically mean that that data is covered by Section 33.


Now, Section 33 does not guarantee access to particular collections of personal data. Discretion in granting access still remains with the data-guardian (e.g., the archivist(s)), who must be satisfied that the disclosure of data is (i) not going to be used to make judgements about a living subject and (ii) is not going to be used to cause them harm or distress.

In my experience, most archives are willing to grant access to personal data provided that the researcher accepts these undertakings. In other words, just because an archive holds personal data on living subjects does not mean that you are automatically barred from access. Section 33 allows data-holders to exercise their discretion, provided that you can convince them of your scholarly intentions.


The DPA in the Field.

Adherence to the DPA is an ongoing process, and research in situ – either in an archive or when conducting oral history interviews – raises certain challenges. I can think of two.

Firstly, be mindful of Section 33 (4). Principle 7, which relates to security, does not apply if you make it impossible to identify the subject when re-processing the data. But this is perhaps easier said than done. For example, if you record an oral history interview, and the interviewee divulges identifying information (which they will), then the recording is now outside the terms of Section 33(4) and therefore in need of appropriate safeguarding. (Indeed, I would suggest that anonymity is never fully possible to achieve in interviews anyway, as those with privileged insight – such as family members – will always be able to identify the data-subject if given access to a recording or transcript.)

It is a little easier to anonymise personal data in an archival setting, as you can choose not to record a subject’s identifying features in your written notes. But how many historians take notes nowadays? I often take pictures for reasons of speed, not bothering to conceal personal data from my images (sometimes it’s just not possible, or there is too much data, etc.) These photographs thus contain data that, just like the oral history recording, cannot be exempt from the DPA and so must be appropriately secured under Principle 7.

Furthermore, even if you can anonymise your notes, you might not actually want to. For instance, you may need to record personal data for constructing an identifier-key (discussed here), if a collection is incompletely or imperfectly catalogued (oral historians will require an identifier-key anyway, to connect their recordings with the pseudonymous transcripts later derive from them).

Unless you are a scrupulous note-taker, and eschew the use of photography, I suspect that you will have recorded some personal data whilst working in an archive. This is not a problem under the DPA; it just means that you are now the controller of that data, and need to have a system in place to adequately secure it. Oral historians will record lots of personal data, and their need to secure it is just as paramount.



There are some final issues of compliance needed after having obtained (via interviews) or processed (via archives) personal data. You have obligations under the DPA in terms of how you disseminate this research and how you retain it.

With respect to dissemination, I would advise not to put any personal data into the public domain (via presentations, publications, exhibitions, etc). There are exceptions countenanced by the DPA (e.g., an interviewee may allow you to). But you are on much safer ground if you anonymise all of the personal data that you put into the public domain. This is for two reasons. Firstly, remember that Section 33 of the DPA grants an exemption to historical researchers to access personal data, so long as, in so doing, they do not cause distress/damage to the data-subject. Yet this is ill-defined by the Act, and many data-guardians therefore encourage a cautious approach to dissemination. Best practice is to anonymise everything you disseminate; it just works out safer this way.

There is a second reason why I encourage researchers to anonymise data they place into the public domain, for Section 33 of the DPA does not excuse researchers from Principle 6. Data-subjects may therefore request that they see all personal data that you hold on them. Now, this is unlikely to happen to oral historians (as I assume there would be some relationship of trust between interviewer and interviewee). But for archival research, there is a risk that data-subjects could identify themselves from your publications, exhibitions, etc. In that situation, they are entitled to request what information you hold about them, even if you have simply taken that information from somewhere else. Be very, very mindful, therefore, of how you disseminate your research.

(The ICO’s website offers some helpful guidance on how to anonymise personal data to prevent it being used to identify data-subjects. Again, I would suggest that it is never possible to fully anonymise what you place into the public domain, as those with privileged knowledge will always be able to guess who data-subjects are. Nevertheless, you should aim to make it as difficult as possible to work out whom any personal data you hold belongs to.)

There are other challenges associated with data retention and security, too. Remember that section 33 (3) of the DPA obviates the need to destroy personal data immediately upon the conclusion of your research. However, if you have not anonymised the personal data you hold, then you need to keep it secure, in-keeping with Principle 7. (I suspect that you will not want/be able to anonymise all of the personal data you hold, as you may want to return to it later in your career or whatever).

The main issue with long-term retention of personal data is security. Written or hard-copies of personal data should be handled with commonsense (i.e., not leaving your notes lying around for others to read). I have known of one archive to request that notes are transferred by the researcher in a locked briefcase, but this is fairly exceptional.

If the data is held electronically (oral history recordings or photographs/notes), then best practice is to encrypt or password-protect the individual files (.mp4 or .jpeg), or the media on which they are stored (i.e., the hard-drive of your computer). This is the most effective way of limiting the fall-out of having data stolen, lost or accessed inappropriately. Also remember that you will need to encrypt any back-ups/copies you make of the personal data, and protect the data if it is transmitted at some stage in the future (e.g., if you migrate your data to a new computer). The ICO has some useful advice on encryption (and it’s fairly easy to activate on either Windows or Mac). If you wish to make hard-copies of any electronically-held personal data, then commonsense rules apply.

Furthermore, you should limit access to the personal data you hold as far as possible, preferably just to yourself (or to one researcher if you work as part of a team). You can share personal data with other researchers, but they must handle it appropriately (they become responsible under the DPA, just as you are). I would strongly discourage this practice, however, and would personally refuse if another academic requested that I show them the personal data on which my research is based (they can go and visit the archive where I processed it from if they’re that bothered).

A final comment on storage-clouds, either services like DropBox or internal university networks. Note Principle 8. This need not apply if the data-subject consents that you can transfer their personal data. However, if the subject refuses consent (or consent is not sought), then you should still be alright. Although most data-storage clouds are based in the US,  the EU has recently struck a deal with American authorities to ensure that firms based there are compliant with legislation like the DPA. You should therefore not be in violation of Principle 8 if you decide to use a cloud-based service based in the US, at least for the foreseeable future.  However, I would still be careful in what you upload to cloud-based services – pick a strong password, etc., or someone may gain access to the personal data you hold. I would also recommend that you encrypt anything that you upload to a cloud platform. I’d do the same with any personal data that I store on an internal university network, and I’d think carefully before relying on one (you need to be confident that no-one else will have access to the files you upload).

Guide to Further Reading.

You can consult the DPA here (or as a PDF here).

Oral historians are spoilt for choice when it comes to issues of data protection and ethics, especially from the Oral History Society (e.g., see here).

The UK Data Archive also has some useful guidance on anonymisation and the DPA.

Many archives offer advice to researchers, too, on compliance with the DPA. In particular, see the advice from the National Archives of Scotland (here) and the Modern Records Centre (here). The National Archives have also produced a very detailed guide for archivists and other data-guardians, which I recommend that historians peruse as it’s very comprehensive.