De-identification of Personal Information

Subsection 7(9) of the ARA requires PSOs to de-identify the collected personal information as required by the Standards. This will require the PSO to have or engage appropriate expertise in de-identification.

Standard 33. De-Identification for Public Release of Data

Before releasing any data, PSOs must de-identify personal information.

When de-identifying data for public release, PSOs must seek to preserve as much utility in the data as possible, particularly with regard to Indigenous identity and race information, while protecting the privacy of individuals. PSOs should apply de-identification processes in such a way that the Indigenous identity and race categories are kept intact as much as possible.

Rationale

De-identification protects the privacy of individuals. Once de-identified, information or data no longer contains information about, or that can be attributed to, identifiable individuals.

Data about Indigenous identity and race should be maintained as intact as possible to support the public’s access to this information and promote public transparency and accountability.

Guidance

“De-identification” is the process of removing or transforming personal information in a record or data set so that there is no reasonable expectation in the circumstances that the information could be used, either alone or with other information, to identify an individual. PSOs should take into account other available information and data sets that might be used with the de-identified data to re-identify individuals.

Before publicly disclosing data, PSOs should take steps to reduce the risk of re-identification to a level appropriate for public release. The IPC's “De-identification Guidelines for Structured Data” (June 2016), sets out a nine-step process for de-identification, including information on how to assess re-identification risks and risks to groups of individuals. Model practices in the de-identification process involve the following considerations:

  1. Analyze the data, user needs, and data environment to understand the data set and the context for release, including legal obligations.
  2. Assess re-identification risks: Re-identification is any process that re-establishes the link between data and an individual. Re-identification risk analysis can be complex and results will differ for each data release.
  3. De-identify data to minimize risk and maximize utility: Removing, mask or transform variables so that identifiable information is removed to the extent necessary to reasonably protect individual privacy while providing useful data.

PSOs should apply de-identification processes so that Indigenous identity and race variables remain intact to the extent possible. To protect individual privacy, PSOs should first apply de-identification techniques (see Appendix C) as appropriate to the other variables in the data set. In some cases, it may still be necessary to suppress or modify categories of Indigenous identity and race in order to protect individual privacy, for example where the number of individuals included within a category is small. Nevertheless, maintaining the Indigenous identity and race variables in the de-identified data set in a format as close as possible to the original helps to support public transparency and accountability with respect to race-based analysis.

Considering community interests before releasing potentially sensitive information

In addition to individual privacy, risks to groups of individuals should be considered. The de-identification process could also include considerations of community interests, such as the need to prevent the release of potentially sensitive information that could be linked to specific communities (i.e. First Nations communities or specific neighbourhoods).

Although de-identification techniques protect against the disclosure of individuals’ identities, they do not specifically protect against the disclosure of potentially stigmatizing attributes relating to groups of individuals. Preventing this may require additional measures such as the removal of geographic information at the census subdivision level or below. For example, census subdivisions can be used to identify First Nations communities and census tracts can be used to identify specific communities in defined neighbourhoods within cities.

Any decision to release or withhold potentially sensitive information that could be linked to a specific community should be done in consultation with the affected community. The de-identification process should also be done in consultation with the PSO’s legal counsel, privacy officer or FIPPA/MFIPPA coordinator.

Managing potential impacts of public release

The release of de-identified data and analyses should be conducted in accordance with established governance and management policies and procedures set out in Standard 2. PSOs should develop and implement a plan to reduce and manage potential re-identification risks and mitigate potential negative impacts on communities:

  • Maintain a record of all data released, including descriptions of release model, data types, and properties, as well as processes used for de-identification;
  • Perform regular and ongoing re-identification risk assessment of released data by examining it against disclosures of new or overlapping data sets; and
  • Identify and communicate with stakeholders, communities, and partners that could be negatively impacted by re-identification and have a plan to mitigate those impacts, including through community outreach, employee training, etc.

Standard 34. De-Identification of Results of Analyses

PSOs must de-identify the results of analyses prior to public reporting.

Rationale

The results of analyses may present re-identification risks depending on the types of data, analyses, and tables to be released. Re-identification risks also depend on the specific circumstances, such as other analyses and data that have already been released and sample sizes (including cell sizes).

Guidance

To minimize re-identification risks when reporting results, the PSO should ensure that it has appropriate expertise in de-identification of analyses and consider:

Restricting tables to two or three dimensions (i.e. two-way or three-way tables);

  • Suppressing results based on small cell sizes;
  • Exercising caution when using and reporting results based on small samples; and
  • Taking into account other results and data sets that are publicly available or have been previously released.

If other similar analyses are publicly available, PSOs should assess how they may be used to re-identify individuals and take appropriate precautions to address the risk of residual disclosure.

Organizations should consult with their legal counsel, privacy specialists and practitioners within their organizations when preparing to release of de-identified analyses to ensure that personal information is not inadvertently published or otherwise disclosed to the public.

Public Release of Data

Standard 35. Open Data

PSOs must publish de-identified data that they collected and used in reported analyses in a manner that is:

  • Open by default (unless there are compelling privacy, security or legal reasons not to do so);
  • Available in original, unmodified form, to the fullest extent possible;
  • Timely, accurate, and in machine-readable format;, and
  • Accessible, permanently available (except where published in error), and offered at no charge to the user.

Data sets must be released on or before the day that the PSO’s public report is released.

The data set must be publicly released on the PSO’s website or the Ontario Data Catalogue (where applicable), together with metadata containing the relevant key words: Anti-Racism Act, Indigenous identity, race, and where relevant, religion, and/or ethnic origin. Metadata must not include any personal information.

Rationale

Open data helps to ensure transparency and public accountability in identifying and monitoring systemic racism and racial disparities in Ontario’s PSOs. It also supports evidence-based public dialogue and debate.

Guidance

Open data is published proactively in free, accessible, and machine-readable formats. Its use by the public as well as within PSOs is encouraged.

Open by default means that data should be open and available unless there are compelling privacy, security, or legal reasons not to do so. Where open by default is not possible for such reasons, the data should not be publicly released.

Metadata is information that describes the characteristics of data. It can be used to help organize, communicate, and exchange information about data. PSOs should take care that metadata does not contain personal information, such as IP addresses, names, or other information that can be used to identify a specific individual.

Where possible, PSOs should undertake the public disclosure of data in consultation with the organization’s Open Government staff, privacy officer or FIPPA/MFIPPA coordinator, legal counsel, and parties to data sharing agreements, as applicable.

Organizations should consider the risk of residual disclosure, which can occur when confidential data can be inferred from what is released, or where the information could be used to re-identify individuals. It can also occur by cross-referencing the information released with other accessible information, including previous releases.

PSOs subject to Ontario’s Open Data Directive must comply by those rules and submit data sets to the Ontario Data Catalogue (see Open Data Guidebook for more information). Other PSOs should determine whether they are required to follow any Open Government policies, practices and standards.

Organizations should use an open licence, and consider including terms of agreement specifying that the dataset is not to be used in a manner that contravenes the ARA, the Code, or any relevant privacy legislation. The Open Government License is an example of an open licence that PSOs can use.

A number of steps are necessary before data can be converted into open and machine-readable format. They include identifying and prioritizing data for release, assessing data quality, reviewing data for accuracy, legal, confidentiality, and privacy and security implications, making data accessible and compliant with any French language requirements, and ensuring that specific technical requirements are met (see Sharing Government Data for more information).

Organizations are encouraged to contact potentially affected communities regarding data sets that may include sensitive information about their communities. Data sharing agreements, if in place, may guide the use and type of release model for public release of data. Under such circumstances, organizations should consider releasing data under different release models. The release model chosen is based on an assessment of the following:

  • The purpose and context of the release;
  • The sensitivity of the data and re-identification risks;
  • Legislative or other requirements to release data; and
  • The public interest in access to data.

Public Reporting of Results of Analyses

Standard 36. Public Reporting of Results

On a regular and timely basis, PSOs must develop and make publicly available on their websites, a report that includes:

  1. Results of analyses:
    • Descriptive statistics of all variables used in the analyses;
    • Description of benchmarks and/or reference groups; and
    • The racial disproportionality and/or disparity indices;
  2. Thresholds set to identify notable differences, including the rationale for them; and
  3. Information about collection method and data quality (accuracy, validity, completeness of data collected).

Rationale

Reporting the results of analyses demonstrates transparency and accountability to the public.

Guidance

The report should include a description of methods used to collect the data and relevant information about the population in the data set, including sample sizes, the period over which the data was collected, and any significant limitations in the data.

Descriptive statistics should include information about the data, wherever relevant:

Frequency (the number of times an observation occurs);Mean (arithmetic, such as averages, or geometric means, such as rate of growth);

  • Median (the value at which half the observations are below and half are above);
  • Range (the minimum and maximum values); and
  • Standard deviation (a value that indicates the amount of variation in the data).

Information about the accuracy of the data and statistics helps the public understand any limitations in the results of analysis and the appropriate level of confidence to place in the findings. Accuracy means the degree to which the data and results correctly describe the phenomena they were designed to measure. This is usually evaluated by identifying the potential sources of error. For example, the report should address the common sources of error that may be present:

  • Measurement error: Does the data collected reflect the “true” value of the measurement as it was designed to do (is the data valid)?
  • Coverage error: Is the population measured adequately covered in the data (to what extent are persons excluded or double-counted)?
  • Non-response error: What is the rate of non-response, and does the population with non-response differ from the responding population in some relevant aspect (is there bias in the responses)?
  • Sampling error: Does the sample represent the underlying population in relevant ways?

In addition to publishing disparity and/or disproportionality indices, organizations may also report on the results of any other analyses, such as intersectional and multivariate analyses.

Include findings from other sources of information to help provide context and additional perspectives to better understand the results.

Reporting on interpretations of results

Where possible, reports should include interpretations of results focussing on any potential systemic factors. They should be based on evidence, and informed by community and stakeholder engagement.

Evidence used to inform the interpretation of results may include qualitative information, such as historical accounts, descriptions of processes and practices, a systematic review of documents, focus groups, interviews, literature reviews, etc.

When reporting findings, organizations should provide sufficient context to avoid stigmatizing groups (e.g. highlight underlying social and historical disadvantages and marginalization faced by communities with poorer outcomes). Wherever possible, context and narrative should be informed by input from affected communities, stakeholders, partners, and subject matter experts.

Organizations should also be sensitive to histories of mistrust among marginalized communities about how government and PSOs have used data. Care should be taken to communicate clearly about the purpose, uses, and disclosure of the information collected, to respond to inquiries from the public and to engage with the communities affected.

Organizations should anticipate, manage, and mitigate any potential unintended negative impacts on affected communities, stakeholders, and partners. This may include preparing and training employees, communications planning, and outreach to potential affected parties as soon as possible.

Notifying the Minister Responsible for Anti-Racism

Standard 37. Notify the Minister Responsible for Anti-Racism

On the date of public release of open data and/or reporting of analyses, or within a reasonable time shortly thereafter, PSOs must provide the Minister Responsible for Anti-Racism with notice.

This notice must include:

  • The name of PSO and a brief description of the program, service, or function;
  • Metadata, including date published, location posted (URL); and
  • The PSO’s contact information.

Rationale

Notice of public releases to the Minister Responsible for Anti-Racism supports transparency.

Guidance

PSOs should provide the notice to the Anti-Racism Directorate. The notice should include the name, title, division or branch, and contact information (telephone number and email) of an employee who can answer questions about the open data or public report.

Metadata includes information describing how, when and by whom a particular data set was collected and how the data set was formatted. Metadata provided in the notice should include the following, where relevant:

  • Coverage: The time period (date range) to which the data set or report applies, and the geographic area or jurisdiction to which the data or report relates;
  • Date Created/Modified: The date on which the data set or report was finalized or modified for public release;
  • Date Released: The date on which the data set or report was released;
  • Update Frequency – The frequency with which the data or report is to be updated;
  • Key words: Terms to describe the major themes covered by the data set or report, including types of information and sector;
  • Format: The media type or dimensions of the resource, such as comma-separated values (CSV), PDF, etc.; and
  • Identifier: A reference to the data set or report using formal identification systems such as the Uniform Resource Locator (URL), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN).