… a case dealt with by the Information Commissioner in which a pupil was away from home at boarding school. Her parents received a letter from the local hospital informing them that their daughter had been involved in a road accident. In fact, there had been no accident, but the hospital had been using live patient data to test a system for sending out letters to patients.
(see “System Testing with Live Data May Breach Data Protection Act — Pinsentmasons.com”)
Production data provides a more accurate representation of real-world scenarios than synthetic data. This allows testers to identify bugs and edge cases that might not be apparent in simulations.
In tests, known use-cases are an examples of scenarios that give accuracy during tests.
By using production data, testers can create a testing environment that closely resembles the actual production environment. This helps ensure that the system behaves as expected under realistic conditions.
In tests, the upgraded software is tested against production data, giving surety that the upgrade will succeed in production.
Using production data for testing can be more cost-effective than creating artificial data. It eliminates the need to generate large amounts of data manually, which can be time-consuming and expensive.
In tests, using production data is straightforward and inexpensive.
Production data can help speed up the testing process, and reduce data friction, by providing a pre-existing dataset that testers can use immediately. This can reduce the time and effort required to set up a testing environment.
In tests, testing is accelerated by using production data.
Production data can provide valuable insights into how users interact with the system in the real world. This information can be used to improve the user experience and identify areas for optimisation.
In tests, using real data gives the closest user experience to the anticipated outcome.
Using production data for testing can expose sensitive user information, such as PII.
This is the case in system testing, with risk of exposure mitigated by conducting tests within a prescribed time window, after which the database clone is removed.
Production data is more valuable than simulated data, making it a potential cybercrime target. Using production data for testing can increase the risk of a security breach, data loss, and theft.
This risk is mitigated by the production data only being in test systems for a prescribed time-window.
This can lead to false positives or false negatives.
This is mitigated by error correction transformations used (e.g. checking emails) in everyday production.
Using production data for testing may be challenged.
You may use production data to test your products if a Data Protection Impact Assessment finds usage to be compliant with data protection law. (see “Using Live Data for Testing Purposes - Security Guidance — Security-Guidance.service.justice.gov.uk”)
The most commonly used way of working with live data is through is Anonymisation.
Anonymisation: a process by which PII is irreversibly altered in such a way that a PII principal can no longer be identified directly or indirectly, either by the PII controller alone or in collaboration with any other party.
(see “Iso.org”)
a stored procedure for anonymising fields extracted from mosaic into enterprise data
CREATE VIEW [rpt].[dimPerson] AS SELECT
[Title],
CASE WHEN [Restricted] 'Y' THEN '********'
ELSE [Forename]
END AS [Forename],
CASE WHEN [Restricted] 'Y' THEN '********'
ELSE[Surname]
END AS Surname,
CASE WHEN [Restricted] 'Y' THEN '********'
[DateOfBirth],
CASE WHEN [Restricted] - 'V' THEN '********'
ELSE [EmailAddress]
END AS EmailAddress
FROM [prs].[dimPerson];
GO
5/10
rule10
and rounds those that are 10
and above to the nearest 5
“Support product teams managing the full data management lifecycle” (see GDS 2023)