Multi-Source Data Integration
Multi-source data integration is often a challenging task for large enterprises. We've challenged our web developers to talk us through a recent data integration task we faced on one of our projects.
Article | by Claudia Webb
If your company does business within the European Union or collects data on European citizens you’ve likely worked to comply with GDPR.
GDPR aims to minimise the potential of malicious stealing of information through:
The evolved version of the current Data Protection Act comes with much higher non-compliance fines that are enough to make any business quiver in their boots.
Given the advancements made in technology over the last 20+ years, the new regulation comes as a replacement to the current Data Protection Act, which was adopted in 1998. With a lot of scaremongering around the subject, Elizabeth Dunham, the UK’s information commissioner in charge of data protection enforcement, reassures that the new regulation is manageable for businesses already complying with existing data protection laws. Nevertheless, the evolved version of the current Data Protection Act comes with much higher non-compliance fines that are enough to make any business quiver in their boots.
Statistics show that human error (damn homo sapiens!) is a leading cause, with 24% of UK employees admitting to intentionally sharing confidential business information outside their organisation, and 50% receiving an email by mistake leaking sensitive email attachments (such as bank details or customer information).
So, with humans often being accountable for messing up when it comes to breaches in data privacy, businesses are starting to turn their attention to sophisticated technology and data-driven solutions to make the transition to GDPR more manageable. This begs the question:
“Could artificial intelligence (AI) be the answer?”
AI (also known as machine learning) is a broad notion that refers to systems that enable technology to perform human-like intelligence-simulated tasks. As we know, AI capabilities are almost everywhere. Think about Alexa answering any question that pops into your head, or Facebook identifying the faces in your photos. It may still be in its infancy, but AI has the ability to change businesses forever. 72% of businesses have already said that AI will be a fundamental business advantage, to alleviate repetitive tasks such as admin, scheduling, and timesheets.
And in regards to GDPR, AI will become irreplaceable in circumstances where fast detection to prevent breaches, analysis and action are required. GDPR requires that all instances of data intrusion or attacks are reported without undue delay. It’s no longer enough to rely on a firewall to send suspicious behaviour alerts to an admin. AI will detect and act on an issue before admins come back from their tea break.
Although AI will be invaluable for business when it comes to the rapid reporting of data breaches, it also poses some challenges. For one, is the issue of complying to an individual’s right to explanation. The problem with machine learning models is that they are kind of a “black box” – no one really knows what answer they are going to come out with and the exact reasoning behind it.
Let’s look at a simple example: A bank uses a machine learning system to determine whether an individual is creditworthy to receive a loan. Based on data from previous borrowers, the system learns how to predict new applicants’ prospects for a loan. Let’s say in this instance, someone is declined a loan. AI reasoning from this decision lies within a complex web of millions of steps of data processing, all of which are difficult to trace back and provide an answer as to why a customer’s loan application was denied. This creates a particularly perplexing issue when the customer doesn’t know how to fix it, because he doesn’t know where the problem lies in the first place.
AI’s apparent unpredictability, deep-rooted in its complex mathematical foundations, causes problems when it comes to adhering to GDPR.
Unless companies processing an individual’s data fully understand the reasoning behind AI decision-making, it is difficult to adhere to the rule of “right to explanation.” Not being able to explain their decision not only risks non-compliance, but also frustrates customers who are left confused by the process.
GDPR also gives citizens the right to human judgement in the event of a contested result. Of course not all people will contest their results, but complying with this element in GDPR negates the need for AI and its whole ethos.
When it comes to GDPR compliance, the transparency and capabilities of AI and other machine learning algorithms are a double-edged sword. On the one hand, AI provides rapid detection of data intrusions and removes human error. But on the other hand, there are still issues around right to explanation and the obscurity of understanding AI decisions.
AI processes need to become transparent for companies to become compliant under GDPR - an issue which machine learning scientists are already looking into so that AI is less of a “black box”. Furthermore, the more resources used to uncloak machine learning models means fewer resources dedicated to making these models more successful (Juraj Jánošík tells in this article) – the latter being far more imperative when it comes to protecting data, particularly when opponents could be taking advantage by using more thoroughly understood machine learning technology.
Although AI has the potential to be a brilliant solution to tackling data security issues, it also only takes into consideration the data it is fed. This means that machine learning will not magically comply with GDPR unless it is clearly programmed to.
There are certainly some advantages to using AI, however companies should approach it with caution. Regulatory issues around data collection and use mean that companies will find themselves treading a fine line when it comes to privacy concerns and will need to be mindful of this issue when developing their data collection strategies.
Companies will need to ensure that there are efficient procedures in place for machine learning to take charge of data, and handle instances of malicious intent, so that they can safely and confidently put the responsibility of data privacy in the hands of a machine.