Decades-old files. Incorrect customer information. Over-retained personally identifiable information (PII). Many organizations have toxic data suffused throughout their systems, without realizing the extent of the associated costs and risks.

Without solid, reliable and trustworthy internal data, organizations can’t take action upon their information. Improperly managed data also heightens risks during data breaches.

All of this means that organizations that find themselves on the wrong side of the toxic data divide are paying a steep price and losing competitive edge.

The Costs of Toxic Data – And How to Avoid Them - Infosecurity Magazine

Gaps in Data Integrity

The expenses that result from data-clogging systems and practices have been well documented, including hosting fees and e-discovery review burdens. While organizations are spending money to maintain this useless data, it’s also hampering efforts to effectively implement artificial intelligence and generate data-driven revenue opportunities.

According to a recent study by FT Longitude, an organization had lost $389,000 over a 12-month period due to data integrity flaws, totaling a $14bn loss globally. While clean data practices are highly profitable – 90% of the study’s respondents reported revenue and profitability growth as a direct result of their organization’s information management practices – the majority of organizations find themselves with significant gaps in their ability to maintain data capabilities.

A majority of organizations (60%) said they are concerned about their data integrity and ability to source information, and nearly two-thirds of respondents said they are unable to consistently generate value from their information management strategies in AI-readiness.

Part of the challenge in getting these issues under control is that toxic data can take many forms and exist in numerous places. For many, it arises in part from inadequate or unenforced data deletion policies.

For example, if employees have tens of thousands of emails dating back for years as a kind of virtual filing cabinet, there is likely a large store of toxic data within the enterprise email system.

Additionally, when organizations don’t have a process for tracking final versions of documents and records, old and incorrect versions of those files may be mistaken for accurate, up-to-date ones.

New sources of emerging data, such as instant messaging and chat tools, can clutter systems with terabytes of irrelevant and trivial data, all while the organization overlooks these tools as data sources that require enforceable retention policies. 

Many organizations stumble onto the depth of this problem when they attempt to implement new technologies, launch AI-readiness programs, face legal issues or otherwise find themselves in the midst of data initiatives or challenges.

Sending Toxic Data to the Dump

Given these issues, how can organizations find this toxic data? And what do they do once they’ve uncovered it?

Conducting a data inventory is an important first step to identifying sources of toxic data. AI tools can be helpful here. Companies can also review their data retention policies and determine what may have slipped through the cracks or which data sources may never have been covered by these policies.

Discovery exercises during or following a dispute or investigation represent additional opportunities to find and address data that is old, outdated or otherwise should not be part of the current data environment. Due diligence for a merger or acquisition can also turn up sources of over-retained information.

Rigorous attention to updating and enforcing retention and data entry processes can be given once a data inventory is completed. The old adage, “garbage in, garbage out,” remains as relevant as ever. Some organizations may find it necessary to start by creating a clean environment where only current, clean data is stored.

In one client engagement, FTI Technology’s team advised the company to do exactly this when it uncovered a data warehouse full of poor-quality information that was outdated and rife with customer PII.

While attempting to sort through that information, the company created a separate data warehouse for all newly created data to be stored going forward, isolating the good while it parsed through the bad.

When creating a sound data environment, companies should involve multiple stakeholders, including the legal department, chief technology officer, IT, information governance, compliance and other business units that interact with sensitive data as part of their regular business activities.

Since data crosses departments, it’s critical to bring in a variety of viewpoints to allow for thoughtful planning, buy-in and prioritization of how the healthy data should be used.

These different perspectives also help to strike the right balance between the risks and benefits of different data management approaches.

Conclusion

For many companies, toxic data is an issue that is often pushed off to be dealt with later. However, when it lurks beneath the surface, it can quickly become a significant area of risk or hinder AI strategy and revenue generation.

By understanding the scope of the problem and taking a proactive approach to solving it, companies can turn their toxic information into healthy, sound data that can drive profitability and growth.