Stale & cloned data hindering generative AI adoption

Tue, 19th Mar 2024

FYI, this story is more than a year old

A recent survey reveals that while 8 in 10 business leaders view generative AI as a competitive opportunity, a mere 9% have successfully implemented the tech extensively. A new study has unearthed what impedes these companies from effectively adopting AI: a plethora of "cloned" and "stale" data. The investigation by Aiimi, a leading AI organisation, examined over 500 million company documents and files, uncovering some startling figures.

According to the analysis, an overwhelming 96% of the data housed by the companies studied hasn't been accessed in the last 90 days, which equates to 164,897,136 documents and files per company analysed. The study found that 75% of this data hasn't been accessed in two years. This vast quantity of stale data could potentially obstruct effective AI adoption. AI models that utilise obsolete information or lack a full data picture may produce unreliable or irrelevant results.

Furthermore, there is a risk that sensitive content contained within this data could be rendered insecure when processed by AI models. The more extensive the period the data has remain untouched, the more significant the issue becomes. Around 81% of the data stored by these companies hasn't been accessed in the last year, and three quarters of the data has been untouched for the last two years.

The study suggests that smaller corporates are more likely to have data untouched for extended periods. An astonishing 99% of the small-scale corporate data examined hadn't been accessed in the last year, and 96% had not been touched in the last two years. Large-scale corporates also leaned towards leaving data untouched for longer periods, compared to medium-scale corporates.

In addition to stale data, "cloned" or duplicated data is complicating matters. The study found that 11% of the documents stored by large and medium corporates are copies. These duplicate copies of the same document could cause inaccuracies in the results generated by AI models. Moreover, storing this unnecessary data incurs sizeable costs for businesses, primarily when using Cloud services.

Steve Salvin, Founder and CEO of Aiimi, highlighted, "The issue of stale data and copies of documents is huge. Our findings show that the very largest and smallest of companies are particularly at risk". He explained that these enduring problems can result in legal issues such as outdated company handbooks leading to regulatory violations, or accidental retention of documents that should have been deleted. This stale and copied data can also distort the results generated by AI.

Salvin further added that addressing this issue would need to be the first order of businesses seeking to make informed decisions and benefit from AI. The need of the hour is well-governed and high-quality data – free from duplicates and correctly stored and cited – which can be efficiently processed by AI.

Share on:

Guides

Search

Stale & cloned data hindering generative AI adoption

Top stories