Data centers contain 90% crap data

We need to talk about data. Crap data. We’re destroying our environment to create and store trillions of blurred images, half-baked videos, rip-off AI ‘songs’, rip-off AI animations, videos and images, emails with mega attachments, never-to-be-watched-again presentations, never-to-be-read-again reports, files and drawings from cancelled projects, drafts of drafts of drafts, out of date, inaccurate and plain wrong information, and gigabytes and gigabytes of poorly written, meandering content.

We destroy our environment by storing copies of copies and copies of copies we never intend to look at again. We destroy our environment by taking 1.9 trillion pictures every year. In the 2020s, there were more photos taken per year than in the 20th century. It’s more than 200 pictures for every man, woman, and child alive. Every year. The Cloud now contains 12 trillion photos, and is growing. The vast majority will never be seen again. It’s mind-blowing and exactly what Big Tech wants.

I have spent nearly 30 years working with the largest organizations in the World in over 40 countries to help them better manage their data and content. Here’s what i’ve learned. 90% of commercial and government data is total crap. Period. It should have never been created. It should never have existed. Data crap production exploded with the rise of digital. Content management systems are like diesel-fueled diggers for staff, where before they only had data spoons. Around 2010, I was in conversation with a Microsoft Manager who estimated that Microsoft.com had 14 million pages, of which four million were never visited. I thought four million. This is the equivalent of the population of Ireland in pages that no one has ever visited. Why were they created? All the wasted time, effort, energy and resources that went into creating all these pages. We are destroying the environment to store and create garbage. Nobody cares.

It was the same story everywhere I went. Data crap everywhere Distributed publishing allows anyone to publish whatever they want on the intranet. Nobody maintains anything. When IBM spun off Kyndryl, a world-leading provider of IT infrastructure, they discovered that their data was scattered across 100 different data warehouses. The same data was duplicated by multiple teams. After cleaning, 90% of the data was gone. There are 10,000,000 stories like this.

Scottish Enterprise’s website had 753 pages, but only 47 of them were visited by 80%. A large organization where I worked had 100 million visitors a year on its website. 5% of the pages received 80% of those visits. In the last 10 years, 100,000 pages had not been viewed. Jordan Tigani explains that a large percentage of data processed is less than a day old. By the time data is a week old it’s probably 20 times less likely that it will be queried. After a month of data sitting there, the Southampton University website found that only 0.2% of its pages received 90% of all visits. Only 4% of the pages were ever accessed. 96% of its four million pages have never been visited. One organization I knew had 1,500 Terabytes worth of data. Less than 2% of it was ever accessed. There are at least 20 million other stories like this.

Most organizations don’t know what they have. It’s worse. Most organizations don’t know where their data is stored. It’s even worse. Most organizations do not even know how many computer they have. Management doesn’t care if 50% of the data in an organization is on a server. The average organization has hundreds unapproved third-party apps that are paid for with a manager’s credit cards. They store everything from project chats, draft reports, and product prototypes.

The Cloud made the crap-data problem infinitely worse. The Cloud is the result when the cost to store data is lower than the cost to figure out what you should do with it. According to a study, the amount of data stored by UK firms in the engineering and construction industries has increased from an average three terabytes per firm in 2018 up to 26 terabytes per firm in 2023. This is a compound annual increase of 50%! That sort of crap data explosion happened–and is happening–everywhere. Because it’s ‘cheap to store data’, nobody cares. AI is trained on this. We wonder why AI is so wrong? Crap data in. Crap data out. Nobody cares. Nobody cares, especially at the senior management level. Senior management is bursting with Big Tech groupies who are chanting about how the latest tech miracle will magically transform their careers. Dealing with senior management has always been a very unpleasant part of my work. They are a bunch of narcissists and stupid egotists who only care about themselves.

Data centers contain 90% crap data

The AI lab revolving door spins ever faster

Flutterwave goes deeper into stablecoins with Turnkey-powered wallets for merchants

Sophos Launches Browser-Based Security Product Targeting Hybrid Work & AI Risks

Razer’s Project Ava: AI now goes in a cannister on your...

Recomended

The AI lab revolving door spins ever faster

Flutterwave goes deeper into stablecoins with Turnkey-powered wallets for merchants

Sophos Launches Browser-Based Security Product Targeting Hybrid Work & AI Risks

Razer’s Project Ava: AI now goes in a cannister on your desk

Tech Careers in 2026 and Beyond: Inside the Jobs, Skills, and Roles Defining Africa’s Digital Future

OpenAI invests in brain-interface biz co-founded by CEO Sam Altman