AI doesn't need perfect data: The misconception that costs companies years – End the migration myth

Konrad Wolfenstein

3 months ago

AI doesn't need perfect data: The misconception that costs companies years – End the migration myth – Image: Xpert.Digital

The fatal IT misconception: Why data warehouses alone are preventing the AI breakthrough

The end of endless preparation: How AI is finally delivering real added value

Artificial intelligence holds enormous potential, yet in business practice it often degenerates into an expensive illusion. The reason is as simple as it is fatal: companies unwittingly transform their ambitious AI initiatives into gigantic, resource-intensive data migration projects. The original goal of achieving fast and measurable business results becomes a protracted struggle for the perfect data infrastructure and seamless consolidation in central data warehouses. While billions are poured into preparation, two-thirds of companies remain stuck in the pilot phase – and the actual value creation falls by the wayside.

This article reveals why rigidly adhering to an "infrastructure-first" strategy regularly leads to failure and why a complete data migration isn't necessarily required for AI success. It outlines a much-needed paradigm shift: those who plan backward from concrete business results and rely on federated data access don't have to wait for the completion of years-long IT megaprojects. Learn how to keep data where it is, provide AI with only the specific context it needs, and achieve measurable success through targeted "quick wins" in a very short time. It's time to shift the focus away from pure data perfection and toward pragmatic AI value creation.

Related to this:

UNFRAME.AI: Data Migration Was Never the Point. AI Results Were.

Escaping the data trap: Thinking about AI from the perspective of the outcome

The biggest AI killer is data migration

AI projects usually fail not because of the technology itself, but because they degenerate into mere IT infrastructure projects. The consolidation of all data is mistakenly considered a mandatory requirement.

Thinking from the result (reverse engineering)

Instead of asking how to prepare all the data for AI, the essential question is: What specific data context does AI need in the here and now to deliver a concrete business result?

Context instead of copy (Federated Access)

AI doesn't need the entire data warehouse. Technologies like federated data access, data virtualization, and RAG (Retrieval-Augmented Generation) make it possible to keep data in its source systems and only assemble the context at the moment of querying. This saves immense time and costs.

Parallel operation instead of standstill

Long-term data migration (ETL processes for reporting, history, etc.) can and may continue. However, the AI initiative does not have to wait for this, but can access the existing, distributed data in parallel.

Agility beats perfectionism

Attempting to build a comprehensive data schema is inefficient. Domain-oriented, use-case-specific context models (similar to the data mesh approach) are significantly more promising.

The power of “quick wins”

To regain the often eroded trust of stakeholders, AI projects must quickly demonstrate a return on investment (ROI). An ideal initial use case (high frequency, measurable basis, existing data) delivers tangible results within a few weeks, thus justifying further investment.

Why companies sink billions into infrastructure instead of finally delivering added value

Digital transformation in recent years has produced a paradoxical pattern that cuts across all industries. Companies are investing significant sums in artificial intelligence, yet in most cases, the actual value creation falls short of expectations. The reason rarely lies in the technology itself. It lies in the way organizations approach the path to AI. Instead of focusing on measurable business results, AI initiatives gradually transform into massive data infrastructure projects that develop a life of their own and lose sight of their original purpose. What began as a strategic initiative to leverage AI often ends as years of data migration without any visible return on investment.

According to Gartner's December 2025 forecast, global spending on artificial intelligence will reach approximately $1.8 trillion in 2025 and is expected to grow to $4.7 trillion by 2029. At the same time, the McKinsey Global Survey 2025 on the state of AI shows that 88 percent of the surveyed companies are already using AI in at least one business function, but nearly two-thirds are still in the experimental or pilot phase. Only about six percent of companies qualify as so-called AI high performers, where more than five percent of EBIT is attributable to AI. These figures illustrate a fundamental discrepancy between the money flowing into AI and the value ultimately generated. Analyzing this discrepancy reveals a structural problem that extends far beyond technical issues.

How the infrastructure project swallowed up the AI initiative

The chain of logic that leads companies into this situation seems plausible at first glance. AI needs data. The data is fragmented across numerous systems. So it needs to be consolidated. Consolidation requires migration. Migration requires transformation. Transformation requires governance. Governance requires data quality programs. Each individual decision in this chain is reasonable on its own. But taken together, they transform an AI initiative into a data infrastructure program that takes years before a single AI result becomes visible.

This phenomenon is strikingly evident in the data. According to Caylent's 2025 Data Migration Report, only six percent of surveyed companies reported completing their most complex migration projects on schedule. Nearly half of respondents experienced more than five hours of downtime during critical migrations, resulting in customer experience issues, revenue losses, and operational delays. An analysis of over 500 company reviews reveals that approximately 73 percent of data migration projects fail due to inadequate planning, governance gaps, and a lack of platform-specific expertise. Time overruns averaging 150 percent are not the exception, but the rule.

These migration projects develop a dynamic of their own. They attract dedicated teams, generate their own key performance indicators (KPIs), and gain their own sponsors at the board level, who stake their reputation on the project's completion. The original AI use cases are postponed to the next phase, then to the post-migration period, and finally, they quietly disappear from planning discussions. No one plans for this outcome. It arises from a thousand small decisions, each justifiable on its own, but which, taken together, result in a strategic misallocation of resources and attention.

A typical scenario illustrates the problem. The quarterly business review begins as it has for the past two years. The data transformation team presents its progress. The migration is 73 percent complete. Data quality metrics have improved across six domains. The data warehouse architecture has passed its latest audit. The executive sponsor nods approvingly at the milestone charts. Then someone asks the question everyone has been avoiding: When will the AI go live? Silence ensues. Someone mentions phase two. Someone else points to dependencies. The original timeline, which promised AI-powered insights within eighteen months, has become a footnote in a data infrastructure project that has taken on a life of its own.

The billion-dollar boondoggle of unfinished preparations

The economic dimension of this problem is significant. Gartner predicts that by the end of 2026, organizations without AI-ready data will experience over 60 percent of their AI projects failing and abandoning. The Harvard Business Review puts the overall failure rate for AI projects at 80 percent, nearly double the failure rate for IT projects that don't involve AI. According to a 2025 survey by S&P Global Market Intelligence, 42 percent of companies had abandoned the majority of their AI initiatives, a dramatic increase from just 17 percent the previous year. The average organization discarded 46 percent of its AI proofs of concept before they even reached production.

Gartner also predicts that at least 30 percent of generative AI projects will be abandoned after the proof of concept phase due to poor data quality, inadequate risk controls, escalating costs, or unclear business value. The Informatica CDO Insights Survey 2025 clearly identifies the biggest obstacles to AI success: data quality and maturity (43 percent), lack of technical maturity (also 43 percent), and a shortage of skilled personnel (35 percent).

These figures highlight a fundamental misunderstanding prevalent in many organizations. The problem isn't that AI use cases are failing. The problem is that migration has become the task itself, rather than the means to an end. Consolidating all data into a central data warehouse has become an end in itself, while the original business value fades into the background. Meanwhile, investment in AI-ready data is exploding. Gartner forecasts that the market for AI data will grow from $134 million in 2024 to $14.6 billion by 2029, representing a compound annual growth rate of 155 percent. The money is flowing, but it's going in the wrong direction if data provisioning is approached as a monolithic, preparatory project rather than an iterative process.

Think in terms of the result, rather than planning from the perspective of the infrastructure

The alternative approach begins with a fundamentally different question. Instead of asking how to prepare data for AI, one should ask what context AI needs to deliver a specific business outcome. This reversal of perspective changes the entire project architecture.

Most AI use cases require context from three to five systems, not a fully migrated data portfolio. Context requirements are specific. An AI for contract analysis needs contracts, amendments, parties, and obligations. It doesn't need the entire data warehouse. An AI for customer service needs interaction histories, product data, and case management records. It doesn't need every table in every source system.

The minimum required data path is almost always narrower than the scope of the migration project. Migration is optimized for every conceivable future query. AI needs the right context for specific use cases in the here and now. These two requirements are fundamentally different, and treating them as equivalent is precisely the mechanism by which infrastructure projects devour AI initiatives.

Working backward from the AI result, one often finds that the necessary data is already accessible. It doesn't need to be moved. It needs to be connected, organized for the use case, and made available at runtime. Effective AI data management begins with this realization: first define the result, then find the simplest path to the context that enables that result.

🤖🚀 Managed AI Platform: Faster, safer & smarter to AI solutions with UNFRAME.AI

Managed AI Platform - Image: Xpert.Digital

Here you will learn how your company can implement customized AI solutions quickly, securely and without high entry barriers.

A managed AI platform is your all-inclusive, worry-free solution for artificial intelligence. Instead of dealing with complex technology, expensive infrastructure, and lengthy development processes, you receive a ready-made solution tailored to your needs from a specialized partner – often within just a few days.

The key advantages at a glance:

⚡ Rapid implementation: From idea to ready-to-use application in days, not months. We deliver practical solutions that create immediate added value.

🔒 Maximum data security: Your sensitive data stays with you. We guarantee secure and compliant processing without sharing data with third parties.

💸 No financial risk: You only pay for results. High upfront investments in hardware, software, or personnel are completely eliminated.

🎯 Focus on your core business: Concentrate on what you do best. We take care of the entire technical implementation, operation, and maintenance of your AI solution.

📈 Future-proof & scalable: Your AI grows with you. We ensure continuous optimization and scalability, and flexibly adapt the models to new requirements.

More information here:

Managed AI Platform

From data perfectionism to AI pragmatism: The cognitive bias that's blocking your ROI

Federated data access as an architectural alternative model

AI without data migration is not a shortcut. It's a different architecture that reflects how AI actually works in production environments. Three fundamental principles characterize this approach.

First, federated access connects AI to the source systems where the data resides without requiring prior centralization. CRM data remains in the CRM. Documents remain in the document repository. Operational data remains in the ERP. The AI layer can access all of this without waiting for synchronization. Federated data access keeps data in its original location, leverages virtualization techniques to provide a unified view, and enables real-time insights on demand. Unlike data warehousing, where data is physically moved to a central location, federated access eliminates the risks and costs associated with data duplication and improves operational efficiency.

Second, use-case-specific context models define what each AI application specifically needs. Instead of building a universal schema that attempts to cover everything, the system defines the specific entities, relationships, and signals relevant to each individual use case. This principle aligns with the concept of data mesh architecture, where domain-oriented teams independently manage their respective data and maintain tailored governance standards that reflect specific business requirements.

Third, runtime assembly assembles the context at the moment of decision, rather than in advance through batch pipelines. When the AI needs to answer a question, it compiles the relevant context from all sources, wherever that context may be. No synchronization delay. No outdated snapshots. Up-to-date data, assembled on demand. This principle has undergone technological maturation with the proliferation of Retrieval Augmented Generation (RAG). RAG architectures enable AI systems to retrieve relevant external information at the moment of querying and embed it into the context, instead of relying solely on pre-trained knowledge. By mid-2026, over 66 percent of enterprise generative AI implementations will utilize RAG architectures.

The practical implementation of this architecture is evident in real-world enterprise environments. SAP's Federated Machine Learning Library, for example, leverages SAP Datasphere's data federation architecture to intelligently expose SAP and non-SAP data for machine learning without requiring replication or data movement. Companies like Downer, one of Australia's largest integrated service providers, have implemented a federated data and AI platform that combines decentralized agility with centralized governance, enabling business units to innovate independently while seamlessly and securely sharing enterprise data.

Data virtualization and batch processing compared

The choice between federated access through data virtualization and traditional ETL-based consolidation is not a binary one, but rather a matter of aligning it with the requirements of the respective workload. Data virtualization delivers faster response times when querying smaller, distributed datasets. However, with increasing data volumes and complex transformation requirements, ETL can be more efficient due to its ability to process large datasets using predefined transformation rules.

The fundamental trade-off is that data virtualization exchanges physical consolidation for logical integration. You gain fresher data, as queries access the source systems directly, and you avoid the cost and complexity of copying all data into a single warehouse. At the same time, you become dependent on the availability and performance of each underlying system. For heavy analytical queries in the petabyte range, warehouses with pre-computed aggregates and columnar storage outperform federated queries across networks by a factor of ten or more.

The smart solution is to use both approaches in a complementary way. ETL handles the processing of structured, historical data for reporting and ensures consistency. Data virtualization enables agile access to live or distributed data for time-critical queries. When integrating a new data source, modifying ETL workflows can take days or weeks. Data virtualization allows for the immediate integration of temporary or experimental data sources. This hybrid approach optimizes performance, cost, and flexibility equally.

The shortest path to measurable AI results

The economic logic behind the results-oriented approach is compelling. The average AI project duration follows a familiar pattern: three months of planning, six months of development, six months of testing, three months of deployment, totaling eighteen months until ROI. According to Gartner, on average only 48 percent of AI projects make it to production, and the path from AI prototype to production takes eight months. Only 35 percent of AI projects even reach production readiness.

But there is another way. According to an IDC study, 92 percent of successful AI implementations deliver a positive return on investment within twelve months. 40 percent of companies report a positive return within six months. The key lies in choosing the right initial use case and avoiding overly ambitious infrastructure preparations.

The framework for rapid AI return on investment is based on four principles. The ideal first use case is characterized by high frequency; the task in question is performed daily or weekly. It has a clear baseline, and current performance can be measured. Data already exists, and the use case has limited dependencies on other systems. If these criteria are met, measurable results can be achieved within a few weeks.

The impact of such quick wins extends far beyond the immediate financial return. A telecommunications provider implemented an AI chatbot for the five most frequent customer inquiries regarding billing. Within 60 days, the solution resolved 35 percent of inquiries without human intervention, reduced the average resolution time from 24 hours to 10 minutes, and improved customer satisfaction scores by 22 percent. A mid-sized manufacturer implemented AI-powered predictive maintenance on a critical production line. The 45-day pilot project delivered a 62 percent reduction in unplanned downtime, $157,000 in avoided production losses, and a 28 percent reduction in maintenance costs. Klarna's AI assistant resolved two-thirds of all customer chat inquiries in the first month and reduced the average resolution time from eleven minutes to under two minutes.

Why stakeholder trust is the hardest currency

These quick wins serve a function that goes beyond mere cost savings. They restore stakeholder trust, which has eroded over years of infrastructure projects without visible results. Rapid successes provide quick, tangible proof that AI creates business value. This builds the confidence of decision-makers, reduces resistance to adoption, and paves the way for larger AI investments.

Successful quick wins create positive feedback loops that accelerate AI adoption. Initial success generates enthusiasm and resources for wider implementation. Widening the implementation creates additional value and organizational learning. This learning enables more sophisticated applications and greater benefits. The greater benefits justify increased investment in AI capabilities.

McKinsey's data underscores this mechanism. AI high performers—the six percent of companies with a measurable EBIT contribution from AI—are three times more likely than others to report that their organization intends to use AI for transformative change. These companies are almost three times more likely than others to fundamentally redesign workflows, and this intentional redesign of workflows demonstrates one of the strongest contributions to achieving measurable business impact. High performers regularly deploy AI across more business functions than their peer group and are three times more likely to expand the use of AI agents.

Parallel operation instead of sequential dependency

The migration project doesn't need to be stopped. It may serve purposes beyond AI. Regulatory reporting, historical analyses, or executive dashboards on the internal roadmap may indeed require consolidated data. The investment in building this foundation is not wasted for these purposes.

But AI doesn't have to wait for the migration to be completed. The two can run in parallel. Migration continues on its own schedule for its intended purposes. AI delivers results now, against the data that exists today.

The pragmatic approach begins with identifying two to three AI use cases that would deliver measurable business value. This is followed by mapping the specific data context required for each use case. Then, it's examined whether this context is directly accessible without requiring migration. Finally, the AI is piloted on the narrowest feasible data path.

This approach aligns with the findings of Gartner analyst Haritha Khandabattu, who describes a gradual shift from generative AI as the central focus to the fundamental enablers that support sustainable AI deployment, including AI-ready data and AI agents. Investments are moving from an infrastructure-first strategy to a data-and-capabilities-first architecture. Organizations that treat data readiness as an afterthought are the ones most likely to remain among the 94 percent that never progress beyond the pilot phase.

The reorganization of investment logic

Gartner's spending data reveals a tectonic shift in investment logic. While AI infrastructure remains the largest spending category by far, at $965 billion in 2025, its growth rate is a comparatively moderate 29 percent per year. The acceleration is happening elsewhere: AI data is growing at 155 percent annually, AI cybersecurity at 74 percent, and AI models at 68 percent. The money follows the bottlenecks, not the headlines.

Within the AI data market, the growth drivers are even clearer. Synthetic data generation is growing at an annual rate of 178 percent, from $41 million to $6.8 billion by 2029. AI-ready datasets—that is, pre-curated data structured for AI workflows—are growing at 136 percent annually. Companies are willing to pay for shortcuts to production. This is a clear signal that the market values rapid data readiness over slow, comprehensive migration.

The winning organizations, those that truly reap the value from this transformation, invest in the capabilities that make AI systems work at enterprise scale: data readiness, governance, integration, and security. They reverse the typical spending ratios, dedicating 50 to 70 percent of their time and budget to data readiness—that is, extraction, normalization, governance metadata, quality dashboards, and retention controls. However, this data readiness is not understood as a monolithic migration project, but rather as an iterative, use-case-driven process.

From data perfectionism to AI pragmatism

The central finding of this analysis can be summarized in one principle: The goal was never a perfect infrastructure. The goal was to achieve results from AI, and fortunately, this does not require complete data consolidation. The teams that recognize this stop treating migration as a prerequisite and begin to view AI results as the metric that truly matters.

The data speaks for itself. 88 percent of companies are using AI, but only a third have begun to scale it. 73 percent of migration projects fail due to implementation issues, not the technology itself. 42 percent of companies will have abandoned the majority of their AI initiatives by 2025. At the same time, the top six percent demonstrate that the path to success lies in ambitious goals, redesigned workflows, and rapid scaling, not in completing migration projects.

This presents a clear call to action for CIOs and CTOs. The question is no longer how to consolidate all data before AI can be implemented. The question is what specific data context is needed for the next AI use case and how this context can be provided most quickly and cost-effectively. Federated access, use-case-specific context models, and runtime assembly are the architectural tools that enable this approach. They replace the paradigm of complete preparation with the paradigm of iterative value creation.

Companies that view AI not as a secondary beneficiary of infrastructure projects, but as a driving force determining data requirements, will be the ones that progress most quickly from the pilot to the scaling phase. The migration project can continue, but the AI doesn't have to wait.

Consulting - Planning - Implementation