AI in Chemical R&D: What's Holding Back Real-World Impact
Artificial intelligence (AI) has been widely promoted as a transformative force in research and development. In chemical R&D, the expectation was clear: automate workflows, accelerate discovery and reduce development costs. In response, organizations invested heavily. They hired data scientists, licensed advanced platforms and launched pilot programs to integrate AI into core scientific processes. Despite these efforts, early results have been inconsistent.
A recent survey found that 67% of R&D decision makers at leading companies are dissatisfied with the pace of AI implementation. When asked whether AI tools accurately answer scientific questions most of the time, fewer than half of the scientists surveyed agreed. This disconnect between anticipated and actual outcomes has led to frustration among researchers, strained budgets and growing skepticism among leadership.
These challenges reflect more than just slow deployment or missing features. They point to a deeper issue: a misalignment between the complexity of chemical R&D and the assumptions built into generic AI strategies.
Performance Divergence in Early Chemical AI Deployments
Recent applications of AI in chemical R&D have produced promising results and notable limitations. In one study, ChemCrow successfully planned and synthesized an insect repellent and three organocatalysts. This marked one of the first documented chemistry-related large language model agent interactions with the physical world.
Another model, trained on a custom-curated dataset, identified optimal conditions for Suzuki coupling reactions within just a few hours, significantly faster than an experienced chemist who required more than a week for similar selections.
In a separate demonstration, the A-Lab, an autonomous laboratory for the solid-state synthesis of inorganic powders, synthesized 41 compounds from a planned set of 58 targets over 17 days. These examples highlight the potential of AI-driven chemistry, though they represent controlled proofs of concept rather than comprehensive performance benchmarks.
Several investigations have revealed persistent challenges. One researcher observed that scientific machine learning in the physical sciences presents widespread problems. Models often memorize existing data but fail to extrapolate into novel chemical spaces. In some cases, AI systems default to familiar or intuitive outputs, even when those responses omit critical variables. Notably, a single AI misinterpretation was linked to nearly two dozen questionable studies, raising concerns about the reproducibility and scientific integrity of these studies.
This pattern has prompted many organizations to reassess their approach. While early results generated enthusiasm, limitations have led to more deliberate integration of domain expertise and a shift toward scientifically grounded strategies.
Key Barriers to AI Deployment
Understanding why artificial intelligence often underperforms in chemical research requires a closer look at the structural, scientific and operational challenges that shape its implementation.
These barriers are not incidental. They reflect the fundamental nature of chemical data and the realities of applying computational tools in a domain where precision, reproducibility and scientific rigor are essential.
Chemical data is structurally and semantically complex
Representing chemical information accurately requires more than capturing text or numerical values. Molecular structures, reaction conditions and analytical outputs must be encoded in formats that preserve stereochemistry, spatial orientation and reactivity.
The difference between a wedge, dash or line may appear trivial in other contexts, but in chemistry, each symbol conveys a distinct meaning. Generic AI models, which often rely on simplified representations, struggle to interpret these nuances. As a result, predictions may appear plausible but fail to reflect the underlying chemistry.
Scientific knowledge evolves faster than most models can adapt
The pace of discovery in chemical sciences is accelerating. New synthetic pathways, reaction mechanisms and compound classes are published regularly. AI models trained on static or outdated datasets quickly lose relevance.
Consider the sequences of CRISPR technology, which were first observed in bacterial genomes in 1987, but their function remained unknown for nearly two decades. In 2012, researchers demonstrated the use of CRISPR-Cas9 as a programmable gene-editing tool.
By 2023, the first CRISPR-based gene therapy received FDA approval. This represents more than incremental progress.
CRISPR fundamentally transformed what is scientifically possible, from theoretical understanding of bacterial immunity to approved treatments for genetic diseases. An AI model trained on pre-2012 biological data would have no knowledge of CRISPR as a gene-editing tool. A model trained in 2020 would lack information about FDA-approved CRISPR therapeutics.
Without continuous updates and validation against current literature and experimental results, these models risk reinforcing obsolete assumptions or missing emerging trends. This dynamic environment demands systems that can evolve in parallel with the science they aim to support.
Organizational capabilities often do not align with AI requirements
Many chemical companies are built around deep expertise in synthesis, formulation and process engineering. However, deploying AI at scale requires different competencies. Data engineering, model development and information architecture are not typically core strengths within traditional research and development (R&D) teams. Even when companies hire data scientists, the absence of foundational systems and collaborative frameworks can prevent meaningful integration. More importantly, organizations must foster a culture that recognizes the strategic value of AI-ready data and understands how to support its use. Without that cultural shift, technical investments alone will not deliver sustainable impact.
Scientific skepticism can limit the adoption of AI tools
Chemists and researchers are trained to question results, seek reproducibility and challenge assumptions. This mindset, although essential to scientific progress, presents a significant barrier to the adoption of AI. Chemists hold AI tools to exacting standards that tolerate little imperfection. Even models with strong benchmark performance occasionally produce outputs that conflict with established principles, and these errors can erode trust across the entire system. If a model suggests a reaction pathway that violates known thermodynamic constraints, it is unlikely to gain traction, regardless of its statistical confidence. Building trust requires more than accuracy; it requires interpretability and alignment with scientific reasoning.
Expectations often exceed what current systems can deliver
Early narratives around AI in science emphasized speed, automation and breakthrough potential. These messages, while compelling, often overlooked the foundational work required to achieve meaningful results. Clean, connected data; clearly defined scientific questions; and sustained collaboration across disciplines are prerequisites for success. When these elements are missing, even the most advanced models will struggle to produce outcomes that matter in real-world R&D settings.
Why Generic AI Strategies Fail
Chemical data differs fundamentally from the types of information used in consumer applications. In chemical R&D, precision is not optional. A streaming platform may tolerate imprecise recommendations, but a formulation model cannot afford similar margins of error. The consequences of incorrect predictions in a laboratory or manufacturing setting are far more significant.
To build AI systems that perform reliably in chemical environments, organizations must address challenges that fall into two categories: those common to all AI projects and those uniquely complex in scientific contexts.
As in any enterprise, foundational technical enablers must be in place for AI initiatives to function. These are the unglamorous requirements that apply across industries: security, unified technical infrastructure and data accessibility. They do not require scientific specificity, yet they frequently become the root cause of failure in scientific AI projects.
The information landscape at scientific companies is inherently complex. Years of disparate data projects, mergers and acquisitions and technology migrations create environments with multiple backend systems that must work in concert. Scientific information must be retrievable across laboratory notebooks, analytical instruments and process control platforms. Security policies must protect proprietary formulations and experimental results without becoming so restrictive that they prevent access when researchers need data. Documentation must be clear enough that information is findable. Systems must support data ingestion and feedback loops that enable continuous improvement.
These barriers are not scientific challenges. They are operational ones. However, their presence or absence determines whether scientifically sophisticated AI solutions can be deployed effectively.
This concludes Part 1 of our exploration of AI in chemical R&D and the critical role of data quality. In Part 2, we will discuss how scientific problem definition is essential for AI success, examining why clearly defined scientific problems must guide AI implementation, how expert guidance ensures proper data interpretation throughout the development process, and what organizations can do to bridge the gap between AI capabilities and real-world chemical R&D outcomes.
About the Author

Andrea Jacobs
senior manager, CAS Product Management
Andrea Jacobs is director of data analytics at CAS, a division of the ACS specializing in scientific knowledge management. In her current role, Andrea leads a team of data scientists, many of whom also have an educational background in a natural science discipline such as chemistry, biology and pharmacology, tasked with pioneering science-smart AI solutions to accelerate R&D workflows. In her 15-year tenure with CAS, she has held scientific, technical and business leadership roles spanning the organization’s end-to-end operations, including enterprise strategy, product development, partnerships, content licensing and data curation operations and infrastructure. Andrea earned her bachelor’s degree in chemistry and computer science from Wellesley College and an MBA from The Ohio State University.
