Document Digitisation and Knowledge Centralisation Using OCR and AI in Construction and Water & Wastewater Companies

AI and automation Consulting

More about case study Document Digitisation and Knowledge Centralisation Using OCR and AI in Construction and Water & Wastewater Companies

This case study describes a project focused on digitising technical documentation and building a central knowledge base in a construction and water & wastewater company. The objective was to transform thousands of scanned PDF documents, which for IT systems existed only as images, into a searchable and intelligent knowledge asset supporting daily operational decisions, service processes and further digital transformation. The project demonstrated that a properly designed combination of OCR, semantic search and a central information structure can significantly reduce information retrieval time and become a foundation for future innovation.

Client

The client is a company operating in the construction and water & wastewater sector, delivering infrastructure projects covering the tender, design, execution and warranty phases. Over many years, the organisation accumulated a vast repository of project, technical and operational documentation. The majority of these materials existed as scanned PDF files stored across multiple locations, without a consistent structure and without effective search capabilities.

As part of its efforts to modernise operations and increase competitiveness, the company required a solution that would recover knowledge hidden in documents and make it operationally usable. The key requirement was not merely document archiving, but transforming documentation into a living, searchable resource supporting both operational and management decision-making.

Client challenge

The primary challenge was managing a very large volume of technical and project documentation that, although digital in form, was effectively unreadable for IT systems due to the lack of a text layer. As a result, finding specific information such as serial numbers, component names, warranty dates or scopes of work required manual review of hundreds or thousands of documents.

This challenge was compounded by the complexity of the documentation, which included technical drawings, charts, tables, handwritten notes, stamps and signatures, as well as inconsistent scan quality. Knowledge was fragmented and difficult to retrieve, leading to high operational costs and an increased risk of errors. The absence of a central knowledge base also blocked further initiatives such as ticketing systems, warranty digitisation or advanced AI-driven analytics. An additional challenge was organisational change, requiring employees to adopt new tools and ways of working.

Our solution

To address these challenges, we adopted a structured, phased approach starting with a Proof of Concept. The purpose of the PoC was to validate key assumptions in real operational conditions and prepare the organisation for a full-scale implementation. The pilot covered the digitisation and tagging of approximately two thousand documents from a single selected project.

The core of the solution was a SharePoint-based document repository enriched with a consistent taxonomy, metadata and version control. A critical component was the implementation of an advanced hybrid OCR solution capable of handling complex technical documentation. The system was designed not to guess in cases of low recognition confidence, but to flag specific fields requiring human verification.

Semantic search complemented the OCR layer, allowing users to locate information based on meaning rather than keywords alone. The PoC was deployed on real data and tested in everyday operations, supported by short user guidelines and micro-training sessions to ensure adoption.

Challenge

  • Managing a large volume of technical and project documentation
  • The complexity of documentation, including technical drawings, charts, tables, handwritten annotations, stamps, and signatures
  • Inconsistent scan quality
  • Scattered and hard-to-reconstruct knowledge

Solution

  • A phased approach starting with a Proof of Concept
  • A document repository based on SharePoint
  • Semantic search, enabling users to find information based on meaning rather than keywords alone

Results

The Proof of Concept delivered clear and measurable outcomes. In more than ninety percent of cases, information retrieval time was reduced to under two minutes, significantly reducing the manual effort required to search through documents. Document versioning was standardised, eliminating the risk of working with outdated files, while the hybrid AI–human approach ensured a high level of data reliability.

The project successfully prepared the organisation for a change in working practices, increasing user acceptance and readiness for further digital transformation. The central knowledge base became the foundation for subsequent initiatives such as ticketing systems, warranty digitisation and bid automation. For full deployment, the return on investment was estimated at 180–200% over an 18–24 month period, confirming the financial viability of the initiative.

Most importantly, the project transformed previously “silent” documents into an active operational asset that directly supports business decisions and long-term organisational growth.

Explore More Projects

Want to work with us? Let's talk

[email protected]
Top