Piotr Krosniak
5 min readMay 8, 2019

--

AI Supported ECM Content Migration

For the longest time, organizations and companies have relied on storing data through conventional means. It is a slow and painful manual process that consists of compiling huge amounts of information into a manageable system by scanning various documents, saving digital files, renaming them according to company needs or transferring said data to different physical or digital locations. The process in itself is a headache.

Mostly, this information is housed within what is called a “legacy system”, meaning it could something as simple as a filing system or an obsolete, older document management system or a combination of both. This can be rather time-consuming when that information is needed and involves a tedious amount of human error.

Shifting large amounts of information onto a platform where that data can be found at a later date much more accurately and bringing it into their data repository, with the aim that the repository is going to be where all the core information is present. That platform is called ECM, or Enterprise Content Management.

Data migration to an ECM is done through machine learning but still involves some level of human element directing said software on when and what to do and migrate. To have artificial intelligence achieve that task completely is very convenient and saves the copious amount of man-hours required for arranging information to be migrated.

Automating the migration and classification process means that operations that usually took weeks or months to achieve can be done within hours.

Enterprise Content Management

Enterprise Content Management is a platform that allows organizations to be able to maintain and streamline their company data and to have better control over content according to company guidelines and regulations.

Through a set of defined processes and strategies, it allows organizations to obtain, store and deliver critical information to its employees and customers. It’s a platform that brings together all the information in a useful way that can increase productivity, eliminate time consumption and is more efficient.

It is vital for any organization with large volumes of content to have a defined ECM plan to eliminate inefficiencies and reduce costs as well as to adhere to compliance mandates.

ECM itself can be broken down to five major components which are:

Capture: Which involves gathering information by converting physical or digital documents to electronic formats and organizing it.

Manage: To connect, modify and employ information through document management, software and record management.

Store: Which means to backing up any frequently changing information in flexible sub folders accessible for users to view and edit.

Preserve: To back up any infrequently changing information for a long term such as company regulations and mandates.

Deliver: To provide clients and users with requested information.

ECM has various benefits because of it being a centralized platform or infrastructure that can held or disseminated in such a manner that meets guidelines and requirements. It is used to workflow control management and security mechanisms and maintain data efficiently, in short.

ECM is evolving at a generous rate to meet the requirements of organizations and to be able to envelope what they need.

Artificial Intelligence/Machine Learning

While the terms AI and ML are used interchangeably, they are not quite the same.

Artificial Intelligence is loosely interpreted to mean incorporating human intelligence in machines.

For example, if a machine completes a task based on a set of algorithms, this behavior can be termed as artificial intelligence. Such as moving objects or recognizing movement itself.

It is usually classified into two groups-general and narrow. General AI can solve tasks like the one mentioned above whereas narrow AI can perform specific tasks better than humans, though limited. Such as classifying images on Pinterest.

Machine Learning, as the name suggests, can be interpreted to mean empowering machines with the ability to learn.

Machine language enables machines to learn by themselves using the given information. It’s a subcategory of AI. Being given the right tools for the machine to be able to make decisions on its own.

Successful ECM Migration

There are key aspects to keep in mind when undergoing a data migration to ECM or from one ECM to another.

Start planning ahead. Organizations often underestimate the amount of planning required to shift complete data systems. It is more than just moving information from one place to another. There is also data mapping, security and other related objects to care for from the source material.

Keep a detailed track of everything that is to be transferred. Creating inventory often helps uncover otherwise unnoticeable data.

Take time to perform test migrations to understand the duration involved. Sample batches will assist greatly in mapping the whole process.

Break the migration down to manageable batches. This helps you in staying organized and keep track of important data being migrated. It also helps in creating necessary folders for similar kinds of data in the system.

Have a migration verification plan. It will help in determining how much data was migrated and how much failed to. Accordingly, it will also provide time to figure out the failures.

Large migrations rarely run perfectly without failing at some points. When you are migrating from a legacy system that has outlived its purpose, there are bound to be some chunks in the armor. Reconciling and resolving should always be planned for to avoid any delays.

A few things to look out for and you will have yourselves with a successful migration with minimal drawbacks.

Solution

During my time working in the ECM project, I develop custom document classification algorithms that support migration of the legacy content to ECM platform. Based on my test accuracy was around 78% when we use document title for the assigning proper classes, worth mentioning that a number of classes also have influence here and in my case we have over 200 different document classes. Full code available on GitHub below

When working with data I tested two classifiers and above you can see results.

Conclusion

Data migration is an important part of every intranet redevelopment. While it is a challenging process and can become quite tedious, if done right with the proper planning and care, data migration can be done with relative harmlessness. Many organizations have already switched to an ECM system. The tools and technology required to make migration a manageable risk are already out there. There are several teams that have experience in migrating terabytes of information onto new systems. Regardless of your specifications, there is a tool and process that makes migration fully automated.

--

--

Piotr Krosniak

DataScience and #GIS specialist. Love #triathlon #dataviz and #opensource tech. Dad of two work in UNICEF