Robotic Operations Center for RPA monitoring & maintenance
(Originally published as a whitepaper on Jun 02, 2019)
Introduction

It is important for large enterprises to start thinking about ROC from day one, so that when they hit 50+ bots, they do not crash and burn.
Since every enterprise starts with a few bots and eventually grow, this whitepaper can be a handbook for everyone who is implementing RPA in their organization.
Understanding Automation Factory & Robotic Operations Center
A typical RPA implementation initiative for a large enterprise can be divided into two major areas:
Automation Factory

The Automation Factory’s work is to churn out bots. This is mostly the development teams comprising of Business Analysts, RPA developers, ML experts, and Quality Analysts, who work with the business in automating the tasks/process. In large enterprises with multiple divisions that are geographically spaced, typically there may be more than one Automation Factory. In fact, each of these automation factories can have their own RPA tool and implementation vendors. This adds complexity to the ecosystem and the governance team struggles to enforce compliance on these multiple Automation Factories. Hence, strong RPA governance has to be set up early on. Automation Factory
Robotic Operations Center (ROC)

Once the bot development is complete and Live in Production, the bots move to the maintenance phase. It is important to maintain control of system automation activities, to eliminate all the possible conflicts and potential downtime. ROC takes up the responsibility of the bots and ensures that the automation is performed correctly and within the task scope. The ROC can be organized into different teams, depending on the complexity of the organizational structure.
Here is a typical RPA operating model

As part of the bot’s hyper care in the ROC Command Center, the following activities are performed

Before the ROC Command Center takes over the bot maintenance, the teams must go through the ROC handover process and the bot in-take checklist.
ROC handover process and bot in-take process
Each step in the bot handover process involves two or more teams. The diagram below illustrates the multiple stages, along with the teams involved and list of actions taken in every step:

In a typical enterprise, multiple parties will be involved in the bot in-take process:
The development team, who has developed the bot The ROC team, who performs the bot maintenance
The Infrastructure support team, who takes care of the servers and upgrades for the bot machines
The business stakeholders
The stakeholders of the applications that are integrated with the bots
All these parties have to be involved in the handover process for a successful transition. For a smooth handover, the ROC team will have to be involved at least from the UAT phase.
Bot in-take checklist
Large enterprises usually have multiple Automation Factories and multiple implementation vendors. For the sake of effective governance and uniformity, there needs to be an entry criteria checklist that the ROC team must enforce. This is to ensure that the bot is in accordance with the standards and is easily maintainable. The Bot Support In-take checklist is as follows:

The above checklist is more than just an entry criteria for the bot maintenance. A lot of the above points span across various phases of the bot development itself and have to be considered, even before bot development starts. It is a checklist for the Automation Factories; they have to be part of the implementation vendor checklist for bot handover. Each of the items in the checklist is to be carefully considered and documented according to your company’s needs.
Documentation
Documentation is a vital aspect of the bot development process (in fact any development). The documentation must cover the business aspects like, the As-Is process, the To-Be process, the changes that the process will undergo after automation, the handover process, detailed technical design, architecture, and signoffs from both the business owners and application owners. In this context, application owners are the ones who own any application that the bot interacts with as part of the automation. This signoff is important, as it ensures that there are no performance issues with integrating applications at a later point of time. The documentation should also extend to cover any reusable components that are built as part of the process and the user manual for the same, so that the other processes/teams can use them to reduce development cycles.
Bot Standard
This checklist must be part of the Automation Factory’s development standards. It is vital that the standards are adhered to, by all the different automation teams, so that it is easy for the ROC team to maintain bots from different implementation vendors.
For example, Bot Scalability Check indicates if the bot is developed to readily scale. This is a very important aspect, because if a bot is not coded for scalability and when the time comes to add one more bot to the same process, it will require a lot of rework; this defeats the whole purpose of implementing RPA for the process.
Standard Operating Procedure
In a typical RPA implementation, multiple teams are involved in maintenance. The bot development team will take care of any enhancements or L3 support. The infrastructure team, (mostly in-house), will take care of the bot infrastructure, patches, upgrades, etc. The ROC (support team), and the NOC team (Network Operations Center) monitors the network and the servers that the bots are in. Each of these teams have their own SLAs (Service Level Agreements).
When an incident occurs, it will involve one or more of these teams. So, it is important to set up well defined ROC SLAs, an Operation Level Agreement (OLA), along with an Escalation and Communication Matrix between these teams. Having these SOPs in place facilitates a smooth transition of the incident between the teams and leads to a faster resolution.
Business
Business is the glue that keeps them all together and holds the overall responsibility for the outcome of the RPA project. In this case, the outcome is the realization of the Cost-Benefit Analysis that would have been performed at the beginning of the initiative. From approved funding to defining reports for analyzing ROI, defining SLAs and monitoring SLA breaches, to devising a mitigation plan in case of bot maintenance/failures, all fall within the purview of business stakeholders of the process.
After the handover process, the ROC Command Center takes ownership of the bot maintenance and interacts with all the involved teams to ensure that the bots are adding value to the business.
Incident management process
When the ROC team takes over the bot maintenance, every request to the ROC team needs to be routed via a ticketing system.
A typical flow of incident management is as follows:

It is important to monitor and record all conversations that occur in an incident, so that process improvements can be done to the bot, as well as the maintenance teams.
Problem management process
When an incident is determined to be a problem that needs to be analyzed further, it follows the problem management process to find a resolution. This can be as simple as interfacing with an application owner to get the password reset or it can be a change in the process, which will need development and has to go through the entire Software Development Lifecycle. In any case, constant communication to the stakeholders involved is key to the success of the ROC team’s effectiveness.

ROC governance
To have an effective ROC team, good ROC governance needs to be set up. From daily interactions of the ROC team to the quarterly reporting, it is important for the business to keep a finger on the pulse of the bots and make strategic decisions, when it comes to process improvements.

Here is a snapshot of reports that are sent to various stakeholders for decision making:
Daily — Log analysis and reporting of critical or recurring errors in the logs to the development teams.
Weekly — Utilization reports help the business decide whether to scale the bots or to schedule more processes on underutilized bots. It can also lead to making shift changes in case of human interaction in the process. Accuracy reports and license reports (if the process involves data extraction), can be used as a feedback loop to update extraction templates and update relevant licenses.
Monthly- Summary of the number of bots, critical incidents, lessons learned, monthly SLA reports, bot utilization, changes made to the process, and bot license reports (in case renewal is around the corner).
Quarterly- Executive summary of the overall trends. Operation level risks that need a change in the SLAs or OLAs, reports on upcoming bots for the next quarter, and ROC capacity planning.
Conclusion
Most of the companies fail at the bot maintenance stage, as bot monitoring is more of an afterthought than a meticulously thought out strategy. Devising a strategy for bot maintenance and governance ahead in the bot development lifecycle will allow to successfully execute effective bot maintenance, irrespective of the scale of the initiative; even with 500+ bots.