Published on 2023-01-25

Sky - Support & Service Management: Onboarding

Quickly supporting a major client

Looking down a data centre row of racks towards the lit door

Background

Sky engaged Nexteam to provide incident management support across their Sky.com website. Focusing on handling incidents raised by their monitoring platforms and other teams, as well as proactively improving their monitoring systems and wider support processes.

The initial goal was to transfer the service over and maintain the same level of service so that engineering teams could focus on new products, this transition was under substantial time pressure. Additionally we were to review their existing processes and make recommendations with the aim of transitioning to an SRE model.

Discovery

When we start anything new we always follow our Product Factory process. We believe it’s important that we take a little time to understand existing systems, processes, people and culture.

We allocated two weeks for this, which we spent with the people who were most familiar with existing processes. It was important for us not to jump into the low level of details and start looking at incidents. We made sure we stuck to our principles and started at a high level and drilled down layer by layer. This ensured we had a good foundation.

The output of this discovery phase was mostly simple visual documentation, quickly explaining complex processes, systems and teams. This way it was easy to digest the information which we regularly referred to during the initial handover.

Onboarding

We formulated a six week plan to onboard our team: two weeks of hand over, two weeks of shadowing the existing team and finally two weeks with the existing team shadowing us. We grew our team in a staggered manner as part of this process. To ensure that we could have the best people possible, balancing availability and keeping costs at a minimum for the handover period.

To ensure knowledge was captured and available for future reference all of the handover sessions were recorded video calls, this was useful later on. We also produced a skeleton runbook for the services we were going to support, which had all of the basic information required for us to support a service.

We also documented the prioritisation matrix along with mapping the relationship between monitored services in Dynatrace through ServiceNow and to engineering squads. This helped us ensure incidents could be handled efficiently and engaging with the engineering squads with the least disturbance possible.

Growing pains

When you go into a new organisation there is a very steep learning curve. There was a lot of domain knowledge, ad hoc processes and knowledge inside people’s heads, this all takes time to transfer. This meant the team was managing major incidents while learning new systems and processes.

When we started handling the incidents on our own with no shadowing from the existing team, a few processes weren’t followed as they should have been. This also identified unknown sub processes, gaps in the knowledge and a lack of documentation.

To bridge the gap and ensure all of the incidents are managed to the expected level of quality, we identified areas to strengthen our knowledge, document undocumented processes and create process maps. As we went through this process we quickly understood there were a few unwritten processes, joined with a lot of knowledge in a small group of people. Oddly enough some documentation was not up to date.

As an output of this process we mapped out roles and responsibilities, the service map, customer journeys, incident management process and communication plans. As we built up our domain knowledge and when things were missed we updated our documents and processes to reflect the current state. This ensured we were continuously improving our service but also bridging the domain knowledge gap.

Lessons Learnt

At the start of this journey we were overly optimistic about the time it’ll take us to fully transition the service. There were two key lessons for us from this onboarding experience, and would approach differently next time.

People

When people are coming into a new organisation with no domain knowledge and no continuity of people with domain knowledge, you need to give them time and space to absorb the information and organise it into their own context.

Process

Rather than a gradual transition it's better to transition over everything in a short period of time and also onboard the team in one go rather than doing it in a staggered manner over a period of time. This means there is a one steep learning curve and rest is learnt by doing things.