When you work at a hypergrowth company, any system you put in place gets outdated really fast. Sometimes it doesn't even last a full year. New business use-cases need to be supported; systems aren't scaling fast enough, on-call is a nightmare, downtime is costing money, are just a few indicators concluding that the old system is insufficient. Solution?
Build a new system, and migrate to it.
Note: My experience (and this post) is limited to migrating companies' internal platforms, services, infrastructures such as migrating from Hive to bigquery, VMs to k8s, Wavefront to M3, and similar migrations of things used by the internal users at the company. And, thankfully, not the IT system migration that government outsources continuously.
Migraine-tion 🤕
If you have been involved in any big system migration, you may have already starting cringing when reading the word migration. I've done so many that I've started calling them 'Migraine-tion'.
It doesn't matter how you migrate; customers (internal users) are never happy. Rarely because they hate the new system. Often because they hate the change.
A new system means new interfaces, new APIs, new SLAs, new dependencies, and most importantly, the new knowledge that the customers need to learn when interacting and operating the new system. Migrating to new systems is an additional work that the users' did not plan to do but now are forced to do. And if the migration news comes as a surprise, they will hate you even more.
As I reflect on every migration I've done or been part of (as a customer), the one which I consider successful from migraine POV had one thing in common. The migration plan was over-communicated. I honestly think we may have been communicated the plan more than 7 times.
In addition to the frequency of communication, the content of the communication mattered as well. The content, in this case, is the migration plan.
The Migration Plan 📝
Create one and share it. Duh!!
I know it's obvious, and trusts me when I say this, rarely have I seen a migration plan from other teams. Have you? Let me know in the comments.
Create a migration plan. Write one for the customers and one for yourself. The content remains the same, but the level of details varies. Here are few things to think about and to include in the plan:
🆕 Reason for the new system (Optional): Even though this is optional, I prefer to include this. Tell the customers why the new system is required. Highlight the advantages, but please keep this brief (1-2 line max).
❓Why the customer should migrate: Highlighting the new system advantages doesn't automatically imply why the customer should migrate. Also, the wording of "why" should be carefully crafted. I've received better adoption when I've worded what customers will miss if they don't migrate instead of what customers will get when they do.
⏰ Migration timelines: Specific dates, please. They can change. But add dates. Sometimes this quarter or last week of the month is not a date. Don't keep customers guessing.
🌏 Whole system vs. Subsystem migration: Decide if you will migrate the entire system or subsystem at a time. Depending upon the system, both are valid migration strategies. Also highlight the possible impact of each type of migration like downtime, etc.
⏳ Migration duration: How long is the entire migration is going to take. Short (~1 week) or longer (3-4 weeks). If migration takes longer than 1 week, break it down into subsystems migration.
🧤White glove vs. self migration: Some systems will need white glove migration while others can be self-immigrated by customers or be hybrid. If so, communicate that as well. If there are few critical customers, do white-glove migration. You don't want them to fail.
🙇♂️ Migration support: Migration will likely cause issues. Even if it doesn't (it will), customers need to reach out to a specific place AND a specific person. Create a migration room, a dedicated slack channel, and assign a person to be the POC during migration. The only thing more frustrating than migration is not knowing who/where to reach out when (not if) a problem arises.
👨🏫 New system learnings/education: New systems will change the customer's experience in some way. Hopefully for the better. Identify if there is a need to do a brown-bag session or office hours or knowledge transfer session that enables easier onboarding for customers. Documentation is your friend and will save you so much support time.
🧯 Contingency Plan: Murphy’s law and migrations got married a long time back. If you don’t have a contingency plan, don’t migrate. Either be prepared to rollback or roll-forward. Decide what, when and how you will do it because you will need to do it. Lack of contingency plan causes customers to have massive anxiety and surprisingly is the most overlooked thing during migration.
💀 Old system deprecation: As and when you migrate to the new system, you need to deprecate the old one. Some people will still stick to the old system. Not because it's better but because it's familiar. Specify the plan and timelines when the old system gets deprecated.
🧠 Migration retrospective: Migration is complete, and it's time to get the pat-on-the-back. While you are doing that, perform a retrospective while the information is still fresh. I 100% guarantee that there will be specific learnings from the migration that will come in handy for future migrations. (This is how I formed this list)
Depending on the system's size, complexity, usage, upstream and downstream dependencies, your migration plan may vary. Thus a migration plan must be updated, shared, and communicated every step of the way. Always err on the side of over-communication.
The golden words
"We migrated the entire system for all customers in one day and have not found any issues. Please continue business as usual".
Migration Plan Change 🙀
Now, are you wondering what happens if the migration plan changes? Won't that gives a bad impression? What will my customers think?
In business, things change all the time. If the plan changed by order of magnitude, yes, people may wonder why. But few weeks of change doesn't bother anyone. From the users' POV, it is better to have a changing plan than to have no plan. Customers are not worried about the plan changing; they are worried that there may not be a plan. This will sound cheeky but this adage is true.
"If you fail to plan, you are planning to fail" - Benjamin Franklin
When the plan change, share the updates with the customer. Remember to OVER-COMMUNICATE 📢 📢 📢 📢 📢 📢 📢
Cost of Migration 💸
If you are the tech lead/manager/exec who is leading or involved in the migration, you want to be wary of the migration cost. Migration cost gets overlooked all the time. And it shouldn’t. In fact, once you know the migration cost, you can decide what’s the best time to migrate for your organization.
The new systems are often better and sometimes cheaper to run and scale. But if we are not careful, the cost of migration can get huge. In my view, the actual cost of the new system is
Cost(New System) = Cost-to-run (New System) + Cost-to-migrate-from (old system)
This means that longer migration means higher costs. To understand the migration cost, let’s look at the various costs associated with it.
💸 Dual infrastructural cost: Most migrations at scale will have some load transfer version from the old-to-new system. During the entire period, both systems will be running at roughly full scale. Longer migrations on big systems can rack up this cost in millions. And no one wants to double spend millions of dollars.
👨💻 Migration support cost: Support is hard, and during migration, it's a nightmare. If you disagree, volunteer to be the POC for the next migration. Support people are soldiers taking fire. Longer migration will lead to burnout and fatigue.
👨🏫 New system training cost: Big systems change requires lots of training. The cost is usually spread over several weeks/months during and following migration. It's tough to track. Good documentation can help manage this cost.
💥 Loss of productivity cost: Engineers who are migrating are not building new things. Longer migration means less output.
😞 Unhappy customers cost: Even though we can't control how sad they will be, we can control for how long they will be sad. This can lead to degraded trust and a lack of adoption of the new system.
😰 Customer migration fatigue and productivity cost: How many migrations can customers handle every quarter/half/year? In my experience, it's 2 per half. The bigger the migration, the lower the number. Also when customers are migrating, they are not building. There is a loss of customer productivity there as well.
In short, long migrations are bad for everyone. They should be done like ripping off a bandaid.
Migration Trip (Analogy)
I think of migration as a trip you take from A-to-B. You are sad to leave place A and somewhat excited to get to B. But if the flight takes more time than expected, especially 2-5x, then you can guarantee the customer will be sad for a longer time. Get them out of the misery. Migrate fast.
Shortening Long Migrations
Here are some tips on shortening the migration process:
Creating the migration plan to identify various gaps that could result in long migration. Fix them and over-communicate as well.
Build the necessary tooling if needed to expedite and automate as much of migration as possible. Over-communicate this as well.
Don't peanut butter spread resources on migration. Migration is a project in itself. Treat it like one and staff it properly. It will prevent burnouts. Over-communicate about the staffing.
And finally, did I mention to OVER-COMMUNICATE !!!!
Migration battle scars
Migrations always leave battle scars. They are no way to avoid them. So enjoy them. Have fun when migrating big systems. Migrations are some of the best and worst times I've had at work. But after every migration, I've built a stronger bond with my colleagues as we all survived the storm together as a team.
Every migration, successful or not, yields a great story for you to share over a happy hour!!! 🍻
If you know a colleague who is going through a system migration, please share this post with them. It may just save them some migraine.