Technical debt and life cycle of the OS

Half of 2019, for me, was spent trying to move applications/servers to supported versions. Each company launches a software and they support it on different operating systems, but as times goes by, the different operating systems reach their end of life and then the software has to be moved to new operating systems. While you can still run your software on an unsupported operating system, the problem will arrive once you no longer receive patches for different vulnerabilities that will be exploited. As the quote goes “There are two types of companies: those who have been hacked, and those who don’t yet know they have been hacked.”

So in line with trying to not be hacked, I`ll share with you some of the pains, aha moments, how to's and troubles that we experienced inside the Visma ITC - ASP squad when moving from one operating system to another. The goal of this document is to share knowledge and to help others that might be facing this issue. There are always better and sometimes faster ways of doing things. 

In the beginning of 2019 we informed our teams that run different software in different environments, that Windows Server 2008 and 2008 R2 will no longer be supported by Microsoft by 14th January 2020 and that we should move to another OS which is supported. Along with Windows Server 2008 & 2008R2 there is also the case with Windows 7. Most companies have a few of them left which they never managed to upgrade/update or the eternal story “it will no longer be in use after x month of this year”. Usually, this is not the case, the machine is needed a few more and a few more and there goes the year.

It is a hard job to identify who owns the machines since some are only logged on with local accounts. A lot of time spent in identifying the right team to talk to about the machines. Some teams use different OS to test their applications and don't want to upgrade as they still support that version of Internet Explorer or OS and want to keep it until after the End of Support as their customers will still use it. Some machines you just inherited from past time and ages and there is little to no documentation about them. All you are told is that it is a critical machine and it has to run.

A project emerged and started with a large time span to finish the task (around 9 months) - upgrade your soon to be obsolete Windows Server 2008/R2 and Windows 7 machines. 

The task ahead looked decent enough, discussions took place with different teams which have software installed on the servers, see what can be done to get the servers decommissioned, consolidate the servers (some very badly scaled while other were super scaled and underused).

Problems appeared quite fast as one of the following answers appeared:

1.We can’t do this right now, we are  busy with a release, let's talk in x months;
2.That server is critical, the person that installed the software has left the company, we don't know how to reinstall it and it will be decommissioned in x months;
3. The version that we need isn’t compatible with Windows Server 2012 or higher;
4. Sure, let’s do this!

Except for the obvious answer, all of them took a lot of trial and error and I`d like to share some of the experiences learned during the journey from 800 Windows Server 2008 servers to 20 in about 13 months.

Let's take the 1st answer: we can’t do this right now, let's talk in x months. X months

This always turned to be x+n and the team was either busy/in vacation/nobody had time to handle the request. This has pushed us in the closing months of support with a few servers that prove to be critical for the customers. In order to get them to spend some time on the task at hand, fixing a potential security issue in a few months, having an obsolete operating system and fixing the lack of documentation we have had to introduce the “premium” for having 2008 servers past the due date presented by Microsoft. They say nothing makes you assign people to work on a task then the idea of paying penalties. 

The 2nd answer: That server is critical, the person that installed the software has left the company, we don't know how to reinstall it and it will be decommissioned in x months

We had to come out with odd working hours when we were allowed to take the server down and try to upgrade until we figured we can always clone them and try on a clone, more about that later. 

The 3rd answer:The version that we need isn’t compatible with Windows Server 2012 or higher

Well, that turned out to be the most interesting one as we turned from system administrators to secretary and project managers, mediators among different teams and offered ways to upgrade/move their solutions to other tools. This usually had to do with Team Foundation Server or EPI solutions. 

Deep dive in the actual process. If you are not a technical person, you can stop reading here.

What we found that worked best for us:

  • initiate contact with the team;
  • do an inventory of what is installed on the server;
  • verify (and save) the IP addresses (check for persistent routes also);
  • check if they have the IP or hostname hardcoded anywhere and it needs to be the same post upgrade/rebuild;
  • agree when a test can be performed (in-place upgrade) with a downtime of 1-7 hours (yes, we have had cases where the server took 7 hours to get the task done, happened on web servers with a lot of installed programs);
  • get a test person from the team to verify once the server is on 2012 R2 (preferably before you have to spend ~3h on patching it with over 150+ patches) - sadly this didn’t really work, as without patches, .Net and other apps didn't work properly;
  • ask if we can go from 2012 R2 to 2019 since the time it takes to upgrade and patch is somewhere in the vicinity of 2h and then you have ~10 years of support left;
  •  if you upgrade a clone and it works well, the prod one will take longer/fail in some cases, this was interesting when asking for a small maintenance window and realizing you are near the end of it and the server is not yet on 2012 R2;
  • if you skip point 3, you can figure that the server on 2012 R2 or 2019 has somehow lost it’s set IP and you are in the dark and you need to dig some more to find them;
  • if you work with Vmware, sometimes vmware tools (even if installed) will give you the same experience as without them, so uninstall and reinstall is in order and then,to make it better, RDP didn't work to the server so learning to tab a lot in vmware is priceless;
  • changing the network card to match the latest offered (vmxnet 3);
  • updating the compatibility to the latest available (VM version);
  • changing the Guest OS Version in Vmware;
  • verify that SCCM (for those that use) works OK or re-install it;
  • make sure access is granted via groups and not via users added to the local admin/remote desktop users.

Other fun things we have had happening in the course of the upgrades:

  • we take a snapshot before we start and after the 2012 R2 upgrade there is not enough space to upgrade to 2019, disk cleanup is in order and that takes it sweet time, make sure you have enough disk space or consider lowering/moving the paging file (for some extra disk space);
  • we take a snapshot before we start and after the 2012 R2 upgrade there is not enough space to upgrade to 2019, disk cleanup is in order and that takes it sweet time, make sure you have enough disk space or consider lowering/moving the paging file (for some extra disk space);
  • not cleaning up the users on the server and having an “Account Unknown” present that will fail the upgrade and after a lot of digging in the “panther” folder, in case you didn’t already roll back to the snapshot and start again. More on the folders created here;
  • the users should not be deleted from the C:\Users folder but via Advanced system Settings > User Profiles

  • for web servers or servers that also have IIS role installed, there might be some configs which needed to be changed (the wording has changed in the IIS versions), in the event viewer you will find: 

 - the Windows Process Activation Service encountered an error trying to read configuration data from file '\\?\C:\WINDOWS\system32\inetsrv\config\applicationHost.config', line number 'xxxx'. The error message is: 'The configuration section 'system.web' cannot be read because it is missing a section declaration'.

-the data field contains the error number.

Error in IIS:

The fix is to ask a dev to check the line that is indicated in the above error msg (in event viewer) and fix the wording (you might also comment it out - but do discuss it with the developer)

  • make sure you change any scheduled tasks “Configure for:” field to the OS your server is now, might also be a need to export and import them again (re-create);
  • check what you have in “log on as batch job” and make sure the same accounts are there after the upgrade;
  • for Vmware only: having the SCSI controller set to VMware Paravirtual or LSI Logic Parallel, the upgrade will start but will keep on rebooting or give you a BSOD and loop there, solution for this is to change the SCSI controller to LSI Logic SAS but it will not work straight up, you will need to add a 2nd SCSI controller, press Ok, then add a 1 GB HDD (I am not sure this step is needed, but i always did it) and initialize the disk. Shutdown the server and change the Paravirtual or Parallel controller to SAS and boot the machine. After it boots successfully, you can remove the 1 GB HDD and the 2nd SCSI controller. As an added bonus, if you have done this and the installation of windows was looping, it will continue where it left and most likely succeed with the upgrade.

There can probably be a lot more to say about this, but i didn’t want to bore you with it all. What I want to highlight is the amount of work needed for a machine upgrade (1-7 hours + 3 hours patching + testing + rollback if failed + investigation from the team on what happened + writing documentation + unforeseen) and if you multiply this times for ~800 machines that we have had to do, you can see how a few months of your life will go with only this task. 

In 10/10/2023, Windows Server 2012 r2 will have it’s end of extended support (more about this: https://support.microsoft.com/en-us/lifecycle/search/1163) and then, if you still have this type of machines in use, the above process has to start again. The good news is that from 2012 R2 to 2019 it usually takes 1-2 hours (except some servers that have a lot of installed tools on them) and thanks to Microsoft (that reverted the way patching is done) you only have to install the current month of patching. No more 150+ patches or why not, just 2-4 updates and you are done. 

Idea is to start upgrading to 2019 before 2023 so you are ahead of a potential stress for you or your team and will not have to spend nights working on this when you can enjoy some good night sleep. I know I`d like to spend my nights sleeping then upgrading Windows. I have had around 100+ hours spent on the weekend to upgrade this. While it is fun, it is also demanding and burnout can occur, working Monday to Monday and having time off from work.