The mysterious case of Azure Backup Agent not running its schedule

This blogpost addresses a real-life issue that I encountered when migrating virtual servers. To give an impression of the situation I will give some background information.

Background information

The case starts with a migration of an existing virtual environment. The goal of the customer was to leave their current solutions provider and transfer server management to us.

Due to the time constraint for this migration, we choose to migrate the servers as-is and work from there.

We received the exported machines from the solutions provider and successfully activated it on a physical virtualization platform. Some of the virtual servers still ran Windows Server 2008 R2 and Windows Server 2012 R2.

This meant that the virtual servers were not built from scratch. We had no idea what the history is of the systems or if they have had errors in the past with updates, features or other software.

Enter backups

One of our first priorities after we successfully migrated and activated the servers and their services, was to setup the backup.

We started with a brief inventory of the installed applications and requirements. Based on the applications, we did not have the need to make stateful backups of SQL Server databases, Exchange Database Availability Groups (DAGs) or other specific applications or application data. We concluded we only needed file/folder and system state backups.

The backups needed to be stored off-site. Also, we needed the capabilities to restore the systems on the physical virtualization platform.

Azure Backup to the rescue!

Based on the above information we choose to use the Azure Backup agent, without the installation of the Azure Backup Server. This way, the backups are directly stored in Microsoft Azure Recovery Services (MARS).

What we did

We followed the Microsoft procedure. It can be found here. We created the Microsoft Azure Recovery Services vault and created a vault key to be used in the installation.

The installation of the agent went without a problem and the server had already been configured with the prerequisite software. We provided the registration information and that worked without any errors or problems, too.

After the successful registration, it was time to configure the backups; a separate schedule for the files and folders and a separate schedule for the System State backup.

So far so good. We had a backup solution and multiple backup schedules.

What happened next…

After a national holiday, we checked the servers for errors and if the backup schedule had run.

On one of the Windows Server 2008 R2 servers, there was no reference of a backup/recovery point. It looked like that the schedule wasn’t activated or hadn’t run; We found no errors in the event viewer or in the applications log.

What I did notice was that there were no references at all in the Event viewer log for the backup jobs. To validate the correct working of the application on the server, we choose to start a manual backup. This backup successfully completed without any errors.

We decided to wait one more night for the backup schedule to pick up its routine. The next day we checked the backup logs again, and no luck. The backup job still didn’t run at his scheduled time.

“Why won’t it just work?”

During my initial part of the investigation, I focused on the configuration of the job schedule itself. I examined the two configured jobs, and I thought to have found the issue; The action configuration is a PowerShell command that kicks off the backup job. Based on its job GUID.

An example is shown below:

BLOG-MARS-001

BLOG-MARS-002

The first thing I noticed, was that the parameter line, didn’t close with a . In normal PowerShell, if you start a string with , then you will need to close it with a .

This was not the case. I manually added the to the parameter line and started the backup through the Task Scheduler interface. But, same result… The job wasn’t started or shown in the GUI as failed.

Getting to the bottom of things

So, I changed the line to its original state and decided to create some test VMs. This way I could check the functionality on different operating systems. On every test VM, the action line looked the same, missing the end , but the actual schedules where starting and performing its configured task. So, the first conclusion was that the missing wasn’t the cause of the issue.

The second conclusion from this was that the Task scheduler input isn’t affected by the missing . If you run the command line yourself in PowerShell, you need to close with a to start the job.

My next step was to run the PowerShell command manually in my administrator session and a newly opened PowerShell Console. With the closing , of course. And to my surprise the actual job, started.

I cancelled the backup job and began focusing on the PowerShell Module that the command line preloads. Import-Module MSOnlineBackup;
BLOG-MARS-002

I looked up the actual location of the PowerShell Module on the server and it’s located at the following location: C:\Program Files\Microsoft Azure Recovery Services Agent\bin\Modules

BLOG-MARS-003

I choose to copy the MSOnlineBackup folder to the following location: C:\Windows\System32\WindowsPowerShell\v1.0\Modules

BLOG-MARS-004

The reason for this is, that PowerShell searches predefined folders for the Modules that are called in the Import-Module command. Windows Server 2012 R2, and higher, with the latest PowerShell version, automatically preloads the modules, from these default locations, when a command is in need to autocomplete.

When the folder was copied, I tried the predefined schedule again. The result was that the backup job was started and visible in the GUI. After this result, we waited two days and the scheduled backups started and completed successfully.

The Root Cause

The root cause of the problem, was that the SYSTEM account couldn’t load/import the MSOnlineBackup module from the Task Scheduler. After I copied it to one of the system default folders locations, it could. It didn’t report the failure in any log on the system.

Double-checking my assumptions

To check this assumption, I created my own scheduled task, running with the NT AUTHORITY\SYSTEM account, to export the result of its $Env:PSModulePath to an text file.

BLOG-MARS-005

The result in this text file was that only the C:\Windows\System32\WindowsPowerShell\v1.0\Modules was listed as source directory, while for the administrator account, multiple folders where specified, including the C:\Program Files\Microsoft Azure Recovery Services Agent\bin\Modules folder.

Concluding

In this case, the root cause of the problem, was the absence of the C:\Program Files\Microsoft Azure Recovery Services Agent\bin\Modules directory in the $Env:PSModulePath in the SYSTEM account context. I ran the same schedule on my test virtual machine, and the result was, that multiple locations were listed, including the one for the backup.

I hope this was useful and educating for future problem analysis.