Everything you need to know about Windows logons in one blog series continues here!
I have threatened on several occasions now to do a follow-up to my previous article on Windows logon times which incorporates the findings from my “logon times masterclass” that I have presented at a few events. The time has come for me to turn these threats into reality, so this series of articles and accompanying videos will explore every trick we know on how to improve your Windows logon times. As many of you know, I work predominantly in Remote Desktop Session Host (RDSH) environments such as Citrix Virtual Apps and Desktops, VMware Horizon, Windows Virtual Desktop, Amazon Workspaces, Parallels RAS, and the like, so a lot of the optimizations discussed here will be aligned to those sorts of end-user computing areas…but even if you are managing a purely physical Windows estate, there should be plenty of material here for you to use. The aim of this is to provide a proper statistical breakdown of what differences optimizations can make to your key performance indicators such as logon time.
This series of articles is being sponsored by uberAgent, because the most important point to make about logon times is that if you can’t measure them effectively, then you will never be able to improve them! uberAgent is my current tool of choice for measuring not just logons (which it breaks down into handy sections that we are going to use widely during this series) but every other aspect of the user’s experience. All of the measurements in this series are going to be done via uberAgent, and as it comes with free, fully-featured community and consultants’ editions, there’s absolutely no reason that you can’t download it and start using it straight away to assess your own performance metrics. I’ve written plenty about uberAgent on this blog before, and I stand by it as the best monitoring tool out there for creating customized, granular, bespoke consoles that can be used right across the business. I’ve recently deployed it into my largest current client, so you can be sure I am putting my money where my mouth is – if it didn’t do the job, I wouldn’t have used it for my customers, simple as. It now features full Citrix Cloud integration and a “user experience” score to tell you where your users are having issues, so go and try uberAgent right now – you won’t regret it!
Part #10 – antivirus and security software
Now, let’s give you the penultimate part of this series by delving into the effects of antivirus and other security software on our logon times KPIs. There are stacks of these suites out there now, covering not just anti-malware and anti-spyware, but agents that can cover vulnerabilities, data loss prevention, screen captures, heuristics detection – you name it, you probably have to deploy it. Now’s not the time to get into another of my rants, about how reactive security software is dead and the market is there for something different to provide security without the UX impact, and neither is it the time to start comparing the aggregate effects of different suites of security software. The simple question is – how much of an effect do security agents have on our logon times?
I’ve continued with a Server 2019 RDSH desktop and a Windows 10 21H1 VDI as my test beds, despite the fact that I’m finally moving towards Server 2022 because of the better MSIX integrations. With regards to the security software – I opted not to go for Windows Defender, even thought it’s pretty common, because it has some auto-configuration features now. I had to grab some free ones, so I ended up with AVG on the Windows 10 instance and something called Immunet (which looks like some crazy fusion of Cisco AMP and ClamAV) on the RDSH instance (as finding free AV for servers is pretty tricky!) Bear in mind, we’re not here to bake off security software combinations against each other – just to get an idea of what kind of impact can be had.
I also used Citrix User Profile Management as the profile tool in a traditional configuration, because the performance of security software is often markedly worse against file-based profile management tools. Whatever security software and profile management you use, the guidelines are the same – take baselines, gauge the impact, and assess what remediation you need to take.
Baselining
Let’s start off with no antivirus at all and take a baseline (yep, even Windows Defender is disabled). An interesting aside – Windows Defender now actually will delete any files or scripts that try to disable it, seeing them as a threat. I’ve had to put exclusions in to allow my scripts which turn off Windows Defender during the automated builds to run properly.
Here’s the data for Server 2019
And here we have it for Windows 10
Pretty bog-standard for a moderately-optimized build using Citrix UPM, I’m sure you’ll agree.
Antivirus installed
Now let’s crack on and install Immunet and AVG. Immunet actually does a bit of self-configuration to try and avoid conflicts and performance impact, but AVG seems to just scan everything by default.
Once we’ve got it running and have restarted to ensure it’s fully operational, let’s take a new set of baselines. Here is the results for Server 2019
And here is the data for Windows 10
Quite a marked increase – over 40 seconds for Server 2019, and nearly 32 seconds more for Windows 10. It’s interesting that despite Immunet doing some default exclusions, it still had a much more noticeable impact onto the logon time than AVG did.
Antivirus installed – everywhere
Now, let’s further emulate an enterprise and also install our security software onto not just the Citrix Virtual Delivery Agents, but also the domain controllers and the file servers. A bit crazy to do this without any default exclusions (and I did get very worried about my domain controllers!), but let’s crack on and get the stats.
With antivirus active on all the components, here is the data for Server 2019
And here we have the stats for Windows 10
That’s quite brutal. You can see that the profile load time has more than doubled (more than trebled, on Windows 10!), and this is all because we are now scanning that file-based profile on both ends as it is loaded. So we can easily see that installing security software on the VDA has a big knock-on effect, but once you send it out to all the infrastructure components, then it becomes really awful in terms of logon times. In fact – not just logon times, we also saw both the session performance and logoff times take a great big hit too.
Antivirus exclusions
Of course, we need to dial back and go with some best practices, not just for the virtual delivery components but for all the infrastructure. Let’s start putting some exclusions in place.
Once we’ve appropriately configured the exclusions in line with best practices, we can now repeat the testing again and see what difference this has made. Here’s the data for Server 2019
And here we see the final data for Windows 10 21H1
So, correctly configured security software does have an impact – which is more noticeable on Server 2019 than Windows 10, although this may just be because we had to use different vendors, to be fair – but nowhere near as bad as it can get if you don’t get your exclusions right. The object lesson here is that whatever security software you deploy, you need to baseline it, measure the impact, define the correct exclusions, and then test again.
For posterity, here’s a graph of the results showing all of the tests we did
The results are pretty uniform, which is good – we are seeing comparable effects from different products on different operating systems.
Summary
Security software is a necessary evil, but you need to get it configured correctly, particularly for virtualized environments. It will *always* have an impact – it’s up to us as administrators and architects to measure that impact, to mitigate it, and to ensure that we aren’t layering security products into environments just to tick boxes.
The main takeaway is that absolutely, utterly, you must get your exclusions right. Security software is generally intrusive, insisting on locking files and reading not just their contents but their capabilities, sometimes comparing their hashes and components to many different datasets. But on the flip side, you’ve got to be careful not to open yourself up to threats by being too lax (I once saw a blanket exclusion for c:\Users – I kid you not). Measuring exclusions against the results of pentesting should be a regular process. There are stacks of vendor guides out there with exclusion configurations that you will need to reference – Citrix’s guide is particularly comprehensive, and contains links to others you may also find useful.
Absolutely, totally, unmitigatedly – you need to get exclusions done properly.
As I said already, different configurations may see different results, although the general trend should be the same. Container-based profile management tools do not suffer as badly, so if you’re in an environment where you have to deal with lots of agents, you would probably want to adopt these to offset the impact. On the other hand, moving to lower-level or agentless security tooling may also allow you to offset these UX hits somewhat – and don’t forget, detection and logging is often more important than simply intercepting everything in-session.
Well, we are ten episodes in to this series and as I originally intended spending about six months on this and now we’re up to a year and a half, it might be time to start bringing it to a close. The next (and hopefully final!) instalment will cover how much we can bring that logon time down if we optimize heavily, and also a full summary of the most important things you can do to try and generate good logon times for your users. Thanks for reading!