Everything you need to know about Windows logons in one blog series continues here!
I have threatened on several occasions now to do a follow-up to my previous article on Windows logon times which incorporates the findings from my “logon times masterclass” that I have presented at a few events. The time has come for me to turn these threats into reality, so this series of articles and accompanying videos will explore every trick we know on how to improve your Windows logon times. As many of you know, I work predominantly in Remote Desktop Session Host (RDSH) environments such as Citrix Virtual Apps and Desktops, VMware Horizon, Windows Virtual Desktop, Amazon Workspaces, Parallels RAS, and the like, so a lot of the optimizations discussed here will be aligned to those sorts of end-user computing areas…but even if you are managing a purely physical Windows estate, there should be plenty of material here for you to use. The aim of this is to provide a proper statistical breakdown of what differences optimizations can make to your key performance indicators such as logon time.
This series of articles is being sponsored by uberAgent, because the most important point to make about logon times is that if you can’t measure them effectively, then you will never be able to improve them! uberAgent is my current tool of choice for measuring not just logons (which it breaks down into handy sections that we are going to use widely during this series) but every other aspect of the user’s experience. All of the measurements in this series are going to be done via uberAgent, and as it comes with free, fully-featured community and consultants’ editions, there’s absolutely no reason that you can’t download it and start using it straight away to assess your own performance metrics. I’ve written plenty about uberAgent on this blog before, and I stand by it as the best monitoring tool out there for creating customized, granular, bespoke consoles that can be used right across the business. I’ve recently deployed it into my largest current client, so you can be sure I am putting my money where my mouth is – if it didn’t do the job, I wouldn’t have used it for my customers, simple as. It now features full Citrix Cloud integration and a “user experience” score to tell you where your users are having issues, so go and try uberAgent right now – you won’t regret it!
Part #5 – user profiles
I’ve said on many times that in my experience, the two biggest drags on user logon times are Group Policy and user profiles. We’ve already covered off Group Policy, so today we are going to enter the murky world of profiles.
We all know what a user profile is, right? Well if you don’t, go and read this article for some background first. The profile is loaded pretty early on in the logon process, so getting it done promptly is very beneficial.
We are testing today on some “fully-loaded” machines to give a proper impression of what a user profile is like. We’ve run Microsoft Teams to create the user cache, we’ve run up Outlook to synchronize the local OST file into the profile (three months’ cache as standard), and we’ve been browsing to a number of busy internet sites to spew out a load of browser garbage as well. There are a full set of applications (both natively installed and AppVentiX-delivered) on the target devices, and we’re testing on Server 2019 and Windows 10 21H1, both fully patched (because we build them like that). Given that I actually have a life outside of tech, we’ve only done five logons for each test instead of ten as previously, so please forgive me.
Let’s start with local profiles – so we will simply give our test users a local profile on the machines and measure the performance.
Obviously, local profiles aren’t very good for RDSH/Citrix/non-persistent environments because the profile only exists on the specific machine the user has accessed. But given that there are a lot of persistent environments out there, it makes sense to include them into the scope of our testing.
Because local profiles are exactly what they say on the tin – local to the device the user is using – it means there are very few dependencies when it comes to loading the profile. And not surprisingly, this is borne out by the data.
So we can see straight away that both the logon time and the profile load time for a local profile is blindingly fast. Under 0.3 seconds for the profile load and 7-11 seconds for the actual logon itself. That’s pretty impressive.
It’s interesting that Windows 10 is noticeably faster to logon than Server 2019. However, as I’ve already mentioned, a lot of us simply can’t use local profiles because we’re not in persistent environments. If you are doing fully persistent, then you should be able to achieve stunning logon times using a local profile, but if you’re not – then you need to choose a different way.
The “old” Microsoft roaming profile was very popular back in the day, but now is very long in the tooth and suffers from many limitations. The main drawback is that a roaming profile only captures the %APPDATA% folder from the user profile, completely ignoring the %LOCALAPPDATA% folder and anything else that may be necessary. This can be overridden by using the ExcludeProfileDirs Registry value, but this is a messy and probably unsupported configuration to use. But in the interests of thoroughness, we’ve done the testing on an old-style roaming profile hosted on an SMB file share.
Wow. We can see a huge jump here, with Windows 10 jumping to a 65 second logon time and nearly 40 seconds to load the profile, and Server 2019 jumping to an average of 50 second logon time and almost 25 seconds to load the profile. On top of this, using a basic roaming profile gave us awful problems with authentication to Office365 and reset a whole host of user settings at every logon. Now what people always say here is “would Folder Redirection make a difference to this?” Unfortunately, the answer is no, because in our testing we didn’t have files in any folders that you would traditionally redirect – there were no files on the users’ Desktop, for instance.
It’s clear that loading user data by copying it down to the endpoint in this fashion is grossly inefficient, so let’s not dwell on it too much and move on.
Citrix User Profile Management
I have to give a run out to Citrix UPM because obviously it’s very popular in environments that I work in. Let’s remember, also, that there are now two ways you can use UPM to manage a user profile – the old-school way, by storing the UPM profiles on a file share and loading them up (kind of like a roaming profile on steroids), but there’s also the UPM Profile Container feature that I talked about previously, where you can mount the entire profile into a VHD file. We’re going to test both.
Citrix UPM (file-based)
We set up UPM in a pretty standard fashion, without any streaming features or folder redirection. The profiles were hosted on the same SMB file share we used with the roaming profiles. Here’s the results
Again, a big uptick in both profile load time and overall logon time – averaging 19 seconds and 29 seconds respectively for the profile load, and 48-50 seconds for the overall logon time. Interestingly, Windows 10 is a lot quicker to load the profile, but only marginally quicker for the overall logon, which is pretty poor, all things considered. Seems that using UPM out-of-the-box in its old file-based format is not very good.
Citrix UPM – container-based
Let’s switch now to capturing the entire UPM profile as a VHD instead and repeat the same tests. The VHD is stored on the same file shares as the “traditional” UPM profile.
Now we can see a drastic improvement in both the profile load and logon time KPIs. Profile load is 6-8 seconds for both OSes, and total duration again is pretty close, but this time it has been reduced to 27-28 seconds, which is a marked improvement and proof that using containers is much better for profiles than a file-based solution.
FSLogix Profile Containers and/or Office Containers
Now that we’ve ascertained containers make a big difference, let’s switch to Microsoft’s FSLogix Profile Containers to do the tests. We will test a single Profile Container capturing everything, and a Profile Container with a second Office Container as well to capture the Teams and Outlook data separately. Again, we used the same SMB file share.
Profile Containers standalone
On their own, Profile Containers perform as below
Now here’s some interesting data – FSLogix Profile Containers load the user profile faster than a local profile (around 0.1 seconds on average). That’s incredible, and they also mount their container far faster than the equivalent Citrix UPM technology. However – they don’t log on as quickly as a local profile, so even though the initial load is done at a breathtaking pace, pulling the expanded data from the VHD on the file share seems to eat up some time. On Server 2019 we are averaging around 17 seconds, on Windows 10 around 23. That’s pretty passable for a non-persistent environment, but we’re still way off the speeds you would get with a local profile.
FSLogix Profile Containers and Office Containers
Let’s repeat this test with the Profile Container loading the Office cache data for Outlook and Teams into a separate Office Container. As before, both containers are stored on the same SMB file share.
Using dual containers makes the profile load take longer – but not really much that a user would notice, to be fair, increasing to about half a second. However – and particularly on Windows 10 – switching to dual containers puts a bit of a dent in the overall logon time, going up to 23 seconds for Server 2019 and a quite dramatic average of 36 seconds for Windows 10.
User Profile Disks
Whilst measuring the performance of container-based solutions, it wouldn’t be fair not to include Microsoft’s old User Profile Disks technology. Whilst this was primarily a Server feature, you can enable it on Windows 10 as well – as discussed in this article.
Again, we’ve simply stored the UPDs on the same SMB file share as previous tests.
User Profile Disks are a little erratic – occasionally they load the profile and logon in a similar time to FSLogix Profile Containers, but on the whole, they seem to be consistently slower than the FSLogix option.
Now, I generally don’t recommend using mandatory profiles any more – see here for a discussion of how to recreate the benefits of a mandatory profile without using one – but I am aware that there are people who still do. Just for posterity, let’s take a Server 2019 and Windows 10 mandatory profile we have lying around and run them through the same tests. In these cases, we stored the mandatory profile local to the server, which is generally the way I would do it if I had to use it in the enterprise.
You can see that despite being stored local to the device, the profile load time of 8-10 seconds is still much higher than that of a container. Also, for some reason the total logon time is particularly bad on Windows 10 – over a minute, compared to 38 seconds for Server 2019. Now, this may also indicate that you would need to regenerate a mandatory profile every time a Windows 10 update is released – this mandatory profile was created on 1909 and run on 21H1, so that may be the reason for the dramatic uptick in overall logon time. However, I think these stats reinforce my feeling that using a mandatory profile is probably a legacy way of operation.
Ivanti User Workspace Manager with a mandatory profile (hybrid profile)
In my AppSense-blogging days, using a mandatory profile alongside the product (now called Ivanti UWM) was the preferred way of setup. Ivanti referred to this as “hybrid” profiles, with the mandatory profile providing the base and Ivanti Personalization Server layering in the settings. There are other technologies you can achieve the same ends with, such as Microsoft UE-V, but I chose to test Ivanti as I am quite familiar with it.
I set up Ivanti’s Personalization Server using the templated groups for applications and Windows settings, and ran the web and database services on a single server as you would in a PoC. The results are below.
Now it is unclear if uberAgent can understand whether the Ivanti Personalization Server loading settings counts as part of the “profile load” – I doubt very much that it would. However, it will add the time taken to the overall logon duration. The results aren’t great – averaging 40 seconds for a logon and 10 seconds to load the profile. Also, I didn’t test on Windows 10 as at this point I was seriously running short on time, but I would expect the results to be pretty much the same.
Now, what can we extrapolate from this data?
Firstly, local profiles are by far the fastest way to get your profile loaded and this contributes to a lightning-fast logon time, in general. But this assumes that the local profile is already on the device the user is accessing, and in non-persistent environments this is practically impossible. Naturally, this also means that roaming between devices or sessions is also not possible, so for many, local profiles aren’t a viable solution. If they are though – look no further.
Mandatory profiles I would avoid, for reasons which are better discussed in the article I linked in the mandatory profiles section. They’re not particularly efficient in terms of logons and have many limitations besides that.
Roaming profiles are to all intents and purposes dead for modern environments – they need to be hacked to even capture all of the required user data and the file-based engine that they use mean that they are very poor in terms of both profile load time and logon time.
Citrix UPM in file-based mode also performs pretty badly. Now I know there are many people who will suggest that you can tune file-based UPM to perform much better, and I agree – there are a lot of features and tricks within UPM that can speed things up greatly. However, we are simply looking at out-of-the-box performance here, and UPM does not offer much better than a roaming profile in terms of speed – although it does successfully capture all user settings, whereas a roaming profile does not.
Hybrid profiles, particularly when used with Ivanti UWM, again seem to perform pretty badly, and again many who use UWM will suggest that aggressive tuning can make that performance considerably better. I also agree with this, but stick to my insistence that we are going to measure it on out-of-the-box performance. I’m sure that using the UWM container features together with very streamlined EM PS configurations would produce a fantastic set of results, and I may see if we can do a study on it within this series, but for the here and now – out-of-the-box, it doesn’t give a great profile load or logon time.
Containers, right across the board, perform much better than old file-based profile management.
Citrix UPM Container seems to take the longest to load out of all the container-based methods tested. It also, interestingly, pumped out a 3.5GB container in total size.
User Profile Disks performed well with profile load times, but was quite inconsistent and the overall logon times were a little poor. It also threw out a 3.5GB container.
FSLogix Profile Containers seemed to perform the best when used standalone – the load time was actually faster than a local profile, and logon time was on average between 17-23 seconds, which is pretty good for a networked profile solution. Again, very interestingly, the profile data that was taking up 3.5GB in both UPM and UPD was reduced to about 800MB when using Profile Containers.
Using FSLogix with dual containers had a slight uptick on the KPIs, and also a slight increase (up to 1.2GB) in the overall data consumed within the containers.
So in conclusion, local profiles perform by far the best, but if you need a centrally managed roaming profile solution, then FSLogix Profile Containers seem to be the most sensible choice.
Improving performance further
One piece of data that was consistent was that when using a FSLogix Profile Container, even though the profile load time was incredibly quick, there was still a delay that increased the logon time. In uberAgent this showed mostly as “Shell Start”, and seemed to be caused by the expansion of the user’s files into their profile from the network location they were stored in.
It stands to reason that when storing profiles on the network, increasing the performance of the storage and the file server presenting the storage is a crucial way to improve that logon KPI. I consulted both the internet and fellow CTP Leee Jefferies, and made the following tweaks to the Registry on my Windows file server to try and improve the overall performance (all the below values are DWORD and decimal)
Higher concurrent limit to allow for higher bandwidth transfer
HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\Smb2CreditsMax = 12288
Allow more I/O to queue in the storage subsystem – 2 additional threads
HKLM\System\CurrentControlSet\Control\Session Manager\Executive\AdditionalCriticalWorkerThreads = 20
Double the standard asynchronous commands
HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\AsynchronousCredits = 1024
Allow more threads per I/O queue
HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\MaxThreadsPerQueue = 40
Improve client latency
HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters\TreatHostAsStableStorage = 1
Also on the clients, I added this Registry value
HKLM\system\CurrentControlSet\Services\lanmanworkstation\parameters\DisableBandwidthThrottling = 1
After applying these I re-ran the tests against FSLogix Profile Container on Server 2019, and the results are shown below
This is quite a marked improvement – profile load time is now an average of 0.8 seconds rather than 0.12 seconds, and average logon time is down from 17 seconds to 12 seconds. It is clear that when using a container-based solution from a networked file share like you would with FSLogix Profile Containers, tuning of your storage and the device that presents the file share is absolutely crucial. When done well – and I have no doubt I could do better on my own lab – we are already seeing logon times averaging at 12 seconds, which for a networked profile solution is very good indeed. Especially when we consider that we aren’t applying any of the optimizations from other parts of this series – this is based on out-of-the-box profile management performance alone.
So in summary, for those of you working in non-persistent environments, a container-based profile solution combined with fast, optimized file server storage is the best way to maintain good logon times. As I said previously, you can probably do much better with UPM or Ivanti than I did, and I may produce an article specifically dedicated to tuning those profile solutions within the later parts of this series. But for now – if you can’t use a local profile, then FSLogix Profile Containers looks like your best bet for good logon performance.
Stay tuned for part #6 of this series, where we will be taking a dive into the effects of Windows 10 UWP applications.
All I have to say is WOW! This took a significant amount of time to create, I really appreciate you creating this. Right now we are using redirect folders on RDS. I started looking into FSLogix last week, looks like I made the right choice!
James, nice blog man. Also impressive.
I am deploying fslogics on vmware horizon 7.13.1 with instant clone.
My login time is around 70 seconds. There is no folder redirection. I have Mapped drive H. Desktop, documents and downloads etc location is defined in registry to go to H mapped drive instead of userprofile.
VHD profile size is 4 gB
Below is the breakdown of each policy login time to 70 seconds.
Please wait for the fs logics apps services 14 seconds
applying users settings 5 seconds
applying registry settings 5 seconds
Applying group policy registry policies 6 seconds
Applying group policy internet settings policy 6 seconds
preparing windows 35 seconds
The last step of preparing windows is taking so long and even the first step of fslogics is taking almost 15 seconds.
Any clue how we can reduce the login time to 30 seconds at least.
Remove things step-by-step and see what happens. You say you’re not using Folder Redirection, yet you say Desktop, Documents and Downloads are redirected to the H: drive – this is classic Folder Redirection, CSE or not. If you take that redirection out, what is the logon time like?
Also, asynchronous policy processing would help here – your GPOs would go from 22 seconds down to about 5 seconds.
same problem, without default profile the login time was around 20 s, but now i have “preparing windows” for 30-35sec.
Hi James, i enabled asynchronous and now fslogics failed and throws the error “The user profile failed to attach. ” and got signed out. looks like fslogics doesnt work if gp is asynchronous.
any other suggestions.
I have done classic Folder Redirection, CSE by changing ntuser.dat in default and changed all the profiles folder eg desktop, documents etc to mapped drive so my desktop is now H:\Desktop and so on.
am i doing something wrong as architecural point of view.
I followed your YouTube tutorial for asynchronous gp.
Ah – do I remember somewhere that the latest version of FSLogix forces synchronous mode? I think it did – not sure on the reason but I think that might be it.
I installed the latest version of fslogics and the error is gone. There is not much improvement.
Now from fslogics policy setting to other group policies its 20 seconds and than preparing windows takes 30 seconds so altogether till it shows desktop takes 50 seconds altogether.
Any other suggestions.
I can attach a video if it makes sense.
It looks like asynchronous is not getting applied and still it is synchronous. unfortunately in master image the events viewer were empty where we see gp status.
when i open registry as normal user i dont see subkey stat but when i open registry as administrator i do see sub key stat and the correspondence values. i added these values in default ntuser.dat in master image.
is there any other way we can see if my policies are asynchronous. because still i see in login that all registries and policies are getting applied during login.
Unfortunately the event viewer is the only place I know. Any idea why the event log is empty? That doesn’t sound good…
I have applied vmware os optimization which is very aggressive and i dont know which seetting has disabled it. But to me it looks like fslogics forces it to go synchronous.
May be someone else has fslogics on windows 21H1.
I will get a look at it hopefully over the next few days and see what gives.
I really appreciate your efforts James.
Facing the same issue here. After everything is loaded windows 10 shows blank screen for 30 seconds, without this the login time is sub 30 secs. Were you able to find a solution for this?
Personally I now only recommend using FSLogix version 2.9.7349.30108. I don’t install anything newer. I’d give that a try and see if it makes a difference. That is the last version that could actually handle asynchronous mode.