There was a lot of noise about this issue lately – hopefully this will help some people out!
So, I remember seeing a few weeks ago a lot of chatter in Slack and Microsoft forums about FSLogix Profile Containers customers having issues with intermittent application of user group policies. A lot of people seemed to report it was on the latest version that this problem existed, and moving down to an older version (7217 if I recall correctly) fixed it for some. Others still reported seeing the problem on whatever version they used, and whether this was on any Windows operating system.
Naturally, as this problem didn’t appear to be affecting me, I put it down to something in their environments that I didn’t have in mine (selfish, I know, but we’re all a bit busy recently). However, when I was doing testing for a new build of our Citrix Virtual Apps image, I noticed that it was happening to me too.
The first logon always seems to be fine, when a user creates their profile in the VHD file store. However at second logon, I noticed that our Start Menu layout (partially locked, applied via GPO) was missing, and more worryingly, restrictions like command prompt and registry editing had been lifted. Knowing that this would create a security issue straight away, we did some digging, and found the following in the event logs:-
A number of client-side extensions were reporting the same error. What was noticeable is that user GPO processing was completing in 2 seconds, as opposed to a more normal time of 30 seconds (yes, this environment hasn’t had my optimization skills fully let loose on it yet!)
The odd thing was, though, that the system appeared to be contacting the domain controller absolutely perfectly. It got a response, it wasn’t on a slow link, it knew the GPOs that it had to apply for the user, yet when it came to process the CSEs – it couldn’t find them. We checked SYSVOL, we checked domain controllers for errors – everything appeared to be working fine.
Initially we were thrown a red herring when we uninstalled FSLogix and the problem persisted. Using Citrix UPM gave the same results. So we were convinced it was something within the new build – the fact that it didn’t seem to happen in our production environment only added to this feeling.
However, when we took the existing production build and added it to the testing environment and saw the same errors again, we knew there was something really odd going on – so at this point we engaged with Microsoft to do some advanced troubleshooting.
Group Policy caching
It turns out that the issue has its roots in Group Policy caching. This was introduced from Windows 8.1 onwards and is a way to make foreground synchronous processing process more quickly. It doesn’t allow GPO processing to be done when domain controllers are unavailable and/or the machine is off the network – the only way to do that, as far as I know, is to adopt something like PolicyPak which has an “offline reinforcement” setting (there you go Jeremy, saves you emailing me ;-)) For more info about synchronous and asynchronous processing, read this article.
Group Policy caching is normally enabled by default, but only for clients. To enable it for server operating systems (such as we are often using in Citrix Virtual Apps environments), you need to enable the following policy – Computer Config | Admin Templates | System | Group Policy | Enable Group Policy caching for servers
There is also a setting in the same folder called Configure Group Policy Caching, but this seems to only apply to clients where it should be on by default anyway. Don’t get the two mixed up.
As I said this only applies to foreground synchronous processing, which on a server operating system would be done at every logon unless you have enabled “Allow asynchronous user Group Policy processing when logging on through Remote Desktop Services” (obviously in the Citrix world I’m assuming all of your servers are RDSH!) But even if you have enabled this, if you use a client-side extension such as Folder Redirection, you will get forced into foreground synchronous at logon if the CSE in question has changed. Or, if you have the “Always wait for the network at computer startup and logon” policy set to Enabled, then every logon will always be foreground synchronous. As I said, more information about the mysteries of processing modes is in the previous article linked earlier.
So once a user logs on, as soon as a background refresh is completed, GPOs are now cached down onto the device (for Computer Config settings) or into the user profile (for User Config settings). So next time the user logs on in foreground synchronous mode, if possible, the cached GPOs will be used – potentially making the logon time faster (which everyone knows I am keen on). Having to fetch hundreds of GPOs from the domain controller could potentially cause a bottleneck here.
When the user policies are cached, they are written to %LOCALAPPDATA%\GroupPolicy\DataStore\0\SysVol\domainname\Policies
There is, however, also a Registry key that backs up this cache and tells the system where to go to fetch these user policies. This sits in HKLM, which was a bit of a surprise. The key is HKLM\Software\Microsoft\Windows\CurrentVersion\Group Policy\DataStore\[USERSID]\0, and contains a number of numbered subkeys. Each of these subkeys has a value called FileSysPath which tells it where to find the cached policies.
All sounds pretty straightforward – this local cache is used to speed up your logon a bit. The weird bit is the use of an HKLM entry to point it to the files – user profile is normally loaded before user GPOs are processed so I don’t know why this can’t be read from HKCU.
However, most of use us the FSLogix redirections.xml in conjunction with Profile Containers to avoid filling our profiles full of bloated crap and wasting storage. And most of us use the community redirections.xml files as a starting point to take advantage of the hard work done by others in discovering files and folders that are extraneous. And (you can see where this is going), those community XML files often contain the following entry:-
That’s the kicker – when the user logs in for the second time, Group Policy caching believes it has a local cached copy of the GPO files (because of the FileSysPath entry in HKLM). It then tries to access them, but because they’re excluded from the FSLogix Profile Container, they can’t be found (hence the error of “cannot find the path specified”). The GPOs can’t be processed, the logon finishes, no Registry values exist for your GPOs, and you’re left in a bad state.
So the best way to resolve this is, simply, remove the exclusion from the FSLogix redirections.xml file. It’s not a huge folder (8KB in my lab, 3MB in a large enterprise), so leave it in there.
The reason the problem is quite intermittent is because users would have to hit the same server again that they logged on to before to see the problem. When we went back through our SIEM, we found instances of it going back a long time, but clearly users had not noticed. So it was only when we were on a limited number of test servers that we actively spotted the problem, which made us think it was a build issue initially.
Some people have asked “why not turn off Group Policy Caching via GPO?” Well you can, but I would sooner have it on and take the potential benefits of speeding up your logon a bit, especially if you are tied to foreground synchronous mode.
You could also drop the HKLM [USERSID] folders at logoff, but again, you’d be fixing the issue but basically disabling caching. In cloud environments particularly, reducing LDAP calls is maybe a good thing.
It’s also possible this problem affects Citrix UPM as well (UPM also excludes the GPO cache folder) – it may also affect other profile management products too. We certainly saw it with UPM in testing but have not tried at scale (because we’re not keen on UPM any more)
If you do get users with this issue, you can take the following actions:-
- Remove the APPDATA\Local\GroupPolicy exclusions from your redirections file and deploy it
- Remove the HKLM entries for the affected users from any servers you find them on
- If you don’t want to give the user a new profile (who does?), have them log back on, run a gpupdate (so that their local cache is created), then log back in again and everything should be golden
This should sort out the issue and in our testing it does not reoccur.
An interesting observation about this, however, is that if a user logs in to a machine and creates the HKLM SID entry, and then their profile is deleted – the next time they log in their GPOs will fail to apply, because it will look in the local profile for them and they will not be there (because it has only just been created). This is rather annoying and leads me to believe that in environments where a profile management tool is used there should be some way of clearing these up. If a user logs on to 500 different XenApp servers and then you have to delete the central copy of their profile – you would need to remove the Registry key from 500 individual servers before the user could log on to them successfully and get all their GPOs applied. I think it would very prudent to clear this key at boot, and/or clear it within your golden image.
Thinking about the issue, we already know that there is a Registry key in HKLM that is user-specific (HKLM\Software\Microsoft\Windows NT\CurrentVersion\ProfileList\[SID]. FSLogix deals with this by using .reg files from %LOCALAPPDATA%\FSLogix to insert the data in here on the fly (see image below).
I think that FSLogix needs to be changed slightly to accomodate the behaviour of Group Policy caching, and possibly for this .reg fix to be extended to encompass it.
If you want to see a video demonstration of this issue, here it is:-
Big thanks due to Sargur Ravi Kiran from Microsoft who helped us greatly by pinpointing the issue.