Note – the issue discussed here has now been fixed in FSLogix version 2105 (see here for details). However, if you are simply wanting to use FSLogix object-specific settings in general, they are discussed as the solution to this now-rectified issue.
You may be aware that not so long ago I put together a blog post on how to spread your users across multiple file shares using FSLogix and some excellent scripting from fellow CTP Ryan Revord. The idea was that every time the workers were rebooted, the VHDLocations Registry value would re-order itself based on the highest amount of free disk space, so that new users were always directed to the least-loaded file share in the list.
This has worked really well for us, and an added boon of the feature was that if a file share in the list was unavailable, the user would simply create a new profile on the next available one. Obviously if you are using Cloud Cache or other methods for resiliency the process would be different, but in my particular use case, where resiliency wasn’t really required, we were safe in the knowledge that if we lost a file share or an availability zone, users would simply create new vanilla profiles on one of the other file shares and continue.
Unfortunately, a Microsoft blog post was brought to my attention recently that seemed to indicate that using multiple entries in the VHDLocations Registry value would not provide failover in this fashion. Conscious that I have many thousand users in an environment with a large number of file shares, this got my attention pretty quickly. Joining up with others in the community and at Microsoft, I did some testing and eventually ascertained that this appeared to be a bug. Using an older version of the FSLogix software showed that it originally behaved as intended. Below are the results of my rudimentary testing on an old version of FSLogix which showed the “failover” style behaviour when a file share was unavailable.
- If a profile exists in \\FS\Share1 and \\FS\Share2 and \\FS\Share2 is offline, the profile from \\FS\Share1 is used
- If a profile exists in \\FS\Share1 and \\FS\Share2 and \\FS\Share1 is offline, the profile from \\FS\Share2 is used
- If no profile exists in \\FS\Share1 and \\FS\Share2 and \\FS\Share1 is offline, a new profile is created in \\FS\Share2
- If no profile exists in \\FS\Share1 and \\FS\Share2 and \\FS\Share2 is offline, a new profile is created in \\FS\Share1
- If a profile exists in \\FS\Share1 only, and \\FS\Share1 is offline, a new profile is created in \\FS\Share2
- If a profile exists in \\FS\Share2 only, and \\FS\Share2 is offline, a new profile is created in \\FS\Share1
However, repeating the same tests on the latest version of FSLogix showed different results. If any of the file shares in the VHDLocations list were offline, users with profiles in that file share or any of the file shares after it in the list would be unable to log on, and new users would also be unable to log on if any file share in the list was offline. Testing in the lab showed that when the first file server in the list was offline, users with profiles in the secondary or users with no pre-existing profile found themselves blocked from logging on with this error
Unfortunately, we’ve had confirmation from Microsoft that this bug was introduced back in July of 2020. (Yes, my environment has been hanging by a thread for this long!) To pour salt on the wound, the original code changes were required to fix some critical logon issues, so it isn’t simply a case of Microsoft being able to roll it back. If you did decide to roll back, then going back seven months or so is going to remove a lot of enhancements and bug fixes (particularly around OneDrive). Microsoft are going to address this, but the point I’m trying to make is that it isn’t going to happen quickly, as they will have to scope out the required changes so as not to reintroduce the problems they were attempting to fix in the first place.
So firstly, if you’ve used the method we put together in your environments, then I apologize for bringing you this bad news. If you have, what you need to do is take stock of your options. This also applies if you’re using multiple VHDLocations entries for failover in any way, not just if you’re using the scripted method, so if you’re doing it this way please read on!
Firstly, take note of the fact that the potential scope of failure has increased possibly greatly. If you had users spread over ten file shares, before the bug was introduced, you would be able to lose one file share and the users homed there would simply receive a new, vanilla profile. But now, if you lost the sixth file share in that list, the users on file shares 6-10 would all be unable to log on and neither would any users who didn’t already have a profile. If you’ve got FSLogix configured to allow logon when it can’t mount a profile, then potentially those users could all log on with local profiles – but then that obviously puts a new strain on your local storage and also, in many Citrix/RDSH environments, means that the users would get a new profile every time they log on as they hit different servers or VMs. In summary – your failure domain has just potentially increased and broadened quite dramatically.
It’s also important to think about how you would normally respond and recover to the failure of a file share or a group of file shares (such as in an AZ loss). If your monitoring and recovery capability is swift enough to restore file shares to service before a major interruption occurs, then you may not be as exposed as you think. Users can be configured to use local profiles for the time period until the file share is restored, as long as you can prove that your response and recovery would initiate a quick fix.
Thirdly – what to do until such time as Microsoft can offer a fix? You can a) roll back to the pre-July 2020 version, which means you’d lose a lot of functionality and fixes, b) trust your monitoring and recovery to alert and repair a broken file share, c) hope your file shares all stay online (hey, it’s worked for me for this long!), or d) use object-specific Registry values. You could also try editing the script to possibly run more regularly and re-order itself if an offline file share is detected – this is more challenging but I believe James Kindon has done some work that may help you in this case (see here for his efforts).
Well, the obvious option, and most likely the official line you will get out of Microsoft, is to use object-specific Registry values to split out your file shares to dedicated groups of users. Now this doesn’t achieve what we were previously achieving with Ryan’s script – which was spreading our users dynamically across the file shares in our estate without having to subdivide them into groups – but it does reduce the scope of interruption when a single file share goes down. If you lose a file share in this configuration, then you will simply affect the users which are homed on that location, rather than users on subsequent file shares and new users as well.
I’m not generally a fan of using the object-specific Registry values – it means I have to split users into groups, manage the groups, and some of those groups may fill file shares up faster than others – but in this situation the only other choice I would have is to accept the fact that I’m horribly exposed. So in short – it’s better than what you’ve got until Microsoft issue a fix!
Configuring object-specific settings
These settings work by reading subkeys in the Registry under the keys that normally hold your configuration values. You can do this for either Profile Containers or Office Containers. You create subkeys under them named for the SIDs of the AD group or the AD user object you want to apply them to (for obvious reasons, I’d recommend using AD group rather than user). When FSLogix looks for settings to apply to a user logging on, it does them in this order (the different paths below refer to Profile Containers and Office Containers, respectively):-
If a key exists for HKLM\Software\FSLogix\Profiles\ObjectSpecific\[USERSID] or HKLM\Software\Policies\FSLogix\ODFC\ObjectSpecific\[USERSID], the configuration values are read and applied from here
If a key exists for HKLM\Software\FSLogix\Profiles\ObjectSpecific\[GROUPSID] or HKLM\Software\Policies\FSLogix\ODFC\ObjectSpecific\[GROUPSID], the configuration values are read and applied from here
If a key exists for HKLM\Software\FSLogix\Profiles or HKLM\Software\Policies\FSLogix\ODFC, the configuration values are read and applied from here
So in summary, user-specific first, group-specific second, machine-specific third.
As I said, it doesn’t make much sense to use user-specific, so let’s just quickly demonstrate what would be needed to divide users into two AD groups and assign them a different VHDLocations value based on that group. Obviously, you can do this with any of the FSLogix configuration settings, you’re not simply limited to that one.
Firstly, split your users into groups as required. You may have some logic around this (closest file share), or you may simply want to do it at random. If you’ve got an existing environment using the method that’s now gone wrong, then you will need to split them into groups based on the file share they’re currently located on. New users will have to be assigned a group, which ideally should be the file share which is currently least loaded. You can see why I preferred using the script to doing it this way 🙂
Next, you need to get the SIDs for the group names, which you can easily do with PowerShell
$AdObj = New-Object System.Security.Principal.NTAccount("ADGroupNameHere") $strSID = $AdObj.Translate([System.Security.Principal.SecurityIdentifier]) $strSID.Value
Repeat as many times as necessary, noting the SIDs.
Once you’ve noted the SIDs, you then need to create Registry keys on your target machines. Computer Config | Preferences | Windows Settings | Registry is the obvious way to do this. Bear in mind, that if you have previously configured FSLogix GPOs to set these settings, you don’t actually need to set them to Not Configured for this to work – as mentioned above, the object-specific entries will take priority, so you don’t need to edit the GPOs at all. Add the SID to the key path as below and create your different entries in the respective values
As I said before, you can configure as many settings as you wish in this way – I’m just doing VHDLocations because of its obvious relevance to the problem you may be facing!
Once you’ve done this, you can wait for replication to proceed, and you should now be in a position where you aren’t left exposed terribly by the loss of a single file share. Once Microsoft get around to fixing the code, however, you can go back to the previous method, and hopefully they will now be aware that despite what they thought, people were using VHDLocations for rudimentary failover purposes.
Big thanks due to Jim Moyle for getting the feedback from Microsoft and letting me know about this, and obviously to James Richards for alerting us all to it in the first place.
Hello James, I read all the articles on your blog, it’s very interesting. I am a technician for an MSP company and I use FSLOGIX in all my RDSH installations. This week I have a GPO problem, as soon as I configure the redirection of profiles to a network location I have a black screen of about 5 minutes. I also try to use gpo regedit to change the documents, desktop etc location but have the same problem. Have you ever had this problem?
Have you set the GroupPolicyState Registry value and tried with that? HKLM\SOFTWARE\FSLogix\Profiles\ and set it to 0.
Thanks, for the tip, I created the registry key on the server, I restarted and tested, but I have the same problem. The folder redirection take years to creates folders. I tryed with the build in folder redirection GPO and also with regkey, i have the same problem. I can show you a video if you want to see
I’m not sure but I think the registry key path is incorrect on your last screenshot. The “objectspecific” registry key is missing between Profiles and the Group SID for the VHDLocations reference.
You are correct, thanks for pointing that out!
There was a new version posted today that may fix the issue
To participate in the public preview please sign up here:
Will upgrade over any previous release.
Various updates were made to improve login time.
Fixed an issue where users could fail to login if a VHD network location was unavailable.
You can now increase the size of an existing VHD(x) by updating the SizeinMB setting.
The RefreshUserPolicy setting can now be managed via group policy template (ADMX).
The Installed Version of FSLogix is now written to the registry (HKLM:Software\FSLogix\Apps:InstallVersion).
Fixed an issue where Type4 printer drivers worked after initial configuration, but not subsequent sessions.
Orphaned/corrupt NST files are now cleaned up along with OST files.
Resolved a Cloud Cache bug where a second machine accessing the RW disk gets permanently locked out.
Fixed an issue compiling AppMasking rules with a destination in HKCU.
Fix an issue where FSLogix could cause a deadlock and prevent user connections.
Fixed various issues that could crash the FSLogix service.
Resolved an issue causing frxshell to not launch on non-English systems
A point which I think ought to have some extended thought put in to it:
1. Customers know that they have to have their Profile storage shares up all the time. That is more or less proven by the fact that this error existed for quite some time and may have only surfaced in testing (or a very limited number of customers). So just going to the next location and creating a new VHD if your particular share is down, isn’t necessarily a good idea for the following reasons:
a. The data in the new VHD needs to be merged into the original VHD. That is a very manual process and if there happened to be many users in the share which is down, a lot of work. I think it would be better to just get the share fixed.
b. Preventing users who are further down the list from logging in, I agree, is not good. That is now fixed. But I would not give users permissions to more than one share else issue a: above would be the new problem.
So in general, to stay away from this problem, I would recommend that users only have permissions to create VHDs in one location. And if that location is down, there will be failures to logon. That is when IT needs to figure out what the problem is with the share and fix it.
Just for clarification, if this happens and two VHDs exist for the user in two different VHDLocation shares, when both shares are online, does one VHD merge into the other or does it simply use the first one it finds? Or is this dependent on the FSLogix configuration for “profile type”?
The first location where the VHD is found will be the VHD which is used for that users session. There is no automated merging of the data between the VHDs. That is why I indicated that users should only have permission to one share.
Does FSLogix Support DFS Namespaces, if you had a non-persistent VDI Multi-Site environment using AD Sites and Services? Only looking at the Office Container. Cheers.
I’ve seen it done, yes. However I’m not sure I would recommend it as I hate DFS. YMMV