Windows File Handling Problems

Allen Webster

#10853

February 10, 2017

I've been failing to solve a really basic problem for two years now.

The two main features I have been trying to get:

1. I need to maintain a set of files being edited that I can quickly query to see what is loaded and to get the buffer data structures associated. I need to be able to add and remove things from this set as the user loads files and kills buffers.

2. I need to get a notification any time one of the files in the set is edited and saved by an application other than my own.

3. I do not care if the system can tell that two different file names are the same file because of links/subst directories, but I do not want feature 1 or 2 to fail because of such a problem.

The various approaches I have taken that have all failed:

Approach 1.
At first I just stored a hash table of the loaded files, hashing them by name. In order to get file changes I used ReadDirectoryChangesW on a separate thread.

This fails because ReadDirectoryChangesW reports files by name, but that name does not necessarily match the name that was used for a key in the hash table. I set up a canonical way of representing names by making them all lower case except for the drive letter. It still failed whenever the files are loaded in a subst'ed directory, which I and many of my users use a lot.

Approach 2.
I tried getting hashes for files by combining nFileIndexHigh and nFileIndexLow to make unique identifiers for each file in the table.

This turned out to have a lot of problems. In particular the file index can change because of the way applications often save files by making a temp, deleting the original, and renaming the temp, according to some documentation out the index can change for other reasons too. So the table hashed by this method was completely unusable once the index is lost.

Maybe this could be improved by fixing the table whenever an event happens on the file that changes it's index, but in order to make that change the application needs the original id, and so it needs a way to look file ids up by name anyway. So it brings me right back to the original problem in Approach 1 of not having a way to store a table of file names because the names can be inconsistent.

Approach 3.
I tried abandoning the hash table and instead getting the file index every time I want to check for equality between two names. While this gave perfectly correct behavior, the need to query the set is high enough that it is too slow to keep.

Approach 4.
I tried opening file handles and keeping them open with some sort of read/write sharing settings in the hopes of being able to skip the CreateFile call which would speed it up pretty well, but that fails because read/write sharing settings apparently don't do exactly what I was expecting.

Approach 5.
Finally I tried building a system to handle subst and just say ignore the problem with link files. This way I can compare files by their names instead of by indexes which skips all the file creating slowness and allowed me to go back to using a hash table. To do this I just queried all the drive letters at start up to find how they were subst'ed. Then in the routine for creating canonical file names I replaced subst'ed directories with their C:\* equivalent. This worked, except that for some users querying the drives would cause a crash at startup for reasons I was never able to understand.

Thanks to anyone who can clear this mess up for me!

ratchetfreak

#10854

February 10, 2017

Have tried you periodically fstat ing (or GetFileAttributesEx or GetFileTime) each file you have open periodically along side the ReadDirectoryChangesW. This would guarantee that you pick up changes eventually.

Especially on refocus of the application this is nice and after user actions that could change the files (like running a command).

Edited by ratchetfreak on February 10, 2017, 10:40pm

Allen Webster

#10855

February 10, 2017

Yes I've done stuff like supplementing ReadDirectoryChangesW with rolling through files to check their last save times, but the ReadDirectoryChangesW doesn't do any good if the filename issue is not resolved which is what I'm really getting at here.

One option is to abandon ReadDirectoryChangesW and just manually check file save times, which I used to do, and it was easier to get that to work properly. But it takes a whole lot more time. I could just trigger mass scans on refocus and after completing calls to commands. Which is what I might end up doing.

Mārtiņš Možeiko

#10857

February 11, 2017

How different are file names reported ReadDirectoryChangesW? Couldn't you simply construct full path to file, convert it to lowercase (because NTFS is case insensitive) and store that as hash key?

Allen Webster

#10858

February 11, 2017

With subst directories I can have "W:\project\code\main.cpp" be the name I used to load the file, and the name that is hashed and stored, but "C:\work\project\code\main.cpp" is the name reported by ReadDirectoryChangesW.

I suppose you're suggesting I turn "W:\project\code\main.cpp" into "c:\work\project\code\main.cpp" and store that in the hash table as the "canonical" name?

Allen Webster

#10860

February 11, 2017

I don't have any code demonstrating the approach available right now. If you do try it, see what happens when you edit and save a file in visual studio. You should find that the hash of the file changes, and thus future queries into the table suggest that the file is not loaded even though it is.

Mārtiņš Možeiko

#10861

February 11, 2017

Ah, subst'ed directories. Not sure how the notifications interact with that, I haven't looked into that before. But yeah, before calculating hash you could get real location of file and use that as hash, so it will match string retrieved by ReadDirectoryChangesW. I think that would solve this issue. There should be a function that does this (maybe GetFinalPathNameByHandle or GetFileInformationByHandleEx+FileNameInfo).

Edited by Mārtiņš Možeiko on February 11, 2017, 2:05am

anaël seghezzi

#10865

February 11, 2017

In CToy I use the stat function and "struct stat" (st_mtime) in a thread to know if a file was updated, it's cross-platform.

Used in this code :
https://github.com/anael-seghezzi/CToy/blob/master/src/ctoy_tcc.c