From time to time people are asking me questions about AD and related topics. I don’t know why but they think I might now the answer. Sometimes this is not true 😉 but I try to do my best (I’ve just learned today that there is “Geek network” so probably those persons can answer all the questions I can’t 😉 . Some time ago through such “question channel” I received following question:

What will be better from LDAP queries performance stand point:

  • single OU with 20k of objects
  • OU tree with 20 OUs and 1k of objects in each.

(this is my rough translation from Polish).

Well, I thought that maybe others might have such dilemma as well so maybe it is good topic for blog entry.

(cc) magic mudpuddle

Disclaimer: of course this is based on my knowledge, so maybe somebody with more ESE deep knowledge will have some more input on possible implications ;).

So short answer to this question is that OU layout will not matter from LDAP queries performance. Now lets check it a bit.

 

Let’s run some queries in lab environment.

Lab setup …

To perform this test I’ve created two OU structures. Single OU with 20k of users in it and nested OU structure with 20 OUs and 1k of users in each of them.

Little digression – for pure fun I’ve created first set using admod.exe and second set using Powershell AD cmdlets from Windows 2008 R2. It was almost same experience with preparation of queries from usability standpoint, admod command was bit shorter. PoSH script was about 4 lines. Next time I will gather execution time for comparison.

Short exercise which was performed on these data was to execute a query which will lookup single user using samaccount name using ADFIND with following query:

adfind -b <OU> -s subtree  -f “(samaccountname=<nazwa użytkownika>)” -stats+only

 -statst+only switch was used as I was not really interested in results but only in LDAP query statistics and this is nifty feature of ADFIND that it can show You this information (for SQL guys … it is something like execution plan 😉 ).

Case #1 – search in single OU with 20k of users

Statistics
=================================
Elapsed Time: 47 (ms)
Returned 1 entries of 1 visited – (100.00%)

Used Filter:
(sAMAccountName=1038_testusr)

Used Indices:
idx_sAMAccountName:1:N

Pages Referenced        : 31
Pages Read From Disk    : 1
Pages Pre-read From Disk: 0

Analysis
———————————
Hit Rate of 100.00% is Efficient

Indices used:

Index Name  : idx_sAMAccountName
Record Count: 1  (estimate)
Index Type  : Normal Attribute Index

Case #2 – search in OU structure with 20 OUs, 1k in each

Statistics
=================================
Elapsed Time: 0 (ms)
Returned 1 entries of 1 visited – (100.00%)

Used Filter:
(sAMAccountName=11520_user)

Used Indices:
idx_sAMAccountName:1:N

Pages Referenced        : 33
Pages Read From Disk    : 0
Pages Pre-read From Disk: 0

Analysis
———————————
Hit Rate of 100.00% is Efficient

Indices used:

Index Name  : idx_sAMAccountName
Record Count: 1  (estimate)
Index Type  : Normal Attribute Index

 

So at first glance both results look similar in terms of efficiency, indexes used etc. In first case however you may notice that query time was much longer – it took 47ms to complete query. But was it related to the fact that this was query performed on single OU with 20k object? No. If you will look at entire statistics data you will spot that in this query case not all data were available in memory cache and one page was read from disk

Pages Read From Disk    : 1

When data was cached we can re-run this query and see if results will changee:

Statistics
=================================
Elapsed Time: 0 (ms)
Returned 1 entries of 1 visited – (100.00%)

Used Filter:
(sAMAccountName=1038_testusr)

Used Indices:
idx_sAMAccountName:1:N

Pages Referenced        : 27
Pages Read From Disk    : 0
Pages Pre-read From Disk: 0

Analysis
———————————
Hit Rate of 100.00% is Efficient

Indices used:

Index Name  : idx_sAMAccountName
Record Count: 1  (estimate)
Index Type  : Normal Attribute Index

So as you can see when data were cached in memory results are the same, which also proves that data read from the disk is expensive operation (which was obvious 😉 ).

Just a side note – ability to see number of pages read from the disk is AFAIR introduced in Windows 2008. Thanks to courtesy of joe ADFIND takes advantage of this and shows this information, which might be handy. .

Conclusions …

Of course this was very simple exercise and if we want really test ESE database engine behavior in different scenarios we should run many queries in different setup, using indexes or not. But I doubt that results will show any dependencies as ESE which is under the hood of AD is just database and object hierarchy is just information in this database.

So if you will consider how your query is constructed in terms of scope and filter, is it efficient or not, is it using indexes it will ensure that it will run efficient on AD without any problems caused by this particular directory logical OU structure.

So conclusions at this point are:

  • Always analyze your queries in terms of being efficient in terms of scope, filter, indexes etc. Of course if this is one time query maybe you can run it even if it is not the best one you can build, but if this is application or script which will be used more often remember to check its statistics and efficiency.
  • Try to avoid extensive data read from disk, which translated to domain controller world can be expressed as “try to get as much memory as you can to fit your DIT data in your cache”. If your DIT size is getting closer to 3GB time to think about switching to x64 architecture (with W2008R2 it is time to switch anyway). 

And with this I will end this post as it it much longer than it should on this topic ;).

PS. If somebody with greater experience has something to add especially in terms of building efficient AD queries etc … comments are here for You :).