SP2010 Scalability (2 of 4): SharePoint Search

For the last several years, I've worked on several projects that stretch the recommended limits regarding the amount of content that SharePoint can handle.  Back in December of 2007 I started on an interesting scalability journey with a couple of awesome guys at Microsoft.  The first, Paul Learning, is a quality MCS SharePoint guy out of Detroit.  The second, Andy Hopkins, served as our red-tape bulldozer.  The three of us worked to put a small server room full of Fujitsu blades and storage arrays to good use in order to prove that SharePoint could do 50 million documents.

The result of our efforts was a very lengthy whitepaper.  I'll sum it up in the following two sentences.  The first is that SharePoint could fairly easily be architected to handle 50+ million documents consisting of over 5 terabytes of data.  The second is that search configuation and crawl processing was BY FAR our greatest challenge.  We were successful, but it took far longer to crawl 50 million documents than we would have liked.

So without further lame commentary, I'll document right now that I believe the most important new feature of SP2010 from an ECM perspective is a highly scalable search subsystem!  Check out some of these new capabilities:

  1. We can now have multiple Index Servers!   Sweet!  No more single point failure and the scale out story gets WAY better!
  2. We can now divide the content index into multiple index partitions.  When implemented with multiple query servers we get benefits of redundancy and parallel performance.
  3. The crawl management and the property store data tables have been split into separate databases.  But they took it further! We can have multiple of each!  This opens doors to scale out even further with respect to I/O and storage as well as possibly multiple SQL Servers to handle different search subsystem component databases.
  4. Each index server can be configured to run multiple crawlers.  Multiple crawlers can crawl content in parallel!  So the process of spinning through the entire corpus is no longer a linear style operation!
  5. Index servers are now STATELESS.  The crawlers build the content index and propagate directly to the query servers.  So guess what...  If an Index Server bombs, no big deal.  Just stand up a new one and pick up where you left off!
  6. All of these improvements result in Microsoft's new target number of being able to crawl 100 million content items! 
  7. We can go well beyond 100 million with FAST for SharePoint! 

Based on what I've seen so far, the search subsystem improvements are very promising. I believe that number is totally legit and possibly even on the conservative side!  Time will tell.

One thing is clear already. The FAST Search team has defintely had a positive impact on the already excellent Enterprise Search team at Microsoft.  In fact, some of the architecture in the new SharePoint search platform is remarkably similar to how FAST is designed!

In Part 3 of my SP2010 Scalability series, I'll talk about how Remote BLOB Storage is going to be a HUGE game changer in the SharePoint Enterprise Document Management story.

 

Print | posted @ Saturday, October 24, 2009 12:10 AM

Comments on this entry:

No comments posted yet.

Your comment:

Title:
Name:
Email:
Website:
 
Italic Underline Blockquote Hyperlink
 
 
Please add 8 and 1 and type the answer here: