SharePoint Saturday Denver and RBS Updates

Wow.  Long break from blog posting!  Sorry about that.  Once again, duty calls.  I was booked solid from March until mid-summer.  Then it was vacation time.  So there was little time to generate meaningful blog post content.

What happened in the meantime?  Well, Microsoft released this tiny little server product called SharePoint Server 2010 and a tiny little client application called Office 2010!  Yeah, yeah... I know.  I'm a bit behind.  Well, I have been playing with SP2010 and over the coming months, I'll be posting on some interesting things around Enterprise Search in SP2010.  Our scalable architecture options have grown considerably and I'd like to cover some of that.

I will be presenting on a couple of topics at SharePoint Saturday in Denver this Saturday (August 7th).  "Architecting for Scale in SP2010" will cover some of those new scale out topology options we have in SP2010.  "Remote BLOB Storage Deep Dive" will get into some of the nuts and bolts of RBS.  Both of these were queued up for the SharePoint Evolution / Portugal TechDays conferences earlier this year.  Unfortunately, the Iceland volcano had other plans for me.  Fortunately, SharePoint Saturday has provided a new forum for me to deliver this content.

In preparing for SharePoint Saturday this weekend, I realized that the RTM version of SharePoint 2010 had a few minor changes to the PowerShell commands for enabling RBS.  I've updated my earlier post to reflect the proper commands.

Hope to see all of you SharePoint types in Denver this weekend!

Otherwise, I'm back in the saddle with a laundry list of research topics.  So stay tuned...

A Guide to Enabling the CodePlex RBS Provider on SharePoint 2010

Over the holiday break I spent quite a bit of time trying to get a real good handle on exactly how SharePoint 2010 and RBS interact in order to externalize content to a BLOB store.  As part of that effort, I implemented the CodePlex RBS Provider as an exercise to learn more about the roles of SQL Server 2008, the RBS Framework, and SharePoint 2010.

So why would you want to try out this exercise?  Well, there are some limitations to the FILESTREAM provider.  First, it only works with local volumes on the SQL Server.  So if you want to store your binaries out on a file share that consists of cheap storage, you’re out of luck.  Also, by default, FILESTREAM provider seems to put all binaries in a single folder.  NTFS used to have problems when you had too many files in a single folder.  That may have been mitigated in newer platforms, so I can’t testify to this specifically.  But in general, I’m not a big fan hundreds of thousands of files in single folder.

I will be doing an RBS Deep Dive session at the SharePoint 2010 Evolution conference on April 21st and there are going to be some real juicy tidbits in that session.  But in the meantime, I thought I would provide a guide to installing the CodePlex RBS Provider in SharePoint 2010 (beta).  Note that some of these steps may be subject to change as I am currently working with the beta bits.

In order for the CodePlex provider to work, it has to be tweaked a bit because SharePoint 2010 requires the provider to properly dispose the BlobStore object in the framework.  So I’ve recompiled the provider appropriately.  You’ll find the link in the steps below.

Also, before we get started, the steps below assume you have a single WFE in your farm.  If you have more than one WFE, steps 2 - 9 will need to be executed on each WFE.

So here we go:

  1. Prepare your content database.  This script must be executed in SQL Management Studio in the context of the content database that will have RBS installed:
    IF NOT EXISTS (SELECT * FROM sys.symmetric_keys WHERE NAME = N'##MS_DatabaseMasterKey##') CREATE MASTER KEY ENCRYPTION BY PASSWORD = N'Admin Key Password !2#4'
  2. Download rbs_x64.msi from here and save it to your WFE.
  3. Execute rbs_x64.msi on your WFE to begin installation of the RBS Framework.
    • Buzz through the standard agreement and registration forms.
    • All features should be installed except FILESTREAM Provider is not necessary.  So your feature options should look like this:
      Feature Selection
    • Configure the connection to your content database.  Use “default” for the Filegroup name.  Test the connection to ensure your good before proceeding. Should look something like this:
      Database Connection
    • On the database configuration form, no need to check the “Request connecting clients…” box.
    • On the Maintainer Task form, scheduling the Maintainer is optional.  I typically do because it’s easier to let the installer help you set up the task.  It’s important to configure the “Run As” account as well as the task schedule which is disabled by default.
      Maintainer Task
    • On the Client Configuration form, I typically set all log settings to “Warning”.
      Client Configuration
    • You should be ready to “Install” at this point.  Once you kick it off, the installer will go to town setting up the framework binaries and running RBS SQL scripts on your content database.  If you told the installer to configure a scheduled task for the Maintainer, the Schedule Task configuration box will pop up for you.
    • Once the installer has completed, your content database should have some new tables in it that look like this:
      RBS Tables
  4. Download the updated CodePlex RBS Provider that I updated here
  5. For the sake of this guide, create a directory to permanently contain the RBS Provider binaries at c:\CodePlex_RBS_Provider
  6. Copy the contents of the “Binaries” directory in the rbs.zip file to c:\CodePlex_RBS_Provider
  7. Now you need to use Notepad to edit c:\CodePlex_RBS_Provider\InstallProvider.cmd
    • The “RootDir” parameter needs to be c:\CodePlex_RBS_Provider
      • If you choose a different RBS Provider location for the RBS binaries, this value needs to reflect that location
    • The “ProviderName” parameter needs to be FileStoreProvider_1 for the purposes of this example.
    • The “DataLocation” parameter contains the location of your BLOB store.  This will typically be a network share somewhere but it could be a local directory.
      • Note that the RBS provider will be invoked in the context of the SharePoint content access (Application Pool) account specified for the Web Application that the content database is associated with.  So that service account needs to have FULL Permissions to this folder.
      • I typically create a subdirectory in the file share for each BLOB Store which I create 1 to 1 for each Content Database that I enable for RBS.
    • The “ServerName” parameter contains the name of your SQL Server instance.
    • The “DatabaseName” parameter contains the name of the content database that has had the RBS resources already installed (new RBS tables, etc).
    • Save the InstallProvider.cmd which should now looks something like this:
      InstallProvider.cmd

      NOTE!!! The application pool account for the SharePoint web application that contains the content database that will have RBS enabled, must have full permission to the final BLOB store folder as well as read access to the location of the RBS provider DLL (if it's not installed to the GAC).
  8. Launch a command prompt with Run As Administrator:
    Run cmd as Administrator
  9. Run the InstallProvider command:

              C:\CodePlex_RBS_Provider\InstallProvider.cmd
    • When it runs correctly, the result looks like this:
      RBS Provider Install Result
  10. Run the SharePoint 2010 Management Shell as Administrator.
  11. Execute the following commands to enable the CodePlex RBS Provider for the content database:

    $site = get-spsite http://siteurl 
    $rbss = $site.ContentDatabase.RemoteBlobStorageSettings
    $rbss.Enable()
    $rbss.SetActiveProviderName($rbss.GetProviderNames()[0])
    $rbss
    • When you execute “$rbss” you should see a result that looks like this:
      Enable RBS PowerShell Result
    • Congratulations, you just enabled RBS!  If all goes well, all future binaries added to your content database will be sent to the BLOB Store.
    • In order to externalize existing BLOB content, you need to run one more PowerShell command.

      $rbss.Migrate()
      • This command will send all binaries currently in the database to the active RBS Provider BLOB Store.
    • What if I want to pull all my BLOBS back into the database?  Simple.  Just execute:

      $rbss.SetActiveProviderName("")
      $rbss.Migrate()
      • This will pull all the BLOBs back in-line in the content database.  This is handy when you need to move your BLOB store or if you need to change your RBS Provider all together.

Ok, so that was all pretty cool, so how do I know that it worked?  Well, before you enable the RBS Provider, the AllDocStreams table in your content database will show values in the “Content” varbinary(max) column like this:
AllDocStreams Content In-Line

After you enable the RBS Provider, the AllDocStreams table in your content database will show NULL values in the “Content” varbinary(max) column and instead show values in the “RbsId” varbinary(64) column like this:
AllDocStreams Content Externalized

Also, if all goes well, then the BLOB Store directory that you specified in the InstallProvider.cmd configuration file will have BLOB files showing up like this:
Active BLOB Store

Finally, I’ll say that while this is a great exercise to show you what’s possible with an RBS Provider other than FILESTREAM, the CodePlex provider should probably be used at your own risk in production.  In my mind, this is an extremely useful but elaborate teaching example.  It should be thoroughly tested before being deployed in mission critical systems.

It's also important to know that Microsoft ISV Partners are working on several different RBS Providers that will provide feature support for file shares, BLOB encryption, BLOB compression, REST, and Cloud Storage as well as support for specific storage platforms such as Fujitsu Eternus, EMC Centera, and Hitatchi HCAP.

I’m convinced the flexible RBS Provider options in the works will squash any remaining FUD out there regarding SharePoint storing binaries in the database.  The scalability story was already great, and it is so much better for SP2010 that it makes my job as a SharePoint evangelist that much easier!

Cheers.

SharePoint 2010 Evolution Conference - London

So I've been spending the last month or so disecting the RBS framework in SharePoint 2010.  I've already succeeded in enabling the CodePlex RBS example provider in SP2010.  I'm sure I'll be posting a "how to" on that in the coming weeks.

But I'm also cooking something else.  The CodePlex RBS SP2010 example is the foundation for an RBS Deep Dive session that I'm assembling for the SharePoint 2010 Evolution Conference in London.  I'm currently scheduled to speak on Wednesday (4/21/2010).

The session will have just a few overview slides on the concepts of RBS and how RBS affects scalable architecture and then I'll dive right in to demos and show you how all the pieces and parts of the RBS subsystems in SharePoint 2010 and SQL Server 2008 R2 work together!

Come check it out!

SharePoint 2010 Evolution Conference - London

Slides from SharePoint Saturday KC

I'm finally getting around to posting the slide deck that presented at SharePoint Saturday KC.  The topic was "Architecting for Scale in SharePoint 2010"

It's a pretty juicy deck with lots of detail that will help you with storage architecture and SQL tuning.  It also provides good background on Remote BLOB Storage (RBS) and the improvements to the search subsystem in SP2010.

You'll find the slides here:

Architecting for Scale in SharePoint 2010

SharePoint Saturday Kansas City, MO - 12.12.2009

So this weekend I'm off to present at SharePoint Saturday in Kansas City.  It is shaping up to be a stellar event and it looks like there are still a few (free) seats left!

I'll be presenting a session on "Architecting for Scale in SharePoint 2010".  Here's the session description:

"Scaling SharePoint has presented many challenges in the past.  Those days are gone.  Learn how new features like Remote  BLOB Storage (RBS), a dramatically improved search subsystem, improved database efficiency, and resource throttling in SharePoint 2010 can be leveraged to scale SharePoint to heights we’ve not seen before."

I'll be covering topics such as Storage Architecture, SQL Tuning, Scalable Taxonomy, Throttling, RBS, and Search Topology in SP2010!  I know, it's a lot right!  Well, I'm going to try to squeeze it all in along with 2 demos.  The slide deck is jam packed with juicy tidbits!

Is it possible?  Could we potentially see 1 BILLION documents in SharePoint???  Well, it's pretty much my at the top of my professional bucket list!

Hope to see you there!









SP2010 Single Server Complete Install - An Extra Note

So I set out to fire up a fresh SP2010 install with the new BETA bits today...

Since I wanted to do a "Complete" install using SQL 2008 (not Express), I promptly referenced Neil Hodgkinson's "From the Field" blog top on "Single Server Complete Install of SharePoint 2010 using local accounts."  Excellent post Neil!  It was VERY helpful.

But being the control freak that I am, I wasn't content going back to the world of GUID laden Administration databases.  So I did a little digging in the New-SPConfigurationDatabase powershell cmdlet and I found a handy extra parameter.

Instead of using just "New-SPConfigurationDatabase", try this...

PS C:\Users\spadmin>  New-SPConfigurationDatabase -AdministrationContentDatabaseName SP2010_CentralAdmin_DB

New-SPConfigurationDatabase Command

It works as advertised:
Central Admin database with no GUID

Yeah, OK.  It's a pretty minor little feature.  But that's what blog posts are for, right?  Obscure little features hidden in the corners of the world?

Now... If I could only figure out how to get the GUIDs out of all those crazy SP2010 Services databases!  There's always tomorrow.

SP2010 Scalability (4 of 4): In-Place Records Management

Ok, so lets do a little math here.

In SharePoint 2007, you can define only 1 Records Center at a time for official files.  A single site collection (including a Records Center) in its own content database might contain somewhere around 1 million documents at a meager average of 55KB per document.  That keeps us under the 100GB content database recommended size limit.

1 million documents?  Seriously?  I suppose you could try to manage multiple records centers but this would get interesting if you actually needed to put something in legal hold in multiple records centers.  So for all practical purposes, we were severey limited as to the number of official records that could be supported in the SharePoint 2007 farm.

Enter In-Place Records Management.  Excellent!  With SharePoint 2010, we will soon have the ability to enable a feature on a site collection that allows us to declare a document as a record IN PLACE!

Suddenly, records center scalability isn't such an issue anymore.  Oh yeah, and if you still want to use Records Centers... in SP2010 we will have the ability to define more than just a single Records Center.  Also, we gain the option to "Move" a document instead of just copying it to the the Records Center.

So there you have it.  Three HUGE improvements to the scalability story of SharePoint 2010!

Remote Blob Storage takes us to a whole new order of magnitude with respect to the number of documents that can be supported in a single content database.

The new SharePoint Search story just got a whole lot better with scale-out for redundancy an performance.  Plus we now have FAST for SharePoint which can support BILLIONS of documents in SharePoint.

Finally, In-Place Records Management will support Enterprise Class Records Management solutions that go far beyond the significant limitation in SP2007!

In weeks to come, I'll be dialing in on a few of these technologies.  Particularly, I'll be diving in to the deep end on RBS in the very near future!

 

SP2010 Scalability (3 of 4): Remote BLOB Storage

Binary Large Objects, or BLOBs as the SQL types like to call them, are the byte arrays that represent documents and other files in SharePoint.  Typicaly, they are stored in the SharePoint content database.  The reality is, the ECM industry has known for decades that RDBMS is not the best place to store BLOBs.  SQL database storage needs to be high IOPS and low latency... translated... EXPENSIVE storage.  It's much more efficient if we are able to store the BLOBs on lower cost, possibly even archival-class storage while we continue to invest in high performance storage for the structured content metadata.

As of SP2007 SP1, it was possible to take advantage of an External BLOB Storage (EBS) API to get the BLOBs out of SQL Server.  Unfortunately, this method is not transactionally consistent and it results in a high number of orphaned BLOBs in the BLOB store because new BLOBs are stored (not replaced) when a document is updated.  We must then rely on event receivers and lazy garbage collection to clean up the orphans.  In short, EBS was a temporary solution from Microsoft.  The future is SQL RBS support in SharePoint 2010.  Fortunately, Microsoft will also provide a PowerShell based solution for migrating from EBS to RBS!

So I can just see the SP2010 box cover art... "Now with RBS!"  When we enable RBS this is what we get:

  1. Transactional consistency ensures that when we get a BLOB ID back from the RBS provider, we are guarateed storage.  It also allows for traditional update capabilities.
  2. Transactional consistency also allows Write Once Read Many (WORM) mode devices to "VETO" a delete or modify operation.  This is clutch for financial institutions who are legally not allowed to delete financial records.  So they might use a storage platform such as EMC Centera or Hitachi HCAP in a sort of "create, but don't delete" mode.  If these vendors choose to write an RBS provider for their devices (and they probably will), then the actual storage subsystem itself can prevent SharePoint from allowing a document to be deleted.
  3. While orphan cleanup is much less of a concern with RBS it still needs to be managed.  The good news is that because RBS is managed through SQL tables, RBS can take advantage of indexes to actually "query" the difference between what is in the BLOB store and what is in SharePoint content databases.  This is a HUGE improvement compared to spinning through the object model of an extremely large content database.
  4. RBS is completely transparent to the SharePoint API.  Nothing changes.  So existing custom and 3rd Party code will continue to function as expected.

So now we have the facility to get the binary data out of the content database.  That means that we're looking at having only metadata present.  SWEET!  That whole 50GB, 100GB, recommended maximum database size discussion becomes a whole long less important!  Instead we'll be concentrating more on the recommendations of list/library sizes and that's a good thing!

Now that Microsoft is targeting 50 million items in a library, site taxonomy can look a little more like what we're used to seeing in your standard collaboration implementation instead of having to split up similar documents into multiple content databases.

Just a quick last note... RBS will require SQL 2008 Enterprise Edition.  So keep that in mind if SP2010 and RBS might possibly be in cards for your organization.

In Part 4 of my SP2010 Scalability series, I will look into the scalability benefits of In Place Records Management.  Microsoft has removed a HUGE scalability bottleneck here! 

SP2010 Scalability (2 of 4): SharePoint Search

For the last several years, I've worked on several projects that stretch the recommended limits regarding the amount of content that SharePoint can handle.  Back in December of 2007 I started on an interesting scalability journey with a couple of awesome guys at Microsoft.  The first, Paul Learning, is a quality MCS SharePoint guy out of Detroit.  The second, Andy Hopkins, served as our red-tape bulldozer.  The three of us worked to put a small server room full of Fujitsu blades and storage arrays to good use in order to prove that SharePoint could do 50 million documents.

The result of our efforts was a very lengthy whitepaper.  I'll sum it up in the following two sentences.  The first is that SharePoint could fairly easily be architected to handle 50+ million documents consisting of over 5 terabytes of data.  The second is that search configuation and crawl processing was BY FAR our greatest challenge.  We were successful, but it took far longer to crawl 50 million documents than we would have liked.

So without further lame commentary, I'll document right now that I believe the most important new feature of SP2010 from an ECM perspective is a highly scalable search subsystem!  Check out some of these new capabilities:

  1. We can now have multiple Index Servers!   Sweet!  No more single point failure and the scale out story gets WAY better!
  2. We can now divide the content index into multiple index partitions.  When implemented with multiple query servers we get benefits of redundancy and parallel performance.
  3. The crawl management and the property store data tables have been split into separate databases.  But they took it further! We can have multiple of each!  This opens doors to scale out even further with respect to I/O and storage as well as possibly multiple SQL Servers to handle different search subsystem component databases.
  4. Each index server can be configured to run multiple crawlers.  Multiple crawlers can crawl content in parallel!  So the process of spinning through the entire corpus is no longer a linear style operation!
  5. Index servers are now STATELESS.  The crawlers build the content index and propagate directly to the query servers.  So guess what...  If an Index Server bombs, no big deal.  Just stand up a new one and pick up where you left off!
  6. All of these improvements result in Microsoft's new target number of being able to crawl 100 million content items! 
  7. We can go well beyond 100 million with FAST for SharePoint! 

Based on what I've seen so far, the search subsystem improvements are very promising. I believe that number is totally legit and possibly even on the conservative side!  Time will tell.

One thing is clear already. The FAST Search team has defintely had a positive impact on the already excellent Enterprise Search team at Microsoft.  In fact, some of the architecture in the new SharePoint search platform is remarkably similar to how FAST is designed!

In Part 3 of my SP2010 Scalability series, I'll talk about how Remote BLOB Storage is going to be a HUGE game changer in the SharePoint Enterprise Document Management story.

 

SP2010 Scalability (1 of 4): Introduction

I have been very fortunate over the last several years in that I've had many opprtunities to architect many extremely high scale SharePoint systems.  Everything from your standard 3 million document Imaging Repository to systems with 10's and even more than 100 million documents (thanks to FAST ESP!)

As I look back on SharePoint 2003 and even to existing SharePoint 2007 solutions, there have definately been several challenges as we design systems that can handle the millions of documents we throw at them.  So it is with great pleasure that I am able to present my 4 favorite improvements in SP2010 that handle virtually ALL of my previous challenges.

Oh yeah... For anyone who might have read this series introduction when I initially released it, I was originally going to include a post regarding the Managed Metadata service.  Then it occured to me that while this is an important feature for ECM in SharePoint 2010, it doesn't really improve performance.  So... I yanked it.  I'll do a separate post on Managd Metadata at some point.

So without further delay, my four part (ok... three if you discount the introduction!) series on SP2010 Scalability:

  1. Introduction
  2. SharePoint Search
  3. Remote BLOB Storage RBS
  4. Inline Records Management

I hope this provides hope for those of you who dream of massive SharePoint 2010 implementations like I do!