Binary Large Objects, or BLOBs as the SQL types like to call them, are the byte arrays that represent documents and other files in SharePoint. Typicaly, they are stored in the SharePoint content database. The reality is, the ECM industry has known for decades that RDBMS is not the best place to store BLOBs. SQL database storage needs to be high IOPS and low latency... translated... EXPENSIVE storage. It's much more efficient if we are able to store the BLOBs on lower cost, possibly even archival-class storage while we continue to invest in high performance storage for the structured content metadata.
As of SP2007 SP1, it was possible to take advantage of an External BLOB Storage (EBS) API to get the BLOBs out of SQL Server. Unfortunately, this method is not transactionally consistent and it results in a high number of orphaned BLOBs in the BLOB store because new BLOBs are stored (not replaced) when a document is updated. We must then rely on event receivers and lazy garbage collection to clean up the orphans. In short, EBS was a temporary solution from Microsoft. The future is SQL RBS support in SharePoint 2010. Fortunately, Microsoft will also provide a PowerShell based solution for migrating from EBS to RBS!
So I can just see the SP2010 box cover art... "Now with RBS!" When we enable RBS this is what we get:
- Transactional consistency ensures that when we get a BLOB ID back from the RBS provider, we are guarateed storage. It also allows for traditional update capabilities.
- Transactional consistency also allows Write Once Read Many (WORM) mode devices to "VETO" a delete or modify operation. This is clutch for financial institutions who are legally not allowed to delete financial records. So they might use a storage platform such as EMC Centera or Hitachi HCAP in a sort of "create, but don't delete" mode. If these vendors choose to write an RBS provider for their devices (and they probably will), then the actual storage subsystem itself can prevent SharePoint from allowing a document to be deleted.
- While orphan cleanup is much less of a concern with RBS it still needs to be managed. The good news is that because RBS is managed through SQL tables, RBS can take advantage of indexes to actually "query" the difference between what is in the BLOB store and what is in SharePoint content databases. This is a HUGE improvement compared to spinning through the object model of an extremely large content database.
- RBS is completely transparent to the SharePoint API. Nothing changes. So existing custom and 3rd Party code will continue to function as expected.
So now we have the facility to get the binary data out of the content database. That means that we're looking at having only metadata present. SWEET! That whole 50GB, 100GB, recommended maximum database size discussion becomes a whole long less important! Instead we'll be concentrating more on the recommendations of list/library sizes and that's a good thing!
Now that Microsoft is targeting 50 million items in a library, site taxonomy can look a little more like what we're used to seeing in your standard collaboration implementation instead of having to split up similar documents into multiple content databases.
Just a quick last note... RBS will require SQL 2008 Enterprise Edition. So keep that in mind if SP2010 and RBS might possibly be in cards for your organization.
In Part 4 of my SP2010 Scalability series, I will look into the scalability benefits of In Place Records Management. Microsoft has removed a HUGE scalability bottleneck here!