If Your Hybrid Search Index lives in the Cloud, is this a Compliance Concern?

One of the new features in SharePoint 2016 is the ability to integrate content from on premise SharePoint Farms with Office 365.  

The Old Model

In the original SharePoint 2013 model, Hybrid Search worked by using a federated search approach where the SharePoint 2013 farm hosted the search index for on premise and Office 365 hosted the search index for SharePoint Online. 

While the search query executes within Office 365, the index is actually still on premise and is federated with the SharePoint online search index.  The content within the on premise index never leaves the on premise farm.

The New SharePoint 2016 Model

With the new SharePoint 2016 model and/or the latest SharePoint 2013 model (it requires the latest cumulative update), there is no longer an on premise index and a cloud index.  There is only a single integrated index running in the cloud.

The key point here is the content that is indexed is now in the cloud at least as represented as the stored index, even if it originated from an on premise SharePoint farm..

There are clear advantages to this approach in that SharePoint Online now has a single index to search and can use ranking, relevance, Delve, activity feeds, etc. against a consolidated index that covers content both on premise and in the cloud.

The Key Compliance Question: If Your Content is Indexed in the Cloud, is it in the Cloud?

Many regulated organizations have made it a policy that certain types of documents cannot be hosted in the cloud.  They implement SharePoint 2013 on premise to avoid the security and compliance risks associated with moving to the cloud.   A typical hybrid scenario involves for example a hospital that might allow for collaboration of non-patient records in Office 365 but keeps all patient records on premise for compliance and security reasons.

In this scenario, any document on premise would be crawled and indexed by SharePoint online using SharePoint 2016.  Crawled document content and metadata is now in the cloud – for example, all metadata attributes would be the index as well as all words/terms from the content of the document.  If your document contains patient identifiers as metadata or within the content of the document, those now presumably live in some form within the index (The original document stays on premise – when the user clicks on the link it sends them back to the on premise SharePoint farm).

Is this a compliance violation if the policy is no patient records can go to the cloud? 

  • Ronny Lewandowski

    Dear Christopher, I just came across this page during an Investigation on having local (On-Premise) documents indexed by SharePoint-Online. What I learned from this Video: https://channel9.msdn.com/events/Ignite/2015/BRK3134 is, that also the parsed Content is copied to the Cloud because SP Online needs this for things like Rebalancing and so on. (around 34:30 in the Video). In my understanding there is no way to avoid local documents to be indexed by SP Online in a hybrid Environment. So external users could never collaborate on confidential documents if the Stakeholder doesn’t allow to have the index in the Cloud, right? – Any technical possibility I don’t see? With Kind Regards, Ronny