Integrating WordPress with Azure Search Service

This blog runs on WordPress using the Brandoo WordPress Plugin.  One of the key challenges with the Brandoo plugin is that the default search service doesn’t work.  I decided to build my own using Azure WebJobs, Azure Search Service and the WordPress REST JSON API.  Here are my lessons learned from developing an Azure Search Solution. 

Note: you can find all the code from the sample solution in GitHub here.

Getting Started

In order to integrate WordPress and Azure Search, the basic flow for data is:


In order to pull posts from WordPress, install the JSON REST API plugin found here (or in the plugin gallery). 

To create a custom WebJob, use the latest Azure SDK and Visual Studio 2013.  Once you have installed the Azure SDK, you’ll see a project template for Azure WebJobs. 

To use the Azure Search service, you need to create a search service in Azure.  See this article for directions on how to do this through the Azure Portal.

To access the Azure Search API, you can go through the REST API directly, or you can use the RedDog.Search C# client.  To install the client into your WebJob, you run the NuGet package console and enter “Install-Package RedDog.Search”.  This also installs the NewtonSoft JSON.NET library which we can also use for interacting with the WordPress REST API.

WebJobs Architecture

When you create a WebJob in Visual Studio, it provides the ability to deploy straight to your Azure Web Site.  This works really well.  Alternatively, you can upload it manually as an .exe through the portal.  You can also run your WebJob locally in debug mode which in this case works perfectly because we have no real dependencies on Azure Web Sites to run the job.

The basic components of the architecture are:

  • Program: the main web job console app.
  • WordPressJSONLoader: service class responsible for pulling posts from WordPress
  • WordPressPosts and WordPressPost: value objects representing the loaded collection of wordpress posts and each individual post.
  • AzureSearchIndexer: service class responsible for pushing posts into Azure Search.

Runtime configuration is done through the App.config and/or the Azure Web Sites configuration.  As part of the Azure SDK you can use the CloudConfigurationManager to get environment settings and it is smart enough to use values in the Azure Web Sites configuration as priority over any settings found locally in the App.Config.  If you are running locally, it degrades automatically to looking in your App.Config for configuration values. 

// load configuration attributes webSiteURL = CloudConfigurationManager.GetSetting("WebSiteURL"); searchServiceName = CloudConfigurationManager.GetSetting("ServiceName"); searchServiceKey = CloudConfigurationManager.GetSetting("ServiceKey"); indexName = CloudConfigurationManager.GetSetting("IndexName");

Retrieving Posts from WordPress

With the JSON REST API plugin installed, retrieving posts from WordPress is easy – just call the URL  This will by default retrieve the last 10 posts but you can use filtering parameters and paging to change how many posts you retrieve.

Using the JSON.API library, you can deserialize your JSON into a JObject which provides you an easy way to pull entities such as posts, comments, etc. out of the returned JSON.

When the JSON REST API is called, it provides 10 posts and the number of “pages”.  Based on this number of pages, we can pull all the posts 10 posts at a time.

public static WordPressPosts LoadAllPosts(string URL) { try { WordPressPosts wordPressPosts = new WordPressPosts(); string query = "?json=get_posts"; WebClient client = new WebClient(); Stream stream = client.OpenRead(URL + query); StreamReader reader = new StreamReader(stream); var results = JObject.Parse(reader.ReadLine()); var JsonPosts = results["posts"]; if (JsonPosts != null) { foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } if (results["pages"] != null) { int pages = (int)results["pages"]; if (pages > 1) { for (int i = 2; i <= pages; i++) { query = "?json=get_posts&page=" + i; stream = client.OpenRead(URL + query); reader = new StreamReader(stream); results = JObject.Parse(reader.ReadLine()); JsonPosts = results["posts"]; foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } } } return wordPressPosts; } catch (Exception e) { throw; } }

In this method, we simply pull out the posts and deserialize these to a collection of WordPressPost objects. 

Running Async Tasks in Console Apps

The library contains only the new .NET 4.5 async methods.  You need to be careful to wrap these methods so that your console app doesn’t delegate out to these methods and then end the program prematurely.  The way to achieve this is to create an async method that you execute from your main program and wait for it using the Wait() method.

You can then call this method from Main() like this:

In addition, make sure that all your async methods return Task instead of void as this will cause your console app to prematurely exit.

Checking for Errors

In the RedDog.Search library, you call all its methods like this:

public async Task CreateIndex() { // check to see if index exists. If not, then create it. var result = await managementClient.GetIndexAsync(Index); if (!result.IsSuccess) { result = await managementClient.CreateIndexAsync(new Index(Index) .WithStringField("Id", f => f.IsKey().IsRetrievable()) .WithStringField("Title", f => f.IsRetrievable().IsSearchable()) .WithStringField("Content", f => f.IsSearchable().IsRetrievable()) .WithStringField("Excerpt", f => f.IsRetrievable()) .WithDateTimeField("CreateDate", f => f.IsRetrievable().IsSortable().IsFilterable().IsFacetable()) .WithDateTimeField("ModifiedDate", f => f.IsRetrievable().IsSortable().IsFilterable().IsFacetable()) .WithStringField("CreateDateAsString", f => f.IsSearchable().IsRetrievable().IsFilterable()) .WithStringField("ModifiedDateAsString", f => f.IsSearchable().IsRetrievable().IsFilterable()) .WithStringField("Author", f=>f.IsSearchable().IsRetrievable().IsFilterable()) .WithStringField("Categories", f => f.IsSearchable().IsRetrievable()) .WithStringField("Tags", f => f.IsSearchable().IsRetrievable()) .WithStringField("Slug", f => f.IsRetrievable()) .WithIntegerField("CommentCount", f => f.IsRetrievable()) .WithStringField("CommentContent", f=>f.IsSearchable().IsRetrievable()) ); if (!result.IsSuccess) { Console.Out.WriteLine(result.Error.Message); } } }

The result will provide a status of success and in the case of an error, some important error details.   Anything that is written to the Console is redirected into the Azure Web Sites log for the WebJob.

Creating an Index

Creating an index is reasonably easy but I found a few gotchas along the way:

  • The key field MUST be a string (I originally tried to use an integer field).
  • Searchable fields MUST be of type string (I originally tried to make a date field searchable). 

If you try to violate the rules, the Index creation process fails and the result returned will be an error.

Adding Posts to an Index

Now that we have our index, we can push posts into the index.

foreach (WordPressPost post in WordPressPosts.Posts) { IndexOperation indexOperation = new IndexOperation(IndexOperationType.MergeOrUpload, "Id", post.Id.ToString()) .WithProperty("Title", post.Title) .WithProperty("Content", post.Content) .WithProperty("Excerpt", post.Excerpt) .WithProperty("CreateDate", post.CreateDate.ToUniversalTime()) .WithProperty("ModifiedDate", post.ModifiedDate.ToUniversalTime()) .WithProperty("CreateDateAsString", post.CreateDate.ToLongDateString()) .WithProperty("ModifiedDateAsString", post.ModifiedDate.ToLongDateString()); IndexOperationList.Add(indexOperation); } var result = await managementClient.PopulateAsync(Index, IndexOperationList.ToArray() ); if (!result.IsSuccess) Console.Out.WriteLine(result.Error.Message); foreach (WordPressPost post in WordPressPosts.Posts) { IndexOperation indexOperation = new IndexOperation(IndexOperationType.MergeOrUpload, "Id", post.Id.ToString()) .WithProperty("Title", post.Title) .WithProperty("Content", post.Content) .WithProperty("Excerpt", post.Excerpt) .WithProperty("CreateDate", post.CreateDate.ToUniversalTime()) .WithProperty("ModifiedDate", post.ModifiedDate.ToUniversalTime()) .WithProperty("CreateDateAsString", post.CreateDate.ToLongDateString()) .WithProperty("ModifiedDateAsString", post.ModifiedDate.ToLongDateString()); IndexOperationList.Add(indexOperation); } var result = await managementClient.PopulateAsync(Index, IndexOperationList.ToArray() ); if (!result.IsSuccess) Console.Out.WriteLine(result.Error.Message);

One key gotcha on adding items to the index – the date field must be in UniversalTime or you’ll get an error message.   For example, instead of supplying post.ModifiedDate as a DateTime attribute you need to call post.ModifiedDate.ToUniversalTime() or the index operation will generate an error.

The RedDog.Search PopulateAsync method allows you to add multiple IndexOperations objects that store up your document post requests into a batch.  The maximum number of IndexOperations the library supports is 1,000 or 16 MB.  In our method, we limit the number of posts per batch to 100 posts to be well under this limit.

public async Task AddPosts() { // if not previously connected, make a connection if (!connected) Connect(); // create the index if it hasn't already been created. await CreateIndex(); // run index population in batches. The Reddog.Search client maxes out at 1000 operations or about 16 MB of data transfer, so we have set the maximum to 100 posts in a batch to be conservative. int batchCount = 0; List<IndexOperation> IndexOperationList = new List<IndexOperation>(maximumNumberOfDocumentsPerBatch); foreach (WordPressPost post in WordPressPosts.Posts) { batchCount++; // create an indexoperation with the appropriate metadata and supply it with the incoming WordPress post IndexOperation indexOperation = new IndexOperation(IndexOperationType.MergeOrUpload, "Id", post.Id.ToString()) .WithProperty("Title", post.Title) .WithProperty("Content", post.Content) .WithProperty("Excerpt", post.Excerpt) .WithProperty("CreateDate", post.CreateDate.ToUniversalTime()) .WithProperty("ModifiedDate", post.ModifiedDate.ToUniversalTime()) .WithProperty("CreateDateAsString", post.CreateDate.ToLongDateString()) .WithProperty("ModifiedDateAsString", post.ModifiedDate.ToLongDateString()) .WithProperty("Author", post.Author) .WithProperty("Categories", post.Categories) .WithProperty("Tags", post.Tags) .WithProperty("Slug", post.Slug) .WithProperty("CommentCount", post.CommentCount) .WithProperty("CommentContent", post.CommentContent); // add the index operation to the collection IndexOperationList.Add(indexOperation); // if we have added maximum number of documents per batch, add the collection of operations to the index and then reset the collection to add a new batch. if (batchCount >= maximumNumberOfDocumentsPerBatch) { var result = await managementClient.PopulateAsync(Index, IndexOperationList.ToArray()); if (!result.IsSuccess) Console.Out.WriteLine(result.Error.Message); batchCount = 0; IndexOperationList = new List<IndexOperation>(maximumNumberOfDocumentsPerBatch); } } // look for any remaining items that have not yet been added to the index. var remainingResult = await managementClient.PopulateAsync(Index, IndexOperationList.ToArray() ); if (!remainingResult.IsSuccess) Console.Out.WriteLine(remainingResult.Error.Message); }

Now that we have our index, we can push posts into the index.

Checking our Index in the Portal

We can verify that we have content in the index by going to the portal and checking out our index:


As shown, we have a newly created index with 291 items in it.

Building a Search Portal

Now that we have some content, let’s build a simple search interface using just HTML and JavaScript.  We’ll use the REST APIs to fetch data from the index and display the search results using Angular.JS as a framework.

Publishing to Azure Web Sites into a Virtual Application

Our WordPress site has been installed into the root of the Azure Web Site.  When we publish our search pages and JavaScript code, we don’t want them clobbering our existing WordPress site or getting deleted or mangled by mistake if there is an upgrade to WordPress.

Azure Web Sites supports the addition of virtual applications that run in their own sub-directory.  To create one, go into the Configure tab of the Azure Web Site and go to the bottom of the page.  You will see a section called “virtual applications and directories”.  In here, we can create a completely separate application that runs in its own directory, with its own web.config and publishing profile.


In Visual Studio, you can configure the publishing profile to publish to this new virtual application.


Specify the subdirectory in both the Site Name and Destination URL fields.

Fetching the Search Results With AngularJS

Building a search form using AngularJS is ideal for pulling in data from Azure Search because Azure Search returns JSON data by default.  We can simply assign the results to an AngularJS variable and then use the AngularJS framework to display the results dynamically.

We start with a basic Search form styled using Bootstrap.  I use the Sparkling Theme for my WordPress blog and this them already uses Bootstrap as its core CSS framework so adding in some custom HTML using the same Bootstrap CSS elements works really well.


The nice thing with using Bootstrap is that if you switch your WordPress theme, as long as it uses Bootstrap (most of them do these days) your search form and results will take on the style of your blog.

If you perform a search with no keywords specified, Azure Search will return ALL documents.  This isn’t something we would want so we have made keyword a required field and check to ensure it isn’t blank before submitting.

The submit method for fetching the Azure Search results is the key for pulling in the results from Azure Search.  In building this method, I found a few gotchas to share:

  • Make sure you include the api-version in the request or Azure Request will return an error.
  • The default order by is relevance.  In our case, we have also added an additional option to sort by Create Date (e.g. $orderby=CreateDate desc.
  • You have to include the api-key in the HTTP header when you send in the request.  You can create a Query key in the azure portal instead of using the admin key and having it public.
  • You assign the JSON object “value” – this contains the search results.
vm.submit = function (item, event) { if (vm.orderby == "Relevance") var URLstring = vm.URL + "?search=" + vm.keywords + "&api-version=" + vm.APIVersion; else var URLstring = vm.URL + "?search=" + vm.keywords + "&$orderby=CreateDate desc" + "&api-version=" + vm.APIVersion; if (!isEmpty(vm.keywords)) { var responsePromise = $http.get(URLstring, config, {}); responsePromise.success(function (dataFromServer, status, headers, config) { vm.results = dataFromServer.value; vm.showSearchResults = true; }); responsePromise.error(function (data, status, headers, config) { alert("Submitting form failed!"); }); } else { vm.showSearchResults = false; vm.results = []; } }

Displaying the Results

Once we have a JSON object with the search results, displaying them is pretty easy – just use the AngularJS ng-repeat attribute to iterate through the results returned.

<div ng-repeat="result in search.results"> <a class="h1" href="{{result.Id}}">{{result.Title}}</a> <div class="h6" ng-bind-html="result.CreateDateAsString | unsafe"></div> <div ng-bind-html="result.Excerpt | unsafe"></div> </div>

One key note is the use of a filter to treat the HTML returned as HTML – by default AngularJS will HTML encode the HTML instead of letting it through raw.  In order to change this behaviour, you can add this function:

angular.module('app').filter('unsafe', function ($sce) { return function (val) { return $sce.trustAsHtml(val); }; });

Using this filter you can then declare the variable as unsafe and it will be allowed through as raw HTML.

Adding a link to the original post is easy – just create an anchor link with the ID of the post.  (You could also use the slug variable that is indexed if permalinks are turned on for more friendly URL’s).

Integrating into WordPress

With the solution published to Azure Web Sites into a Search subdirectory, we can use the published JavaScript files and embed them into our WordPress site.  While a proper WordPress plugin would be ideal, we just added the search.html code into a WordPress page using the out of the box content editor.

Note: when adding HTML into a page using the text editor in WordPress, if you lead any line feeds WordPress converts them into <p> tags.  This isn’t what we want with all our javascript and AngularJS code.  If you delete all the line feeds and keep all the HTML together, you can mitigate this problem.


The Final Result – Search Results!

Here is the final result – a fully functioning search page that pulls WordPress posts from Azure Search and searches against keywords with the results sorted by either relevance or create date.