Integrating WordPress and Azure Search with new Microsoft Azure Search SDK

As previously posted, Azure Search has been promoted to General Availability.  In February, I posted a detailed article on how to integrate Word Press with Azure Search using the Azure Search Preview APIs.  This article describes the same approach but with updated code using the new Azure Search SDK.  The latest code is committed to GitHub here.

In addition, I have now fully deployed the code to this blog so you can try it out…let me know what you think!

Getting Started

In order to integrate WordPress and Azure Search, the basic flow for data is:

clip_image001

In order to pull posts from WordPress, install the JSON REST API plugin found here (or in the plugin gallery). 

To create a custom WebJob, use the latest Azure SDK and Visual Studio 2013.  Once you have installed the Azure SDK, you’ll see a project template for Azure WebJobs. 

To use the Azure Search service, you need to create a search service in Azure.  See this article for directions on how to do this through the Azure Portal.

To access the Azure Search API, you can go through the REST API directly, or you can use the Microsoft Azure Search SDK.  To install the client into your WebJob, you run the NuGet package console and enter “Install-Package Microsoft.Azure.Search -Pre”.  This also installs the NewtonSoft JSON.NET library which we can also use for interacting with the WordPress REST API.

WebJobs Architecture

When you create a WebJob in Visual Studio, it provides the ability to deploy straight to your Azure Web Site.  This works really well.  Alternatively, you can upload it manually as an .exe through the portal.  You can also run your WebJob locally in debug mode which in this case works perfectly because we have no real dependencies on Azure Web Sites to run the job.

The basic components of the architecture are:

  • Program: the main web job console app.

  • WordPressJSONLoader: service class responsible for pulling posts from WordPress
  • WordPressPosts and WordPressPost: value objects representing the loaded collection of wordpress posts and each individual post.
  • AzureSearchIndexer: service class responsible for pushing posts into Azure Search.

Runtime configuration is done through the App.config and/or the Azure Web Sites configuration.  As part of the Azure SDK you can use the CloudConfigurationManager to get environment settings and it is smart enough to use values in the Azure Web Sites configuration as priority over any settings found locally in the App.Config.  If you are running locally, it degrades automatically to looking in your App.Config for configuration values. 

// load configuration attributes webSiteURL = CloudConfigurationManager.GetSetting("WebSiteURL"); searchServiceName = CloudConfigurationManager.GetSetting("ServiceName"); searchServiceKey = CloudConfigurationManager.GetSetting("ServiceKey"); indexName = CloudConfigurationManager.GetSetting("IndexName");

Retrieving Posts from WordPress

With the JSON REST API plugin installed, retrieving posts from WordPress is easy – just call the URL www.yourwebsite.com/?json=get_posts.  This will by default retrieve the last 10 posts but you can use filtering parameters and paging to change how many posts you retrieve.

Using the JSON.API library, you can deserialize your JSON into a JObject which provides you an easy way to pull entities such as posts, comments, etc. out of the returned JSON.

When the JSON REST API is called, it provides 10 posts and the number of “pages”.  Based on this number of pages, we can pull all the posts 10 posts at a time.

In this method, we simply pull out the posts and deserialize these to a collection of WordPressPost objects. 

One of the key changes to the Microsoft Azure Search SDK from the RedDog.Search client that was previously available is both async and regular methods are provided which makes the code a little bit simpler in a console application.

Note: One bug in the JSON API I found is that the excerpt field contains the JetPack plugin’s share button HTML if you have it activated.  In my code, I strip these out to only take the first paragraph representing the excerpt text.

/// <summary> /// Loads WordPress posts from any WordPress blog. /// </summary> /// <param name="URL">WordPress blog URL</param> /// <returns></returns> public static WordPressPosts LoadAllPosts(string URL) { try { WordPressPosts wordPressPosts = new WordPressPosts(); string query = "?json=get_posts"; WebClient client = new WebClient(); Stream stream = client.OpenRead(URL + query); StreamReader reader = new StreamReader(stream); var results = JObject.Parse(reader.ReadLine()); var JsonPosts = results["posts"]; if (JsonPosts != null) { foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } if (results["pages"] != null) { int pages = (int)results["pages"]; if (pages > 1) { for (int i = 2; i <= pages; i++) { query = "?json=get_posts&page=" + i; stream = client.OpenRead(URL + query); reader = new StreamReader(stream); results = JObject.Parse(reader.ReadLine()); JsonPosts = results["posts"]; foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } } } return wordPressPosts; } catch (Exception e) { throw; } }

Creating an Index

Creating an index is reasonably easy but I found a few gotchas along the way:

  • The key field MUST be a string (I originally tried to use an integer field).

  • Searchable fields MUST be of type string (I originally tried to make a date field searchable). 

If you try to violate the rules, the Index creation process fails and the result returned will be an error.

The new create index method looks like this:

/// <summary> /// Loads WordPress posts from any WordPress blog. /// </summary> /// <param name="URL">WordPress blog URL</param> /// <returns></returns> public static WordPressPosts LoadAllPosts(string URL) { try { WordPressPosts wordPressPosts = new WordPressPosts(); string query = "?json=get_posts"; WebClient client = new WebClient(); Stream stream = client.OpenRead(URL + query); StreamReader reader = new StreamReader(stream); var results = JObject.Parse(reader.ReadLine()); var JsonPosts = results["posts"]; if (JsonPosts != null) { foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } if (results["pages"] != null) { int pages = (int)results["pages"]; if (pages > 1) { for (int i = 2; i <= pages; i++) { query = "?json=get_posts&page=" + i; stream = client.OpenRead(URL + query); reader = new StreamReader(stream); results = JObject.Parse(reader.ReadLine()); JsonPosts = results["posts"]; foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } } } return wordPressPosts; } catch (Exception e) { throw; } }

Adding Posts to an Index

Now that we have our index, we can push posts into the index.  One of the new features of the Azure Search SDK is that you can pass rows in as objects and it will use reflection to convert the properties into field values. 

We have a class called WordPressPost that represents each post with its appropriate fields.

/// <summary> /// Value object representing a single WordPress post. /// </summary> public class WordPressPost { public string Id { get; set; } public string Status { get; set; } public string Title { get; set; } public string Content { get; set; } public string Excerpt { get; set; } public DateTime CreateDate { get; set; } public DateTime ModifiedDate { get; set; } public string CreateDateAsString { get; set; } public string ModifiedDateAsString { get; set; } public string Author { get; set; } public string Categories { get; set; } public string Slug { get; set; } public string Tags { get; set; } }

To add the post, we add the objects as an array and create an IndexBatch object like this:

try { DocumentIndexResponse response = indexClient.Documents.Index(IndexBatch.Create(BatchOfWordPressPosts.Select(doc => IndexAction.Create(doc)))); } catch (IndexBatchException e) { Console.WriteLine( "Failed to index some of the documents: {0}", String.Join(", ", e.IndexResponse.Results.Where(r => !r.Succeeded).Select(r => r.Key))); }

In the previous RedDog Azure Search library, there was a maximum of 1000 items per batch.  I haven’t found any maximum number of items per batch limitation yet for the new SDK, but I left in the code that limits the number of items to a 100 items per batch. 

Checking our Index in the Portal

We can verify that we have content in the index by going to the portal and checking out our index:

image_thumb3

As shown, we have a newly created index with 285 items.

Building a Search Portal

Now that we have some content, let’s build a simple search interface using just HTML and JavaScript.  We’ll use the REST APIs to fetch data from the index and display the search results using Angular.JS as a framework.

Publishing to Azure Web Sites into a Virtual Application

Our WordPress site has been installed into the root of the Azure Web Site.  When we publish our search pages and JavaScript code, we don’t want them clobbering our existing WordPress site or getting deleted or mangled by mistake if there is an upgrade to WordPress.

Azure Web Sites supports the addition of virtual applications that run in their own sub-directory.  To create one, go into the Configure tab of the Azure Web Site and go to the bottom of the page.  You will see a section called “virtual applications and directories”.  In here, we can create a completely separate application that runs in its own directory, with its own web.config and publishing profile.

clip_image001[6]

In Visual Studio, you can configure the publishing profile to publish to this new virtual application.

image_thumb4

Specify the subdirectory in both the Site Name and Destination URL fields.

Fetching the Search Results With AngularJS

Building a search form using AngularJS is ideal for pulling in data from Azure Search because Azure Search returns JSON data by default.  We can simply assign the results to an AngularJS variable and then use the AngularJS framework to display the results dynamically.

We start with a basic Search form styled using Bootstrap.  I use the Sparkling Theme for my WordPress blog and this them already uses Bootstrap as its core CSS framework so adding in some custom HTML using the same Bootstrap CSS elements works really well.

image_thumb5

The nice thing with using Bootstrap is that if you switch your WordPress theme, as long as it uses Bootstrap (most of them do these days) your search form and results will take on the style of your blog.

If you perform a search with no keywords specified, Azure Search will return ALL documents.  This isn’t something we would want so we have made keyword a required field and check to ensure it isn’t blank before submitting.

The submit method for fetching the Azure Search results is the key for pulling in the results from Azure Search.  In building this method, I found a few gotchas to share:

  • Make sure you include the api-version in the request or Azure Request will return an error.

  • The default order by is relevance.  In our case, we have also added an additional option to sort by Create Date (e.g. $orderby=CreateDate desc.
  • You have to include the api-key in the HTTP header when you send in the request.  You can create a Query key in the azure portal instead of using the admin key and having it public.
  • You assign the JSON object “value” – this contains the search results.

vm.submit = function (item, event) { if (vm.orderby == "Relevance") var URLstring = vm.URL + "?search=" + vm.keywords + "&api-version=" + vm.APIVersion; else var URLstring = vm.URL + "?search=" + vm.keywords + "&$orderby=CreateDate desc" + "&api-version=" + vm.APIVersion; if (!isEmpty(vm.keywords)) { var responsePromise = $http.get(URLstring, config, {}); responsePromise.success(function (dataFromServer, status, headers, config) { vm.results = dataFromServer.value; vm.showSearchResults = true; }); responsePromise.error(function (data, status, headers, config) { alert("Submitting form failed!"); }); } else { vm.showSearchResults = false; vm.results = []; } }

Displaying the Results

Once we have a JSON object with the search results, displaying them is pretty easy – just use the AngularJS ng-repeat attribute to iterate through the results returned.

One key note is the use of a filter to treat the HTML returned as HTML – by default AngularJS will HTML encode the HTML instead of letting it through raw.  In order to change this behaviour, you can add this function:

angular.module('app').filter('unsafe', function ($sce) { return function (val) { return $sce.trustAsHtml(val); }; });

Using this filter you can then declare the variable as unsafe and it will be allowed through as raw HTML.

Adding a link to the original post is easy – just create an anchor link with the ID of the post.  (You could also use the slug variable that is indexed if permalinks are turned on for more friendly URL’s).

Integrating into WordPress

With the solution published to Azure Web Sites into a Search subdirectory, we can use the published JavaScript files and embed them into our WordPress site.  While a proper WordPress plugin would be ideal, we just added the search.html code into a WordPress page using the out of the box content editor.

Note: when adding HTML into a page using the text editor in WordPress, if you lead any line feeds WordPress converts them into <p> tags.  This isn’t what we want with all our javascript and AngularJS code.  If you delete all the line feeds and keep all the HTML together, you can mitigate this problem.


Adding a Search Form on the Home Page

In addition to the search results page, we can add a widget to include a basic search form on the home page.  You can embed the HTML for the form using the widget editor and adding a text widget.

image_thumb6

Reading Query Data from JavaScript

In order to read the submitted form from the home page to the search results page, we need to read the posted values that are included in the query string.

I found a basic JavaScript function that parses the query string and looks for incoming search parameters.  I then load these into the AngularJS controller and execute a search on the initial page load.

function getUrlParameters(parameter, staticURL, decode) { /* Function: getUrlParameters Description: Get the value of URL parameters either from current URL or static URL Author: Tirumal URL: www.code-tricks.com */ var path = (staticURL.length) ? staticURL : window.location.search; if (path.indexOf("?") >= 0) { var currLocation = path, parArr = currLocation.split("?")[1].split("&"), returnBool = true; for (var i = 0; i < parArr.length; i++) { var parr = parArr[i].split("="); if (parr[0] == parameter) { return (decode) ? decodeURIComponent(parr[1]) : parr[1]; returnBool = true; } else { returnBool = false; } } } else returnBool = false; if (!returnBool) return false; }

The Final Result – Search Results!

Here is the final result – a fully functioning search page that pulls WordPress posts from Azure Search and searches against keywords with the results sorted by either relevance or create date.

image