Azure Search Introduces New Service Tiers

Microsoft has just announced a series of new service tiers for Azure Search.  The previously announced Basic tier has gone from preview to generally available.  Azure Search Basic pricing has been increased from the preview price of US$38 per month to $75 / month.    Basic supports up to 1 million documents per partition and 5 indexes per service, which is a good option for many small to medium sized web sites.

image

The new S3 tier is available in preview.  It supports storage of up to 2.4 TB of documents, 1.4 billion documents served across 36 scale out units.  It is targeted to customers with massive search needs.

Microsoft has also created a tier called S3 HD which is targeted at ISVs and SAAS vendors with lots of customers who have small indexes (e.g. less than 1 million documents).  S3 HD allows you to pack in up to 1000 indexes per search service, making it ideal for vendors who need to spin up search instances for many customers.

Read More

Azure Search Basic Tier Now Available

When Azure Search launched, I posted that a basic tier would be very helpful as the standard tier started at $250 per month.  This seemed like a lot of money for a basic search engine.

image

Microsoft has just launched in preview a new Basic tier that costs $38 / month.  The basic tier has some limits on overall storage (2 GB), queries per second (3), indexes per service (5), number of documents (1 million) and the number of scale out units (3) but for many small web sites they likely won’t exceed these limits.  For example, this blog uses Azure Search and it barely uses the available capacity in the free tier. 

Read More

Azure Search Now Supports Lucene Fuzzy Searching in Preview

Azure Search provides a cloud based text indexing engine – this blog uses Azure search for its search engine.  The underlying indexing engine is lucene.net wrapped in a cloud service.  The search query API is provided through a REST API and a .NET SDK.   

In the latest preview version of the Azure Search API, you can now execute Lucene queries that enable some new features for querying the search index:

  • Fuzzy Search: finds terms that are similar.  For example, “blue” is similar to “blue”, “blues”, and “glue”.
  • Proximity Search: finds terms that are in proximity but not right beside each other in a phrase.  For example, searching for “azure search” could return “azure search” but also “azure uses search” or “azure indexed by search” because azure and search are close to each other.
  • Term Boosting: allows you to boost one term over another in your results.  For example the search query “rock^2 music” will boost results containing the word rock higher than results containing the word music.
  • Regular Expression Search: allows for searching based on regular expression. 
  • Wildcard Search: allows for searching for terms that start with a specific query but then follow with multiple additional characters.  For example, “star*” would find documents containing “starlight”, “starship”, “starman”, etc.

In order to leverage these new APIs, you have to use the 2015-02-28 Preview API.  In addition, the .NET SDK doesn’t yet support this release, so you’ll have to construct your own REST calls.

Read More

Integrating WordPress and Azure Search with new Microsoft Azure Search SDK

As previously posted, Azure Search has been promoted to General Availability.  In February, I posted a detailed article on how to integrate Word Press with Azure Search using the Azure Search Preview APIs.  This article describes the same approach but with updated code using the new Azure Search SDK.  The latest code is committed to GitHub here.

In addition, I have now fully deployed the code to this blog so you can try it out…let me know what you think!

Getting Started

In order to integrate WordPress and Azure Search, the basic flow for data is:

clip_image001

In order to pull posts from WordPress, install the JSON REST API plugin found here (or in the plugin gallery). 

To create a custom WebJob, use the latest Azure SDK and Visual Studio 2013.  Once you have installed the Azure SDK, you’ll see a project template for Azure WebJobs. 

To use the Azure Search service, you need to create a search service in Azure.  See this article for directions on how to do this through the Azure Portal.

To access the Azure Search API, you can go through the REST API directly, or you can use the Microsoft Azure Search SDK.  To install the client into your WebJob, you run the NuGet package console and enter “Install-Package Microsoft.Azure.Search -Pre”.  This also installs the NewtonSoft JSON.NET library which we can also use for interacting with the WordPress REST API.

WebJobs Architecture

When you create a WebJob in Visual Studio, it provides the ability to deploy straight to your Azure Web Site.  This works really well.  Alternatively, you can upload it manually as an .exe through the portal.  You can also run your WebJob locally in debug mode which in this case works perfectly because we have no real dependencies on Azure Web Sites to run the job.

The basic components of the architecture are:

  • Program: the main web job console app.

  • WordPressJSONLoader: service class responsible for pulling posts from WordPress
  • WordPressPosts and WordPressPost: value objects representing the loaded collection of wordpress posts and each individual post.
  • AzureSearchIndexer: service class responsible for pushing posts into Azure Search.

Runtime configuration is done through the App.config and/or the Azure Web Sites configuration.  As part of the Azure SDK you can use the CloudConfigurationManager to get environment settings and it is smart enough to use values in the Azure Web Sites configuration as priority over any settings found locally in the App.Config.  If you are running locally, it degrades automatically to looking in your App.Config for configuration values. 

// load configuration attributes webSiteURL = CloudConfigurationManager.GetSetting("WebSiteURL"); searchServiceName = CloudConfigurationManager.GetSetting("ServiceName"); searchServiceKey = CloudConfigurationManager.GetSetting("ServiceKey"); indexName = CloudConfigurationManager.GetSetting("IndexName");

Retrieving Posts from WordPress

With the JSON REST API plugin installed, retrieving posts from WordPress is easy – just call the URL www.yourwebsite.com/?json=get_posts.  This will by default retrieve the last 10 posts but you can use filtering parameters and paging to change how many posts you retrieve.

Using the JSON.API library, you can deserialize your JSON into a JObject which provides you an easy way to pull entities such as posts, comments, etc. out of the returned JSON.

When the JSON REST API is called, it provides 10 posts and the number of “pages”.  Based on this number of pages, we can pull all the posts 10 posts at a time.

In this method, we simply pull out the posts and deserialize these to a collection of WordPressPost objects. 

One of the key changes to the Microsoft Azure Search SDK from the RedDog.Search client that was previously available is both async and regular methods are provided which makes the code a little bit simpler in a console application.

Note: One bug in the JSON API I found is that the excerpt field contains the JetPack plugin’s share button HTML if you have it activated.  In my code, I strip these out to only take the first paragraph representing the excerpt text.

/// <summary> /// Loads WordPress posts from any WordPress blog. /// </summary> /// <param name="URL">WordPress blog URL</param> /// <returns></returns> public static WordPressPosts LoadAllPosts(string URL) { try { WordPressPosts wordPressPosts = new WordPressPosts(); string query = "?json=get_posts"; WebClient client = new WebClient(); Stream stream = client.OpenRead(URL + query); StreamReader reader = new StreamReader(stream); var results = JObject.Parse(reader.ReadLine()); var JsonPosts = results["posts"]; if (JsonPosts != null) { foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } if (results["pages"] != null) { int pages = (int)results["pages"]; if (pages > 1) { for (int i = 2; i <= pages; i++) { query = "?json=get_posts&page=" + i; stream = client.OpenRead(URL + query); reader = new StreamReader(stream); results = JObject.Parse(reader.ReadLine()); JsonPosts = results["posts"]; foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } } } return wordPressPosts; } catch (Exception e) { throw; } }

Creating an Index

Creating an index is reasonably easy but I found a few gotchas along the way:

  • The key field MUST be a string (I originally tried to use an integer field).

  • Searchable fields MUST be of type string (I originally tried to make a date field searchable). 

If you try to violate the rules, the Index creation process fails and the result returned will be an error.

The new create index method looks like this:

/// <summary> /// Loads WordPress posts from any WordPress blog. /// </summary> /// <param name="URL">WordPress blog URL</param> /// <returns></returns> public static WordPressPosts LoadAllPosts(string URL) { try { WordPressPosts wordPressPosts = new WordPressPosts(); string query = "?json=get_posts"; WebClient client = new WebClient(); Stream stream = client.OpenRead(URL + query); StreamReader reader = new StreamReader(stream); var results = JObject.Parse(reader.ReadLine()); var JsonPosts = results["posts"]; if (JsonPosts != null) { foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } if (results["pages"] != null) { int pages = (int)results["pages"]; if (pages > 1) { for (int i = 2; i <= pages; i++) { query = "?json=get_posts&page=" + i; stream = client.OpenRead(URL + query); reader = new StreamReader(stream); results = JObject.Parse(reader.ReadLine()); JsonPosts = results["posts"]; foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } } } return wordPressPosts; } catch (Exception e) { throw; } }

Adding Posts to an Index

Now that we have our index, we can push posts into the index.  One of the new features of the Azure Search SDK is that you can pass rows in as objects and it will use reflection to convert the properties into field values. 

We have a class called WordPressPost that represents each post with its appropriate fields.

/// <summary> /// Value object representing a single WordPress post. /// </summary> public class WordPressPost { public string Id { get; set; } public string Status { get; set; } public string Title { get; set; } public string Content { get; set; } public string Excerpt { get; set; } public DateTime CreateDate { get; set; } public DateTime ModifiedDate { get; set; } public string CreateDateAsString { get; set; } public string ModifiedDateAsString { get; set; } public string Author { get; set; } public string Categories { get; set; } public string Slug { get; set; } public string Tags { get; set; } }

To add the post, we add the objects as an array and create an IndexBatch object like this:

try { DocumentIndexResponse response = indexClient.Documents.Index(IndexBatch.Create(BatchOfWordPressPosts.Select(doc => IndexAction.Create(doc)))); } catch (IndexBatchException e) { Console.WriteLine( "Failed to index some of the documents: {0}", String.Join(", ", e.IndexResponse.Results.Where(r => !r.Succeeded).Select(r => r.Key))); }

In the previous RedDog Azure Search library, there was a maximum of 1000 items per batch.  I haven’t found any maximum number of items per batch limitation yet for the new SDK, but I left in the code that limits the number of items to a 100 items per batch. 

Checking our Index in the Portal

We can verify that we have content in the index by going to the portal and checking out our index:

image_thumb3

As shown, we have a newly created index with 285 items.

Building a Search Portal

Now that we have some content, let’s build a simple search interface using just HTML and JavaScript.  We’ll use the REST APIs to fetch data from the index and display the search results using Angular.JS as a framework.

Publishing to Azure Web Sites into a Virtual Application

Our WordPress site has been installed into the root of the Azure Web Site.  When we publish our search pages and JavaScript code, we don’t want them clobbering our existing WordPress site or getting deleted or mangled by mistake if there is an upgrade to WordPress.

Azure Web Sites supports the addition of virtual applications that run in their own sub-directory.  To create one, go into the Configure tab of the Azure Web Site and go to the bottom of the page.  You will see a section called “virtual applications and directories”.  In here, we can create a completely separate application that runs in its own directory, with its own web.config and publishing profile.

clip_image001[6]

In Visual Studio, you can configure the publishing profile to publish to this new virtual application.

image_thumb4

Specify the subdirectory in both the Site Name and Destination URL fields.

Fetching the Search Results With AngularJS

Building a search form using AngularJS is ideal for pulling in data from Azure Search because Azure Search returns JSON data by default.  We can simply assign the results to an AngularJS variable and then use the AngularJS framework to display the results dynamically.

We start with a basic Search form styled using Bootstrap.  I use the Sparkling Theme for my WordPress blog and this them already uses Bootstrap as its core CSS framework so adding in some custom HTML using the same Bootstrap CSS elements works really well.

image_thumb5

The nice thing with using Bootstrap is that if you switch your WordPress theme, as long as it uses Bootstrap (most of them do these days) your search form and results will take on the style of your blog.

If you perform a search with no keywords specified, Azure Search will return ALL documents.  This isn’t something we would want so we have made keyword a required field and check to ensure it isn’t blank before submitting.

The submit method for fetching the Azure Search results is the key for pulling in the results from Azure Search.  In building this method, I found a few gotchas to share:

  • Make sure you include the api-version in the request or Azure Request will return an error.

  • The default order by is relevance.  In our case, we have also added an additional option to sort by Create Date (e.g. $orderby=CreateDate desc.
  • You have to include the api-key in the HTTP header when you send in the request.  You can create a Query key in the azure portal instead of using the admin key and having it public.
  • You assign the JSON object “value” – this contains the search results.

vm.submit = function (item, event) { if (vm.orderby == "Relevance") var URLstring = vm.URL + "?search=" + vm.keywords + "&api-version=" + vm.APIVersion; else var URLstring = vm.URL + "?search=" + vm.keywords + "&$orderby=CreateDate desc" + "&api-version=" + vm.APIVersion; if (!isEmpty(vm.keywords)) { var responsePromise = $http.get(URLstring, config, {}); responsePromise.success(function (dataFromServer, status, headers, config) { vm.results = dataFromServer.value; vm.showSearchResults = true; }); responsePromise.error(function (data, status, headers, config) { alert("Submitting form failed!"); }); } else { vm.showSearchResults = false; vm.results = []; } }

Displaying the Results

Once we have a JSON object with the search results, displaying them is pretty easy – just use the AngularJS ng-repeat attribute to iterate through the results returned.

One key note is the use of a filter to treat the HTML returned as HTML – by default AngularJS will HTML encode the HTML instead of letting it through raw.  In order to change this behaviour, you can add this function:

angular.module('app').filter('unsafe', function ($sce) { return function (val) { return $sce.trustAsHtml(val); }; });

Using this filter you can then declare the variable as unsafe and it will be allowed through as raw HTML.

Adding a link to the original post is easy – just create an anchor link with the ID of the post.  (You could also use the slug variable that is indexed if permalinks are turned on for more friendly URL’s).

Integrating into WordPress

With the solution published to Azure Web Sites into a Search subdirectory, we can use the published JavaScript files and embed them into our WordPress site.  While a proper WordPress plugin would be ideal, we just added the search.html code into a WordPress page using the out of the box content editor.

Note: when adding HTML into a page using the text editor in WordPress, if you lead any line feeds WordPress converts them into <p> tags.  This isn’t what we want with all our javascript and AngularJS code.  If you delete all the line feeds and keep all the HTML together, you can mitigate this problem.


Adding a Search Form on the Home Page

In addition to the search results page, we can add a widget to include a basic search form on the home page.  You can embed the HTML for the form using the widget editor and adding a text widget.

image_thumb6

Reading Query Data from JavaScript

In order to read the submitted form from the home page to the search results page, we need to read the posted values that are included in the query string.

I found a basic JavaScript function that parses the query string and looks for incoming search parameters.  I then load these into the AngularJS controller and execute a search on the initial page load.

function getUrlParameters(parameter, staticURL, decode) { /* Function: getUrlParameters Description: Get the value of URL parameters either from current URL or static URL Author: Tirumal URL: www.code-tricks.com */ var path = (staticURL.length) ? staticURL : window.location.search; if (path.indexOf("?") >= 0) { var currLocation = path, parArr = currLocation.split("?")[1].split("&"), returnBool = true; for (var i = 0; i < parArr.length; i++) { var parr = parArr[i].split("="); if (parr[0] == parameter) { return (decode) ? decodeURIComponent(parr[1]) : parr[1]; returnBool = true; } else { returnBool = false; } } } else returnBool = false; if (!returnBool) return false; }

The Final Result – Search Results!

Here is the final result – a fully functioning search page that pulls WordPress posts from Azure Search and searches against keywords with the results sorted by either relevance or create date.

image

Read More

Azure Search Needs a Basic Tier

One of the great things about Azure is the ability to scale a service from very small to very large.  For example, I can create an Azure DB for as little as $6 a month on Basic.

Azure Search, which has just gone General Availability,  has two pricing options – Free and Standard.  image

Free supports up to 10,000 documents but you can only have one Free index per subscription.

Standard starts at $250.00 for a “search unit” (more units = higher performance guarantees) and up to 15m documents.  You can scale it up to 36 search units as needed.

However, there is no “Basic” tier option – the minimum cost is $250 a month, which seems pretty steep for a hosted search engine, especially for a small web site or application. 

Compared to Amazon, the price seems pretty expensive – $0.355 / hr compared with as little as $0.059 / hr for Amazon SearchHowever, Amazon also charges you $0.10 per 1,000 Batch Upload Requests and $0.98 per GB of data for every time you re-index the search.  If you were indexing your documents once an hour and you have 1-2 GB of data, that’s $47 per day in potential charges! 

Read More

Azure Search is now Generally Available

Azure Search

Azure Search has just gone from preview to general availability.  In addition to the service now being production ready, the Microsoft team has added a number of new features.

New Azure SDK

There is a new Azure Search .NET SDK available that wraps the REST API into .NET classes.  One of the neat features of the new SDK is the ability to serialize objects to the index.  When you retrieve the results of a search, you can also automatically de-serialize the returned rows back into your object.

The Azure Search .NET SDK is available through the NuGet Package manager.

Enhanced Multilanguage Support

The preview version of Azure Search only support indexing in English.  Now there is support for more than 50 additional languages. 

New Indexers for SQL Server, Azure SQL and Azure DocumentDB

Using the Azure Search .NET SDK or the REST API, you can push any text data into your index with code. 

Microsoft has now included built indexers for SQL Server running in an Azure VM, Azure SQL, and Azure DocumentDB.  You can simply point an indexer at your existing repository and have it pull the content from it on a scheduled basis with no code required.

Enhanced Search Suggestions

Search suggestions is the ability to suggest search terms as you type into the search box.  Search suggestions have been enhanced significantly in this release from the preview version:

  • Matching for terms within a phrase instead of the start of a phrase
  • Fuzzy matching to accommodate spelling mistakes and near misses
  • Increase to 100 suggestions per result
  • No more 3 character minimum to return a result

Read More

Integrating WordPress with Azure Search Service

This blog runs on WordPress using the Brandoo WordPress Plugin.  One of the key challenges with the Brandoo plugin is that the default search service doesn’t work.  I decided to build my own using Azure WebJobs, Azure Search Service and the WordPress REST JSON API.  Here are my lessons learned from developing an Azure Search Solution. 

Note: you can find all the code from the sample solution in GitHub here.

Getting Started

In order to integrate WordPress and Azure Search, the basic flow for data is:

clip_image001

In order to pull posts from WordPress, install the JSON REST API plugin found here (or in the plugin gallery). 

To create a custom WebJob, use the latest Azure SDK and Visual Studio 2013.  Once you have installed the Azure SDK, you’ll see a project template for Azure WebJobs. 

To use the Azure Search service, you need to create a search service in Azure.  See this article for directions on how to do this through the Azure Portal.

To access the Azure Search API, you can go through the REST API directly, or you can use the RedDog.Search C# client.  To install the client into your WebJob, you run the NuGet package console and enter “Install-Package RedDog.Search”.  This also installs the NewtonSoft JSON.NET library which we can also use for interacting with the WordPress REST API.

WebJobs Architecture

When you create a WebJob in Visual Studio, it provides the ability to deploy straight to your Azure Web Site.  This works really well.  Alternatively, you can upload it manually as an .exe through the portal.  You can also run your WebJob locally in debug mode which in this case works perfectly because we have no real dependencies on Azure Web Sites to run the job.

The basic components of the architecture are:

  • Program: the main web job console app.
  • WordPressJSONLoader: service class responsible for pulling posts from WordPress
  • WordPressPosts and WordPressPost: value objects representing the loaded collection of wordpress posts and each individual post.
  • AzureSearchIndexer: service class responsible for pushing posts into Azure Search.

Runtime configuration is done through the App.config and/or the Azure Web Sites configuration.  As part of the Azure SDK you can use the CloudConfigurationManager to get environment settings and it is smart enough to use values in the Azure Web Sites configuration as priority over any settings found locally in the App.Config.  If you are running locally, it degrades automatically to looking in your App.Config for configuration values. 

// load configuration attributes webSiteURL = CloudConfigurationManager.GetSetting("WebSiteURL"); searchServiceName = CloudConfigurationManager.GetSetting("ServiceName"); searchServiceKey = CloudConfigurationManager.GetSetting("ServiceKey"); indexName = CloudConfigurationManager.GetSetting("IndexName");

Retrieving Posts from WordPress

With the JSON REST API plugin installed, retrieving posts from WordPress is easy – just call the URL www.yourwebsite.com/?json=get_posts.  This will by default retrieve the last 10 posts but you can use filtering parameters and paging to change how many posts you retrieve.

Using the JSON.API library, you can deserialize your JSON into a JObject which provides you an easy way to pull entities such as posts, comments, etc. out of the returned JSON.

When the JSON REST API is called, it provides 10 posts and the number of “pages”.  Based on this number of pages, we can pull all the posts 10 posts at a time.

public static WordPressPosts LoadAllPosts(string URL) { try { WordPressPosts wordPressPosts = new WordPressPosts(); string query = "?json=get_posts"; WebClient client = new WebClient(); Stream stream = client.OpenRead(URL + query); StreamReader reader = new StreamReader(stream); var results = JObject.Parse(reader.ReadLine()); var JsonPosts = results["posts"]; if (JsonPosts != null) { foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } if (results["pages"] != null) { int pages = (int)results["pages"]; if (pages > 1) { for (int i = 2; i <= pages; i++) { query = "?json=get_posts&page=" + i; stream = client.OpenRead(URL + query); reader = new StreamReader(stream); results = JObject.Parse(reader.ReadLine()); JsonPosts = results["posts"]; foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } } } return wordPressPosts; } catch (Exception e) { throw; } }

In this method, we simply pull out the posts and deserialize these to a collection of WordPressPost objects. 

Running Async Tasks in Console Apps

The RedDog.search library contains only the new .NET 4.5 async methods.  You need to be careful to wrap these methods so that your console app doesn’t delegate out to these methods and then end the program prematurely.  The way to achieve this is to create an async method that you execute from your main program and wait for it using the Wait() method.

You can then call this method from Main() like this:

In addition, make sure that all your async methods return Task instead of void as this will cause your console app to prematurely exit.

Checking for Errors

In the RedDog.Search library, you call all its methods like this:

public async Task CreateIndex() { // check to see if index exists. If not, then create it. var result = await managementClient.GetIndexAsync(Index); if (!result.IsSuccess) { result = await managementClient.CreateIndexAsync(new Index(Index) .WithStringField("Id", f => f.IsKey().IsRetrievable()) .WithStringField("Title", f => f.IsRetrievable().IsSearchable()) .WithStringField("Content", f => f.IsSearchable().IsRetrievable()) .WithStringField("Excerpt", f => f.IsRetrievable()) .WithDateTimeField("CreateDate", f => f.IsRetrievable().IsSortable().IsFilterable().IsFacetable()) .WithDateTimeField("ModifiedDate", f => f.IsRetrievable().IsSortable().IsFilterable().IsFacetable()) .WithStringField("CreateDateAsString", f => f.IsSearchable().IsRetrievable().IsFilterable()) .WithStringField("ModifiedDateAsString", f => f.IsSearchable().IsRetrievable().IsFilterable()) .WithStringField("Author", f=>f.IsSearchable().IsRetrievable().IsFilterable()) .WithStringField("Categories", f => f.IsSearchable().IsRetrievable()) .WithStringField("Tags", f => f.IsSearchable().IsRetrievable()) .WithStringField("Slug", f => f.IsRetrievable()) .WithIntegerField("CommentCount", f => f.IsRetrievable()) .WithStringField("CommentContent", f=>f.IsSearchable().IsRetrievable()) ); if (!result.IsSuccess) { Console.Out.WriteLine(result.Error.Message); } } }

The result will provide a status of success and in the case of an error, some important error details.   Anything that is written to the Console is redirected into the Azure Web Sites log for the WebJob.

Creating an Index

Creating an index is reasonably easy but I found a few gotchas along the way:

  • The key field MUST be a string (I originally tried to use an integer field).
  • Searchable fields MUST be of type string (I originally tried to make a date field searchable). 

If you try to violate the rules, the Index creation process fails and the result returned will be an error.

Adding Posts to an Index

Now that we have our index, we can push posts into the index.

foreach (WordPressPost post in WordPressPosts.Posts) { IndexOperation indexOperation = new IndexOperation(IndexOperationType.MergeOrUpload, "Id", post.Id.ToString()) .WithProperty("Title", post.Title) .WithProperty("Content", post.Content) .WithProperty("Excerpt", post.Excerpt) .WithProperty("CreateDate", post.CreateDate.ToUniversalTime()) .WithProperty("ModifiedDate", post.ModifiedDate.ToUniversalTime()) .WithProperty("CreateDateAsString", post.CreateDate.ToLongDateString()) .WithProperty("ModifiedDateAsString", post.ModifiedDate.ToLongDateString()); IndexOperationList.Add(indexOperation); } var result = await managementClient.PopulateAsync(Index, IndexOperationList.ToArray() ); if (!result.IsSuccess) Console.Out.WriteLine(result.Error.Message); foreach (WordPressPost post in WordPressPosts.Posts) { IndexOperation indexOperation = new IndexOperation(IndexOperationType.MergeOrUpload, "Id", post.Id.ToString()) .WithProperty("Title", post.Title) .WithProperty("Content", post.Content) .WithProperty("Excerpt", post.Excerpt) .WithProperty("CreateDate", post.CreateDate.ToUniversalTime()) .WithProperty("ModifiedDate", post.ModifiedDate.ToUniversalTime()) .WithProperty("CreateDateAsString", post.CreateDate.ToLongDateString()) .WithProperty("ModifiedDateAsString", post.ModifiedDate.ToLongDateString()); IndexOperationList.Add(indexOperation); } var result = await managementClient.PopulateAsync(Index, IndexOperationList.ToArray() ); if (!result.IsSuccess) Console.Out.WriteLine(result.Error.Message);

One key gotcha on adding items to the index – the date field must be in UniversalTime or you’ll get an error message.   For example, instead of supplying post.ModifiedDate as a DateTime attribute you need to call post.ModifiedDate.ToUniversalTime() or the index operation will generate an error.

The RedDog.Search PopulateAsync method allows you to add multiple IndexOperations objects that store up your document post requests into a batch.  The maximum number of IndexOperations the library supports is 1,000 or 16 MB.  In our method, we limit the number of posts per batch to 100 posts to be well under this limit.

public async Task AddPosts() { // if not previously connected, make a connection if (!connected) Connect(); // create the index if it hasn't already been created. await CreateIndex(); // run index population in batches. The Reddog.Search client maxes out at 1000 operations or about 16 MB of data transfer, so we have set the maximum to 100 posts in a batch to be conservative. int batchCount = 0; List<IndexOperation> IndexOperationList = new List<IndexOperation>(maximumNumberOfDocumentsPerBatch); foreach (WordPressPost post in WordPressPosts.Posts) { batchCount++; // create an indexoperation with the appropriate metadata and supply it with the incoming WordPress post IndexOperation indexOperation = new IndexOperation(IndexOperationType.MergeOrUpload, "Id", post.Id.ToString()) .WithProperty("Title", post.Title) .WithProperty("Content", post.Content) .WithProperty("Excerpt", post.Excerpt) .WithProperty("CreateDate", post.CreateDate.ToUniversalTime()) .WithProperty("ModifiedDate", post.ModifiedDate.ToUniversalTime()) .WithProperty("CreateDateAsString", post.CreateDate.ToLongDateString()) .WithProperty("ModifiedDateAsString", post.ModifiedDate.ToLongDateString()) .WithProperty("Author", post.Author) .WithProperty("Categories", post.Categories) .WithProperty("Tags", post.Tags) .WithProperty("Slug", post.Slug) .WithProperty("CommentCount", post.CommentCount) .WithProperty("CommentContent", post.CommentContent); // add the index operation to the collection IndexOperationList.Add(indexOperation); // if we have added maximum number of documents per batch, add the collection of operations to the index and then reset the collection to add a new batch. if (batchCount >= maximumNumberOfDocumentsPerBatch) { var result = await managementClient.PopulateAsync(Index, IndexOperationList.ToArray()); if (!result.IsSuccess) Console.Out.WriteLine(result.Error.Message); batchCount = 0; IndexOperationList = new List<IndexOperation>(maximumNumberOfDocumentsPerBatch); } } // look for any remaining items that have not yet been added to the index. var remainingResult = await managementClient.PopulateAsync(Index, IndexOperationList.ToArray() ); if (!remainingResult.IsSuccess) Console.Out.WriteLine(remainingResult.Error.Message); }

Now that we have our index, we can push posts into the index.

Checking our Index in the Portal

We can verify that we have content in the index by going to the portal and checking out our index:

image

As shown, we have a newly created index with 291 items in it.

Building a Search Portal

Now that we have some content, let’s build a simple search interface using just HTML and JavaScript.  We’ll use the REST APIs to fetch data from the index and display the search results using Angular.JS as a framework.

Publishing to Azure Web Sites into a Virtual Application

Our WordPress site has been installed into the root of the Azure Web Site.  When we publish our search pages and JavaScript code, we don’t want them clobbering our existing WordPress site or getting deleted or mangled by mistake if there is an upgrade to WordPress.

Azure Web Sites supports the addition of virtual applications that run in their own sub-directory.  To create one, go into the Configure tab of the Azure Web Site and go to the bottom of the page.  You will see a section called “virtual applications and directories”.  In here, we can create a completely separate application that runs in its own directory, with its own web.config and publishing profile.

clip_image001[6]

In Visual Studio, you can configure the publishing profile to publish to this new virtual application.

clip_image002

Specify the subdirectory in both the Site Name and Destination URL fields.

Fetching the Search Results With AngularJS

Building a search form using AngularJS is ideal for pulling in data from Azure Search because Azure Search returns JSON data by default.  We can simply assign the results to an AngularJS variable and then use the AngularJS framework to display the results dynamically.

We start with a basic Search form styled using Bootstrap.  I use the Sparkling Theme for my WordPress blog and this them already uses Bootstrap as its core CSS framework so adding in some custom HTML using the same Bootstrap CSS elements works really well.

clip_image003

The nice thing with using Bootstrap is that if you switch your WordPress theme, as long as it uses Bootstrap (most of them do these days) your search form and results will take on the style of your blog.

If you perform a search with no keywords specified, Azure Search will return ALL documents.  This isn’t something we would want so we have made keyword a required field and check to ensure it isn’t blank before submitting.

The submit method for fetching the Azure Search results is the key for pulling in the results from Azure Search.  In building this method, I found a few gotchas to share:

  • Make sure you include the api-version in the request or Azure Request will return an error.
  • The default order by is relevance.  In our case, we have also added an additional option to sort by Create Date (e.g. $orderby=CreateDate desc.
  • You have to include the api-key in the HTTP header when you send in the request.  You can create a Query key in the azure portal instead of using the admin key and having it public.
  • You assign the JSON object “value” – this contains the search results.
vm.submit = function (item, event) { if (vm.orderby == "Relevance") var URLstring = vm.URL + "?search=" + vm.keywords + "&api-version=" + vm.APIVersion; else var URLstring = vm.URL + "?search=" + vm.keywords + "&$orderby=CreateDate desc" + "&api-version=" + vm.APIVersion; if (!isEmpty(vm.keywords)) { var responsePromise = $http.get(URLstring, config, {}); responsePromise.success(function (dataFromServer, status, headers, config) { vm.results = dataFromServer.value; vm.showSearchResults = true; }); responsePromise.error(function (data, status, headers, config) { alert("Submitting form failed!"); }); } else { vm.showSearchResults = false; vm.results = []; } }

Displaying the Results

Once we have a JSON object with the search results, displaying them is pretty easy – just use the AngularJS ng-repeat attribute to iterate through the results returned.

<div ng-repeat="result in search.results"> <a class="h1" href="http://wordpressazuresearchintegration.azurewebsites.net/?p={{result.Id}}">{{result.Title}}</a> <div class="h6" ng-bind-html="result.CreateDateAsString | unsafe"></div> <div ng-bind-html="result.Excerpt | unsafe"></div> </div>

One key note is the use of a filter to treat the HTML returned as HTML – by default AngularJS will HTML encode the HTML instead of letting it through raw.  In order to change this behaviour, you can add this function:

angular.module('app').filter('unsafe', function ($sce) { return function (val) { return $sce.trustAsHtml(val); }; });

Using this filter you can then declare the variable as unsafe and it will be allowed through as raw HTML.

Adding a link to the original post is easy – just create an anchor link with the ID of the post.  (You could also use the slug variable that is indexed if permalinks are turned on for more friendly URL’s).

Integrating into WordPress

With the solution published to Azure Web Sites into a Search subdirectory, we can use the published JavaScript files and embed them into our WordPress site.  While a proper WordPress plugin would be ideal, we just added the search.html code into a WordPress page using the out of the box content editor.

Note: when adding HTML into a page using the text editor in WordPress, if you lead any line feeds WordPress converts them into <p> tags.  This isn’t what we want with all our javascript and AngularJS code.  If you delete all the line feeds and keep all the HTML together, you can mitigate this problem.

clip_image001[8]

The Final Result – Search Results!

Here is the final result – a fully functioning search page that pulls WordPress posts from Azure Search and searches against keywords with the results sorted by either relevance or create date.

clip_image002[6]

Read More