Using Power BI to Visualize WordPress Blog Posts

Microsofttrends.com runs on WordPress, and I have been working with Power BI to show some analytics on my posting behavior.  Fetching the data can be done using a JSON plugin (I use the JSON API plugin but there is also now a new WordPress REST API plugin) which will provide all your blog posts as a JSON data source.  

Accessing WordPress Blog Posts

The first step is accessing the JSON data – in the latest version of Power BI desktop there is a dedicated JSON data source that you can provide the URL for your JSON feed and it will fetch the data into PowerQuery for further manipulation.  Here is what the results of my query look like in Power BI Desktop.

image

As a table, this just provides the count of the posts but not the posts themselves – not very interesting.  However, if I click on the List of posts then I can drill down into the JSON and fetch the list of posts themselves.

Each post is represented as a record.  You can convert this now to a table by clicking on the “To Table” function and then clicking on the expand icon to the right of column 1.  When you do this you are prompted to select the fields that you want to select as part of your table.

image

If you click Close and Apply this will load all your blog posts into a table. 

Adding Date Filtering and Drill Down

Each post as a posting date.  We can create a calculated table that represents all the possible dates and add in year, month, and day columns that are nicely formatted to support drill down by year, month or day.

Start by creating a calculated table by clicking on “New Table” with the following formula:

DateTable = CALENDAR (MINX(‘Blog Posts’, [date]), NOW())

This will create a table of all the possible dates from the earliest blog post to today.

Now we need to add a bunch of calculated columns to represent year, month, day, day of the week, etc.  Here are some formulas that you can use – each line is a calculated column that you add to the table.

DateTable = CALENDAR (MINX(‘Blog Posts’, [date]), NOW())

Day = Format([Date],”DDDD”)

DayofMonth = DAY([Date])

DayofWeek = WEEKDAY([Date],2)

ISOWeekofYear = IF ((([OrdinalDate] – [DayofWeek] + 10)/7) >= 53, 1, TRUNC(([OrdinalDate] – [DayofWeek] + 10)/7))

LongDay = Format(DateTable[Date], “dddd”)

Month = FORMAT([Date], “MMM”) & ” ” & [Year]

MonthDate = Date(DateTable[Year], DateTable[MonthofYear], 1)

MonthofYear = MONTH([Date])

OrdinalDate = DATEDIFF([Year]&”,1,1″,[Date],DAY)+1

QuarterofYear = ROUNDUP (MONTH([Date])/3,0)

Year = YEAR([Date])

In addition, you will need to change the sort column for columns such as LongDay to the integer or date representations so that your columns sort in the appropriate order and not in alphabetical order.

Once you have your date table set up, you can create a relationship between the date column in the blog posts table and the DateValue column.  Depending on the format of your original date column, you may need a calculated column that reflects the identical format so that the ids match.  In my case, I created a calculated column called “DateOnly” that has the same formula as the DateValue column and linked these together through the relationship management dialog.

image

Once you have this set up, you can now create date driven visualizations such as the number of posts per month or per day of the week!

image

image

Adding Metadata Filtering

One of the key features of a blog are tags and categories that act as metadata for filtering.  The JSON provides these within its structure so can we extract the data and use it for creating some dashboards by category or tag?

If we go back to our original Power Query, we can create a new table that contains a denormalized set of keys that provide the link from the blog post ID to the tag values (the same approach will work for categories, authors, etc.).   When we get the list of posts, we do the same as above and expand the records but in this case we only need two columns – the blog post ID and the tags.

image

Then, we need to expand the list of tags so that we can fetch the tag ID and the title of the tag for each record.

image

Our new table should look something like this:

image

Once we load up this as a table, we can now create some interesting dashboards based on the tag frequency.  For example, here is the frequency of posts per tag:

image

Here is the frequency of tag use by month as a visualization.

image

Here is a comparison of the use of the tag “Azure” with the use of the tag “Power BI” over time.

image

Once you have the data there are lots of possible ways to explore and analyze the data!

Read More

WordPress on Azure Simplified through Azure Resource Manager Template

As those of you who read this blog regularly know, this web site runs WordPress on Microsoft Azure.  WordPress runs within an Azure Web App and MySQL runs within an Azure VM running MySQL on Linux.  To date, this solution has been rock solid, even running on a A0 shared CPU virtual machine. 

For those wanting to build a more robust enterprise scale WordPress cluster, Microsoft has released a new Azure Resource Manager template that provisions a WordPress Azure Web Application backed by a cluster of MySQL servers through a load balancer. 

WordPress-Azure-Template Microsoft releases Azure template for WordPress/MySQL

The template is available in the Azure Marketplace.

createcontent

Read More

MySQL Running on Azure VM A0 Rock Solid So Far…

This blog runs on WordPress using Azure Web Sites.  When I first launched the site, I used the Clear DB free MySQL tier and it worked quite well until about 300-400 page views per day.  The free tier has a connection limit of 4 connections and when it exceeds that limit you start getting errors.

I then re-launched the site with an Azure SQL backend using the Brandoo plugin.  At the time, this was an option in Azure Web Sites template selection (it’s no longer available).  This worked well until I upgraded to WordPress 4.2 and then it broke completely.  I had been warned by the original developer that this would happen eventually because it was impossible to keep up with all the WordPress changes to the MySQL schema. 

I then went back to MySQL using a dedicated Azure VM running Ubuntu and MySQL.  The VM is an A0 VM which means it runs on a shared core (e.g. not even a full core) and 768 MB of memory. 

The results so far have been very good, with no downtime and no errors.  If I look at the VM performance, it’s rock solid and barely uses up the available CPU or memory available:

image

image

As you can see by the charts, the VM is running at about 2-3% of CPU and MySQL runs at about 14% of available memory.  Given this is a shared core and 768 MB only of memory, that’s pretty remarkable and a better deal than running through Clear DB at least at my current scale (peak load is about 1500 page views per day).

Read More

Brandoo and WordPress Now Completely Broken: Moved Back to MySQL on Ubuntu

Brandoo is an open source plugin for WordPress that translates MySQL to SQL Server.  As performance increased on this blog, the free database provided by ClearDB wasn’t cutting it and I moved to Azure SQL using the Brandoo plugin.

I received a note from the author of the plugin saying it was no longer supported and continued to use the plugin because it was working. 

As of WordPress 4.2, Brandoo finally broke altogether.  The blog continued to work but the upgrade process failed and I couldn’t get access to my blog at all from an administration perspective.  I couldn’t also post any new blog posts.

So over the past week, I have re-architected my blog to go back to MySQL as this seems to be the only reliable way to run WordPress.  Instead of using the free MySQL database from ClearDB, I have set up a dedicated linux VM for running MySQL in Azure.  This seems to work remarkably well, even with the smallest VM – MySQL is very efficient at running in a small VM footprint.

Capture2

For those looking to go this route, have a read through this article – I found it very helpful in setting up Ubuntu, MySQL and then connecting it to Azure Web Sites.

Read More

Integrating WordPress and Azure Search with new Microsoft Azure Search SDK

As previously posted, Azure Search has been promoted to General Availability.  In February, I posted a detailed article on how to integrate Word Press with Azure Search using the Azure Search Preview APIs.  This article describes the same approach but with updated code using the new Azure Search SDK.  The latest code is committed to GitHub here.

In addition, I have now fully deployed the code to this blog so you can try it out…let me know what you think!

Getting Started

In order to integrate WordPress and Azure Search, the basic flow for data is:

clip_image001

In order to pull posts from WordPress, install the JSON REST API plugin found here (or in the plugin gallery). 

To create a custom WebJob, use the latest Azure SDK and Visual Studio 2013.  Once you have installed the Azure SDK, you’ll see a project template for Azure WebJobs. 

To use the Azure Search service, you need to create a search service in Azure.  See this article for directions on how to do this through the Azure Portal.

To access the Azure Search API, you can go through the REST API directly, or you can use the Microsoft Azure Search SDK.  To install the client into your WebJob, you run the NuGet package console and enter “Install-Package Microsoft.Azure.Search -Pre”.  This also installs the NewtonSoft JSON.NET library which we can also use for interacting with the WordPress REST API.

WebJobs Architecture

When you create a WebJob in Visual Studio, it provides the ability to deploy straight to your Azure Web Site.  This works really well.  Alternatively, you can upload it manually as an .exe through the portal.  You can also run your WebJob locally in debug mode which in this case works perfectly because we have no real dependencies on Azure Web Sites to run the job.

The basic components of the architecture are:

  • Program: the main web job console app.

  • WordPressJSONLoader: service class responsible for pulling posts from WordPress
  • WordPressPosts and WordPressPost: value objects representing the loaded collection of wordpress posts and each individual post.
  • AzureSearchIndexer: service class responsible for pushing posts into Azure Search.

Runtime configuration is done through the App.config and/or the Azure Web Sites configuration.  As part of the Azure SDK you can use the CloudConfigurationManager to get environment settings and it is smart enough to use values in the Azure Web Sites configuration as priority over any settings found locally in the App.Config.  If you are running locally, it degrades automatically to looking in your App.Config for configuration values. 

// load configuration attributes webSiteURL = CloudConfigurationManager.GetSetting("WebSiteURL"); searchServiceName = CloudConfigurationManager.GetSetting("ServiceName"); searchServiceKey = CloudConfigurationManager.GetSetting("ServiceKey"); indexName = CloudConfigurationManager.GetSetting("IndexName");

Retrieving Posts from WordPress

With the JSON REST API plugin installed, retrieving posts from WordPress is easy – just call the URL www.yourwebsite.com/?json=get_posts.  This will by default retrieve the last 10 posts but you can use filtering parameters and paging to change how many posts you retrieve.

Using the JSON.API library, you can deserialize your JSON into a JObject which provides you an easy way to pull entities such as posts, comments, etc. out of the returned JSON.

When the JSON REST API is called, it provides 10 posts and the number of “pages”.  Based on this number of pages, we can pull all the posts 10 posts at a time.

In this method, we simply pull out the posts and deserialize these to a collection of WordPressPost objects. 

One of the key changes to the Microsoft Azure Search SDK from the RedDog.Search client that was previously available is both async and regular methods are provided which makes the code a little bit simpler in a console application.

Note: One bug in the JSON API I found is that the excerpt field contains the JetPack plugin’s share button HTML if you have it activated.  In my code, I strip these out to only take the first paragraph representing the excerpt text.

/// <summary> /// Loads WordPress posts from any WordPress blog. /// </summary> /// <param name="URL">WordPress blog URL</param> /// <returns></returns> public static WordPressPosts LoadAllPosts(string URL) { try { WordPressPosts wordPressPosts = new WordPressPosts(); string query = "?json=get_posts"; WebClient client = new WebClient(); Stream stream = client.OpenRead(URL + query); StreamReader reader = new StreamReader(stream); var results = JObject.Parse(reader.ReadLine()); var JsonPosts = results["posts"]; if (JsonPosts != null) { foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } if (results["pages"] != null) { int pages = (int)results["pages"]; if (pages > 1) { for (int i = 2; i <= pages; i++) { query = "?json=get_posts&page=" + i; stream = client.OpenRead(URL + query); reader = new StreamReader(stream); results = JObject.Parse(reader.ReadLine()); JsonPosts = results["posts"]; foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } } } return wordPressPosts; } catch (Exception e) { throw; } }

Creating an Index

Creating an index is reasonably easy but I found a few gotchas along the way:

  • The key field MUST be a string (I originally tried to use an integer field).

  • Searchable fields MUST be of type string (I originally tried to make a date field searchable). 

If you try to violate the rules, the Index creation process fails and the result returned will be an error.

The new create index method looks like this:

/// <summary> /// Loads WordPress posts from any WordPress blog. /// </summary> /// <param name="URL">WordPress blog URL</param> /// <returns></returns> public static WordPressPosts LoadAllPosts(string URL) { try { WordPressPosts wordPressPosts = new WordPressPosts(); string query = "?json=get_posts"; WebClient client = new WebClient(); Stream stream = client.OpenRead(URL + query); StreamReader reader = new StreamReader(stream); var results = JObject.Parse(reader.ReadLine()); var JsonPosts = results["posts"]; if (JsonPosts != null) { foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } if (results["pages"] != null) { int pages = (int)results["pages"]; if (pages > 1) { for (int i = 2; i <= pages; i++) { query = "?json=get_posts&page=" + i; stream = client.OpenRead(URL + query); reader = new StreamReader(stream); results = JObject.Parse(reader.ReadLine()); JsonPosts = results["posts"]; foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } } } return wordPressPosts; } catch (Exception e) { throw; } }

Adding Posts to an Index

Now that we have our index, we can push posts into the index.  One of the new features of the Azure Search SDK is that you can pass rows in as objects and it will use reflection to convert the properties into field values. 

We have a class called WordPressPost that represents each post with its appropriate fields.

/// <summary> /// Value object representing a single WordPress post. /// </summary> public class WordPressPost { public string Id { get; set; } public string Status { get; set; } public string Title { get; set; } public string Content { get; set; } public string Excerpt { get; set; } public DateTime CreateDate { get; set; } public DateTime ModifiedDate { get; set; } public string CreateDateAsString { get; set; } public string ModifiedDateAsString { get; set; } public string Author { get; set; } public string Categories { get; set; } public string Slug { get; set; } public string Tags { get; set; } }

To add the post, we add the objects as an array and create an IndexBatch object like this:

try { DocumentIndexResponse response = indexClient.Documents.Index(IndexBatch.Create(BatchOfWordPressPosts.Select(doc => IndexAction.Create(doc)))); } catch (IndexBatchException e) { Console.WriteLine( "Failed to index some of the documents: {0}", String.Join(", ", e.IndexResponse.Results.Where(r => !r.Succeeded).Select(r => r.Key))); }

In the previous RedDog Azure Search library, there was a maximum of 1000 items per batch.  I haven’t found any maximum number of items per batch limitation yet for the new SDK, but I left in the code that limits the number of items to a 100 items per batch. 

Checking our Index in the Portal

We can verify that we have content in the index by going to the portal and checking out our index:

image_thumb3

As shown, we have a newly created index with 285 items.

Building a Search Portal

Now that we have some content, let’s build a simple search interface using just HTML and JavaScript.  We’ll use the REST APIs to fetch data from the index and display the search results using Angular.JS as a framework.

Publishing to Azure Web Sites into a Virtual Application

Our WordPress site has been installed into the root of the Azure Web Site.  When we publish our search pages and JavaScript code, we don’t want them clobbering our existing WordPress site or getting deleted or mangled by mistake if there is an upgrade to WordPress.

Azure Web Sites supports the addition of virtual applications that run in their own sub-directory.  To create one, go into the Configure tab of the Azure Web Site and go to the bottom of the page.  You will see a section called “virtual applications and directories”.  In here, we can create a completely separate application that runs in its own directory, with its own web.config and publishing profile.

clip_image001[6]

In Visual Studio, you can configure the publishing profile to publish to this new virtual application.

image_thumb4

Specify the subdirectory in both the Site Name and Destination URL fields.

Fetching the Search Results With AngularJS

Building a search form using AngularJS is ideal for pulling in data from Azure Search because Azure Search returns JSON data by default.  We can simply assign the results to an AngularJS variable and then use the AngularJS framework to display the results dynamically.

We start with a basic Search form styled using Bootstrap.  I use the Sparkling Theme for my WordPress blog and this them already uses Bootstrap as its core CSS framework so adding in some custom HTML using the same Bootstrap CSS elements works really well.

image_thumb5

The nice thing with using Bootstrap is that if you switch your WordPress theme, as long as it uses Bootstrap (most of them do these days) your search form and results will take on the style of your blog.

If you perform a search with no keywords specified, Azure Search will return ALL documents.  This isn’t something we would want so we have made keyword a required field and check to ensure it isn’t blank before submitting.

The submit method for fetching the Azure Search results is the key for pulling in the results from Azure Search.  In building this method, I found a few gotchas to share:

  • Make sure you include the api-version in the request or Azure Request will return an error.

  • The default order by is relevance.  In our case, we have also added an additional option to sort by Create Date (e.g. $orderby=CreateDate desc.
  • You have to include the api-key in the HTTP header when you send in the request.  You can create a Query key in the azure portal instead of using the admin key and having it public.
  • You assign the JSON object “value” – this contains the search results.

vm.submit = function (item, event) { if (vm.orderby == "Relevance") var URLstring = vm.URL + "?search=" + vm.keywords + "&api-version=" + vm.APIVersion; else var URLstring = vm.URL + "?search=" + vm.keywords + "&$orderby=CreateDate desc" + "&api-version=" + vm.APIVersion; if (!isEmpty(vm.keywords)) { var responsePromise = $http.get(URLstring, config, {}); responsePromise.success(function (dataFromServer, status, headers, config) { vm.results = dataFromServer.value; vm.showSearchResults = true; }); responsePromise.error(function (data, status, headers, config) { alert("Submitting form failed!"); }); } else { vm.showSearchResults = false; vm.results = []; } }

Displaying the Results

Once we have a JSON object with the search results, displaying them is pretty easy – just use the AngularJS ng-repeat attribute to iterate through the results returned.

One key note is the use of a filter to treat the HTML returned as HTML – by default AngularJS will HTML encode the HTML instead of letting it through raw.  In order to change this behaviour, you can add this function:

angular.module('app').filter('unsafe', function ($sce) { return function (val) { return $sce.trustAsHtml(val); }; });

Using this filter you can then declare the variable as unsafe and it will be allowed through as raw HTML.

Adding a link to the original post is easy – just create an anchor link with the ID of the post.  (You could also use the slug variable that is indexed if permalinks are turned on for more friendly URL’s).

Integrating into WordPress

With the solution published to Azure Web Sites into a Search subdirectory, we can use the published JavaScript files and embed them into our WordPress site.  While a proper WordPress plugin would be ideal, we just added the search.html code into a WordPress page using the out of the box content editor.

Note: when adding HTML into a page using the text editor in WordPress, if you lead any line feeds WordPress converts them into <p> tags.  This isn’t what we want with all our javascript and AngularJS code.  If you delete all the line feeds and keep all the HTML together, you can mitigate this problem.


Adding a Search Form on the Home Page

In addition to the search results page, we can add a widget to include a basic search form on the home page.  You can embed the HTML for the form using the widget editor and adding a text widget.

image_thumb6

Reading Query Data from JavaScript

In order to read the submitted form from the home page to the search results page, we need to read the posted values that are included in the query string.

I found a basic JavaScript function that parses the query string and looks for incoming search parameters.  I then load these into the AngularJS controller and execute a search on the initial page load.

function getUrlParameters(parameter, staticURL, decode) { /* Function: getUrlParameters Description: Get the value of URL parameters either from current URL or static URL Author: Tirumal URL: www.code-tricks.com */ var path = (staticURL.length) ? staticURL : window.location.search; if (path.indexOf("?") >= 0) { var currLocation = path, parArr = currLocation.split("?")[1].split("&"), returnBool = true; for (var i = 0; i < parArr.length; i++) { var parr = parArr[i].split("="); if (parr[0] == parameter) { return (decode) ? decodeURIComponent(parr[1]) : parr[1]; returnBool = true; } else { returnBool = false; } } } else returnBool = false; if (!returnBool) return false; }

The Final Result – Search Results!

Here is the final result – a fully functioning search page that pulls WordPress posts from Azure Search and searches against keywords with the results sorted by either relevance or create date.

image

Read More

Shrink Your WordPress Azure Web Site Memory Usage by Switching Comments Form to Disqus

As traffic on this web site has increased, the memory usage on Azure Web Sites has been creeping up, overwhelming the quotas on my Shared plan.  A Shared Instance of Azure Web Sites is allocated 512 MB of “memory usage” per hour which can be overwhelmed quickly if you get a reasonable amount of traffic.

image

One simple way to increase your quota is to scale up the number of instances – each new instance also increases your quota.  In my current configuration, www.microsofttrends.com runs on 2 shared instances which provides me a quota of 1024 MB / hour.   I have scaled this up to 3 instances when there has been noticeably higher traffic. 

One of the ways I have found to keep this usage contained is a pretty simple one – I switched my comments form to Disqus.  I was receiving approximately 500-1000 comments a day that were all spam which I’m convinced was increasing the load on my web site by at least 50-100%.  Switching to Disqus has killed the spam entirely and more importantly completely offloads comments to another service. 

Turning on Disqus is pretty easy in WordPress – just sign up for an account and then install the disqus plugin and it will replace your default WordPress comment form.

Read More

Integrating WordPress with Azure Search Service

This blog runs on WordPress using the Brandoo WordPress Plugin.  One of the key challenges with the Brandoo plugin is that the default search service doesn’t work.  I decided to build my own using Azure WebJobs, Azure Search Service and the WordPress REST JSON API.  Here are my lessons learned from developing an Azure Search Solution. 

Note: you can find all the code from the sample solution in GitHub here.

Getting Started

In order to integrate WordPress and Azure Search, the basic flow for data is:

clip_image001

In order to pull posts from WordPress, install the JSON REST API plugin found here (or in the plugin gallery). 

To create a custom WebJob, use the latest Azure SDK and Visual Studio 2013.  Once you have installed the Azure SDK, you’ll see a project template for Azure WebJobs. 

To use the Azure Search service, you need to create a search service in Azure.  See this article for directions on how to do this through the Azure Portal.

To access the Azure Search API, you can go through the REST API directly, or you can use the RedDog.Search C# client.  To install the client into your WebJob, you run the NuGet package console and enter “Install-Package RedDog.Search”.  This also installs the NewtonSoft JSON.NET library which we can also use for interacting with the WordPress REST API.

WebJobs Architecture

When you create a WebJob in Visual Studio, it provides the ability to deploy straight to your Azure Web Site.  This works really well.  Alternatively, you can upload it manually as an .exe through the portal.  You can also run your WebJob locally in debug mode which in this case works perfectly because we have no real dependencies on Azure Web Sites to run the job.

The basic components of the architecture are:

  • Program: the main web job console app.
  • WordPressJSONLoader: service class responsible for pulling posts from WordPress
  • WordPressPosts and WordPressPost: value objects representing the loaded collection of wordpress posts and each individual post.
  • AzureSearchIndexer: service class responsible for pushing posts into Azure Search.

Runtime configuration is done through the App.config and/or the Azure Web Sites configuration.  As part of the Azure SDK you can use the CloudConfigurationManager to get environment settings and it is smart enough to use values in the Azure Web Sites configuration as priority over any settings found locally in the App.Config.  If you are running locally, it degrades automatically to looking in your App.Config for configuration values. 

// load configuration attributes webSiteURL = CloudConfigurationManager.GetSetting("WebSiteURL"); searchServiceName = CloudConfigurationManager.GetSetting("ServiceName"); searchServiceKey = CloudConfigurationManager.GetSetting("ServiceKey"); indexName = CloudConfigurationManager.GetSetting("IndexName");

Retrieving Posts from WordPress

With the JSON REST API plugin installed, retrieving posts from WordPress is easy – just call the URL www.yourwebsite.com/?json=get_posts.  This will by default retrieve the last 10 posts but you can use filtering parameters and paging to change how many posts you retrieve.

Using the JSON.API library, you can deserialize your JSON into a JObject which provides you an easy way to pull entities such as posts, comments, etc. out of the returned JSON.

When the JSON REST API is called, it provides 10 posts and the number of “pages”.  Based on this number of pages, we can pull all the posts 10 posts at a time.

public static WordPressPosts LoadAllPosts(string URL) { try { WordPressPosts wordPressPosts = new WordPressPosts(); string query = "?json=get_posts"; WebClient client = new WebClient(); Stream stream = client.OpenRead(URL + query); StreamReader reader = new StreamReader(stream); var results = JObject.Parse(reader.ReadLine()); var JsonPosts = results["posts"]; if (JsonPosts != null) { foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } if (results["pages"] != null) { int pages = (int)results["pages"]; if (pages > 1) { for (int i = 2; i <= pages; i++) { query = "?json=get_posts&page=" + i; stream = client.OpenRead(URL + query); reader = new StreamReader(stream); results = JObject.Parse(reader.ReadLine()); JsonPosts = results["posts"]; foreach (var JsonPost in JsonPosts) { wordPressPosts.Posts.Add(loadPostFromJToken(JsonPost)); } } } } return wordPressPosts; } catch (Exception e) { throw; } }

In this method, we simply pull out the posts and deserialize these to a collection of WordPressPost objects. 

Running Async Tasks in Console Apps

The RedDog.search library contains only the new .NET 4.5 async methods.  You need to be careful to wrap these methods so that your console app doesn’t delegate out to these methods and then end the program prematurely.  The way to achieve this is to create an async method that you execute from your main program and wait for it using the Wait() method.

You can then call this method from Main() like this:

In addition, make sure that all your async methods return Task instead of void as this will cause your console app to prematurely exit.

Checking for Errors

In the RedDog.Search library, you call all its methods like this:

public async Task CreateIndex() { // check to see if index exists. If not, then create it. var result = await managementClient.GetIndexAsync(Index); if (!result.IsSuccess) { result = await managementClient.CreateIndexAsync(new Index(Index) .WithStringField("Id", f => f.IsKey().IsRetrievable()) .WithStringField("Title", f => f.IsRetrievable().IsSearchable()) .WithStringField("Content", f => f.IsSearchable().IsRetrievable()) .WithStringField("Excerpt", f => f.IsRetrievable()) .WithDateTimeField("CreateDate", f => f.IsRetrievable().IsSortable().IsFilterable().IsFacetable()) .WithDateTimeField("ModifiedDate", f => f.IsRetrievable().IsSortable().IsFilterable().IsFacetable()) .WithStringField("CreateDateAsString", f => f.IsSearchable().IsRetrievable().IsFilterable()) .WithStringField("ModifiedDateAsString", f => f.IsSearchable().IsRetrievable().IsFilterable()) .WithStringField("Author", f=>f.IsSearchable().IsRetrievable().IsFilterable()) .WithStringField("Categories", f => f.IsSearchable().IsRetrievable()) .WithStringField("Tags", f => f.IsSearchable().IsRetrievable()) .WithStringField("Slug", f => f.IsRetrievable()) .WithIntegerField("CommentCount", f => f.IsRetrievable()) .WithStringField("CommentContent", f=>f.IsSearchable().IsRetrievable()) ); if (!result.IsSuccess) { Console.Out.WriteLine(result.Error.Message); } } }

The result will provide a status of success and in the case of an error, some important error details.   Anything that is written to the Console is redirected into the Azure Web Sites log for the WebJob.

Creating an Index

Creating an index is reasonably easy but I found a few gotchas along the way:

  • The key field MUST be a string (I originally tried to use an integer field).
  • Searchable fields MUST be of type string (I originally tried to make a date field searchable). 

If you try to violate the rules, the Index creation process fails and the result returned will be an error.

Adding Posts to an Index

Now that we have our index, we can push posts into the index.

foreach (WordPressPost post in WordPressPosts.Posts) { IndexOperation indexOperation = new IndexOperation(IndexOperationType.MergeOrUpload, "Id", post.Id.ToString()) .WithProperty("Title", post.Title) .WithProperty("Content", post.Content) .WithProperty("Excerpt", post.Excerpt) .WithProperty("CreateDate", post.CreateDate.ToUniversalTime()) .WithProperty("ModifiedDate", post.ModifiedDate.ToUniversalTime()) .WithProperty("CreateDateAsString", post.CreateDate.ToLongDateString()) .WithProperty("ModifiedDateAsString", post.ModifiedDate.ToLongDateString()); IndexOperationList.Add(indexOperation); } var result = await managementClient.PopulateAsync(Index, IndexOperationList.ToArray() ); if (!result.IsSuccess) Console.Out.WriteLine(result.Error.Message); foreach (WordPressPost post in WordPressPosts.Posts) { IndexOperation indexOperation = new IndexOperation(IndexOperationType.MergeOrUpload, "Id", post.Id.ToString()) .WithProperty("Title", post.Title) .WithProperty("Content", post.Content) .WithProperty("Excerpt", post.Excerpt) .WithProperty("CreateDate", post.CreateDate.ToUniversalTime()) .WithProperty("ModifiedDate", post.ModifiedDate.ToUniversalTime()) .WithProperty("CreateDateAsString", post.CreateDate.ToLongDateString()) .WithProperty("ModifiedDateAsString", post.ModifiedDate.ToLongDateString()); IndexOperationList.Add(indexOperation); } var result = await managementClient.PopulateAsync(Index, IndexOperationList.ToArray() ); if (!result.IsSuccess) Console.Out.WriteLine(result.Error.Message);

One key gotcha on adding items to the index – the date field must be in UniversalTime or you’ll get an error message.   For example, instead of supplying post.ModifiedDate as a DateTime attribute you need to call post.ModifiedDate.ToUniversalTime() or the index operation will generate an error.

The RedDog.Search PopulateAsync method allows you to add multiple IndexOperations objects that store up your document post requests into a batch.  The maximum number of IndexOperations the library supports is 1,000 or 16 MB.  In our method, we limit the number of posts per batch to 100 posts to be well under this limit.

public async Task AddPosts() { // if not previously connected, make a connection if (!connected) Connect(); // create the index if it hasn't already been created. await CreateIndex(); // run index population in batches. The Reddog.Search client maxes out at 1000 operations or about 16 MB of data transfer, so we have set the maximum to 100 posts in a batch to be conservative. int batchCount = 0; List<IndexOperation> IndexOperationList = new List<IndexOperation>(maximumNumberOfDocumentsPerBatch); foreach (WordPressPost post in WordPressPosts.Posts) { batchCount++; // create an indexoperation with the appropriate metadata and supply it with the incoming WordPress post IndexOperation indexOperation = new IndexOperation(IndexOperationType.MergeOrUpload, "Id", post.Id.ToString()) .WithProperty("Title", post.Title) .WithProperty("Content", post.Content) .WithProperty("Excerpt", post.Excerpt) .WithProperty("CreateDate", post.CreateDate.ToUniversalTime()) .WithProperty("ModifiedDate", post.ModifiedDate.ToUniversalTime()) .WithProperty("CreateDateAsString", post.CreateDate.ToLongDateString()) .WithProperty("ModifiedDateAsString", post.ModifiedDate.ToLongDateString()) .WithProperty("Author", post.Author) .WithProperty("Categories", post.Categories) .WithProperty("Tags", post.Tags) .WithProperty("Slug", post.Slug) .WithProperty("CommentCount", post.CommentCount) .WithProperty("CommentContent", post.CommentContent); // add the index operation to the collection IndexOperationList.Add(indexOperation); // if we have added maximum number of documents per batch, add the collection of operations to the index and then reset the collection to add a new batch. if (batchCount >= maximumNumberOfDocumentsPerBatch) { var result = await managementClient.PopulateAsync(Index, IndexOperationList.ToArray()); if (!result.IsSuccess) Console.Out.WriteLine(result.Error.Message); batchCount = 0; IndexOperationList = new List<IndexOperation>(maximumNumberOfDocumentsPerBatch); } } // look for any remaining items that have not yet been added to the index. var remainingResult = await managementClient.PopulateAsync(Index, IndexOperationList.ToArray() ); if (!remainingResult.IsSuccess) Console.Out.WriteLine(remainingResult.Error.Message); }

Now that we have our index, we can push posts into the index.

Checking our Index in the Portal

We can verify that we have content in the index by going to the portal and checking out our index:

image

As shown, we have a newly created index with 291 items in it.

Building a Search Portal

Now that we have some content, let’s build a simple search interface using just HTML and JavaScript.  We’ll use the REST APIs to fetch data from the index and display the search results using Angular.JS as a framework.

Publishing to Azure Web Sites into a Virtual Application

Our WordPress site has been installed into the root of the Azure Web Site.  When we publish our search pages and JavaScript code, we don’t want them clobbering our existing WordPress site or getting deleted or mangled by mistake if there is an upgrade to WordPress.

Azure Web Sites supports the addition of virtual applications that run in their own sub-directory.  To create one, go into the Configure tab of the Azure Web Site and go to the bottom of the page.  You will see a section called “virtual applications and directories”.  In here, we can create a completely separate application that runs in its own directory, with its own web.config and publishing profile.

clip_image001[6]

In Visual Studio, you can configure the publishing profile to publish to this new virtual application.

clip_image002

Specify the subdirectory in both the Site Name and Destination URL fields.

Fetching the Search Results With AngularJS

Building a search form using AngularJS is ideal for pulling in data from Azure Search because Azure Search returns JSON data by default.  We can simply assign the results to an AngularJS variable and then use the AngularJS framework to display the results dynamically.

We start with a basic Search form styled using Bootstrap.  I use the Sparkling Theme for my WordPress blog and this them already uses Bootstrap as its core CSS framework so adding in some custom HTML using the same Bootstrap CSS elements works really well.

clip_image003

The nice thing with using Bootstrap is that if you switch your WordPress theme, as long as it uses Bootstrap (most of them do these days) your search form and results will take on the style of your blog.

If you perform a search with no keywords specified, Azure Search will return ALL documents.  This isn’t something we would want so we have made keyword a required field and check to ensure it isn’t blank before submitting.

The submit method for fetching the Azure Search results is the key for pulling in the results from Azure Search.  In building this method, I found a few gotchas to share:

  • Make sure you include the api-version in the request or Azure Request will return an error.
  • The default order by is relevance.  In our case, we have also added an additional option to sort by Create Date (e.g. $orderby=CreateDate desc.
  • You have to include the api-key in the HTTP header when you send in the request.  You can create a Query key in the azure portal instead of using the admin key and having it public.
  • You assign the JSON object “value” – this contains the search results.
vm.submit = function (item, event) { if (vm.orderby == "Relevance") var URLstring = vm.URL + "?search=" + vm.keywords + "&api-version=" + vm.APIVersion; else var URLstring = vm.URL + "?search=" + vm.keywords + "&$orderby=CreateDate desc" + "&api-version=" + vm.APIVersion; if (!isEmpty(vm.keywords)) { var responsePromise = $http.get(URLstring, config, {}); responsePromise.success(function (dataFromServer, status, headers, config) { vm.results = dataFromServer.value; vm.showSearchResults = true; }); responsePromise.error(function (data, status, headers, config) { alert("Submitting form failed!"); }); } else { vm.showSearchResults = false; vm.results = []; } }

Displaying the Results

Once we have a JSON object with the search results, displaying them is pretty easy – just use the AngularJS ng-repeat attribute to iterate through the results returned.

<div ng-repeat="result in search.results"> <a class="h1" href="http://wordpressazuresearchintegration.azurewebsites.net/?p={{result.Id}}">{{result.Title}}</a> <div class="h6" ng-bind-html="result.CreateDateAsString | unsafe"></div> <div ng-bind-html="result.Excerpt | unsafe"></div> </div>

One key note is the use of a filter to treat the HTML returned as HTML – by default AngularJS will HTML encode the HTML instead of letting it through raw.  In order to change this behaviour, you can add this function:

angular.module('app').filter('unsafe', function ($sce) { return function (val) { return $sce.trustAsHtml(val); }; });

Using this filter you can then declare the variable as unsafe and it will be allowed through as raw HTML.

Adding a link to the original post is easy – just create an anchor link with the ID of the post.  (You could also use the slug variable that is indexed if permalinks are turned on for more friendly URL’s).

Integrating into WordPress

With the solution published to Azure Web Sites into a Search subdirectory, we can use the published JavaScript files and embed them into our WordPress site.  While a proper WordPress plugin would be ideal, we just added the search.html code into a WordPress page using the out of the box content editor.

Note: when adding HTML into a page using the text editor in WordPress, if you lead any line feeds WordPress converts them into <p> tags.  This isn’t what we want with all our javascript and AngularJS code.  If you delete all the line feeds and keep all the HTML together, you can mitigate this problem.

clip_image001[8]

The Final Result – Search Results!

Here is the final result – a fully functioning search page that pulls WordPress posts from Azure Search and searches against keywords with the results sorted by either relevance or create date.

clip_image002[6]

Read More

Brandoo Plugin for WordPress NOT Recommended on Azure Web Sites

As posted previously, this blog currently runs on WordPress and recently I migrated the blog to Azure SQL using the Brandoo Plugin.

The primary reason to migrate was that I don’t like the ClearDB MySQL database option as the free version is limited to 4 connections and the paid version is not cheap and/or scalable as running Azure SQL.  I really want to have a site that can run as light weight as possible to recommend to people looking for low cost approaches for running sites on Azure Web Sites.

Unfortunately, it seems that the Brandoo project has run into a lot of challenges with the ongoing changes in WordPress.  I heard from the Brandoo project owner himself:

Hello Christopher,
I’m the maintainer of Brandoo WordPress. Changes in WordPress upstream is making debugging of Brandoo WordPress almost impossible today. There is many “strange” changes in WordPress, making translation to AzureSQL/MS SQL impossible at all. We do not want to change WordPress core and build fork. Without community we will be not able to work on it. There is no support form developers community so we are fighting alone. Second thing is that today it is possible to connect Azure Web Site with Your own MySQL cluster or single VM via VPN. So You can create HA WordPress on Azure without AzureSQL – I think that this will be a new direction for Brandoo WordPress. I’m father of this project and it is not so simple to tell You, that we are slowly going to the end of Brandoo WordPress in form You know and use.
Best Regards,
Michael.

What this means is that if running Brandoo is a use at your own risk option.  While my experience has been positive so far (except for search not working), it means that it’s not a well supported option and one that I couldn’t recommend as a solution to those wanting to run a low cost web site or blog on Azure Web Sites.

My next step for this blog is to look at alternatives to WordPress that will work on a standard Azure Web Site with an Azure SQL database (or even better, no database requirement at all).  I’ll keep everyone posted as I look at various options – currently I’m looking at C1 Composite, Ghost, and a few others (I’ve ruled out both Joomla an Drupal as they don’t seem to deploy properly to Azure SQL).

Read More

Running WordPress on Azure SQL – Search Doesn’t Work

As posted last week, this blog now runs on WordPress with Azure SQL as its database.  The Brandoo plugin is used to translate the original MySQL database layer and maps it to SQL Server. 

One challenge with the plugin – WordPress native search does not work using the Brandoo plugin.  I created a brand new site just to test it out and verified that this is a basic defect in the plugin.  There is also a documented conversation here that documents this as a bug.

image

I’ll be continue to dig into this issue to see if there are alternative solutions to the out of the box search capabilities.

Read More

Migrating WordPress to Azure SQL from MySQL

As www.microsoftTrends.com traffic grows, we have been making some changes to the underlying infrastructure in Microsoft Azure and doing some performance tests against the environment.  The goal in all cases is to run this blog as cheaply as possible by limiting the footprint while allowing for lots of increased traffic.

One of the key changes we have decided was to migrate the database from MySQL to Azure SQL.  The primary reason for this is that the bundled free version of MySQL is limited to 4 connections.  As my blog was increasing traffic, I was finding that the connection limit was being overwhelmed and then the blog was generating errors connecting to the database.

image

There is also a limitation on how many MySQL databases you can have – with Azure Web Sites you are limited to a single free database. 

If you can run using Azure SQL, you can have a free 20 MB database with any Azure Web Sites.  This option is technically a “retired” plan (e.g. it will be unavailable in September, 2015) but at that time I can upgrade to Basic which provides me a 2 GB database for $6 / month or go crazy and get a 250 GB standard database starting at $16 / month.

In comparison, ClearDB’s “Bronze” package only gives me a 1 GB database and maxes out at 15 connections.  SQL Server in comparison has no limitations on connections.

Migration of the site was pretty simple – I created a new WordPress site using the Brandoo WordPress plugin.  The Brandoo WordPress plugin allows you to run WordPress through an adapter that translates the calls to a standard SQL database instead of MySQL. 

image

Once created, I had a standard WordPress site.  I reinstalled all the plugins, themes, and settings from the existing site manually and then did an import of all the content from the existing site.  I then ran some performance tests against the site to see how the site would perform under load (the results are posted here).

Once up and running, I simply switched my domain to the new site and we were up and running!

So far, I haven’t seen any issues other than the web site seems to consume more memory than the old WordPress site.

image

This means that I cannot run the site under a single shared instance – I need at least two just to have enough memory quota to keep the site running.  I had moved to two shared instances with the previous site because it was just under the quota limit and occasionally would go over the 512 MB instance quota and kill the site.  So far, the site performs well and runs comfortably within two shared instances using the 1024 MB memory usage limit.

Tweet me your feedback at @microsoftrend – let me know what your experience is as a consumer of the web site!

Read More