Monthly Archives: November 2009

Measuring Website Usage With Google Analytics, Part I

Knowing where to get started with reporting website statistics can often provide new webmasters with something of a challenge. In this post, I’ll quickly review the guidance provided by the Central Office of Information on Measuring Website Usage which:

describes a common approach to measuring website traffic [for central government]. This enables departments to answer Parliamentary Questions and Freedom of Information Requests about website usage consistently and reliably

I’ll also start to explore how to generate reports that satisfy those guidelines using Google Analytics.

The proposed metrics “are defined according to industry standards set by the Joint Industry Committee for Web Standards (JICWEBS)” and specify the following minimal level of reporting (Measuring Website Usage – Reporting requirements):

  1. The following web metrics, as defined by the Joint Industry Committee for Web Standards (JICWEBS), must be measured for each and every publicly accessible website operated by an organisation:
    • Unique User/Browsers
    • Page Impressions
    • Visits
    • Visit Duration
  2. Central government departments must measure Unique User/Browsers, Page Impressions, Visits and Visit Duration starting from 1 April 2009 for every website open on 1 April 2010.
  3. Executive agencies and non-departmental public bodies (NDPBs) must measure Unique User/Browsers, Page Impressions, Visits and Visit Duration starting from 1 April 2010 for every website open on 1 April 2011.
  4. The following information must be provided to COI at the end of each quarter:
    • Number of monthly Unique User/Browsers
    • Number of monthly Page Impressions
    • Number of monthly Visits
    • Number of Visits of at least two Page Impressions
    • Total time in seconds for all Visits of at least two Page Impressions
  5. Each report should contain figures for each of the previous three months. This information should be provided in the format shown in the reporting template in Appendix A.COI Website usage reporting template http://coi.gov.uk/guidance.php?page=237
  6. All figures should exclude internal web development activity, performance monitoring, automated broken link detection and other types of non-human activity (e.g. robots and spiders). Further details on what to exclude are found in the Page Impressions section.

So what does Google Analytics offer “out of the box”?

Headline report - Google Analytics

The Visitors Overview repeats these figures and additionally provides an indication of the number of ‘unique’ visitors:

Visitors Overview

At face value then, it would appear that the Google Analytics are providing at least some of the required stats (though we need to clarify that the numbers as recorded by Google Analytics conform to what the COI has in mind for those reports as described in their guidance on the Minimum standard for web metrics!) But what does that guidance relating to “at least two web pages” mean?

To understand the emphasis on “at least two pages”, it’s worth reflecting on the notion of bounces and the bounce rate. Bounce rate refers to the proportion of visitors to a site who only visit one page on a website before leaving that site, and as such tend to leave no meaningful analytics behind.

According to the ClickTale blog (What Google Analytics Can’t Tell You – Part 1), Google Analytics “has no way of knowing how long a bounced visitor, who only visits one page, spent on your website”. That is, it appears that the time spent looking at a page appears not to be based on the difference between the time when a page has fully loaded (and generated a trackable onload event) and its unload event; instead, it is calculated as the time between two loading one page and clicking through to and loading a second page on the sam site.

Which is why the emphasis on collecting stats from at last two pages: given the current crop of analytics tools that struggle to do anything meaningful with single page visits, specifying a two page visit means that not only visits to the site that are likely to be meaningful are reported, but also that the reports are more likely to contain meaningful data too. (There is an obvious problem here: if visitors visit two pages, and quickly click to the second from the first before exiting the site from the second page, the time spent on the second page won’t be captured? See for example Time on Site & Time on Page – Google Analytics metric mystery)

One of the nice things about Google Analytics is that it lets you create custom views, or “segments” of the data in which you can specify things such as the minimum number of pages visited when generating a particular report. In order to do this, you specify an “Advanced Segment”. Here’s what an Advanced Segment for a “minimum of two pages visited report” might look like:

GA Advancd segment - visited at last two pages

Applying this segment to the same data charted above gives these results:

Segmented goog stats

GA segmented view

So for example, in this version of the report we see that the average number of page views and the average time on site has gone up.

Something I don’t think Google Analytics report is the total time on site. Bearing in mind the lack of data regarding the time spent on exit pages, the best we can do is multiply the number of visits by the average time on site to get an estimate of the total time on site.

With just this single advanced segment, a simple calculation, and the out of the can reports from Google Analytics, I think we can deliver on the suggested stats based on a literal reading of the headings, though in a follow up post I’ll check to see if the more detailed spec on the metrics matches the way that Google ANalytics defines its metrics.

PS Unfortunately, the segmented report appears to have lost the number of absolute unique visitors (although I think the recommended report wanted the number of uniques, including bounces, to the site?) Anyway, let’s play: the number of visits gives the upper bound on the number of unique visitors, but can we also estimate the lower bound? One heuristic might be to look at the number of visits and uniques in the original report (176 uniques, 245 visits), see how many visits were lost in discounting the bounces (245-104 = 141), assume these were all unique and subtract these from the original number of uniques (176-141=35). I think this gives the lower bound on uniques as recorded by Google Analytics for non-bouncing visitors?

“Campaign” Tracking With Google Analytics

Of the very many things that it’s possible to provide webstats reports about, such as tracking visitors arriving from organisational wbsites, one of the most useful is being able to track how much traffic has been driven back to your website from a particular link – such as a link included in a particular tweet, or in a particular email announcement, and so on.

If a link to a JISCPress document appears on a third party webpage, and somebody clicks on that link and then lands on the corresponding JISCPress page, Google Analytics will capture where that incoming visitor cam from via the Referring Sites report. At the top level this is organised by domain:

Google Analytics - Referring sites

We can then tunnel down to the page level:

More referrers

This is all well and good, but sometime we also might want to know where the person who posted the referring link on their web page got hold of it. Did they capture it from a tweet, for example, or via an email list? When we releas a URI into the wild via some sort of marketing campaign, what sort of life does that URI have, and where will it end up sending traffic back from?

In the Googe Analytics FAQ answer How do I tag my links?, a method is described for adding additional tags to a referral URL (that is, a URL that you publish and/or distribute more widely that refers back to your website) that Google Analytics can use to segment traffic referred from that URL. Five tags are available (as described in Understanding campaign variables: The five dimensions of campaign tracking):

Source: Every referral to a web site has an origin, or source. Examples of sources are the Google search engine, the AOL search engine, the name of a newsletter, or the name of a referring web site.
Medium: The medium helps to qualify the source; together, the source and medium provide specific information about the origin of a referral. For example, in the case of a Google search engine source, the medium might be “cost-per-click”, indicating a sponsored link for which the advertiser paid, or “organic”, indicating a link in the unpaid search engine results. In the case of a newsletter source, examples of medium include “email” and “print”.
Term: The term or keyword is the word or phrase that a user types into a search engine.
Content: The content dimension describes the version of an advertisement on which a visitor clicked. It is used in content-targeted advertising and Content (A/B) Testing to determine which version of an advertisement is most effective at attracting profitable leads.
Campaign: The campaign dimension differentiates product promotions such as “Spring Ski Sale” or slogan campaigns such as “Get Fit For Summer”.

(For an alternative description, see Google Analytics Campaign Tracking Pt. 1: Link Tagging.)

The recommendation is that campaign source, campaign medium, and campaign name should always be used.

Elsewhere, (Library Analytics (Part 7), from which elements of this post have been taken), I considered how these codes might be used to track course referrals to Library resources from a VLE (something I need to revisit, now I’ve had a little more time to consider the possible role(s) of these tracking codes). But it also seems to me to be reasonable to raise a few questions about how we might use these tracking codes in the context of a document on JISCPress or WriteToReply in order to track referrals back to the site from social media campaigns highlighting a particular document or section of a document.

So, what are sensible mappings/interpretations for the campaign variables? Remember, these tracking variables are parameters that we might add to a link that we have posted somewherethat is intended to drive traffic back to the site. The tracking variables are there to allow us to see how different links are performing. Thinking about how we might use these five tracking dimensions, whether or not we use them in the “intended” Google Analytics way, may also provide us with some ideas about how to use links to drive traffic back to our site.

To try and ground the exercise, consider this example: a new document is published on JISCPress and we want to compare how well links posted on Facebook compare with links posted on Twitter for driving traffic back. For tracking to be most effective, we hope that if a link is rebroadcast or shared, the tracking variables are carried along with it. This means that if a link is posted to Twitter, that gets shared onto Facebook and onto a blog, we can look at the traffic that comes back, and from where (via the Referral tracking described at the start of this post), for each of the separately released URIs. A second example might relate to a campaign intended to drive traffic back to a particular section or paragraph of a document. This campaign might involve publishing a link back to the same paragraph in a series of separate posts or status updates, each with a different slug or call to action message. That is, each link+message may be published in the same place (and hence have the same referrer information), but at different times and with different link text, or contextual information. A third example might be where there is more than on link back to the same document on a web page, and we want to track how effective each link is compared to the others?

Here are the supported variables again:

  • source: the obvious thing to use this variable for is the domain or URI of the page where the link is published to. So if we tweet a link, twitter.com might be sensible. If we blog it, actually might be best?
  • medium: this is intended to refer to the sort of link that has generated the traffic, such as a banner ad. In our case, we might clarify the intent with which the link was posted, such as announcement, or question;
  • term: this is an optional parameter, and I’m not sure how it should be used or whether it conflicts with other Google services. If we post something with a hashtag on twitter, or a st of tags on delicious, might we use those tags are terms?
  • content The second optional variable, this is often usd to discern A/B test ads. If we tweet the same link with different call to action/prompting questions, maybe this differential content should be uniquely identified with the content field?
  • campaign: typically used for tracking a promotion or campaign, this field might be used to identify a different document when, for example, a link to the top level JISCPress is referred to in a announcement about a particular document?

So for example, we might have something like:
http://writetoreply.org/?utm_campaign=ukgovurisets &utm_medium=announcement&utm_source=actually
appearing as the link for WriteToReply in an announcment about the hosting of the UK Government URI Sets document.

Or maybe a call to action on twitter relating to a particular part of a document:
What benefits would you like to see from #JISCRI calls? http://writetoreply.org/jiscri/2009/03/11/rapid-innovation-projects/#3?utm_campaign=jiscri &utm_medium=question&term=JISCRI&utm_source=twitter.com&utm_content=slug3

To support the generation of tracking URIs, a URL Generator Tool (like the official Tool: URL Builder) that will accept a tweet, for example, along with a JISCPress/WriteToReply URL and then automatically create tracking variable values might be worth considering?

Thoughts on JISCPress

As we come to the final month of the JISCPress project, we had some great news over on WriteToReply last week where we were able to announce that Eduserv would be covering our hosting costs for the immediate future (Eduserv funds hosting for WriteToReply, eFoundations: Write To Reply).

So what exactly does the platform we’ve been working on have to offer? Here’s one of the ways I think of it…

A document publishing platform that automatically atomises documents to the paragraph level, allows aggregated commenting at the paragraph and ‘user’ level, and supports the republication and re-presentation of documents in a variety of standard formats at the document level.

The first part of the process is the (manual assisted) ingress stage, in which documents are imported into the WordPress environment such that each substantive document section ideally maps onto a single WordPress “blog post”:

An RSS for the document as a whole, with one item per section, is generated automatically by the WordPress platform. A single item RSS feed is also generated for each page (so the content of each page can be easily transported around the web).

The second part of the process is the atomisation of each post, carried out automatically by the Digress.It theme, in which each paragraph in the document is given its own unique URI, derived from the URI of the web page (“blog post”) the paragraph appears on:

Potentially, an RSS feed can also be produced for each page in which each paragraph is a separate feed item, thus allowing a page/section to be transported around the web via a single feed, but in atomised form.

The paragraph level chunks produced by the atomistation process can be transcluded as independent elements in independent web documents in other documents by a variety of means (as an embeddable object, via XML, txt, JSON, etc):

The default nature of the WordPress platform allows comments to be made at the level of each web page, with an RSS feed of comments for each page being published ‘for free’. JISCPress extends this functionality by allowing comments to be associated with discrete paragraphs. Views over the comments are also available at the user level, (that is, grouped according to the user who made the comments, wheresoever they are made in the document). An additional RSS fed of comments by user is also available, which means that a document on the platform can actually be used as a scaffold for a critical response to the document by a particular user.

A further level of innovation is based on the automated generation of ‘semantic tags’ at the page level. Once generated, tag based collections of posts can be syndicated in the normal way via WordPress generated tag based RSS feeds:

JISCPress also benefits from the Trackback mechanism implemented by WordPress. When a page or paragraph URI is linked to from a third party web page, a trackback to the originating page may be captured, which we interpret as the automated capture of links remote annotations or comments about the document.

When considered in these terms, the JISCPress/WriteToReply platform is seen to provide a powerful means of publishing documents in which individual sections may carry their own unique URI, and individual paragraphs within a section also contain their own unique URI (which in many situations may be rooted on the section URI).

The platform can also be regarded as republishing – or re-presenting – each section (i.e. page) and each paragraph as an independent entity. That is, whenever a document is published via the platform, each separate paragraph may also be thought of as being independently published “for free”, in the sense that:

– each paragraph is independently addressable,
– each paragraph is independently commentable, and
– each paragraph is independently republishable/syndicatable.

So, given that, can you think of any ways in which the JISCPress/WriteToReply platform can support your document publishing and comment gathering strategy?

Eduserv funds hosting for WriteToReply

Loyal readers might recall that when we set up WriteToReply for the Digital Britain – Interim Report, in February, there was no business plan and no idea, really, about where this might take us. WriteToReply has so far cost us relatively little to run. We started off on cheap, shared hosting, quickly moved to a dedicated host and everything was fine for a while.  More recently, as we added new documents, the site was beginning to groan a bit and it became apparent that if we were to maintain a decent service and, significantly, ensure that the documents we’d already hosted didn’t disappear from the web, we’d need to find support from someone, somewhere.

In June, we set up Public Platforms Limited, a not-for-profit company, limited by guarantee, to represent our work on WriteToReply and any other related activities we might do. Public Platforms allows us to legitimately receive financial support for what we’re doing. We decided upon the following main objective for the company:

To conduct and promote research into the use and effects of information and communication technologies in the context of the publication and dissemination of electronic documents and to disseminate the useful results of such research for the benefit of the public.

So Public Platforms won’t be opening a bar near you or paying for our vacations in Hawaii, but hopefully it will sustain our side-work around public engagement with documents on the web. We continue to work full-time at our respective universities and don’t see WriteToReply becoming a full-time job for either of us. However, if you think you can create a job for yourself out of what we’ve started, let us know.

Anyway, we’d noticed that Andy Powell, Research Programme Director at Eduserv always said nice things about WriteToReply, so we thought we’d ask if his organisation would be interested in supporting our work by covering the hosting costs. Eduserv have lots of experience with web hosting, and provide services to various organisations across the public sector, including government.

Well, we’re really pleased to announce that Eduserv have offered to support the hosting of WriteToReply for the next two-years (10/2009-09/2011). Initially, Eduserv will pay for a six-month upgrade to the hosting we currently have, doubling the server resources available to us. By April 2010, we’ll see where we are and maybe move to Eduserv’s infrastructure or continue with our current hosting arrangement. We regard this as fantastic news. Not only does it help ensure that WriteToReply remains a reliable service to you, but it also ensures the availability of hosted documents, and your comments, for the next two years.

Our own interests in WriteToReply are largely in the area of Research and Development. Most notably, we’ve been working on the JISC-funded JISCPress project, which will be completed at the end of this month and demonstrate further ways in which the platform can be used.

Remember that WriteToReply was always intended to be a community platform for anyone that wanted to re-publish a report, consultation or think-piece for comment and discussion. If you want to see more documents on WriteToReply, contact us and we’ll be happy to help you publish them yourself. Additionally, all the software we use and have developed, is open source and freely available for you to use. We encourage it and we’ll help you set up a WriteToReply-like service if that’s what you want to do. We’re not looking to become the next leading consultation platform, although we’d like to help you create it! We’re interested in thinking about (research) and testing (development) how public engagement with online documents might be improved. It’s an exciting area to be working in and as well as those before us, there have been a few significant developments since we launched the Digital Britain – Interim Report, too.

Now, thanks to Eduserv, we can continue to contribute to this area of public service for at least another two years.