Getting Started With BigQuery

As an analyst, you work with data. Sometimes the data is already stored somewhere…acceptable. Sometimes it’s not. Sometimes it might just be good enough to use a text file or excel document. But a lot of times that’s very impractical for a whole host of reasons.

If the data is large, it will take a long time to load and query. If you want to share reproducible results, it may be better if the data is remotely located, somewhere that accessible through code. Also, the data should only be accessible to those that should have access to it (security!).

This is generally solved through using a database, like MySQL, SQL Server or PostgreSQL. However, that requires that you have a server available and that there is a secure (read: encrypted) connection to it. That is not always practical. I have often used a remote MySQL or SQL Server as a solution (sometimes local, but…reproducibility). It has not always been as secure as I would like and scalability could sometimes (read: often) be an issue. Additionally, for data to be accessed efficiently, tables need to be optimized through indexes and server configuration.

If you are looking for data storage that is secure, fast, scalable and self-optimizing, BigQuery might just be the solution you need (I know I sound like a salesperson, but I really like BigQuery). BigQuery is part of the Google Cloud Platform (GCP). Once loaded, you can access data stored in BigQuery securely. In fact, there is no other option. It is hugely scalable as well – practically limitless. It works a lot like a database, but most of the optimization is taken care of by Google. You only need to worry about how you want to structure and query the data. Continue reading

Something About Data

With this post I am going to briefly walk through some things about data you many not have thought about.   I think that this might give you a different and useful perspective on the nature of data.  I’ll start first with the various types of data and data elements.  Then I’ll get into some data issues and finish up with data classification and standardization.

Types of Data Generally, data falls into one of three types:

  • Quantitative
  • Qualitative
  • Descriptive

Continue reading

Strategies For Measuring Digital Branding Campaigns

Digital branding campaigns are campaigns designed to boost positive awareness and recall of your brand.  They are not generally tied directly to revenue.

The challenge with digital branding campaigns is understanding what constitutes a successful campaign.  Since revenue is not the goal, creating measures of success is more difficult, but not impossible. Continue reading

Graphing With R

There are a number of ways in ggplot to express your data.  The first are the geometric objects:

  • geom_bar(): Bar charts
  • geom_point(): Points – great for scatter plots
  • geom_line(): Line charts
  • geom_boxplot(): A box & whiskers plot
  • geom_smooth(): Smoothed means with CI
  • geom_histogram(): Histogram
  • geom_density(): Smoothed density plot
  • geom_qq(): A quantile-quantile plot
  • geom_errorbar(): Error bars

All of these geoms (and more) are highly customizable.  But first we must set up the data. Continue reading