As an analyst, you work with data. Sometimes the data is already stored somewhere…acceptable. Sometimes it’s not. Sometimes it might just be good enough to use a text file or excel document. But a lot of times that’s very impractical for a whole host of reasons.
If the data is large, it will take a long time to load and query. If you want to share reproducible results, it may be better if the data is remotely located, somewhere that accessible through code. Also, the data should only be accessible to those that should have access to it (security!).
This is generally solved through using a database, like MySQL, SQL Server or PostgreSQL. However, that requires that you have a server available and that there is a secure (read: encrypted) connection to it. That is not always practical. I have often used a remote MySQL or SQL Server as a solution (sometimes local, but…reproducibility). It has not always been as secure as I would like and scalability could sometimes (read: often) be an issue. Additionally, for data to be accessed efficiently, tables need to be optimized through indexes and server configuration.
If you are looking for data storage that is secure, fast, scalable and self-optimizing, BigQuery might just be the solution you need (I know I sound like a salesperson, but I really like BigQuery). BigQuery is part of the Google Cloud Platform (GCP). Once loaded, you can access data stored in BigQuery securely. In fact, there is no other option. It is hugely scalable as well – practically limitless. It works a lot like a database, but most of the optimization is taken care of by Google. You only need to worry about how you want to structure and query the data. Continue reading