AWS Data Storage Options

Economist often deal with large and complicated data.  However, the degree to which data is expanding in business, academia, and government in many instances has outpaced the storage and computing capacity of individual personal computers, powerful as they may be. Therefore economists must continue to evolve and adapt to new technologies for storing and modeling ever more expansive data. One such technology is Amazon Web Services(AWS).

The goal of this post is to describe how to create new data storage capabilities by using Amazon Web Services. The first step is to set up and create a new AWS account, next explore the different storage options below.

Elastic Block Storage (EBS)

How is it used?

EBS is a volume-based storage that isn’t associated with any particular instance; rather, it’s attached to instances to provided additional storage. It can be attached for that instance but is detached from the instance when it terminates

How to I create it?

  1. Click on the EC2 link on your Management Console of your AWS account.Screen Shot 2016-01-24 at 12.09.01 AM.png
  2. Next click on the ‘create bucket’ button, choose Standard from the Volume Type drop-down menu, set the size to 2GB, choose US west 1a from the Availability Zone, and specify No Snapshot from the Snapshot drop down menu.Screen Shot 2016-01-24 at 12.15.57 AM.png


Glacier Storage Service

How is it used?

Glacier, released in August 2012, is a storage service targeted at a critical IT requirement; archival storage. The best known use of archival storage involves server backups – complete dumps of all data on a server’s drive.

How to I create it?

  1. Click on the Glacier link as shown in the picture below

Screen Shot 2016-01-24 at 12.23.26 AM


2. Click on the Create Vault button and a new Glacier vault is created, it’s that simple

Screen Shot 2016-01-24 at 12.27.07 AM.png


How is it used?

This is a noSQL database

How to I create it?

  1. Click on the DynamoDB iconScreen Shot 2016-01-24 at 12.35.49 AM
  2. Set capacity limits and alarms for the database to avoid being overcharged Screen Shot 2016-01-24 at 12.46.27 AM


Screen Shot 2016-01-24 at 12.46.46 AM.png


To summarize:

S3, thigh highly scalable object

EBS, the volume storage offering

DynamoDB, the highly scalable key-value service

Glacier, the inexpensive and highly durable archiving service

Guidelines when choosing among AWS storage services:

  1. Become knowledgeable about all services
  2. Leverage the strengths of the services so Amazon does the heavy lifting so you don’t have to
  3. Choose services that are appropriate and necessary
  4. Use what-if approaches to make service choices (e.g. what is our users grow by 10 times)

Getting Started with Amazon Web Services

Economist often deal with large and complicated data.  However, the degree to which data is expanding in business, academia, and government in many instances has outpaced the storage and computing capacity of individual personal computers, powerful as they may be. Therefore economists must continue to evolve and adapt to new technologies for storing and modeling ever more expansive data. One such technology is Amazon Web Services.

The objective of this post is to briefly introduce Amazon Web Services (AWS) as a way for economist to do large scale data storage. Future post will expand on this trivial example as well as introduce other AWS services, especially those services that would be most helpful to economist (e.g. renting massive computing power).

Creating an S3 Bucket

S3 is one of the most widely used cloud data storage services one can use on Amazon. Here is the step by step guide on how to create, store, and share a file on S3.

  1. Create and Amazon Web Services account
  2. Go to the S3 service, bottom of this picture Screen Shot 2016-01-23 at 11.29.04 PM.png
  3. On the S3 home page, click the Create Bucket button. Congratulations! Now that you have done your first bit of cloud computing, it’s time to put some data in there.Screen Shot 2016-01-23 at 11.34.06 PM.png
  4. Click on the bucket that was created, click the ‘upload’ button. Click the add Files button and upload a file from your computer such as a picture.Screen Shot 2016-01-23 at 11.45.19 PM
  5. Once you’ve uploaded the file, you need to go to permissions to make it available to anyone over the web. Go to the Properties for the uploaded file, click the Permissions link, then click Add More Permissions.  Choose Everyone from the second drop-down and select the associated Open/Download check box.Screen Shot 2016-01-23 at 11.48.51 PM.png
  6. Now you can click on the url and the file will be accessible via the web.

In later post we will take this very simple example and expand upon it to develop ever more complex data storage and computationally intensive algorithms.

Optimizing Marketing Investment to Reach Communication Goals



The code and data for the marketing optimization found below can be found on my GitHub account by clicking here:


The problem of optimally spending marketing dollars can be formulated in many ways. The goal of this post is to explain how to minimize advertising investment given a minimum communication goal for a given set of target populations. This post will leverage a constrained optimization framework to answer a simplified marketing problem, namely: how do can we minimize the marketing investment required and still reach our communication goals?  The solution to the marketing problem will be obtained via Linear Programming and the Simplex Algorithm.


ScreenHunter_95 Nov. 17 13.34

The data above represent the media channels available for the marketing campaign: Television and Magazines. The reach of each one unit of advertising per  media channel (e.g. one unit of TV reaches 5 million Boys, 1 million Women, and 3 million Men). The unit cost of each media channel (e.g. TV 600 and Magazine 500) and finally the marketing targets for the product being advertised in million  (e.g. 24 million Boys).

I’ve saved these data to a Google Sheet which are then imported into R in the next section.

Optimization Model

The following questions represent a standard linear programming model specification, which is similar to the specification we plan on using in the empirical calculations in this post:


ScreenHunter_104 Nov. 17 15.23


All the code for this analysis can be found on my Github account, click here. What follows is an explanation of this code which solves the marketing problem described above.

  • Importing the data into R using the RCurl library and processing using the foreign library.

ScreenHunter_96 Nov. 17 13.59

  • Defining the Objective function and the left and right-hand side constraints

ScreenHunter_97 Nov. 17 14.28

  • Using the linprog package to solve the linear programming problem

ScreenHunter_99 Nov. 17 14.31

Marketing Recommendations

The optimal solution is one that hits the target audience at the lowest costs.  In this case the algorithm says this can be achieved by investing in 2.7  units of Television and 5.3 units of Magazine advertisement to hit the marketing goals of reaching at least 24 million Boys, 18 million Women, and 24 million Men.

ScreenHunter_100 Nov. 17 14.34

The total cost of this marketing campaign is $4,266, not that this is the minimum costs associated with the cost minimizing allocation described above:

ScreenHunter_101 Nov. 17 14.36

How many people were reached in this marketing campaign? Recall that the targets were a minimum requirement per target audience, so what is the real reach?

ScreenHunter_102 Nov. 17 14.38

The first line shows that exactly 24 million Boys and 24 million Men were reached, which is the minimum level required by the communication goals of the campaign.  However, 34 million Women were reached, with this marketing plan when the minimum communication goal was only 18 million Women reached.  The reason is that the optimization has to simultaneously reach all the targets and do it at a minimum cost, having said that one can rest assured this is the cheapest way of reaching the communication goals.

Ideas for Extending this Analysis

Clearly many more women were reached than the communication goal intended.  Adding additional marketing tactics besides Television and Magazines, especially ones that are especially efficient at targeting women will most likely hit all the targets at a cheaper price.

Time is certainly a factor when it comes to marketing effectiveness, here is a a previous post on measuring marketing effectiveness.  Understanding not only how many people were reached but also how effective Television and Magazines are at different time horizons would likely improve this analysis.

This optimization does not take into account non-linear or synergistic effects of marketing, which again adds a bit more complexity but certainly be worth exploring.  This would likely require a different type of optimization algorithm.

Expanding the marketing goals to include not only the reach but the frequency as one is likely to hit the same people multiple times via Television and Magazines.  The number of times a person is exposed to a message can make it resonate and increase conversion (sales, revenue, web hits, etc.), but this analysis does not account for frequency of exposure the marketing message.

Despite all the limitations to this approach it still provides an mathematically precise way of creating a marketing budget that meets a set of specific goals.  It is a good place to start introducing rigorous and proven algorithms to answer some very basic marketing questions.



Screening Stocks Based on Value & Optimizing Portfolio to Minimize Variance


The goal of this post is to introduce Fundamental Stock Analysis, specifically this post will focus on introducing key financial, operational, and equity based measures to select a handful of stocks out of thousands. The selection process aims to find a small group of stocks that should be considered as invest-able based on their fundamental performance.

We identify healthy companies whose stocks price is consistent and offers potential for security and growth by using the rules outlined in the book “Computational Finance” by Argimiro Arratia which is based on previous work on the topic conducted by Graham’s work from 1973. Graham’s rules have been adjusted adjusted for today’s financial climate (e.adjusted for inflation)

1) Adequate size of enterprise: The recommendation is to exclude companies with low revenues, consider only companies with more than $1.5 billion in revenue.

2) Strong financial condition: Use the current ratio (current assets/current liabilities) to eliminate companies who are in a weak short-term financial condition, consider only companies with a current ration of 2 or greater.

3) Earnings stability: Consider only companies with positive earnings in each of the past 10 years.

4) Dividend record: Consider only companies with uninterrupted payments of dividends for at least the past 20 years.

5) Earnings growth: Invest in companies that have growth rates of  3% or higher in earnings per share (EPS) over the past 10 years.

6) Price-to-Earning ratio: Purchase stock if the stock is adequately priced, a good range for a P/E ratio is 10-15, beware of stocks priced too cheap or too expensive relative to earnings.

7) Price-to-Book ratio: The price-to-book ratio should be no more than 1.5.

Using these criteria at the time of this post and leveraging Google Stock Screener as the filtering mechanism we have only 5 stock that meet these strict criteria for investable equities.

FCX, HP, HFC, RS, and TS

Once we have narrowed down our choices to these strong companies we must allocate our funds in a way that makes the most sense. One way to allocate funds in these stocks is to purchase the portfolio that minimizes variance (risk), this is called the Minimum Variance Portfolio and was the subject of a previous post:

Updating the code in the post above to include the ticker symbols for the 5 strong companies and running the algorithm yields the optimal allocation if one wants to minimize risk while investing in a strong portfolio of stocks:.

ScreenHunter_393 Dec. 13 18.06

ScreenHunter_394 Dec. 13 18.09

Minimizing Risk in a Portfolio of Assets

goldfish jumping out of the water


There are many instances in business where a portfolio of assets must be evaluated in terms of risk and rewards.  The key questions may be:

“How much should we invest?”

“What should we not invest in?”

“What is the risk of different budget allocations and what are the expected rewards?”

“What is the optimum allocation if we want to minimize risk?”


Similarly the concept of an asset portfolio can take the form of:

Assortment of clothing for a retailer

Chargebacks for a credit card processor

Scholarship recipients

Movies for a Hollywood studio

Collection of stocks


The objective of this post is to introduce the concept of the Minimum Variance Portfolio.  The Minimum Variance Portfolio is an optimum allocation of funds across risky assets where the risk (variance) is minimized in the optimization.  The simplest example would be a 2 asset portfolio, such as a portfolio consisting of an ice cream shop businesses and a coffee shop businesses.  In this scenario, during the summer people will buy more ice cream but coffee sales will be lower during warm temperatures but during winter the opposite will be true.If the mix of stores in this portfolio is chosen in a way to reduce variance in revenue due to weather, it is theoretically possible to hedge against the risk of weather.  Basically, if one chooses the right number of coffee and ice cream store to minimize revenue risk, the weather risk is also minimized or reduced so revenue is the same regardless of weather.

In order to understand the risks of a portfolio of ice cream and coffee stores we need to understand the variability of each business individually. However in a portfolio situation understanding the individual risks isn’t enough, we also need to understand how the sales of ice cream and sales of coffee shops are correlated with each other.  This is the concept of the variance of a portfolio, it is the big picture few of variability of a collection of assets.  Once the portfolio variance is understood and quantified, the next step may be to minimize this portfolio variance to hedge against risk. One can extend the concept of portfolio variance beyond 2 assets to any number of assets.


The following example consist of 2 assets to illustrate the concept of minimizing the variance of a portfolio of assets, here is the formula that describes the expected portfolio return and the portfolio variance that we are trying to minimize.

ScreenHunter_317 Nov. 08 13.45

To minimize the portfolio one would need to set up a Lagrangian Optimization problem with the constraint being that the weights of the investment sum to 1, to return a budget with the % of funds are allocated in a meaningful way.  This is a normalization technique, but if necessary one could change the weights to represent a budget constraint of say $1,000, but the results will be the same in terms of % allocated, so we will stick to this convention.

ScreenHunter_318 Nov. 08 13.57


The following R code solves the problem above by downloading data from the web and running a quadratic programming problem that solves the Lagrangian optimization problem above to return the Minimum Variance Portfolio, click the images to see a larger view or click here to download the code.

ScreenHunter_320 Nov. 08 14.01

ScreenHunter_321 Nov. 08 14.03


The optimum allocation based on the minimum variance portfolio is

ScreenHunter_322 Nov. 08 14.09

ScreenHunter_324 Nov. 08 14.12

The expected annual return of this allocation is 22%.  Note that past performance does not guarantee future performance.

Measuring Marketing Effectiveness: Cobb-Douglas Production Functions


Introduction, Data, and Program

Measuring the effectiveness of a marketing channel is difficult due to the large amount of variables and other confounding factors. The field of Marketing Mix Modelling was first developed by econometricians to accurately estimate the impact of marketing on consumer packaged goods, since manufacturers of those goods had access to good data on sales and marketing support.

This post is going to use concepts from microeconomics and econometrics to understand the effectiveness of Television (TV), Newspaper, and Radio on the sales of a good. These data come from the the textbook “An Introduction to Statistical Learning with Applications in R”.  I have provided these data along with the R program used to derive the marketing estimates derived in this post, please see the links below:


Market Mix Modelling R Program


Marketing Production Function

Production functions are used in economics to model the relationship between inputs and outputs.  Production functions are very flexible and have been used in various branches of economics.  Agricultural economists use production function to model how different inputs effect crop yields, educational production functions have been used to model how different classroom inputs effect children’s learning, and macroeconomists have used production functions to understand how labor and capital inputs effect the total national output. I’m going to use a production function to model how different marketing inputs effect sales, per the following equation:


ScreenHunter_196 Sep. 03 08.41


The majority of inputs that go into production experience diminishing marginal returns, therefore I take the multiplicative form of the production function and take natural logarithms to both sides of the equation.  This is the famous translog equation. The translog equation has the nice property of converting the multiplicative form of the production function into a linear model that can be estimated using Ordinary Least Squares (OLS Regression). Another nice property of the translog equation is that the coefficient (betas)  or a regression analysis can be interpreted as elasticities.  Elasticities are measures of % change in the outcome variable (sales) as a result of a % change in one of the input marketing variables.  The stand alone variable ‘alpha’ captures all non-marketing variables that effect sales, this is called the baseline in Marketing Mix Modelling (MMM).  In this post I will not use other explanatory variables (store traffic, seasonality, other promotions, etc.) to keep things simple, but a robust analysis of the effectiveness of marketing should include additional variables to control for these factors.


Statistical Estimates


The simple set of scatter plots show that television appears to have the strongest impact on sales.  Radio has a modest effect on sales, but newspaper appears to be weakly correlated with sales. The data also support the notion of marginal diminishing returns, which further motivates the logarithmic transformation of the production function.


Nespaper Radio Tv


Scatter plots can only reveal so much to do a proper analysis economists use econometric estimates of the translog function described above.  This ensures that we are controlling for other factors when measuring the impact of each of the marketing variables, and is shown below:

ScreenHunter_194 Sep. 02 16.26


The results shows that for every 1% increase in TV advertising you’d expect to get a .34% increase in sales.  A 1% increase in the Newspaper budget only increases sales by .01%, the smallest of all the elasticity estimates.  A 1% increase in the Radio budget accounts for a modest .17 increase in sales.

How much should the company spend on each form of advertising? The data in this example doesn’t show how many impressions or people where reached with the money spent.  In order to provide a proper optimal allocation one would need to know the cost-per-impression, but assuming the cost per impression (CPM) are constant one can simply take the ratio of each elasticity relative to the sum of all elasticities to come up with the optimal marketing mix.


ScreenHunter_195 Sep. 02 16.37

A following post will do a proper optimization using Lagragian Optimization of the production function, which will take into account the total cost of advertising on each channel.