Data Mining for Arts & Cultural Organizations – How Much Data Do I Need?

You Don’t Need To Test All The Water In The Well – Just A Sample And A Clean Bucket

This is the second article in a Series on Market Research and Data Mining for Arts and Cultural Organizations. In this article we will explore how much data (or completed surveys in this case) we will need to conduct robust analyses whose results we can have confidence in. We will need enough responses to perform a traditional market research analysis as well as ensure we have enough data to conduct data mining and predictive analytics. Luckily we should be able to accomplish both goals with the same data set.

A Few Terms You Need To Know

Here are a few terms that will be helpful as we start our journey exploring market research and data mining:


  • If we a conducting a general market survey of the entire US population then our population would be about 320 million.
  • If we are conducting a survey of people in our town then the population would be the population of our town.
  • If we are conducting a survey of our employees then the population would be the number of employees in our organization.
  • In the case of our Art Center example the population for their study was a bit over 12,000 email addresses they had collected in their database. It is this population that we are interested in from which we will project the results from our survey sample.

Ok, now you should get the idea. We will need a good estimate of the population that we are going to study in order to determine the proper survey sample size.

Random Sample

The idea behind a random sample is to choose a subset of records or observations from the population that are representative of the population as a whole and are not biased. You can think of this process as choosing balls from a lottery ball machine. Each numbered ball has an equal chance of being chosen – we do basically the same thing with records contained in our population database.

Typically you would sequentially number each record in your population database and use a spreadsheet or other program to assign random numbers to each record. The database or spreadsheet is then re-ordered by the random numbers. From the new re-ordered database records for your sample are chosen. If for example you need a sample for 200 records for your survey you would choose 200 records starting anywhere in the re-ordered file.

For our Art Center example the marketing manager has decided to email the survey to the entire email list or population. In this case we do not need to prepare a random sample since we are dealing with the entire population.

Confidence Level & Confidence Interval (or Margin of Error)

Typically when we are determining the required sample size for our study we need an idea of how confident we can be in the sample size we have chosen. Typically market researchers choose a confidence level of 95% which can be translated into the idea that we are 95% confident that the sample represents the total population. If the confidence level is increased to say 99% the size of the required sample may grow dramatically.

The Confidence Interval or Margin of Error is the +/- figure you may have seen reported on survey results in the media. It gives us the range that the reported results will fall in given the confidence level we have established. As we narrow the Confidence Interval or Margin of Error i.e. going from +/- 5% to +/- 2% which increases its precision, the size of the required sample will increase.

The following table shows the effect of various population sizes, confidence levels and margin of errors on the required minimum sample size. Bottom Line: the more confident and precise you want to be, the larger your sample will have to be.

Survey Sample Sizes Required For Various Population Sizes, Confidence Levels and Margins of Error

Response Rate

This is the percentage of people who responded to an attempt to conduct a survey. Survey response rates will vary depending on a lot of factors.
Some of these include:

  • The survey method used such as mail, email, landline phone, cell phone, web/internet or in person – it’s getting harder to reach people and the old landline standby is a thing of the past!
  • Respondents access or usage of email, phones an web/internet
  • Time of day, week or month when the survey is conducted – i.e. around holidays
  • How willing respondents are to take the time to complete a survey or even willing to consider it – an incentive might help here.
  • How well they know you and how engaged and loyal they are to your organization.

Bottom Line: Response rates will be dependent on a variety of factors and can vary from a few percentage points to well above 40%. So if you need 300 completed surveys you are going to have to contact or “attempt” many more survey prospects. Plan this into your project’s money and time estimates!

Completed Surveys or “Completes”

In order to be able to have confidence in our results we need a certain number of completed surveys. These surveys are called completes because they are not missing any information and the responses are within the boundaries we have established – .i.e. someone listing their age as 150 years old might be rejected if age is an important factor in our study. Not every survey we mail or email out will be answered completely and we may have to reject it.

Why Do We Need To Be Concerned?

– We need enough to data to conduct a robust analysis whose results we can have confidence in. In other words our results should be statistically significant, not biased and not just a result of chance.
– Collecting data is not without cost. There are costs in terms of money, resources and time and these costs need to be considered when embarking on any market research or data mining project. Ideally we should strive to have at least a minimum amount of data required to do the task at hand. If you talk to a data scientist or market researcher they might tell you that you need all the data you can get!

How Many Completed Surveys Do I Need?

Once we have determined the population size, confidence level and margin of error we can calculate the required number of completed surveys we need. Don’t forget that this is the number of “completed” surveys not the number of people we need to invite to participate in the survey – this will likely be many more!

There is a formula to determine sample size but thankfully we can use one of the many free Online Survey Calculators that are available. Simply Google “survey sample size calculator” and check out the many offerings available. In these calculators you will be required to enter the Population Size your are studying, your desired Confidence Level (usually 95%) and your desired Confidence Interval or Margin of Error (5% is a good number to start with) and the calculator will tell you how big your study sample needs to be.

In our Arts Center example they have 12,000 records on their email database which is also the population of the study as well as the number of emails that will be sent out inviting them to participate in the survey. The marketing manager has determined that she wants a Confidence Level of 95% with a Margin of Error of +/- 5%. Entering these figures into an online survey sample size calculator determined that the marketing manager would need 372 completed surveys. After the survey was conducted, the Arts Center received over 3,400 completed surveys – well above the required minimum. In fact the 3,400 responses received was over the amount needed to have a 99% confidence level with a margin of error of +/- 2%. The marketing manager can be very confident that her sample reflects the population of her email database.


In the next article we will take a look at some tips on designing the survey and preparing it for analysis and data mining. We will also take a quick look at some of the survey gathering methods we can use to make this step easier and go faster.

Data Mining for Arts & Cultural Organizations – Survey Design and Data Preparation


Neil McKenzie is the author of The Artist’s Business and Marketing ToolBox – How to Start, Run and Market a Successful Arts or Creative Business available in softcover from Barnes & Noble and Amazon and as an eBook from iTunes, Amazon and Barnes & Noble.   He developed and taught the course “Artrepreneurship” at the Center for Innovation at Metropolitan State University of Denver, and was also a visiting professor at University College at the University of Denver where he developed and taught “Marketing the Arts”.

Neil has over 30 years’ experience as a management consultant and marketing research executive, working with some of the world’s top brands. Neil is a frequent lecturer to artists and arts organizations, was a guest columnist for Colorado Biz Magazine, re:sculpt for the International Sculpture Center, and the author of several articles for Americans for the Arts, a national arts organization. Follow Neil on Twitter: @neilmckenzphoto