Data Mining for Arts & Cultural Organizations – Survey Design and Data Preparation

This is the third article in a Series on Market Research and Data Mining for Arts and Cultural Organizations. The marketing manager for the local Arts Center is interested in finding out more about the usage habits of its members and visitors and more specifically she is looking to increase participation in the Center’s Special Exhibits. In this article we will explore some tips on designing surveys and preparing the survey for analysis and data mining.

Designing the Survey

This is perhaps the most important step in the market research / data mining process. It takes a bit of time and effort to create a survey that provides you with the information you need to make better decisions.

Here are a few tips on designing your survey:

  • Collect only the information you need – Think carefully about the objective of your study and stay focused. It is easy to get sidetracked and add “just one more thing”
  • Make it easy for respondent to understand and complete – There is a difference between you understanding the question and the respondent understanding the question. Make your questions clear and concise and try to avoid unnecessary jargon or buzzwords (unless it’s language your audience uses/understands). Your survey results won’t be effective if the survey respondents and you are not on the same wavelength! Be very mindful of the length of time required to complete the survey – shorter is better!
  • Make it easy for researcher to collect – The survey should be designed so that the results can be collected in an efficient manner. If possible avoid too many questions that are open-ended (i.e. require a written response) as these will require more effort on your part to prepare and analyze the responses.
  • Minimize the amount of cleanup/data preparation required – A carefully designed survey should minimize the effort required to clean up the responses once you have collected them. Give the respondent a series of choices rather than having them enter a value such as when asking the respondent’s age – what if somebody enters 105 or you can’t read their writing? Do you really need their exact age or will a range of ages serve your purposes i.e. 15-24, 25-36 etc.? Simple No/Yes responses or multiple choice answers are best and can make the data preparation stage go a lot faster.
  • Make it easy to analyze the results – Again it is best to have simple No/Yes or multiple choice questions. Open ended questions requiring a written response will have to be analyzed and possibly coded to make sense of them and determine if there is a pattern in the responses. This should not discourage you from using open ended questions as they can provide valuable insight, just be aware they will require more effort on your part.
  • Pre-test your survey – This is a step you should never skip in developing your survey. You should pre-test your survey with a small group of people to make sure that your questions are stated clearly and that the recipients understand the question you asked. You may need to go back and rework the language to get it just right. A pre-test will also give you some insight into how long the survey takes to complete and if you are connecting with your audience.

Types of Survey Questions

There are a few types of survey questions you should be aware of when developing your survey. You have probably seen most of them if you have ever participated in a survey. Choose the type of question that matches you research needs.

  • Dichotomous – No/Yes, False/True. These types of questions are most common and lead to little ambiguity for the respondent. They are also easy to code for analysis and can be converted easily to numeric values for data mining.
  • Likert Scales – Rating on a Scale(i.e. 1 – 5). These types of questions are very common for measuring customer satisfaction or how someone feels about a particular topic. Example responses could include: Very Dissatisfied , Dissatisfied , Neither Satisfied nor Dissatisfied, Satisfied, Very Satisfied or Strongly Disagree, Disagree, Neither Agree or Disagree, Agree, Strongly Agree. When choosing your response scale it is a good idea to use an odd number of choices so that you have a neutral midpoint.
  • Multiple Choice – Multiple choice questions are a great way to force the respondents to answer a fixed number of choices by choosing none or one or more. An example of a multiple choice question might be: Which of the following activities/services would you like the Art Center to offer? (Check those that apply) a) Children’s activities, b) Espresso Bar, c) Light Fare Cafe, d) Gift Shop.
  • Semantic Differential – Semantic differential questions are similar to Likert Scale Questions but are different in the way they are presented. Typically the respondent is given a scale to rank a particular item such as 1 – 5. Likert Scale questions present a series of fixed choices i.e. Very Satisfied, Satisfied… while Semantic Differential leave more up to the respondent on how they rate a particular item. For example the Arts Center could create a question such as: “On a scale of 1-7 rank the variety of our special exhibits?” With Semantic Differential questions the respondent determines the degree not the categories presented if using Likert Scales.
  • Rank Order Questions – The respondent is asked to choose between two alternatives i.e. “Which is more important to you in choosing to go to a special exhibit – The artist is a local artist or the artist is not local?” These can be useful to help you determine which factors are most important.
  • Ratio Scale Questions – These questions ask for an exact number such as age, number of times they visited the Art Center in the last year etc. If it is not important to know an exact number then you may want to consider using a question where for example age ranges or bins are presented. Data mining programs work very well with ratio scale numbers that can be compared (i.e. 4 is twice as large as 2). In some instances the respondents may feel uncomfortable with giving an exact number as a response in questions dealing with age, income or other personal information. Use Ratio Scale Questions where appropriate, for example if the Art Center was researching the idea of offering a senior discount to persons over 65 or over years of age then it might be important to know how many members were approaching the discount eligibility age i.e. 64 years.
  • Open Ended Questions – The respondent is asked to supply a response to a question i.e. “What do you like best about our special exhibits? ____________” Open Ended Questions can provide a great deal of insight if used properly. For example, they may give you insight into things you may have not thought about or ways to improve your survey. If you get a lot of responses saying the same thing or have a similar response you may want to think about creating a new question asking about the response given. The downside to Open Ended Questions are that they require someone to read them and categorize them into a format (categories or bins) that can be used by survey and data mining software. Some survey software can perform what is known as text or sentiment analysis to categorize the responses for you but it is best to have a human check the results to make sure they are accurate and adequate for your intended uses.

The Arts Center Survey

After careful design and collaboration with her Arts Center colleagues the marketing manager has developed the following short survey.

Preparation for analysis

Data mining software likes numerical data so responses such as False/True or No/Yes will need to be converted to numbers that the analysis software can understand and use. A good method is to convert the data to a dichotomous or binominal format such as 0/1 which represent a No/Yes or False/True response. Typically results would be coded so that No/False = 0 and Yes/True = 1. Luckily most survey analysis and data mining software can perform these tasks with a little help from you.

For example in the gender question, Gender: Female  Male  the respondent was asked to check one. In preparing the data the response would be converted or coded to 0 = Female and 1 = Male. Questions with three responses could be converted as follows – Which one do you like best? A or B or C? would be prepared for analysis as A – 0/1 B – 0/1 C- 0/1. We now have the data in a format that survey analysis and data mining programs can understand and use.

Continuous data such as age could be coded in a similar fashion by dividing the age ranges into categories or bins such as Age 16-24, 17-44, 45+ as was used in the Arts Center survey. During the data prep stage each bin would be coded in a binominal fashion with 0 = No/ False, 1 = Yes/True. Likert Scale question and Semantic Differential questions can converted or coded in a similar way.

  • Open Ended responses that contain text data should be converted as well. This step may require a bit of work for the market researcher to organize and categorize the responses. For example if the Arts Center asked the following Question – “What do you like best about the Art Center?” They might receive responses such as:
    – “A great place to take the family”
    – ”Family programs”
    – “Children like the make art events”
    – “Special exhibits”
    – “Like to meet the artists at the special exhibits”

These responses could be coded as binominal responses such as Family Programs and Special Exhibits. If you plan on sending your surveys out over a period of time it is a common practice to take a look at the first batch of responses to determine if you can design the survey with the most common responses such as What do you like best…. Special Exhibits, Family Programs. If you go this route don’t forget to leave a line for Other _______, to make sure you are not missing anything. If there are enough responses that are similar then these new choices should be added to the survey.

Below is the dataset from the Arts Center survey with data conversions and preparation completed. This data is now ready for market research and data mining analysis.

The Arts Center survey data is ready for market research and data mining analysis

The Bottom Line:

Take care in designing your survey so that it provides you with the information you need in a format you can easily use. Make sure that the survey is understandable by your audience and does not take too long to complete. And most importantly – pretest your survey. Good Luck!

In the next article we will take a look at the common survey methods used to collect the survey responses and which one(s) might work best for you.


Neil McKenzie is the author of The Artist’s Business and Marketing ToolBox – How to Start, Run and Market a Successful Arts or Creative Business available in softcover from Barnes & Noble and Amazon and as an eBook from iTunes, Amazon and Barnes & Noble.   He developed and taught the course “Artrepreneurship” at the Center for Innovation at Metropolitan State University of Denver, and was also a visiting professor at University College at the University of Denver where he developed and taught “Marketing the Arts”.

Neil has over 30 years’ experience as a management consultant and marketing research executive, working with some of the world’s top brands. Neil is a frequent lecturer to artists and arts organizations, was a guest columnist for Colorado Biz Magazine, re:sculpt for the International Sculpture Center, and the author of several articles for Americans for the Arts, a national arts organization. Follow Neil on Twitter: @neilmckenzphoto