1. Knowledge Base
  2. Data Analysis Guides

Sample Weighting

This article explains how to weight the sample collected in a SightX project.

What's on this page:

What is Sample Weighting?

In research studies, survey data can become skewed when certain segments within a population are over or under represented in the sample. Even the most well-planned projects can end up with too many respondents from a single gender, age group, or ethnicity. Weighting is a statistical technique applied to a sample to adjust for biases and achieve better representation of your target audience.

When to Weight Sample in Data Analysis

You can use weighting when your sample skews considerably from the actual population or target audience. For example, suppose you're conducting research for a meal kit subscription box, and the sample composition ended up with 30% plant-based eaters and 70% meat eaters. You know that plant-based eaters only make up 10% of the population, so you would apply weighting so that the responses from plant-based eaters aren't over-represented in your data.

Applying Weighting to Data in SightX

To weight data in a SightX project, navigate to the Basic Stats or Pivot Tables dashboards in the Analysis module. Click the magic toolbox icon in the right corner of the dashboard, and then click on the purple scales icon.

weighting_icon.png

Click the "Next" button in the toolbox window. Then, select a survey question to weight, and type in a name for the weight. For example, if you select the question "What is your gender?", the weight name could be "Gender". Naming the weights is particularly useful when you want to nest two variables together in a weight (like Age and Gender).

To add another question, click "Add item". Repeat this process for all of the variables you want to adjust the distribution of. When you're done adding variables, click "Next".

select_qs_to_weight.png

Next, a modal will pop up where you can specify your desired population distribution for each variable in the "Weighted Sample" column. You can also see the existing population distribution for each variable in the "Unweighted Sample" column.

Input your desired sample percentages, and then click the "Calculate" button in the bottom right corner of the modal to calculate the weights.weight__.png

If you wish to adjust the weighted sample percentages or the weights, do so and then click "Update calculation" at the bottom of the modal. When you're satisfied with the weight calculation, click the "Apply" button in the bottom right corner of the modal.

weight_apply.png

The modal will close and you'll be able to view the dashboard with weighting applied to every question. Weighting is applied to the dashboard when the weighting scales icon below the toolbox shows a red dot.

Screen_Shot_2021-09-23_at_3.35.54_PM.png

Weighting Multiple Variables

You may want to weight responses based on more than one variable. You can do this in two ways: by nesting both variables in a single weight, OR by creating two separate weights.

Nesting variables is appropriate when you know the exact proportions you want of each nested group. For example, in our case of the meal kit subscription survey, let's say we know that in the general population the breakdowns of each dietary type by each gender are as follows:

Gender Diet Type Distribution in Population
Female Vegetarian 3%
Vegan 1.5%
Pescatarian 6%
Omnivore 41%
Male Vegetarian 1%
Vegan 0.5%
Pescatarian 1%
Omnivore 46%
Total   100%

Females represent both a greater portion of the population (51% vs 49% for males) and are also more likely to not eat meat. Therefore, it makes sense to nest the gender and diet variables into one weight so that the data reflects a real world distribution.

However, if you don't know the dietary breakdowns by gender and only know the�separate�age and diet breakdowns (as shown below), you should create two separate weights.

Gender Distribution in Population
Female 51%
Male 49%
Total 100%

 

Diet Type Distribution in Population
Vegetarian 3%
Vegan 2%
Pescatarian 5%
Omnivore 90%
Total 100%

Creating Nested Weights in a Weighting Schema

To create a nested weight in SightX, begin creating a weight in the toolbox following the steps above. Choose the first variable you want to weight, then click the "Nest an item" button underneath the variable.

nest_an_item.png

Select the variable you want to nest, and name the weight. It's best to name the weight something that describes both the variables, like "Gender & diet".

nested_weight.png

Click the "Next" button to open the weighting schema, and enter the distributions you want for each nested group.

Creating Multiple Weights in a Weighting Schema

If you want to weight multiple variables, but don't know the exact proportions of each nested population segment, you can simply create multiple weights in the toolbox.

You can add additional weights by clicking the "Add item" button in the weighting toolbox, and adding as many variables as you want to weight.

weighting_add_item.png

Once you've added all of the variables, click the "Next" button to proceed to the Weighting Schema modal. Here, you'll input the weights for each variable separately. SightX's Raked Weighting algorithm will then figure out the interlocked percentiles for each weight so that each variable is properly distributed.

Viewing and Editing Weights

To view or edit the weights you've created, click on the purple "View" button with the weighting icon below the toolbox.

weighting_icon.png

From here, you can add more variables to your weighting schema by clicking the "Add item" button, or click the "View weights" button to see (or edit weights in) the weighting modal.

If you add additional variables to weight, click the "Next" button to revise the weighting schema. If you want to edit the distributions for the variables you've already selected, click the "View weights" button.

If you've made any edits to the variables or the percentiles, click the "Update calculation" button at the bottom of the weighting schema to recalculate the weights, and then click "Apply".

update_calculation.png

In addition, you can toggle the weighting schema off and on by clicking the toggle at the top of the toolbox. This will remove the weights from the data set, without deleting the weighting schema.

Recalculating Weights

If your project receives more responses, or you scrub any responses from the project, you'll need to recalculate the weights since the number of responses has changed. SightX will remind you to recalculate the weights by adding the "recalculate" icon on top of the purple weighting icon underneath the toolbox.

Screen_Shot_2021-09-30_at_3.22.44_PM.png

Click on the purple icon to open weighting in the toolbox, and then click "Recalculate" to recalculate the weights based on the new number of responses.

Recalculate_weights.png

Types of Weighting Used in SightX

Cell-based Weighting

Cell-based weighting is used when you wish to weight one variable, or multiple variables that do not overlap. You specify the exact desired percentile for each one of the subgroups or interlocked groups. In this method, the researcher specifies the desired percentage for each group, and then the weights are calculated by dividing the desired weighted sample by the unweighted sample.

For example, suppose you're conducting research for a meal kit subscription box, and the sample composition of 1000N ended up with 300 plant-based eaters and 700 meat eaters. You know that plant-based eaters only make up 10% of the population, while meat eaters make up 90%. You would input your desired "Weighted Sample" percentages as 10% and 90% respectively.

 

The weights would be calculated as:

Plant-based: (10% x 1000N)/300 = 0.333

Meat-eaters: (90% x 1000N)/300 = 3

Raked/RIM Weighting

The Raking (also called RIM) method is used when you are weighting multiple variables that overlap in the sample. Raking is iterative proportional fitting. With raking, a researcher chooses a set of variables where the population distribution is known, and the procedure iteratively adjusts the weight for each case until the sample distribution aligns with the population for those variables.

For example, a researcher might specify that the sample should be 48% male and 52% female, 40% with a high school education or less, 31% who have completed some college, and 29% college graduates. Since all of the respondents answered both the gender and education questions, cell-based weighting can't be used because the variables overlap.

First, the Raking process will calculate the weights for the first variable using the same process as cell-based weighting, so that the distribution of the first variable matches the desired weighted sample outcome. In this example, the weights would be calculated for the gender variable so that the distribution was the desired 48% male and 52% female.

Next, the weights are adjusted so that the education groups are also weighted to the desired education sample distribution. If the adjustment pushes the gender distribution out of alignment with the desired distribution, then the weights are adjusted again so that the gender distribution is correct again. This process is repeated until the weighted distribution of all of the weighting variables matches their specified targets.

Weighting Efficiency Score (WES)

In Raked weighting, a Weighting Efficiency Score (WES) is calculated to determine the overall efficiency of the weighting algorithm. It is a numeric score that falls on a scale between 0 and 100. The higher the WES is, the more efficient the weighting schema is. Generally speaking, a WES score that is higher than 80% is considered acceptable.