How to Convert Categorical Variables Into Dummy Variables
Browse articles:
Auto Beauty Business Culture Dieting DIY Events Fashion Finance Food Freelancing Gardening Health Hobbies Home Internet Jobs Law Local Media Men's Health Mobile Nutrition Parenting Pets Pregnancy Products Psychology Real Estate Relationships Science Seniors Sports Technology Travel Wellness Women's Health
Browse companies:
Automotive Crafts & Gifts Department Stores Electronics Fashion Food & Drink Health & Beauty Home & Garden Online Services Sports & Outdoors Subscription Boxes Toys, Kids & Baby Travel & Events

How to Convert Categorical Variables Into Dummy Variables

How to change nominal or categorical variables into dummy variables to facilitate statistical analysis of data by the computer.

Students of statistics are usually confronted with issues on how to convert nominal or categorical variables into dummy variables. This is a required step to do before data is encoded into the computer so that the computer is able to 'understand' and analyze a given set of data using statistical softwares. Statistical analysis is employed to determine if there are relationships or differences between samples obtained during the course of a research work.

What is a variable?

A variable is a quantity that can assume any of a set of values. This is a term used in research and statistics in order to simplify an otherwise complex phenomena observed in nature. A variable should be measurable, that is, it must be in terms of numbers that will then be subjected to statistical analysis.

Age, for example, can be easily encoded into the computer because age assumes a number to represent how long someone or something has existed. The same is true with height. Height can be measured in terms of meters or feet, also in numbers. But how about those variables in categories like gender? Gender is composed of males and females. But these are not in numbers. Variables like this are called nominal or categorical variables.

There is therefore a need for a nominal or categorical variable like gender to be converted into something that the computer can understand. Computers basically work in binary mode. This means that computers 'think' in base two. Ones and zeros; on and off. Therefore, data must be converted into the binary form.

Dummy Variables

At this point, dummy variables are necessary to allow analysis of nominal or categorical variables like gender. The two categories of gender, that is, male and female can be represented by the numbers "1" and "0". The male category may be represented by the number "1" while the female category by the number "0". This means that if you encode "1" into a spreadsheet this means male is represented, not the female. When a female is represented, "0" must be entered. Gender is thus represented as the dummy variable 'X1" in the matrix below.


But how about if the categories are more than two? Say for example, eye color? How can this be represented as dummy variables?

The principle is still the same but much more easily understood using again a matrix. A set of dummy variables are generated below to represent eye color. These dummy variables are 'X1' and 'X2' to represent eye color.

dummy variables

It is now then possible to analyze the data given the set of numbers that represent the nominal or categorical variables. Nominal or categorical variables are converted into dummy variables in binary form that facilitate statistical analysis by computers.

Need an answer?
Get insightful answers from community-recommended
in Mathematics on Knoji.
Would you recommend this author as an expert in Mathematics?
You have 0 recommendations remaining to grant today.
Comments (6)

Very interesting. To tell you the truth Math is not my thing. Bookmarked this one. Could be very useful in the future lol.. Liked it and tweeted :)

Thanks Phoenix. I like statistics because I find it useful in decision making as well as in doing research.

Great presentation.

Very educational, Patrick. I had been long out of school and my last reference to the word "statistics" was the TV show "Weakest Link".

Thanks Ron & Will for your encouraging remarks.

Very useful info! This is well-explained and will help many students. :-)