This post introduces dummy coding for categorical variables.
Under most situations, categorical variables cannot be entered directly into a regression model and be meaningfully interpreted. As a result, a common method dealing with categorical variables in regression is Dummy Coding. Dummy coding refers to the process of coding categorical variables into dichotomous variables (Wikiversity).
For example, given a categorical variable having three classes: “faculty”, “staff”, and “student”. Dummy variables are created as follows:
dv_1 | dv_2 | dv_3 | |
---|---|---|---|
faculty | 1 | 0 | 0 |
staff | 0 | 1 | 0 |
student | 0 | 0 | 1 |
The categorical variable is dummy coded as three dummy variables: dv_1, dv_2, and dv_3.
Usually, people will select a category as the reference category in the regression process to avoid rank deficiency. For example, if “faculty” is chosen as the reference category, the new dummy coded variables become:
dv_1 | dv_2 | |
---|---|---|
faculty | 0 | 0 |
staff | 1 | 0 |
student | 0 | 1 |
pandas
dv_status = 1 0 0 0 1 0 0 0 1 dv_gender_status = 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1
Categorical regression using dummy coding can be done either manually or automatically in Matlab. The codes are shown respectively as follows which generate the same fitting results.
fit = Linear regression model: MPG ~ 1 + Weight*Model_Year2 + Weight*Model_Year3 Estimated Coefficients: Estimate SE tStat pValue ___________ __________ ________ __________ (Intercept) 37.399 2.1466 17.423 2.8607e-30 Weight -0.0058437 0.00061765 -9.4612 4.6077e-15 Model_Year2 4.6903 2.8538 1.6435 0.10384 Model_Year3 21.051 4.157 5.0641 2.2364e-06 Weight:Model_Year2 -0.00082009 0.00085468 -0.95953 0.33992 Weight:Model_Year3 -0.0050551 0.0015636 -3.2329 0.0017256
fit = Linear regression model: MPG ~ 1 + Weight*Model_Year Estimated Coefficients: Estimate SE tStat pValue ___________ __________ ________ __________ (Intercept) 37.399 2.1466 17.423 2.8607e-30 Weight -0.0058437 0.00061765 -9.4612 4.6077e-15 Model_Year_76 4.6903 2.8538 1.6435 0.10384 Model_Year_82 21.051 4.157 5.0641 2.2364e-06 Weight:Model_Year_76 -0.00082009 0.00085468 -0.95953 0.33992 Weight:Model_Year_82 -0.0050551 0.0015636 -3.2329 0.0017256