Index
Methodology
Brief Report
Full Report
Visual Report
Discussions
Improvement
Acknowledgements
   
Print
Printer-Friendly Page
Allows you to open a less graphically intense version of the web page.
About
eBrochure
eTour
eSolutions
Our Project
FAQs
Site Map
References

Copyright Information
Methodology
The method

The survey was a slightly modified version of a survey administered by Pinkerton Academy. Questions 6 and 7 measure the student's computer usage frequency, while other questions measure factors which we thought may affect that frequency. We included these factors because they can be easily measured by asking students simple questions. Other factors such as teacher's IT literacy are harder to measure by a questionnaire.

The sample

In this survey we used a convenience sample. In total, 2600 students at Pinkerton Academy and 151 students at Kim Lien High School completed the survey. However, all forms with missing answers were excluded from analysis, so the sample size for analysis was 1909 US students and 136 Vietnamese students.

Data collection

At Pinkerton Academy, students in all grades were presented with a printed questionnaire during their first period class. The survey was a mandatory questionnaire approved and administered by Pinkerton Academy administration. All students in attendance on the day of the survey took the survey. The data was then input into Microsoft Access for further analysis. Printed questionnaire was also given to four classes chosen at random at Kim Lien High School.

Statistical analysis

We model the effect of the factors using the proportional odds ordered logistic model – a type of mathematical model. (David W. Hosmer 2000). This model is used when the outcome variable is categorical, polytomous and ordered. In this case, our outcome variable is the frequency of computer usage at home, which has 4 levels: “Never”, “Monthly”, “Weekly” and “Daily”, and these levels can be compared in magnitude. To select factors which are most influential to the frequency of com puter usage, we used a procedure called backward elimination. (Kutner 2005). This involves fitting all the potential factors into the model and identifying the one with the largest p-value. If this p-value is over a threshold (e.g.: p > 0.05), the factor is dropped. The model with the remaining factors is then fitted, and the process continued until no more factors can be dropped.

Frequency of computer usage at home and at school was considered separately.

 
The mathematics

The mathematical formulae are presented below. We try to give enough background information so that people with sufficient background in statistics can understand. To be consistent, we use the same notation as in the book “Applied Logistic Regression” (David W. Hosmer 2000). The development of all of the equations presented here are available in that book.

We have: Forumula01 is the simple logistic regression model. The logit transformation of the above equation is expressed as:
Forumula02

Note that this is the simple logistic regression model – that is, we only have one predictor and one response variable. When we have more than one predictor – for example, p predictors, then the model is refered to as the multiple logi stic regression model. (Kutner 2005) The expressions above remain the same, except that Forumula03 is replaced with

Forumula04.
In terms of matrix notation, Beta would be a Forumula05 matrix (denoted Forumula06 and X would

be a Forumula05 matrix (denoted Forumula07 ).

In cases where the response variable has more than two categories, the model fitted is called the multinomial logistic regression model. Assume that the response variable can take on K+1 categories coded 0, 1, 2, … K, and assume that we have p-1 predictors (X1, … Xp-1). Denote the probability that the response at the ith observation would take a certain value k if the response vector X at the ith observation equals to the vector x as
Forumula08.

If the categories of Y are unordered, then we have a nomial logistic regression model. We would then have:
Forumula09(where Forumula10 is the same as in equation 1).

Taking Forumula11 as the “baseline” category, the logits under this model are:
Forumula12

If the categories of Y are ordered, then we have an ordinal logistic regression model. The model we fit in our report is the proportional odds model, which is a special type of ordinal logistic regression models. In the proportional odds model, we compare the probability of an equal or smaller response, Y < k, to the probability of a larger response, Y > k.

Forumula13

So we have:
Forumula14 where k = 0, 1,... K.. and Forumula14 k is just a different notation for the constant vector. The coefficient vector Beta is negated to make it consistent with the software package Stata used in this analysis.

The last equation above is the one fitted to our data. From here the odds ratios are derived from a simple manipulation of the equation above. It can be shown that the odds ratios bewteen x1 and x0 is equal to Forumula15. (Refer to David W. Hosmer 2000 - 8.24) All calculations were done in Stata, a statistical software package.

 
Reference:
  • David W. Hosmer, S. L. (2000), Applied Logistic Regression (Second ed.), Wiley-Interscience Publication.
  • Kutner, N., Neter, Li (2005), Applied Linear Statistical Model (Fifth ed.), McGraw-Hill Irwin.
 
 
 
 

Terms of Use | : 2005 – 2006 e-Divide Team. All rights reserved.