basic data analysis

screening the dataset

兩個目的:1遺失的數(shù)據(jù) check for missing data

2 奇怪的 和 錯誤的數(shù)據(jù)?

什么算是奇怪的數(shù)據(jù)?

consistency check 前后回答不一致的

filler questions 是??

極端的數(shù)據(jù) 怎么算極端??

如何做?

1 analyze frequencies 頻率,- check missing data and extreme data ?

2 scatter plot 分布圖 - check consistency

*不會 spss- scatter plot , select cases

對’壞數(shù)據(jù)‘做什么?

啥都不做

收集更多數(shù)據(jù)

assign missing value-

for not key variables, 填充平均數(shù) substitute neutral values, usually the mean

impute values (根據(jù)附近的數(shù)值填充)

刪掉

決定主要是取決于how many good repondents there are


analyzing dataset

levels of measurement?

assigning number ,spss-values

spss中的scale是指 metric data,包括interval和ratio。

nominal 類別

ordinal 排序

interval 評分什么的 1—10?

ratio 有含義的數(shù)據(jù)

數(shù)據(jù)檢驗statistical tests 就取決于 度量的類型 the level of measurement of a variable

types of statistical analyses

1描述分析descriptive analysis。總結(jié)樣本,頻率分析

2推斷 inferential analysis,由樣本推總體,假設(shè)檢驗 和 confidence intervals(可能存在一個模型啥的) ,one-sample

3比較分析 differences analysis , 比較兩組或多組數(shù)據(jù)mean。differences among means.?

4關(guān)聯(lián)分析 associative analysis,考察一個關(guān)系的strength and direction. cross-tabulations and correlations.

5預(yù)測 predictive analysis: regressions.

descriptive analysis

summarize data 總結(jié)樣本

HOW 如何總結(jié),(總結(jié)啥)? (一般來說 這些數(shù)據(jù)有意義嗎)

-descriptive analysis 那一套?

1. location: mode , median ,mean

2.variability: (interquartile)range, variance , standard deviation (為啥有了方差還要標(biāo)準(zhǔn)差),coefficient of variation: =standard deviation/mean?

3.shape : skewness, kurtosis?

*注意:描述分析的意義depending on the level of measurement?

adjusting data?

re-specifying variables 啥意思??

transforming scales -standardizing z-scores

weighing cases/ respondent (不經(jīng)常用)啥意思? to account for representativeness.

hypothesis testing

1.two-sided tests (等于or不等)

Ho: 變量的參數(shù)是等于某值 the parameter (mean, proportion )of the variable is equal?

H1:the parameter of the variable is different

2.one-sided tests (大于小于)

Ho: 大于等于 or 小于等于

H1:< or >

結(jié)果可以有兩種,一種是test statistic 另一種是p-value.(test statistic 越大,p-value就越小,Ho的可能性就越小) 見圖?

所以,test statistic >critical value 就拒絕

p-value <0.05 拒絕?

spss中,p-value 顯示為“Sig.”

p≤0.05,Ho is rejected → the parameter is significantly different from xx.

0.05<p≤0.1,Ho is rejected but marginally → the parameter is marginally significantly different from xx.

p >0.1, Ho is not rejected → the parameter is not statistically different from xx.

test statistic?

test statistic > critical value, Ho is rejected?

diagram 'when to use which test?'

圖~

怎么用這張表??-3 questions:

1. what is the dependent variable?

2.what is the measurement level of the dependent variable??

3.what and how many samples does the hypothesis involve??

-one sample: 比較給定組的參數(shù) (和某一值~)

-independent samples:比較兩個組的參數(shù)。eg. man/woman, branded/unbranded

-related samples: compare the responses of the same individual amongst each other. 其實是同一個樣本 對不同問題的回答 醬紫?

inferential analysis: one-sample tests. representativeness

推斷是否具有代表性,和給定的某一值比較

Ho:mean in the population where the sample came from =2.28

首先,DV=household size ,DV measurement= ratio ?sample: one sample (必要步驟)

所以(查看表格),用one sample t-test?

eg2:檢驗 房屋分布的比例是否和統(tǒng)計數(shù)據(jù)一致

首先,DV=sample household proportion, DV measurement= ordinal, sample =one sample?

所以用one sample Kolmogorov- smirnov (by hand or excel )

total population 中的cumulative percentage 和樣本observed cumulative% 計算absolute difference?

test statistic = 最大的那個difference → K=xx

critical value at 5%=1.36 除以 根號下樣本個數(shù) =aa

K 大于 aa →Ho is rejected 顯著不同

檢驗二分法中的比例 the proportion of a dichotomous variable (yes/no)

用Z-test (by hand)

differential analysis:two and more independent or related samples

表格的運用,見onenote

associative analysis: correlations

變量間的關(guān)系

when there are 2 variables?

both are metric(interval /ratio ), linear relationship , use pearson correlation coefficient?

one or both are ordinal, use spearman rank correlation coefficient?

r 屬于[-1,1]

significant vs. substantive results.

significant 取決于1 “不同”或“相關(guān)”的strength、magnitude? 以及 2樣本大小 sample size

sig是第一步,relevance是一個主觀判斷

sig difference or correlation 不能推斷出substantive or relevant?

magnitude of the difference =% change in the response of one group from that of the comparision group?

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

推薦閱讀更多精彩內(nèi)容