abandon/ ə’bændən/ vt.丢弃;放弃,抛弃
aboard/ ə’bɔ:d/ ad.在船(车)上;上船
absolute/ ‘æbsəlu:t/ a.绝对的;纯粹的
absolutely/ ‘æbsəlu:tli/ ad.完全地;绝对地
absorb/ əb’sɔ:b/ vt.吸收;使专心
abstract/ ’æbstrækt/ n.摘要
abundant/ ə’bΛndənt/ a.丰富的;大量的
abuse/ ə’bju:z, ə’bju:s/ vt.滥用;虐待 n.滥用
academic/ ækə’demik/ a.学院的;学术的
accelerate/ æk’seləreit/ vt.(使)加快;促进
access/ ‘ækses/ n.接近;通道,入口
accidental/ æksi’dentl/ a.偶然的;非本质的
accommodate/ ə’kɔmədeit/ vt.容纳;供应,供给
accommodation/ ə,kɔmə’deiʃən/ n.招待设备;预定铺位
accompany/ ə’kΛmpəni/ vt.陪伴,陪同;伴随
accomplish/ ə’kɔmpliʃ/ vt.达到(目的);完成
accordance/ ə’kɔr:dəns/ n.一致;和谐;授予
accordingly/ ə’kɔr:diŋli/ ad.因此,所以;照着
account/ ə’kaunt/ n.记述;解释;帐目
accumulate/ ə’kju:mjuleit/ vt.积累 vi.堆积
accuracy/ ‘ækjurəsi/ n.准确(性);准确度
accurate/ ‘ækjurit/ a.准确的,正确无误的
accustomed/ ə’kΛstəmd/ a.惯常的;习惯的
acid/ ‘æsid/ n.酸;酸的,酸性的
acquaintance/ ə’kweintəns/ n.认识;了解;熟人
acquire / ə’kwaiə/ vt.取得;获得;学到
acre/ ‘eikə/ n.英亩(=6.07亩)
adapt/ ə’dæpt/ vt.使适应;改编
addition/ ə’diʃən/ n.加,加法;附加物
additional/ ə’diʃənl/ a.附加的,追加的
address / ə’dres/ n.地址;演说;谈吐
adequate/ ‘ædikwit/ a.足够的;可以胜任的
adjust/ ə’dʒΛst/ vt.调整,调节;校正
administration / ədminis’treiʃən/ n.管理;管理部门
admission/ əd’miʃən/ n.允许进入;承认
admit/ əd’mit/ vt.承认;准许…进入
advance/ əd’va:ns/ vi.前进;提高 n.进展
advanced/ əd’va:nst/ a.先进的;高级的
adventure/ əd’ventʃə/ n.冒险;惊险活动
advisable/ əd’vaizəbl/ n.明智的;可取的
affair/ ə’feə/ n.事情,事件;事务
affect/ ə’fekt/ vt.影响;感动
affection/ ə’fekʃən/ n.慈爱,爱;爱慕
afford/ ə’fɔr:d/ vt.担负得起…;提供
afterward/ ‘a:ftəwəd(z)/ ad.后来,以后
age/ eidʒ/ vt.变老
aggressive/ ə’gresiv/ a.侵略的;好斗的
aircraft/ ‘eəkra:ft/ n.飞机,飞行器
alarm/ ə’la:m/ n.惊恐,忧虑;警报
alcohol/ ‘ælkəhɔl/ n.酒精,乙醇
alike/ ə’laik/ a.同样的,相同的
alloy/ ‘ælɔi, ə’lɔi/ n.合金;(金属的)成色
alphabet/ ‘ælfəbit/ n.字母表,字母系统
alter/ ‘ɔ:ltə/ vt.改变,变更;改做
alternative/ ɔ:l’tə:nətiv/ n.替换物;取舍,抉择
altitude/ ‘æltitju:d/ n.高,高度;高处
aluminium/ ælju’minjəm/ n.铝
amaze/ ə’meiz/ vt.使惊奇,使惊愕
ambulance/ ‘æmbjuləns/ n.救护车;野战医院
amongst/ ə’mΛŋst/ prep在…之中(=among)
amuse/ ə’mju:z/ vt.逗…乐;给…娱乐
analyse/ ‘ænəlaiz/ vt.分析,分解,解析
analysis/ ə’næləsis/ n.分析,分解,解析
ancestor/ ‘ænsistə/ n.祖宗,祖先
anchor/ ‘æŋkə/ n.锚 vi.抛锚,停泊
ancient/ ‘einʃənt/ a.古代的,古老的
ankle/ ‘æŋkl/ n.踝,踝节部
announce/ ə’nauns/ vt.宣布,宣告,发表
annoy/ ə’nɔi/ vt.使恼怒;打搅
annual/ ‘ænjuəl/ a.每年的 n.年报
anticipate/ æn’tisipeit/ vt.预料,预期,期望
anxiety/ æŋg’zaiəti/ n.焦虑,忧虑;渴望
anxious/ ‘æŋkʃəs/ a.忧虑的;渴望的
apart/ ə’pa:t/ ad.相隔;分开;除去
apologize/ ə’pɔlədʒaiz/ vi.道歉,谢罪,认错
apparatus/ ,æpə’reitəs/ n.器械,仪器;器官
appeal/ ə’pi:l/ vi.&n.呼吁;申述
appetite/ ‘æpitait/ n.食欲,胃口;欲望
appliance/ ə’plaiəns/ n.用具,器具,器械
applicable/ ‘æplikəbl/ a.能应用的;适当的
application/ æpli’keiʃən/ n.请求,申请;施用
appɔint/ ə’pɔint/ vt.任命,委任;约定
appreciate/ ə’pri:ʃieit/ vt.欣赏;领会;感谢
approval/ ə’pru:vəl/ n.赞成,同意;批准
approve/ ə’pru:v/ vt.赞成,称许;批准
approximate/ ə’prɔksimit/ a.近似的 vt.近似
arbitrary/ ‘a:bitrəri/ a.随心所欲的;专断的
architecture/ ‘a:kitektʃə/ n.建筑学;建筑式样
argue/ ‘a:gju:/ vi.争论,争辩,辩论
argument/ ‘a:gju:mənt/ n.争论,辩论;理由
arise/ ə’raiz/ vi.出现;由…引起
arithmetic/ ə’riθmətik/ n.算术,四则运算
arouse/ ə’rauz/ vt.引起,唤起;唤醒
article/ ‘a:tikl/ n.条款;物品
artificial/ a:ti’fiʃəl/ a.人工的;娇揉造作的
artistic/ a:’tistik/ a.艺术的;艺术家的
ash/ æʃ/ n.灰,灰末;骨灰
ashamed/ ə’ʃeimd/ a.惭愧(的);羞耻(的)
aspect/ ‘æspekt/ n.方面;样子,外表
assemble/ ə’sembl/ vt.集合,召集;装配
assembly/ ə’sembli/ n.集合;集会;装配
assess/ ə’ses/ vt.对(财产等)估价
assign/ ə’sain/ vt.指派;分配;指定
assist/ ə’sist/ vt.援助,帮助;搀扶
assistance/ ə’sistəns/ n. 协助,援助
associate/ ə’səuʃieit/ vi.交往 n.伙伴,同事
association/ əsəusi’eiʃən/ n.协会,团体;联合
assume/ ə’sju:m/ vt.假定;承担;呈现
assure/ ə’ʃuə/ vt.使确信;向…保证
astonish/ əs’tɔniʃ/ vt.使惊讶,使吃惊
astronaut/ ‘æstʃəunɔ:t/ n.宇宙航行员,宇航员
ætlantic/ ət’læntik/ a.大西洋的 n.大西洋
atom/ ‘ætəm/ n.原子;微粒;微量
atomic/ ə’tɔmik/ a.原子的;原子能的
attach/ ə’tætʃ/ vt.缚,系,贴;附加
attain/’tein/ vt.达到,获得,完成
attempt/ ə’tempt/ vt.尝试,试图 n.企图
attend/ ə’tend/ vt.出席;照顾,护理
attribute/ ‘ætribju:t/ vt.把…归因于 n.属性
audience/ ‘ɔ:djəns/ n.听众,观众,读者
authority/ ɔ:’θɔriti/ n.当局,官方;权力
automatic/ ɔ:tə’mætik/ a.自动的;机械的
automobile/ ‘ɔ:təməbi:l/ n.汽车,机动车
auxiliary/ ɔ:g’ziljəri/ a.辅助的;附属的
available/ ə’veiləbl/ a.可利用的;通用的
avenue/ ‘ævinju:/ n.林荫道,道路;大街
await/ ə’weit/ vt.等候,期待
awake/ ə’weik/ a.醒着的 vt.唤醒
award/ ə’wɔ:d/ n.奖,奖品;判定
aware/ ə’weə/ a.知道的,意识到的
awful/ ‘ɔ:ful/ a.令人不愉快的
awkward/ ‘ɔ:kwəd/ a.笨拙的;尴尬的
ax/ æks/ n.斧子
baby/ ‘beibi/ n.婴儿;孩子气的人
back/ bæk/ ad.在后;回原处;回
background/ ‘bækgraund/ n.背景,后景,经历
backward/ ‘bækwəd/ a.向后的;倒的 ad.倒
bacteria/ bæk’tiəriə/ n.细菌
bad/ bæd/ a.坏的,恶的;严重的
badly/ ‘bædli/ ad.坏,差;严重地
bag/ bæg/ n.袋,包,钱包,背包
baggage/ ‘bægidʒ/ n.行李
bake/ beik/ vt.烤,烘,焙;烧硬
balance/ ‘bæləns/ vt.使平衡;称 n.天平
ball/ bɔ:l/ n.球,球状物;舞会
balloon/ bə’lu:n/ n.气球,玩具气球
banana/ bə’na:nə/ n.香蕉;芭蕉属植物
band/ bænd/ n.乐队;带;波段
bang/ bæŋ/ n.巨响,枪声;猛击
bank/ bæŋk/ n.银行;库;岩,堤
bar/ ba:/ n.酒吧间;条,杆;栅
barber/ ‘ba:bə/ n.理发师
bare/ beə/ a.赤裸的;仅仅的
bargain/ ‘ba:gin/ n.交易 vi.议价;成交
barrel/ ‘bærəl/ n.桶;圆筒;枪管
barrier/ ‘bæriə/ n.栅栏,屏障;障碍
base/ beis/ n.基础,底层;基地
basic/ ‘beisik/ a.基本的,基础的
basically/ ‘beisikəli/ ad.基本上
basin/ ‘beisn/ n.盆,洗脸盆;盆地
basis/ ‘beisis/ n.基础,根据
basket/ ‘ba:skit/ n.篮,篓,筐
basketball/ ‘ba:skitbɔ:l/ n.篮球;篮球运动
bath/ ba:θ/ n.浴,洗澡;浴缸
bathe/ beið/ vt.给…洗澡;弄湿
bathroom/ ‘ba:θrum/ n.浴室;盥洗室
battery/ ‘bætəri/ n.电池;一套,一组
battle/ ‘bætl/ n.战役;斗争 vi.作战
bay/ bei/ n.湾;山脉中的凹处
be/ bi:/ aux.v.&vi.是,在,做
beach/ bi:tʃ/ n.海滩,湖滩,河滩
beam/ bi:m/ n.梁;横梁;束,柱
bean/ bi:n/ n.豆,蚕豆
bear/ beə/ n.熊;粗鲁的人
bear/ beə/ vt.容忍;负担;生育
beard/ biəd/ n.胡须,络腮胡子
beast/ bi:st/ n.兽,野兽;牲畜
beat/ bi:t/ vt.&vi.打,敲;打败
beautiful/ ‘bju:tiful/ a.美的,美丽的
beauty/ ‘bju:ti/ n.美,美丽;美人
because/ bi’kɔz/ conj.由于,因为
become/ bi’kΛm/ vi.变成;成为,变得
bed/ bed/ n.床,床位;圃;河床
bee/ bi:/ n.蜂,密蜂;忙碌的人
beef/ bi:f/ n.牛肉;菜牛
beer/ biə/ n.啤酒
before/ bi’fɔ:/ prep.在…以前;向…
beg/ beg/ vt.&vi.乞求;请求
begin/ bi’gin/ vi.开始 vt.开始
beginning/ bi’giniŋ/ n.开始,开端;起源
behalf/ bi’ha:f/ n.利益,维护,支持
behave/ bi’heiv/ vi.表现,举止;运转
behavior/ bi’heivjə/ n.行为,举止,态度
behind/ bi’haind/ prep.在…后面
being/ ‘bi:iŋ/ n.存在;生物;生命
belief/ bi’li:f/ n.信任,相信;信念
believe/ bi’li:v/ vt.相信;认为
bell/ bel/ n.钟,铃,门铃;钟声
belong/ bi’lɔŋ/ vi.属于,附属
below/ bi’ləu/ prep.在…下面(以下)
belt/ belt/ n.带,腰带;皮带;区
bench/ bentʃ/ n.长凳,条凳;工作台
bend/ bend/ vt.使弯曲 vi.弯曲
beneath/ bi’ni:θ/ prep.在…下方
beneficial/ beni’fiʃəl/ a.有利的,有益的
benefit/ ‘benifit/ n.利益;恩惠;津贴
beside/ bi’said/ prep.在…旁边
besides/ bi’saidz/ ad.而且prep.除…之外
best/ best/ a.最好的;最大的
bet/ bet/ vt.&vi.&n.打赌
better/ ‘betə/ a.较好的 ad.更好地
between/ bi’twi:n/ prep.在…中间
beyond/ bi’jɔnd/ prep.在…的那边
Bible/ ‘baibl/ n.基督教《圣经》
bicycle/ ‘baisikl/ n.自行车,脚踏车
big/ big/ a.大的,巨大的
bike/ baik/ n.自行车 vi.骑自行车
bill/ bil/ n.账单;招贴;票据
billion/ ‘biljən/ num.万亿(英)
bind/ baind/ vt.捆绑;包扎;装钉
biology/ bai’ɔlədʒi/ n.生物学;生态学
bird/ bə:d/ n.鸟,禽
birth/ bə:θ/ n.分娩,出生;出身
birthday/ ‘bə:θdi/ n.生日,诞生的日期
biscuit/ ‘biskit/ n.(英)饼干;(美)软饼
bit/ bit/ n.一点,一些,小片
bite/ bait/ vt.咬,叮,螫;剌穿
bitter/ ‘bitə/ a.痛苦的;严寒的
black/ blæk/ a.黑色的;黑暗的
blackboard/ ‘blækbɔ:d/ n.黑板
blade/ bleid/ n.刀刃,刀片;叶片
blame/ bleim/ vt.责备,把…归咎于
blank/ blæŋk/ a.空白的 n.空白
blanket/ ‘blæŋkit/ n.毛毯,毯子,羊毛毯
blast/ bla:st/ n.爆炸,冲击波 vt.炸
bleed/ bli:d/ vi.出血,流血;泌脂
blend/ blend/ vt.&vi.&n.混和
blind/ blaind/ a.瞎的;盲目的
block/ blɔk/ n.街区 vt.堵塞,拦阻
blood/ blΛd/ n.血,血液;血统
bloom/ blu:m/ n.花;开花,开花期
blow/ bləu/ vi.吹,吹动;吹响
blue/ blu:/ a.蓝色的 n.蓝色
board/ bɔ:d/ n.板 vt.上(船、车等)
boast/ bəust/ vi.自夸 vt.吹嘘
boat/ bəut/ n.小船,艇;渔船
body/ ‘bɔdi/ n.身体;主体;尸体
bɔil/ bɔil/ vi.沸腾;汽化vt.煮沸
bold/ bəuld/ a.大胆的;冒失的
bolt/ bəult/ n.螺栓;插销 vt.闩门
bomb/ bɔm/ n.炸弹 vt.轰炸
bond/ bɔnd/ n.联结,联系;公债
bone/ bəun/ n.骨,骨骼
book/ buk/ n.书,书籍 vt.预定
boot/ bu:t/ n.靴子,长统靴
border/ ‘bɔ:də/ n.边,边缘;边界
bore/ bɔ:/ vt.使厌烦;钻,挖
born/ bɔ:n/ a.天生的;出生的
borrow/ ‘bɔrəu/ vt.借,借用,借人
boss/ bɔs/ n.老板,上司 vt.指挥
both/ bəuθ/ pron.两者(都)
bother/ ‘bɔðə/ vt.烦扰,迷惑 n.麻烦
bottle/ ‘bɔtl/ n.瓶,酒瓶;一瓶
bottom/ ‘bɔtəm/ n.底,底部,根基
bounce/ bauns/ vi.反跳,弹起;跳起
bound/ baund/ a.一定的;有义务的
boundary/ ‘baundəri/ n.分界线,办界
bow/ bau/ n.弓;蝴蝶结;鞠躬
bowl/ bəul/ n.碗,钵;碗状物
box/ bɔks/ n.箱,盒;包箱
box/ bɔks/ vi. 拳击,打拳
boy/ bɔi/ n.男孩,少年;家伙
brain/ brein/ n.脑,脑髓;脑力
brake/ breik/ n.闸,刹车 vi.制动
branch/ bra:ntʃ/ n.树枝;分部;分科
brand/ brænd/ n.商品;烙印 vt.铭刻
P(A | B) is a conditional probability: the likelihood of event A occurring given that B is true.
P(B | A) is also a conditional probability: the likelihood of event B occurring given that A is true.
P(A) and P(B) are the probabilities of observing A and B independently of each otherthis is known as the marginal probability.
Bayes theorem Interpretations:
Bayesian inference derives the posterior probability as a consequence of two antecedents: a prior probability and a " likelihood function " derived from a statistical model for the observed data. Bayesian inference computes the posterior probability according to Bayes' theorem:
H: stands for any hypothesis whose probability may be affected by data (called evidence below). Often there are competing hypotheses, and the task is to determine which is the most probable.
P(H): the prior probability , is the estimate of the probability of the hypothesis H before the data E, the current evidence, is observed.
P(H | E): the posterior probability , is the probability of H given E, i.e., after E is observed. This is what we want to know: the probability of a hypothesis given the observed evidence.
P(E | H): is the probability of observing E given H, and is called the likelihood . As a function of E with H fixed, it indicates the compatibility of the evidence with the given hypothesis. The likelihood function is a function of the evidence, E, while the posterior probability is a function of the hypothesis, H.
P(E): is sometimes termed the marginal likelihood or " model evidence ". This factor is the same for all possible hypotheses being considered (as is evident from the fact that the hypothesis H does not appear anywhere in the symbol, unlike for all the other factors), so this factor does not enter into determining the relative probabilities of different hypotheses.
Sometimes, Bayes theorem can be written as:
where the factor P(E | H) / P(E) can be interpreted as the impact of E on the probability of H .
Binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments , each asking a yes–no question , and each with its own Boolean-valued outcome: a random variable containing a single bit of information: success/yes/true/one (with probability p) or failure/no/false/zero (with probability q = 1 − p).
In general, if the random variable X follows the binomial distribution with parameters n ∈ ℕ and p ∈ [0,1], we write X ~ B(n, p) . The probability of getting exactly k successes in n trials is given by the probability mass function :
The cumulative distribution function can be expressed as:
Mean: E(X) = npVariance: Var(X) = npq = np(1-q)Mode:
If two binomially distributed random variables X and Y are observed together, estimating their covariance can be useful. The covariance is Cov(X,Y) = E(XY) - μX * μY
In the case n = 1 (the case of Bernoulli trials ) XY is non-zero only when both X and Y are one, and μ X and μ Y are equal to the two probabilities. Defining p B as the probability of both happening at the same time, this gives
In a bivariate setting involving random variables X and Y, there is a particular expectation that is often of interest. It is called covariance and is given by: Cov(X,Y) = E((X-E(X))(Y-E(Y)) where the expectation is taken over the bivariate distribution of X and Y. Alternatively, Cov(X,Y) = E(XY) - E(X)E(Y)
Moreover, a scaled version of covariance is the correlation ρ which is given by
ρ = Corr(X,Y) = Cov(x,y) / [sqrt(Var(X)*sqrt(Var(Y)], Var(X)=σx^2
Assume that total number of successes X ~ B(n,p) with np>=5, n(1-p)>=5 so that the normal approximation to the binomial is reasonable.
In practice, p is unknown. Under the normal approximation, we have X ~ N(np, np(1-p)) and we define p^ = X/n as the proportion of successes. Since p^ is a linear combination of normal random variable, it follows that p^ ~ N(p,p(1-p)/n) then the probability statement is
Let Za/2 denote the (1-a/2)100-th percentile for the standard normal distribution, a (1-a)100% approximation confidence interval ( because we user normal distribution to the binomial and the substitution of p with p-hat ) for p-hat is given by
the normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables whose distributions are not known.
If X ~ N(μ,σ^2), then E(X) = μ, and Var(X) = σ^2
σ^2 is the variance and not the standard deviation.
A random variable Z ~ N(0,1) is referred to as standard normal and it has the simplified pdf:
The relationship between an arbitrary normal random variable X ~ N(μ, σ^2) and the standard normal distribution is expressed via (X-μ) / σ ~ N(0,1)
In this case, we assume X1,X2,...,XN iid normal(μ, σ^2) where our interest concerns that μ is unknown and σ is known for ease of development. (In real world, we can't find a case with known σ &unknown μ)
X-bar ~ N(μ, σ^2/n)
Rearranging terms:
Finally, we obtain a 95% confidence interval (as follows) for μ
More generally, let Za/2 denote the (1-a/2)100-th percentile for the standard normal distribution, a (1-a)100% confidence interval for μ is given by
we use the observed value x-bar. It is understood that confidence intervals are functions of observed statistics.
It concerns the presentation of data (either numerical or graphical) in a way that makes it easier to digest data.
outliers : too big or small
centrality : values in the middle portion of the dotplot
dispersion : spread or variation in the data
modality : a histogram with two distinct humps is referred to as bimodal
skewness :
symmetry :
How to choose interval as x-axis: choose the number of intervals roughly equal to sqrt(n) where n is the number of observations .
For those intervals are not equal length, we should plot relative frequency divided by intervals length on the vertical axis, instead of using frequency .
sample median ( Q2 )top-edge is 3/4 quantile ( Q3 )bottom-edge: 1/4 quantile ( Q1 )
interquartile range ( IQR ) : Q3-Q1, known as ΔQ
maximum interval: Q3+1.5ΔQ or 90th percentile
minimum interval: Q1-1.5ΔQ or 10th percentile
values that out of max &min intervals are Outliers .
whiskers (vertical dashed lines) extend to the outer limits of the data and circles correspond to outliers.
extrapolated data : when predicting, you should be cautious about predictions based on extrapolated data. There perhaps appears a positive increase trend from the pairplot with two variables X,Y, but it doesn't mean they have the same relationship for X, Y. (Data should be combined with the real world)
It is a numerical descriptive statistic for investigating paired data is the sample correlation or correlation or correlation coefficient r defined by
-1 <= r <= 1
when r close to 1, the points are clustered about a line with positive slope
when r close to -1, the points are clustered about a line with negative slope
when r close to 0, points are lack of linear relationship . However, there may be a quadratic relationship
when x and y are correlated (not close to 0), it merely denotes the presence of a linear association. For example, weight and height are positively correlated, and it is obviously wrong to state that one causes the other.
In order to establish cause and effect relationship, we should do a controlled study .
the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
For the CLT, we assume that the random variables X1,X2,...,Xn are * iid from a population with mean μ and variance σ^2. The CLT states that as n =>infinity, the distribution of * (X_bar - μ)/(σ/sqrt(n)) converges to the distribution of a standard normal random variable.
从一个均值为 μ 、标准差为σ的总体中选取一个有n个观测值的随机样本。那么当n足够大时, x¯的抽样分布将近似服从均值μx¯=μ、标准差σx¯=σ/√n的正态分布 。并且样本量越大,对x¯的抽样分布的正太近似越好
In probability theory , the central limit theorem ( CLT ) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a " bell curve ") even if the original variables themselves are not normally distributed.
1. 总体本身的分布不要求正态分布
2. 样本每组要足够大,但也不需要太大 n≥30
linear regression is predicting the value of a variable Y(dependent variable) based on some variable X(independent variable) provided there is a linear relationship between X and Y.
Y=b0 + b1X+e
(Recall that the regression equation without the error term, Y=b0 + b1X , is called the least squares line .)
SSTO, a.k.a SST , sum of squared total: sum of difference from the mean of y and data point yi
SSE , sum of squared error: sum of difference from the estimated regression line and data point yi
SSR , sum of squared regression: quantifies how far the estimated sloped regression line , y^i, is from the horizontal " no relationship line ," the sample mean or y¯.
From the above example, it tells us that most of the variation in the response y ( SSTO = 1827.6) is just due to random variation ( SSE = 1708.5), not due to the regression of y on x ( SSR = 119.1).
If r^ 2 = 1, all data points fall perfectly on the regression line. The predictor x accounts for all of the variation in y !
If r^ 2 = 0, the estimated regression line is perfectly horizontal. The predictor x accounts for none of the variation in y !
r^ 2 ×100 percent of the variation in y is 'explained by' the variation in predictor x .
SSE is the amount of variation that is left unexplained by the model.
1. The coefficient of determination r^ 2 and the correlation coefficient r quantify the strength of a linear relationship . It is possible that r^ 2 = 0% and r = 0, suggesting there is no linear relation between x and y , and yet a perfect curved (or "curvilinear" relationship ) exists.
[Most misinterpreting concept] 2. A large r^2 value should not be interpreted as meaning that the estimated regression line fits the data well .
Although the R-squared value is 92% and only 8% of the variation US population is left to explain after taking into account the year in a linear way. The plot suggests that a curve plot describe the relationship even better. (Its large value does suggest that taking into account year is better than not doing so . It just doesn't tell us that we could still do better .)
3. The coefficient of determination r2 and the correlation coefficient r can both be greatly affected by just one data point (or a few data points) .
4. Correlation (or association) does not imply causation .
VIF check the co-linearity between explanatory variables. Over 5 is too bad.
H0: null hypothesisH1: alternative hypothesis.
Testing begins by assuming that H0 is true, and data is collected in an attempt to establish the truth of H1.
H0 is usually what you would typically expect (ie, H0 represents the status quo).
In inference step, we calculate a p-value, defined as the probability of observing data as extreme or more extreme (in the direction of H1) than what we observed given that H0 is true.
Significance level: a, usually equal to 0.01, 0.05
If p-value is less than a, reject H0
If p-value is larger than a, fail to reject H0.
When fitting models, it is possible to increase the likelihood by adding parameters, but doing so may result in overfitting . Both BIC and AIC attempt to resolve this problem by introducing a penalty term for the number of parameters in the model.
AIC Akaike information criterion: 2k - 2ln(L) where k is the number of parameters in the model (or the number of degrees of freedom being used up)ln(L) is the ' log likelihood ', which is a measure of how well the model fits the data. Low AIC is better. 2k is the 'penalty' term.
AIC measure the Goodness of fit &Complexity (number of terms)
Comparing AIC with the proportion of variance explained, R^2, R^2 only measures goodness of fit.
However, because of co-linearity, sometimes that variable is 'stealing' the significance from some other term. The AIC doesn't care which terms are significant , it just looks at how well the model fits as a whole.
BIC Bayesian Information Criterion: (ln(n)*k) - 2ln(L) where n is the number of observations, also call the sample size, k stands for the number of parameters (df).
BIC is similar to the AIC, but imposes a larger penalty term for complexity . Lower BIC is better. And BIC favors for simpler models, given a set of candidate models . What's more, BIC is easier to find significance in variables that are unimportant when n is large because of large penalty.
we also need to check influential outliers, homoscedasticity (equal variance) and normality.
Residual is to check above mentioned properties.
To check normality: use Shapiro-Wilks Test
It is a hypothesis test whose null hypothesis is ' your data is normally distributed '
Large p-value, fail to reject H0, you have no evidence against normalitysmall p-value, reject H0, so you have evidence of non-normality
To check homoscedasticity: use Levene Test
Still hypothesis with null hypothesis: all input samples are from populations with equal variances .
Outlier Detection: in statistical method, not mention approaches in data mining aspect.
noise: it is random error or variance in a measured variable
noise should be removed before outlier detection.
outlier: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism . It violates the mechanism that generates the normal data .
Parametric Methods I : detection univariate outliers based on Normal Distribution
μ+3σ region contains 99.7% data, outliers are out of this region .
Parametric Methods II : detection of multivariate outliers.
bottom line: transform the multivariate outlier detection task into a univariate outlier detection problem
use X^2-statistic: (chi square statistic)
If X^2-statistic is large, then Object Oi is an outlier.
A low value for chi-square means there is a high correlation between your two sets of data. In theory, if your observed and expected values were equal (“no difference”) then chi-square would be zero — an event that is unlikely to happen in real life. You could take your calculated chi-square value and compare it to a critical value from a chi-square table. If the chi-square value is more than the critical value, then there is a significant difference.
[ Omit ] Parametric Methods III: Using mixture of parametric distributions
Outlier Detection is a big topic that can be expand for another article. Let me stop it here in Statistics topic.
Note that statistics are quantities that we can calculate , and therefore do not depend on unknown parameters . Moveover, statistics have associated probability distributions, and we are sometimes interested in the distributions of statistics.
MLE: maximum likelihood estimate 最大似然估计
MSE: mean squared error 误差均方
RMSE: root mean squared error 误差均方根
r^2: coefficient of determination 确定系数
SE: standard error 标准误
SEM: standard error of the mean 均数的标准误
SS: sum of squares 平方和
SSE: sum of squared error of the prediction function
SSR: sum of squared residuals
SST: total sum of squares
1、东南亚木材(12种)1.1 学名:印茄intsia biujga 0.Ktze。
商品名:梅宝Merbau(马来西亚、印度尼西亚); Ipil(菲律宾); Kwila(巴布亚新儿内亚); Hintzy(马达加斯加)。
用途:由于木材重而硬,且强度高,并且具有一定的花纹,所以多用于要求木材耐久、强度大和有装饰性的场合,如 :建筑构件、高级家具、细木工、地板等。
1.2 学名:平滑(重黄)娑罗双 Shorea Laevis Ridl.
商品名:巴劳Balau(马来西亚); Selangan batu kumus(沙巴、沙捞越); AK、Teng、 Ack(泰国); Bangkirai(印度尼西亚); Yakal、Malagkal、Guiuo(菲律宾)。
1.3 学名:疏花(深红)娑罗双Shorea pauciflora King.
商品名:红柳安 Red lauan、Tangile、Tiaong(菲律宾);深红麻兰蒂 Dark red maranti、Nemusu(马来西亚); 0bar Suluk(沙巴); Meranti merah、Meranti Ketuko(印度尼西亚)。
木材材性:心材红至深红褐色,边材桃红色,心边材区别略明显;生长轮不明显;木材光泽弱;无特殊气味;纹理交错;结构略粗且均匀;干缩率生材到炉干径向2.2%,弦向0.7;木材略耐腐,边材易被粉蠹虫和白蚁危害,不抗海生钻木动 物侵袭;木材重量中等,马来西亚的该树种的气干密度约0.68g/cm3木材强度低于至中。
1.4 学名:角香茶茱萸 Cantleyt corniculata(Becc) Howard
商品名:达茹-达茹Daru-Daru;德达茹Dedaru(马来西亚); Seranai(印度尼西亚); Bedaru(沙捞越、印度尼西亚); Samala(沙巴)。
1.5 学名:木荚豆Xylia xylocarpa(Roxb.)Taub
商品名:卡姆-贼Cam-xe(柬埔寨、泰国、越南);Pyinkado(缅甸); Irul(印度尼西亚); Sokram(柬埔寨);Deng(泰国)。
1.6 学名:马来甘巴豆 Koompassia malaccensis Maing
商品名:康派斯、克姆帕斯Kempas;门格拉斯Mengeris(加里曼丹岛); Empas(沙巴);Impas(婆罗洲、印度尼西亚、沙巴);Taulong(马来西亚); Upil(印度尼西亚); Bueng(泰国)。
1.7 学名:柚木 Tectona grandis Linn
商品名:柚木Teak(缅甸、印度尼西亚、Jati(印度尼西亚);Kyun(缅甸); Maisak(泰国)。
1.8 学名:大花龙脑香Dipterocarpus grandiflorus Bianco
商品名:克隆Keruing;阿必通Apitong、Hagakhak(菲律宾);Gurjun(印度);Keruing belimbing(马来西亚、北波罗洲): Kanyinbyan(缅甸)。
1.9 学名:坤甸铁木 Eusideroxylon xwageri Teijsm&Binnend
商品名:贝联Beilian(沙巴、沙捞越、印度尼西亚);Tambulian(沙巴、菲律宾);Bormeo iromwood(欧洲);0nglen;Ulin等。
1.10 学名:阔叶黄檀、印度玫瑰木Dalbergia latifolia Roxb
商品名:玫瑰木Rosewood(印度、新加坡、缅甸);Indian Rosewood Bombayblack-wood (印度);Sonkeling、Angsana Keling、Sonobrits、Java-palisandre(印度尼西亚)。
1.11 学名:阔萼摘亚木 Dialium platysepalum Baker
商品名:克然吉Keranji(沙巴、印度尼西亚);Kerandjiasap(印度尼西亚); Keranji Kuning besar(马来西亚); Yi thong bueng(泰国)。
1.12 学名:番龙眼、水黄皮pometia pinnata Forst
商品名:麻芦盖Malugay、Agupanga(菲律宾);卡赛Kasai(东南亚、索罗门群岛);唐Taun(巴布亚新几内亚);Truong(越南);Lan doeng、Kasi besar daun、Matoa(印度尼西亚);Sibu(沙巴)。
用途:建筑上用的构件、地板、室内装饰等。2 非洲木材(12种)
2.1 学名:安哥拉紫檀Pterocarpus angolensis D.C
商品名:穆尼加MunigaGirassonde(安哥拉);Ambila(莫桑比克); Mukwa、 Muninga (赞比亚、津巴布韦)Kiaat、 Kajat、 Kajaatenhout(南非); Mninga(坦桑尼亚)。木材材性:木材系半环孔至散孔材;边材浅灰或黄色,宽度3-5cm,心材材色变异大,从褐色到紫褐公,有时具有深色条纹,与边材区别明显;生长轮略明显:木材有光泽,有微弱香气;纹理直至略交错;结构细略均匀略耐腐,抗蚁和抗海生钻木动物能力较强;干缩小;木材重量中等,气干密度约0.64g/cm3;木材的强度和各项力学性能一般。
2.2 学名:缅茄Afzelia africana Smith.
利亚); Azodau、 Lingue(科特迪瓦);Chamfuta、Mussacossa(莫桑比克); Mbembakofi、 Mkora(坦桑比克);M’bangaLingue(喀麦隆);Afzelia(利比里亚); Bolenug (扎伊尔);Nkokongo(刚果)。木材材性:木材是散孔材;边材浅黄白色,宽度5cm,心材红褐色,常有斑点,与边材区别明显;生长轮略明显;木材具有光泽,无特殊气味;纹理混交错;结构细且均匀;耐磨性强,非常耐虫蛀;干缩小,生材至炉干干缩率弦向4.4%,径向3.0%木材硬而重,气干密度约0.83g/cm3;木材较稳定,强度高。
2.3 学名:刚果铁木、奥特山榄 Aurranglla congoensis A.Chev.
商品名:木库轮古Mukulungu莫比Moabi:Djave(尼日利亚); Elang、Elanzok(喀麦隆);Mfua(刚果);Kungulu(安哥拉);Kabulungu、Kondo-fino(扎伊尔)。
2.4 学名 白梨柴龙树Apodytes dimidiana E.Mey
商品名:穆冈犹讷 Mugonyone;White paer;Pearwood。
2.5 学名:特氏古夷黄木 Guibourtia tessmanii J.Leonard
2.6 学名:筒状非洲楝 Entandrophragma cylindrium Sprague。
木材材性 :木材是散孔材;边材浅黄色;宽度7-10cm,心材新切面是粉红色,时间长变红褐色,心边材区别明显;生长轮不明显;木材具有光泽,新切面有雪松气味;木材纹理交错,径切面有黑色条状花纹或梅花状花纹;结构细且均匀;木材干缩大,径向4.6%;弦向7.4%;木材较耐腐,但边材易受粉蠹虫危害;木材较硬,重量中等,气干密度约0.67 g/cm3;木材的强度和各项力学指标较高。
2.7 学名:猴子果Tieghemella heckelii pierre。
商品名:马扣热Makore;Aganokwa (尼日利亚);Baku、Abako、Edumo(加纳);Makorou、Dumori(科特迪瓦);Doukd、Okola(加蓬)。
木材材性:木材散孔材,边材色浅,宽度5-6cm,心材红褐色,心边材区别不明显;生长轮不明显;木材光泽强,无特殊气味;纹理直,部分具有交错纹理,结构细且均匀;木材的干缩甚大,稳定性好;材质硬且重,气干密度0.62-0.72 g/cm3;木材的耐久性极强,能抗白蚁,偶然会出现蓝变;木材的韧性,强度和各项力学性能强。
2.8 学名:大美木豆Pericopsis elata Van Meeuwen。
商品名:阿夫莫西亚 Afrormosia;Assameal(法国、象牙海岸);Ejen(喀麦隆);Kokrodua、Awawai(加纳);Obang(加蓬);Ole、Bahala、Mohole(扎伊尔、荷兰);Ayin(尼日利亚)。
2.9 学名:罂粟尼索桐Nesogordonia papaverifera R.
2.10 学名:西非香脂树Copaifera salilounda Heck。
2.11 学名:高贵绿柄桑Chlorophora regiaA.Chev
木材材性;木材是散孔格;边材黄白色,宽度5cm ,心材新切面是黄色或浅褐色,久露空气中成为金黄褐色,心边材略有区别;生长轮不明显;木材具有光泽,无特殊气味;耐腐性好,不宜受小蠹虫危害;木材干缩小至中,生材至炉干干缩率径向2.1-4.0%;弦向3.6-6.5%;木材纹理斜或交错,结构略细且略均匀;在由机械损伤而造成的木材裂痕和沟槽中,有碳酸钙沉淀物(称为:石头);木材重量中等,平均气干密度约0.66g/cm3;强度和各项力池指标较好。
2.12 学名:斯图崖豆木Millettia stuhlmannii Taub
3 美洲木材(12种)