Correlation analysis: Difference between revisions

From Cybis Wiki
Jump to navigation Jump to search
Line 21: Line 21:
==Definition of the correlation coefficient==
==Definition of the correlation coefficient==
Define X and Y as paired curve values. There is one X and one Y for each year when the curves lay at a certain position.
Define X and Y as paired curve values. There is one X and one Y for each year when the curves lay at a certain position.
Define Mx and My as the mean values of each curve, i.e. Mx = E(X) and My = E(Y).
Define Mx and My as the mean values (or expected value) of each curve, i.e:
Calculate the standard deviations as Sx = Sqr( E (X-Mx)² ) and Sy = Sqr( E (Y-My)² )
:<math>Mx = E(X)</math> and <math>My = E(Y)</math>
 
Calculate the standard deviations as:
:<math>\sigma x = \sqr{E (X-Mx)^2}</math> and <math>\sigma y = \sqr{E (Y-My)^2}</math>
 
(The standard deviation is a measure of a "normal" (typical) distance from a point on a curve to the mean value of that curve.)
(The standard deviation is a measure of a "normal" (typical) distance from a point on a curve to the mean value of that curve.)


Calculate the correlation coefficient as r = E( (X-Mx)*(Y-My)) / (Sx * Sy )
Calculate the correlation coefficient as:
:<math>r = \frac{E( (X-Mx) (Y-My))}{(\sigma x )( \sigma y)}</math>
 
;See also
*{{enWP|Standard_deviation}} and {{enWP|Expected_value}}
*{{svWP|Standardavvikelse}} and {{svWP|Väntevärde}}


==Overlapping==
==Overlapping==

Revision as of 11:03, 4 July 2009

Correlation analysis is a type of scoring method where you calculate a value for the coovariance between two curves when they lay over each other at a certain position. I.e. the calculated value - the correlation coefficient - is a measure of how well the two curves match each other at that position. When using Curve sliding for the analysis during crossdating, the correlation coefficient is calculated at every possible overlapping position for the curves. Hopefully the highest value found does then correspond to the correct crossdating position.

The correlation coefficient

The correlation coefficient is cumbersome to calculate so you really need a computer for this type of scoring.

Within CDendro the correlation coefficient used is the Pearson product-moment correlation coefficient.[1]

A coefficient value of 1 means that both curves follow each other exactly. A value of -1 means that the curves behaves exactly contrary to each other, e.g. when the one curve goes up, the other goes down. Correlation coefficient values are always within the limits -1 to +1!

It should be noted that the statistical mathematics for the correlation coefficient are defined on the relations between random variables. It should then be noted that ring width values are not random - when a ring is thick, there is a high probability that the next years ring will also be thick. So the use of any correlation coefficient within dendrochronology should best be motivated by practical observations on its efficiency to find correct crossdatings and its efficiency to sort out incorrect matches see [2]

Note: When comparing ring width curves, we do the correlation coefficient mathematics on the normalized curves! When you document a best value from such a correlation calculus, you should also document the normalization method used, as the requirements on the level of the coefficient to acertain a dating, differs somewhat with the normalization method used.[2]

Definition of the correlation coefficient

Define X and Y as paired curve values. There is one X and one Y for each year when the curves lay at a certain position. Define Mx and My as the mean values (or expected value) of each curve, i.e:

and

Calculate the standard deviations as:

Failed to parse (unknown function "\sqr"): {\displaystyle \sigma x = \sqr{E (X-Mx)^2}} and Failed to parse (unknown function "\sqr"): {\displaystyle \sigma y = \sqr{E (Y-My)^2}}

(The standard deviation is a measure of a "normal" (typical) distance from a point on a curve to the mean value of that curve.)

Calculate the correlation coefficient as:

See also

Overlapping

If we slide the curve of one sample so it hangs out a bit on either side of the other curve, it means that only a part of the first curve overlaps the other curve. It is usually not meaningful to test the curve fitting when the overlap is less than 30. For proper crossdating overlaps less than 50-70 should not be considered.

TTest value

The TTest value, also called T-score or T-value, is based on the correlation value but it also takes into account that a match with a short overlap is less worth than a match with a longer overlap when correlation values are the same.

TTest values are calculated according to the formula below, where n is the number of overlapping years and r is the correlation coefficient value.

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle TTest = r \sqr{ \frac{(n-2)}{(1 - r^2 )} }}

See also Wikipedia (English) article about ttest

Notes