Correlation analysis: Difference between revisions

From Cybis Wiki
Jump to navigation Jump to search
(→‎TTest-value: kanske wp-referensen är mer förvirrande än klargörande?)
Line 15: Line 15:
ring width values are not random - when a ring is thick, there is a high probability that the next years ring will also be thick. So the use of any correlation coefficient
ring width values are not random - when a ring is thick, there is a high probability that the next years ring will also be thick. So the use of any correlation coefficient
within dendrochronology should best be motivated by practical observations on its efficiency to find correct crossdatings and its efficiency to sort out incorrect matches  
within dendrochronology should best be motivated by practical observations on its efficiency to find correct crossdatings and its efficiency to sort out incorrect matches  
see <ref>[[Torbjörn Axelson]] and [[Lars-Åke Larsson]]: [http://www.cybis.se/forfun/dendro/TTEST/index.htm What is a good TTest value]''</ref>
see <ref name=tattest>[[Torbjörn Axelson]] and [[Lars-Åke Larsson]]: [http://www.cybis.se/forfun/dendro/TTEST/index.htm What is a good TTest value]''</ref>


Note: When comparing ring width curves, we do the correlation coefficient mathematics on the [[Normalization|normalized]] curves! When you document a best value from such a correlation calculus, you should also document the normalization method used, as the requirements on the level of the coefficient to acertain a dating, differs somewhat with the normalization method used (ref 2).
Note: When comparing ring width curves, we do the correlation coefficient mathematics on the [[Normalization|normalized]] curves! When you document a best value from such a correlation calculus, you should also document the normalization method used, as the requirements on the level of the coefficient to acertain a dating, differs somewhat with the normalization method used.<ref name=tattest/>


==Definition of the correlation coefficient==
==Definition of the correlation coefficient==

Revision as of 17:10, 1 July 2009

Correlation analysis is a type of scoring method where you calculate a value for the coovariance between two curves when they lay over each other at a certain position. I.e. the calculated value - the correlation coefficient - is a measure of how well the two curves match each other at that position. When using Curve sliding for the analysis during crossdating, the correlation coefficient is calculated at every possible overlapping position for the curves. Hopefully the highest value found does then correspond to the correct crossdating position.

The correlation coefficient

The correlation coefficient is cumbersome to calculate so you really need a computer for this type of scoring.

Within CDendro the correlation coefficient used is the Pearson product-moment correlation coefficient.[1]

A coefficient value of 1 means that both curves follow each other exactly. A value of -1 means that the curves behaves exactly contrary to each other, e.g. when the one curve goes up, the other goes down. Correlation coefficient values are always within the limits -1 to +1!

It should be noted that the statistical mathematics for the correlation coefficient are defined on the relations between random variables. It should then be noted that ring width values are not random - when a ring is thick, there is a high probability that the next years ring will also be thick. So the use of any correlation coefficient within dendrochronology should best be motivated by practical observations on its efficiency to find correct crossdatings and its efficiency to sort out incorrect matches see [2]

Note: When comparing ring width curves, we do the correlation coefficient mathematics on the normalized curves! When you document a best value from such a correlation calculus, you should also document the normalization method used, as the requirements on the level of the coefficient to acertain a dating, differs somewhat with the normalization method used.[2]

Definition of the correlation coefficient

Define X and Y as paired curve values. There is one X and one Y for each year when the curves lay at a certain position. Define Mx and My as the mean values of each curve, i.e. Mx = E(X) and My = E(Y). Calculate the standard deviations as Sx = Sqr( E (X-Mx)² ) and Sy = Sqr( E (Y-My)² ) (The standard deviation is a measure of a "normal" (typical) distance from a point on a curve to the mean value of that curve.)

Calculate the correlation coefficient as r = E( (X-Mx)*(Y-My)) / (Sx * Sy )

Overlapping

If we slide the curve of one sample so it hangs out a bit on either side of the other curve, it means that only a part of the first curve overlaps the other curve. It is usually not meaningfull to test the curve fitting when the overlap is less than 30. For proper crossdating overlaps less than 50-70 should not be considered.

TTest value

The TTest value, also called T-score or T-value, is based on the correlation value but it also takes into account that a match with a short overlap is less worth than a match with a longer overlap when correlation values are the same.

TTest values are calculated according to the formula below, where n is the number of overlapping years and r is the correlation coefficient value.

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle TTest = r \sqr{ \frac{(n-2)}{(1 - r^2 )} }}

See also Wikipedia (English) article about ttest

Notes