Correlation analysis: Difference between revisions

Revision as of 12:03, 4 July 2009

Correlation analysis is a type of scoring method where you calculate a value for the coovariance between two curves when they lay over each other at a certain position. I.e. the calculated value - the correlation coefficient - is a measure of how well the two curves match each other at that position. When using Curve sliding for the analysis during crossdating, the correlation coefficient is calculated at every possible overlapping position for the curves. Hopefully the highest value found does then correspond to the correct crossdating position.

The correlation coefficient

The correlation coefficient is cumbersome to calculate so you really need a computer for this type of scoring.

Within CDendro the correlation coefficient used is the Pearson product-moment correlation coefficient.^[1]

A coefficient value of 1 means that both curves follow each other exactly. A value of -1 means that the curves behaves exactly contrary to each other, e.g. when the one curve goes up, the other goes down. Correlation coefficient values are always within the limits -1 to +1!

It should be noted that the statistical mathematics for the correlation coefficient are defined on the relations between random variables. It should then be noted that ring width values are not random - when a ring is thick, there is a high probability that the next years ring will also be thick. So the use of any correlation coefficient within dendrochronology should best be motivated by practical observations on its efficiency to find correct crossdatings and its efficiency to sort out incorrect matches see ^[2]

Note: When comparing ring width curves, we do the correlation coefficient mathematics on the normalized curves! When you document a best value from such a correlation calculus, you should also document the normalization method used, as the requirements on the level of the coefficient to acertain a dating, differs somewhat with the normalization method used.^[2]

Definition of the correlation coefficient

Define X and Y as paired curve values. There is one X and one Y for each year when the curves lay at a certain position. Define Mx and My as the mean values (or expected value) of each curve, i.e:

Mx=E(X)

and

My=E(Y)

Calculate the standard deviations as:

Failed to parse (unknown function "\sqr"): {\displaystyle \sigma x = \sqr{E (X-Mx)^2}} and Failed to parse (unknown function "\sqr"): {\displaystyle \sigma y = \sqr{E (Y-My)^2}}

(The standard deviation is a measure of a "normal" (typical) distance from a point on a curve to the mean value of that curve.)

Calculate the correlation coefficient as:

r={\frac {E((X-Mx)(Y-My))}{(\sigma x)(\sigma y)}}

See also

Wikipedia (English) article about Standard_deviation and Wikipedia (English) article about Expected_value
Wikipedia (Swedish) article about Standardavvikelse and Wikipedia (Swedish) article about Väntevärde

Overlapping

If we slide the curve of one sample so it hangs out a bit on either side of the other curve, it means that only a part of the first curve overlaps the other curve. It is usually not meaningful to test the curve fitting when the overlap is less than 30. For proper crossdating overlaps less than 50-70 should not be considered.

TTest value

The TTest value, also called T-score or T-value, is based on the correlation value but it also takes into account that a match with a short overlap is less worth than a match with a longer overlap when correlation values are the same.

TTest values are calculated according to the formula below, where n is the number of overlapping years and r is the correlation coefficient value.

Failed to parse (unknown function "\sqr"): {\displaystyle TTest = r \sqr{ \frac{(n-2)}{(1 - r^2 )} }}

See also Wikipedia (English) article about ttest

Notes

↑ . See Wikipedia (English) article about Pearson_product-moment_correlation_coefficient
↑ ^2.0 ^2.1 Torbjörn Axelson and Lars-Åke Larsson: What is a good TTest value

[1] . See Wikipedia (English) article about Pearson_product-moment_correlation_coefficient

[tattest-2] 2.0 ^2.1 Torbjörn Axelson and Lars-Åke Larsson: What is a good TTest value

[1]

[2]

@@ Line 21: / Line 21: @@
 ==Definition of the correlation coefficient==
 Define X and Y as paired curve values. There is one X and one Y for each year when the curves lay at a certain position.
-Define Mx and My as the mean values of each curve, i.e. Mx = E(X) and My = E(Y).
+Define Mx and My as the mean values (or expected value) of each curve, i.e:
-Calculate the standard deviations as Sx = Sqr( E (X-Mx)² ) and Sy = Sqr( E (Y-My)² )
+:<math>Mx = E(X)</math> and <math>My = E(Y)</math>
+Calculate the standard deviations as:
+:<math>\sigma x = \sqr{E (X-Mx)^2}</math> and <math>\sigma y = \sqr{E (Y-My)^2}</math>
 (The standard deviation is a measure of a "normal" (typical) distance from a point on a curve to the mean value of that curve.)
-Calculate the correlation coefficient as r = E( (X-Mx)*(Y-My)) / (Sx * Sy )
+Calculate the correlation coefficient as:
+:<math>r = \frac{E( (X-Mx) (Y-My))}{(\sigma x )( \sigma y)}</math>
+;See also
+*{{enWP|Standard_deviation}} and {{enWP|Expected_value}}
+*{{svWP|Standardavvikelse}} and {{svWP|Väntevärde}}
 ==Overlapping==

Correlation analysis: Difference between revisions

Revision as of 12:03, 4 July 2009

Contents

The correlation coefficient

Definition of the correlation coefficient

Overlapping

TTest value

Notes

Navigation menu

Correlation analysis: Difference between revisions

Revision as of 12:03, 4 July 2009

The correlation coefficient

Definition of the correlation coefficient

Overlapping

TTest value

Notes

Navigation menu

Search