Skip to main content

Table 3 Selection strategy for the subset based on information without taking into account correlations between haplotype frequency estimates.

From: Assessment of global phase uncertainty in case-control studies

hipROA data

genotype

nr of individuals

111

112

121

122

211

212

221

loss per genotype

total loss

Cases

n = 61

1HH

10

0.25

0.25

0.25

0.25

0

0

0

1.00

 
 

HHH

7

0

0.03

0.18

0.19

0.19

0.18

0.03

0.79

 
 

H1H

2

0.19

0.19

0

0

0.19

0.19

0

0.77

 
 

HH1

3

0.04

0

0.040

0

0.04

0

0.040

0.16

 
 

no ambiguity

39

         
 

loss per haplotype

 

3.00

3.07

3.85

3.83

1.85

1.62

0.31

 

17.52

Controls

n = 653

H1H

28

0.21

0.21

0

0

0.21

0.21

0

0.83

 
 

HH1

46

0.18

0

0.18

0

0.18

0

0.18

0.72

 
 

1HH

91

0.12

0.12

0.12

0.12

0

0

0

0.49

 
 

HHH

47

0

0.04

0.06

0.10

0.10

0.06

0.04

0.40

 
 

no ambiguity

441

         
 

loss per haplotype

 

25.29

19.09

22.23

15.760

18.59

8.52

10.34

 

119.81

Simulated data

genotype

nr of individuals

111

112

121

122

211

212

221

loss per genotype

total loss

Cases

n = 500

1HH

83

0.25

0.25

0.25

0.25

0

0

0

1.00

 
 

HHH

40

0

0.03

0.11

0.13

0.13

0.11

0.03

0.55

 
 

H1H

26

0.11

0.11

0

0

0.11

0.11

0

0.15

 
 

HH1

32

0.04

0

0.04

0

0.04

0

0.04

0.15

 
 

no ambiguity

319

         
 

loss per haplotype

 

24.80

24.94

26.26

26.07

9.36

7.17

2.52

 

121.12

Controls

n = 500

H1H

25

0.23

0.23

0

0

0.23

0.23

0

0.93

 
 

HH1

36

0.21

0

0.21

0

0.21

0

0.21

0.83

 
 

HHH

43

0

0.05

0.06

0.11

0.11

0.06

0.05

0.44

 
 

1HH

70

0

0.11

0.11

0.11

0.11

0

0

0.42

 
 

no ambiguity

326

         
 

loss per haplotype

 

20.68

15.23

17.65

11.89

17.86

8.61

9.55

 

101.47

  1. The group identifiers denote the genotype at the SNPs, where 1 and 2 stand for homozygote 1/1 and 2/2, and H denotes a heterozygote. The order of the group identifications are determined by the sum of the diagonal elements - the column "loss per genotype" - of the loss matrix â„’ i in (3). Individuals with higher loss will results in higher information gain, when their ambiguity could be resolved. The values of the last row, "loss per haplotype", show information loss per haplotype. The simulated data set is the same sample data set as in Table 2.