Minitab Assignment Help

The analysis illustrated hereafter is based on the frequency of cancer in developed and developing countries. Our experts who provide Minitab assignment help have captured cancer prevalence data of cancer survivors which clearly shows the difference in deve

loped and developing countries. Partial window view of the data analysis is also provided in the Minitab homework help solution below. We have also applied the logistic linear regression to this data and interpreted the results given by the software.

In this assignment we have attempted to understand and predict the prevalence of cancer in Developed versus Developing Country based on various predictors grouped as Country wise, Country Status wise, region wise, cancer type wise and gender Minitab assignment help wise have been considered. It has been ensured that data remains free from Minitab project help any missing values and all predictors are coded to either numeric categorization or selected predictors of numeric continuous ratio scale type scale of measurement. Attempt has been made to understand that is there any statistics using Minitab assignment help statistically significant difference in the prevalence of any type of cancer between Developed vs. Developing Countries. Data has been collected for mostly all the countries help with Minitab homework covering respective region and type of cancer. Our response and predictor variables are as under:

Dependent Variable: Country Status (1-Developed, 0-Developing)

Independent Variables:

Region (1- Northern America, 2- Oceania, 3- Latin America/Caribbean, 4- Europe, 5-Asia, 6-Africa),

Type of Cancer (1- Esophagus, 2- Thyroid, 3- Leukemia, 4- Non-Hodgkin lymphoma, 5- Lip, oral cavity, 6- Kaposi sarcoma, 7-Colorectal, 8- Stomach, 9- Liver, 10- Cervix uteri, 11-Lung, 12- Prostate and 13-Breast),

Smoking prevalence 2013 statistics using Minitab assignment help,

All cancers incidence (excluding non-melanoma skin cancers) per 1, 00,000 lives, 2012,

All cancers mortality (excluding non-melanoma skin cancers) per 1, 00, 000 lives, 2012,

Overweight prevalence, adult lives Percentage of overweight and obese people aged 20 and older, 2008,

“Prevalence of cancer survivors – Number of survivors diagnosed with cancer within the past five years per 100,000 adults (15 years and older), 2012,

Risk of cancer by age 75 Cumulative risk Percentage, 2012

We have analyzed two categorical predictors and six numeric continuous variables make my Minitab assignment.

Here is a partial window view of the data analysis using:

Minitab assignment help

In order to get a feeling for the predictive power of individual variables and Minitab assignment help to see if there is separation between statistics using the two groups on the variables, help with Minitab homework we constructed side-by-side boxplots as under:

Minitab homework help

All cancers incidence and Prevalence of cancer survivors show clearly the separation between Developed vs. Minitab problems with answers Developing Countries. But all cancers mortality and Minitab assignment help overweight prevalence shows no or limited separation between Developed vs. Developing Countries.

Minitab project help

Type of cancer obviously and data analysis using Minitab homework help smoking prevalence shows some separation between Developed vs. Developing Countries but same is more and Minitab assignment help clearly true for risk of cancer by age 75. As evident, box-plots does not take into account the variables having joint effects, and doesn’t necessarily imply that a linear logistic model is appropriate, but is still helpful Minitab assignment for money.

We attempted to fit a logistic regression model to these data as under:

help with Minitab assignment

Minitab has refused to fit the model, due to its stringent criteria related to (quasi-complete) separation because maximum likelihood estimates are potentially do my Minitab assignment  unstable because of the configuration of the data. When the predictors are (all) numerical this is often because the model fits too well or the predictors separate the data into man and women (almost) perfectly, and a simpler model that does (almost) as well is preferable. To check this we help with Minitab assignment turn to the old algorithm Minitab project help from earlier versions of Minitab and check what it can tell us. The relevant command code is as follows:

Blogistic ‘Country_Code’ = ‘Region_Code’ ‘Cancer_Code’ ‘Smoking prevalence 2013’ ‘All cancers incidence (excluding’ ‘All cancers mortality (excluding’ ‘Overweight prevalence, adult li’ ‘Prevalence of cancer survivors’ ‘Risk of Cancer by Age 75’;

Logit;

Brief 3.

Note that in this situation where the response is Minitab homework for money simple 0/1 variable the ST subcommand is not needed, and the Brief 3 rather than Brief 2 subcommand is used because there are two categorical predictors. Resultant output is shown below statistics using Minitab assignment help:

help with Minitab homework

005

Statistics using Minitab homework help as expected the fit is extremely strong and we need to simplify our model. First we tried the logistic regression after removing non-separable predictors as help with Minitab homework per box-plots i.e. Cancer_code, Region_codes and All cancers mortality (excluding non-melanoma skin cancers) per 1, 00, 000 lives as under Minitab questions with answers:

Minitab assignment solution

007

008

Model now fits directly by Minitab and issue of quasi-complete Minitab assignment help separation of data points no longer linger, data analysis using so we need not again run the model using old algorithm as done before.

statistics using Minitab assignment help

As expected the Minitab problems with answers fit is still quite strong by Somer’s D in the above reduced model and we need to Minitab assignment  solution simplify our model further to see the fine separations help with Minitab assignment because still three of the predictors Minitab questions with answers: Smoking prevalence 2013, All cancers incidence (excluding non-melanoma skin cancers) per 1, 00,000 lives, 2012 and Risk of cancer by age 75 Cumulative risk Percentage, 2012are individually not significant and may be removed. Statistics using Minitab homework solution this time we ran logistic data analysis using Minitab homework help regression by removing the categorical variable Smoking prevalence 2013from the model and keeping All cancers incidence (excluding non-melanoma skin cancers) per 1, 00,000 lives, 2012 and Risk of cancer by age 75 Cumulative risk Percentage, 2012 as they were quite Minitab assignment help separable as per Box plots. Minitab did not show any quasi-complete separation issue. Output is as under:

Simplified Logit Model1:

online Minitab assignment help

11

12

Simplified model has shown some improvements but still predictors All cancers incidence (excluding non-melanoma skin cancers) per 1, 00,000 lives, 2012 and Risk of cancer Minitab assignment help by age 75 Cumulative risk Percentage, 2012 are individually Minitab homework help insignificant (p>0.05). Hence we decided to help with Minitab homeworkperforms stepwise regression model on remaining predictors as under:

Stepwise Logit Model2:

online Minitab tutor

01

02

Minitab assignment help

Algorithm selected pay for Minitab assignment the significant two numeric continuous predictors as each one of them is individually significant at 5% level. On the face of it, Minitab homework solution results look bit less than ok. Even though Hosmer-Lemeshow indicates a good fitpay for Minitab homework (the Pearson and deviance goodness-of-fit tests Minitab assignment help are not valid here), both predictors are highly statistically significant, and Somers’ D is a robust 0.84 and AIC is reduced to 21233. These results are even better when we perform step-wise binary logistic regression using all the six continuous predictors as potential predictors, results are again shown as under:

Minitab homework help

But also the problem remains the Minitab homework help same that stepwise regression does not account for how predictors might work together in a very effective way, the way least squares regression best subsets can. We can’t easily get best subsets here, but a quick-and-dirty approach is to use ordinary data analysis using Minitab assignment help (least squares) best subsets regression (with the 0/1 Country Status variable as the response) to sort through the models. Here are the resultant best subsetsoutput Minitabassignment solution:

Best Subsets Regression: Country_Stat versus Region_Code, Cancer_Code, …

 Response is Country_Status

The following variables are included in all models: Overweight prevalence, adult li

Prevalence of cancer survivors

AA

ll

ll

cc

aa

nn

cc R

ee i

rr s

ss k

i m o

n o f

c r

i t C

d a a

e l n

n i c

c t e

R C e y r

e a

g n ( ( b

i c e e y

o e x x

n r c c A

_ _ l l G

C C u u e

oo d d

R-Sq    R-Sq  Mallows           d d i i 7

Vars  R-Sq  (adj)  (pred)       Cp        S  e e n n 5

1  57.4   57.0    56.4     21.3  0.29523  X

1  55.8   55.4    54.6     34.9  0.30077        X

2  58.3   57.9    57.0     15.4  0.29240  X     X

2  57.8   57.3    56.6     20.3  0.29446  X X

3  59.2   58.6    57.5     10.2  0.28986  X   XX

3  58.9   58.3    57.4     12.5  0.29082  X XX

4  59.9   59.2    58.0      6.0  0.28766  X XXX

4  59.2   58.5    57.2     12.1  0.29025  X   XXX

5  59.9   59.1    57.7      8.0  0.28809  X XXXX

The best subsets points to a model with the four predictors Region_Code, Cancer_Code, All cancers incidence and All cancers mortality (of course, all of those four predictorsMinitab homework help were in the stepwise-generated model2 above and region_code and cancer_code were rejected by the same).

The Hosmer-Lemeshow test indicates a good fit to the data in step wise logit model 2, which is having the same two predictors as suggested by best subsets points excluding Region_Code and Cancer_Code. More interestingly, this model is clearly better than the one generated by simplified logistic regression (Model 1 using four predictors. Its Somers’ D value is (0.84 versus 0.88) but  its AIC value is relatively smaller (212 versus 215). With 92% concordant pairs and only 8% discordant ones, we’ve identified our groups well.

Each of the coefficients is Minitab assignment help statistically significant, and the overall test on all two coefficients is very significant. The Overweight prevalence coefficient says that an increase of one percentage point in  the Overweight prevalence is associated with an increase in the odds of finding Developed country having cancer prevalence in the next year by 3.75%% holding prevalence of cancer survivors constant. Similarly, Prevalence of cancer survivor’s coefficient says that an increase of one percentage point in the Prevalence of cancer survivors is associated with increase in the odds of finding Developed country having cancer prevalence in the next year by0.351%; On seeing regression diagnostics we see some outliers with high residuals and some unusual values.

Minitab project help

Above histogram may help in visually understanding selection of two predictor variables by stepwise logistic regression process.

Descriptive Statistics: Smoking prev, All cancers , All cancers , Overweight p, …

Total

Variable               Country_Status  Count    Mean   StDev  Minimum  Median  Maximum

Smoking prevalence 2013   0              250  15.863  14.315    0.600  12.050   61.100

1               98   22.43   10.13     1.40   22.10    48.10

All cancers incidence (e  0250  139.93   51.42    56.70  132.75   305.60

1               98  251.07   74.01    83.80  259.25   385.30

All cancers mortality (e  0250   94.19   31.06    42.30   88.50   209.60

1               98  106.60   35.87    46.40   98.30   208.20

Overweight prevalence, a  0250   39.73   19.84     6.10   41.35    88.90

1               98   55.09   13.90    15.30   57.65    81.30

Prevalence of cancer sur  0250   399.6   267.2     82.8   306.4   1409.7

1               98  1342.3   603.7    131.1  1473.9   2165.5

Risk of Cancer by AGe75  0250  14.250   4.672    4.600  14.000   27.100

1               98  24.565   6.083    9.800  25.900   32.900

Our logit model 2 is better and simpler due to two predictors, but still should be improved. If we try to remove outliers, Minitab might not run the model again due to quasi-complete separation issue. If we observe the descriptive statistics of our five predictors in Minitab assignment help the model and comparing country status level mean for each predictor, we can easily say that separation is better defined with three predictors Risk of Cancer by Age 75, prevalence of cancer survivor and All cancers incidence due to significant variation in mean value at country statuslevel Minitab homework solution as highlighted above If we conceptualize the logistic regression model using these three predictors only, we have following Minitab output:

Binary Logistic Regression: Country_Stat versus All cancers , Prevalence o, Risk of Canc

 * WARNING * When the data are in the Response/Frequency format, the Residuals versus fits

plot is unavailable.

Method

Link function  Logit

Rows used      348

Response Information

Variable        Value  Count

Country_Status  1         98  (Event)

0        250

Total    348

Deviance at Each Iterative Step

Step    Deviance

1  224.403002

2  216.166732

3  215.796647

4  215.795436

5  215.795436

Deviance Table

Source                              DF  AdjDevAdj Mean  Chi-Square  P-Value

Regression                           3  197.953   65.9845      197.95    0.000

All cancers incidence (excludin    1    0.032    0.0320        0.03    0.858

Prevalence of cancer survivors     1   22.684   22.6837       22.68    0.000

Risk of Cancer by AGe 75           1    0.051    0.0512        0.05    0.821

Error                              344  215.795    0.6273

Total                              347  413.749

Model Summary

Deviance   Deviance

R-Sq  R-Sq(adj)     AIC

47.84%     47.12%  223.80

Coefficients

Term                                 Coef   SE Coef    VIF

Constant                           -3.703     0.834

All cancers incidence (excludin   0.00115   0.00645   6.19

Prevalence of cancer survivors   0.003862  0.000941   6.08

Risk of Cancer by AGe 75          -0.0224    0.0993  10.88

Odds Ratios for Continuous Predictors

Odds Ratio       95% CI

All cancers incidence (excludin1.0012  (0.9886, 1.0139)

Prevalence of cancer survivors       1.0039  (1.0020, 1.0057)

Risk of Cancer by AGe 75             0.9778  (0.8049, 1.1879)

Regression Equation

P(1)  =  exp(Y’)/(1 + exp(Y’))

Y’ = -3.703 + 0.00115 All cancers incidence (excludin

+ 0.003862 Prevalence of cancer survivors – 0.0224 Risk of Cancer by AGe 75

Goodness-of-Fit Tests

Test              DF  Chi-Square  P-Value

Deviance         344      215.80    1.000

Pearson          344      380.95    0.083

Hosmer-Lemeshow    8       35.35    0.000

Observed and Expected Frequencies for Hosmer-Lemeshow Test

Event

Probability   Country_Status = 1  Country_Status = 0

Group       Range      Observed  Expected  Observed  Expected

1  (0.000, 0.044)         7       1.4        27      32.6

2  (0.044, 0.048)         1       1.6        34      33.4

3  (0.048, 0.056)         2       1.8        33      33.2

4  (0.056, 0.064)         0       2.1        35      32.9

5  (0.064, 0.077)         0       2.5        35      32.5

6  (0.077, 0.125)         6       3.4        28      30.6

7  (0.125, 0.296)         4       6.5        31      28.5

8  (0.296, 0.652)        13      16.3        22      18.7

9  (0.652, 0.918)        30      28.7         5       6.3

10  (0.918, 0.988)        35      33.8         0       1.2

Measures of Association

Pairs       Number  Percent  Summary Measures       Value

Concordant   21426     87.5  Somers’ D               0.75

Discordant    2981     12.2  Goodman-Kruskal Gamma   0.76

Ties            93      0.4  Kendall’s Tau-a         0.31

Total        24500    100.0

Association is between the response variable and predicted probabilities

Fits and Diagnostics for Unusual Observations

Observed

Obs  Probability     Fit    ResidStdResid

6       0.0000  0.3605  -0.9457      -0.98     X

11       1.0000  0.0419   2.5190       2.52  R

23       1.0000  0.0783   2.2572       2.27  R

53       1.0000  0.7635   0.7347       0.75     X

63       1.0000  0.5511   1.0916       1.14     X

70       1.0000  0.7727   0.7181       0.74     X

82       0.0000  0.2091  -0.6850      -0.70     X

84       1.0000  0.1148   2.0806       2.09  R

86       1.0000  0.0500   2.4478       2.45  R

89       1.0000  0.7366   0.7820       0.80     X

94       1.0000  0.7043   0.8373       0.85     X

128       1.0000  0.8577   0.5542       0.56     X

129       1.0000  0.0397   2.5398       2.54  R

134       1.0000  0.0440   2.4996       2.51  R

139       1.0000  0.6745   0.8874       0.91     X

143       1.0000  0.1113   2.0953       2.11  R

158       0.0000  0.3095  -0.8607      -0.88     X

164       1.0000  0.0346   2.5938       2.60  R

180       0.0000  0.3398  -0.9112      -0.94     X

185       1.0000  0.0423   2.5150       2.52  R

197       1.0000  0.0808   2.2433       2.25  R

227       1.0000  0.7377   0.7799       0.80     X

237       1.0000  0.5349   1.1187       1.16     X

244       1.0000  0.7476   0.7628       0.78     X

256       0.0000  0.1969  -0.6622      -0.68     X

258       1.0000  0.1109   2.0970       2.11  R

260       1.0000  0.0519   2.4328       2.44  R

263       1.0000  0.7092   0.8290       0.85     X

302       1.0000  0.8417   0.5870       0.60     X

303       1.0000  0.0411   2.5265       2.53  R

308       1.0000  0.0448   2.4921       2.50  R

313       1.0000  0.6486   0.9304       0.95     X

317       1.0000  0.1052   2.1223       2.13  R

319       1.0000  0.8360   0.5985       0.61     X

334       1.0000  0.1356   1.9989       2.03  R

338       1.0000  0.0363   2.5752       2.58  R

R  Large residual

X  Unusual X

Clearly, above three predictor model is not better that our prior two predictor binary logistic model. Hence best possible fitted model is the one with two predictors.

Tabulated Statistics: Country_Status, Predict 

Rows: Country_Status   Columns: Predict

0      1     All

0        240     10     250

68.97   2.87   71.84

1         24     74      98

6.90  21.26   28.16

All      264     84     348

75.86  24.14  100.00

Cell Contents:      Count

% of Total

Pearson Chi-Square = 196.607, DF = 1, P-Value = 0.000

Likelihood Ratio Chi-Square = 191.577, DF = 1, P-Value = 0.000

Fisher’s exact test: P-Value =  0.0000000

Cramer’s V-square  0.564962

Kappa              0.747568

Above is a classification matrix, based on which the estimated Minitab homework help probability is above or below the 0.50.

Accuracy: Overall, how often is the classifier correct?

(TP+TN)/total = (240+74)/348 = 0.9023

Specificity: When it’s actually no, how often does it predict no?

TN/actual NO = 240/250= 0.96

Precision: When it predicts yes, how often is it correct?

TP/predicted yes = 74/84 = 0.881

Prevalence: How often does the yes condition actually occur in our sample?

actual yes/total = 98/348 = 0.2816

90.2% of the events Minitab assignment help was correctly classified with 88.1% correct prediction for developed country and 96% for Developing country. Thus, the two predictors Minitab homework solution did good job of classifying the cancer data analysis using Minitab assignment help patients into the Developed country and Developing country groups with prevalence of cancer in developed country at 28.16%.