# Minitab Assignment Help

The analysis illustrated hereafter is based on the frequency of cancer in developed and developing countries. Our experts who provide Minitab assignment help have captured cancer prevalence data of cancer survivors which clearly shows the difference in deve

loped and developing countries. Partial window view of the data analysis is also provided in the Minitab homework help solution below. We have also applied the logistic linear regression to this data and interpreted the results given by the software.

In this assignment we have attempted to understand and predict the prevalence of cancer in Developed versus Developing Country based on various predictors grouped as Country wise, Country Status wise, region wise, cancer type wise and gender Minitab assignment help wise have been considered. It has been ensured that data remains free from Minitab project help any missing values and all predictors are coded to either numeric categorization or selected predictors of numeric continuous ratio scale type scale of measurement. Attempt has been made to understand that is there any statistics using Minitab assignment help statistically significant difference in the prevalence of any type of cancer between Developed vs. Developing Countries. Data has been collected for mostly all the countries help with Minitab homework covering respective region and type of cancer. Our response and predictor variables are as under:

__Dependent Variable__: Country Status (1-Developed, 0-Developing)

__Independent Variables__:

Region (1- Northern America, 2- Oceania, 3- Latin America/Caribbean, 4- Europe, 5-Asia, 6-Africa),

Type of Cancer (1- Esophagus, 2- Thyroid, 3- Leukemia, 4- Non-Hodgkin lymphoma, 5- Lip, oral cavity, 6- Kaposi sarcoma, 7-Colorectal, 8- Stomach, 9- Liver, 10- Cervix uteri, 11-Lung, 12- Prostate and 13-Breast),

Smoking prevalence 2013 statistics using Minitab assignment help,

All cancers incidence (excluding non-melanoma skin cancers) per 1, 00,000 lives, 2012,

All cancers mortality (excluding non-melanoma skin cancers) per 1, 00, 000 lives, 2012,

Overweight prevalence, adult lives Percentage of overweight and obese people aged 20 and older, 2008,

“Prevalence of cancer survivors – Number of survivors diagnosed with cancer within the past five years per 100,000 adults (15 years and older), 2012,

Risk of cancer by age 75 Cumulative risk Percentage, 2012

We have analyzed two categorical predictors and six numeric continuous variables make my Minitab assignment.

Here is a partial window view of the data analysis using:

In order to get a feeling for the predictive power of individual variables and Minitab assignment help to see if there is separation between statistics using the two groups on the variables, help with Minitab homework we constructed side-by-side boxplots as under:

All cancers incidence and Prevalence of cancer survivors show clearly the separation between Developed vs. Minitab problems with answers Developing Countries. But all cancers mortality and Minitab assignment help overweight prevalence shows no or limited separation between Developed vs. Developing Countries.

Type of cancer obviously and data analysis using Minitab homework help smoking prevalence shows some separation between Developed vs. Developing Countries but same is more and Minitab assignment help clearly true for risk of cancer by age 75. As evident, box-plots does not take into account the variables having joint effects, and doesn’t necessarily imply that a linear logistic model is appropriate, but is still helpful Minitab assignment for money.

We attempted to fit a logistic regression model to these data as under:

Minitab has refused to fit the model, due to its stringent criteria related to (quasi-complete) separation because maximum likelihood estimates are potentially do my Minitab assignment unstable because of the configuration of the data. When the predictors are (all) numerical this is often because the model fits too well or the predictors separate the data into man and women (almost) perfectly, and a simpler model that does (almost) as well is preferable. To check this we help with Minitab assignment turn to the old algorithm Minitab project help from earlier versions of Minitab and check what it can tell us. The relevant command code is as follows:

Blogistic ‘Country_Code’ = ‘Region_Code’ ‘Cancer_Code’ ‘Smoking prevalence 2013’ ‘All cancers incidence (excluding’ ‘All cancers mortality (excluding’ ‘Overweight prevalence, adult li’ ‘Prevalence of cancer survivors’ ‘Risk of Cancer by Age 75’;

Logit;

Brief 3.

Note that in this situation where the response is Minitab homework for money simple 0/1 variable the ST subcommand is not needed, and the Brief 3 rather than Brief 2 subcommand is used because there are two categorical predictors. Resultant output is shown below statistics using Minitab assignment help:

Statistics using Minitab homework help as expected the fit is extremely strong and we need to simplify our model. First we tried the logistic regression after removing non-separable predictors as help with Minitab homework per box-plots i.e. Cancer_code, Region_codes and All cancers mortality (excluding non-melanoma skin cancers) per 1, 00, 000 lives as under Minitab questions with answers:

Model now fits directly by Minitab and issue of quasi-complete Minitab assignment help separation of data points no longer linger, data analysis using so we need not again run the model using old algorithm as done before.

As expected the Minitab problems with answers fit is still quite strong by Somer’s D in the above reduced model and we need to Minitab assignment solution simplify our model further to see the fine separations help with Minitab assignment because still three of the predictors Minitab questions with answers: Smoking prevalence 2013, All cancers incidence (excluding non-melanoma skin cancers) per 1, 00,000 lives, 2012 and Risk of cancer by age 75 Cumulative risk Percentage, 2012are individually not significant and may be removed. Statistics using Minitab homework solution this time we ran logistic data analysis using Minitab homework help regression by removing the categorical variable Smoking prevalence 2013from the model and keeping All cancers incidence (excluding non-melanoma skin cancers) per 1, 00,000 lives, 2012 and Risk of cancer by age 75 Cumulative risk Percentage, 2012 as they were quite Minitab assignment help separable as per Box plots. Minitab did not show any quasi-complete separation issue. Output is as under:

__Simplified Logit Model1:__

Simplified model has shown some improvements but still predictors All cancers incidence (excluding non-melanoma skin cancers) per 1, 00,000 lives, 2012 and Risk of cancer Minitab assignment help by age 75 Cumulative risk Percentage, 2012 are individually Minitab homework help insignificant (p>0.05). Hence we decided to help with Minitab homeworkperforms stepwise regression model on remaining predictors as under:

__Stepwise Logit Model2__:

Algorithm selected pay for Minitab assignment the significant two numeric continuous predictors as each one of them is individually significant at 5% level. On the face of it, Minitab homework solution results look bit less than ok. Even though Hosmer-Lemeshow indicates a good fitpay for Minitab homework (the Pearson and deviance goodness-of-fit tests Minitab assignment help are not valid here), both predictors are highly statistically significant, and Somers’ D is a robust 0.84 and AIC is reduced to 21233. These results are even better when we perform step-wise binary logistic regression using all the six continuous predictors as potential predictors, results are again shown as under:

But also the problem remains the Minitab homework help same that stepwise regression does not account for how predictors might work together in a very effective way, the way least squares regression best subsets can. We can’t easily get best subsets here, but a quick-and-dirty approach is to use ordinary data analysis using Minitab assignment help (least squares) best subsets regression (with the 0/1 Country Status variable as the response) to sort through the models. Here are the resultant best subsetsoutput Minitabassignment solution:

**Best Subsets Regression: Country_Stat versus Region_Code, Cancer_Code, …**

** **Response is Country_Status

The following variables are included in all models: Overweight prevalence, adult li

Prevalence of cancer survivors

AA

ll

ll

cc

aa

nn

cc R

ee i

rr s

ss k

i m o

n o f

c r

i t C

d a a

e l n

n i c

c t e

R C e y r

e a

g n ( ( b

i c e e y

o e x x

n r c c A

_ _ l l G

C C u u e

oo d d

R-Sq R-Sq Mallows d d i i 7

Vars R-Sq (adj) (pred) Cp S e e n n 5

1 57.4 57.0 56.4 21.3 0.29523 X

1 55.8 55.4 54.6 34.9 0.30077 X

2 58.3 57.9 57.0 15.4 0.29240 X X

2 57.8 57.3 56.6 20.3 0.29446 X X

3 59.2 58.6 57.5 10.2 0.28986 X XX

3 58.9 58.3 57.4 12.5 0.29082 X XX

4 59.9 59.2 58.0 6.0 0.28766 X XXX

4 59.2 58.5 57.2 12.1 0.29025 X XXX

5 59.9 59.1 57.7 8.0 0.28809 X XXXX

The best subsets points to a model with the four predictors Region_Code, Cancer_Code, All cancers incidence and All cancers mortality (of course, all of those four predictorsMinitab homework help were in the stepwise-generated model2 above and region_code and cancer_code were rejected by the same).

The Hosmer-Lemeshow test indicates a good fit to the data in step wise logit model 2, which is having the same two predictors as suggested by best subsets points excluding Region_Code and Cancer_Code. More interestingly, this model is clearly better than the one generated by simplified logistic regression (Model 1 using four predictors. Its Somers’ D value is (0.84 versus 0.88) but its AIC value is relatively smaller (212 versus 215). With 92% concordant pairs and only 8% discordant ones, we’ve identified our groups well.

Each of the coefficients is Minitab assignment help statistically significant, and the overall test on all two coefficients is very significant. The Overweight prevalence coefficient says that an increase of one percentage point in the Overweight prevalence is associated with an increase in the odds of finding Developed country having cancer prevalence in the next year by 3.75%% holding prevalence of cancer survivors constant. Similarly, Prevalence of cancer survivor’s coefficient says that an increase of one percentage point in the Prevalence of cancer survivors is associated with increase in the odds of finding Developed country having cancer prevalence in the next year by0.351%; On seeing regression diagnostics we see some outliers with high residuals and some unusual values.

Above histogram may help in visually understanding selection of two predictor variables by stepwise logistic regression process.

**Descriptive Statistics: Smoking prev, All cancers , All cancers , Overweight p, … **

Total

Variable Country_Status Count Mean StDev Minimum Median Maximum

Smoking prevalence 2013 0 250 15.863 14.315 0.600 12.050 61.100

1 98 22.43 10.13 1.40 22.10 48.10

All cancers incidence (e 0250 139.93 51.42 56.70 132.75 305.60

1 98 251.07 74.01 83.80 259.25 385.30

All cancers mortality (e 0250 94.19 31.06 42.30 88.50 209.60

1 98 106.60 35.87 46.40 98.30 208.20

Overweight prevalence, a 0250 39.73 19.84 6.10 41.35 88.90

1 98 55.09 13.90 15.30 57.65 81.30

Prevalence of cancer sur 0250 399.6 267.2 82.8 306.4 1409.7

1 98 1342.3 603.7 131.1 1473.9 2165.5

Risk of Cancer by AGe75 0250 14.250 4.672 4.600 14.000 27.100

1 98 24.565 6.083 9.800 25.900 32.900

Our logit model 2 is better and simpler due to two predictors, but still should be improved. If we try to remove outliers, Minitab might not run the model again due to quasi-complete separation issue. If we observe the descriptive statistics of our five predictors in Minitab assignment help the model and comparing country status level mean for each predictor, we can easily say that separation is better defined with three predictors Risk of Cancer by Age 75, prevalence of cancer survivor and All cancers incidence due to significant variation in mean value at country statuslevel Minitab homework solution as highlighted above If we conceptualize the logistic regression model using these three predictors only, we have following Minitab output:

**Binary Logistic Regression: Country_Stat versus All cancers , Prevalence o, Risk of Canc**

** *** WARNING * When the data are in the Response/Frequency format, the Residuals versus fits

plot is unavailable.

Method

Link function Logit

Rows used 348

Response Information

Variable Value Count

Country_Status 1 98 (Event)

0 250

Total 348

Deviance at Each Iterative Step

Step Deviance

1 224.403002

2 216.166732

3 215.796647

4 215.795436

5 215.795436

Deviance Table

Source DF AdjDevAdj Mean Chi-Square P-Value

Regression 3 197.953 65.9845 197.95 0.000

All cancers incidence (excludin 1 0.032 0.0320 0.03 0.858

Prevalence of cancer survivors 1 22.684 22.6837 22.68 0.000

Risk of Cancer by AGe 75 1 0.051 0.0512 0.05 0.821

Error 344 215.795 0.6273

Total 347 413.749

Model Summary

Deviance Deviance

R-Sq R-Sq(adj) AIC

47.84% 47.12% 223.80

Coefficients

Term Coef SE Coef VIF

Constant -3.703 0.834

All cancers incidence (excludin 0.00115 0.00645 6.19

Prevalence of cancer survivors 0.003862 0.000941 6.08

Risk of Cancer by AGe 75 -0.0224 0.0993 10.88

Odds Ratios for Continuous Predictors

Odds Ratio 95% CI

All cancers incidence (excludin1.0012 (0.9886, 1.0139)

Prevalence of cancer survivors 1.0039 (1.0020, 1.0057)

Risk of Cancer by AGe 75 0.9778 (0.8049, 1.1879)

Regression Equation

P(1) = exp(Y’)/(1 + exp(Y’))

Y’ = -3.703 + 0.00115 All cancers incidence (excludin

+ 0.003862 Prevalence of cancer survivors – 0.0224 Risk of Cancer by AGe 75

Goodness-of-Fit Tests

Test DF Chi-Square P-Value

Deviance 344 215.80 1.000

Pearson 344 380.95 0.083

Hosmer-Lemeshow 8 35.35 0.000

Observed and Expected Frequencies for Hosmer-Lemeshow Test

Event

Probability Country_Status = 1 Country_Status = 0

Group Range Observed Expected Observed Expected

1 (0.000, 0.044) 7 1.4 27 32.6

2 (0.044, 0.048) 1 1.6 34 33.4

3 (0.048, 0.056) 2 1.8 33 33.2

4 (0.056, 0.064) 0 2.1 35 32.9

5 (0.064, 0.077) 0 2.5 35 32.5

6 (0.077, 0.125) 6 3.4 28 30.6

7 (0.125, 0.296) 4 6.5 31 28.5

8 (0.296, 0.652) 13 16.3 22 18.7

9 (0.652, 0.918) 30 28.7 5 6.3

10 (0.918, 0.988) 35 33.8 0 1.2

Measures of Association

Pairs Number Percent Summary Measures Value

Concordant 21426 87.5 Somers’ D 0.75

Discordant 2981 12.2 Goodman-Kruskal Gamma 0.76

Ties 93 0.4 Kendall’s Tau-a 0.31

Total 24500 100.0

Association is between the response variable and predicted probabilities

Fits and Diagnostics for Unusual Observations

Observed

Obs Probability Fit ResidStdResid

6 0.0000 0.3605 -0.9457 -0.98 X

11 1.0000 0.0419 2.5190 2.52 R

23 1.0000 0.0783 2.2572 2.27 R

53 1.0000 0.7635 0.7347 0.75 X

63 1.0000 0.5511 1.0916 1.14 X

70 1.0000 0.7727 0.7181 0.74 X

82 0.0000 0.2091 -0.6850 -0.70 X

84 1.0000 0.1148 2.0806 2.09 R

86 1.0000 0.0500 2.4478 2.45 R

89 1.0000 0.7366 0.7820 0.80 X

94 1.0000 0.7043 0.8373 0.85 X

128 1.0000 0.8577 0.5542 0.56 X

129 1.0000 0.0397 2.5398 2.54 R

134 1.0000 0.0440 2.4996 2.51 R

139 1.0000 0.6745 0.8874 0.91 X

143 1.0000 0.1113 2.0953 2.11 R

158 0.0000 0.3095 -0.8607 -0.88 X

164 1.0000 0.0346 2.5938 2.60 R

180 0.0000 0.3398 -0.9112 -0.94 X

185 1.0000 0.0423 2.5150 2.52 R

197 1.0000 0.0808 2.2433 2.25 R

227 1.0000 0.7377 0.7799 0.80 X

237 1.0000 0.5349 1.1187 1.16 X

244 1.0000 0.7476 0.7628 0.78 X

256 0.0000 0.1969 -0.6622 -0.68 X

258 1.0000 0.1109 2.0970 2.11 R

260 1.0000 0.0519 2.4328 2.44 R

263 1.0000 0.7092 0.8290 0.85 X

302 1.0000 0.8417 0.5870 0.60 X

303 1.0000 0.0411 2.5265 2.53 R

308 1.0000 0.0448 2.4921 2.50 R

313 1.0000 0.6486 0.9304 0.95 X

317 1.0000 0.1052 2.1223 2.13 R

319 1.0000 0.8360 0.5985 0.61 X

334 1.0000 0.1356 1.9989 2.03 R

338 1.0000 0.0363 2.5752 2.58 R

R Large residual

X Unusual X

Clearly, above three predictor model is not better that our prior two predictor binary logistic model. Hence best possible fitted model is the one with two predictors.

**Tabulated Statistics: Country_Status, Predict**** **

Rows: Country_Status Columns: Predict

0 1 All

0 240 10 250

68.97 2.87 71.84

1 24 74 98

6.90 21.26 28.16

All 264 84 348

75.86 24.14 100.00

Cell Contents: Count

% of Total

Pearson Chi-Square = 196.607, DF = 1, P-Value = 0.000

Likelihood Ratio Chi-Square = 191.577, DF = 1, P-Value = 0.000

Fisher’s exact test: P-Value = 0.0000000

Cramer’s V-square 0.564962

Kappa 0.747568

Above is a classification matrix, based on which the estimated Minitab homework help probability is above or below the 0.50.

Accuracy: Overall, how often is the classifier correct?

(TP+TN)/total = (240+74)/348 = 0.9023

Specificity: When it’s actually no, how often does it predict no?

TN/actual NO = 240/250= 0.96

Precision: When it predicts yes, how often is it correct?

TP/predicted yes = 74/84 = 0.881

Prevalence: How often does the yes condition actually occur in our sample?

actual yes/total = 98/348 = 0.2816

90.2% of the events Minitab assignment help was correctly classified with 88.1% correct prediction for developed country and 96% for Developing country. Thus, the two predictors Minitab homework solution did good job of classifying the cancer data analysis using Minitab assignment help patients into the Developed country and Developing country groups with prevalence of cancer in developed country at 28.16%.