R Programming Assignment Help

The four questions asked in the given R programmings assignment help, explains the casual relationship between dependent and independent variables. The assignment is based on the impact of various parameters on a baseball player’s performance, viz : salary here. There were more than 2200 data points given and the various parameters were team, play style, league and other technical aspects of baseball like homeruns, strikeouts, sacrifice hits etc. The R programming homework help was provided by our expert using regression analysis.

For the proceeding analysis we have used the data from an Excel file with information on batting statistics of various players throughout two consecutive years. For the purpose of answering these R coding assignment help particular questions in this task, we have thought of players as individuals without any correlation to the team or league.

All of the information in the data table I was provided is showing the overall performance of batters and the distribution of their points over different ways of scoring. Some of the variables are not of the batters direct influence but are just the mistake of a pitcher. One might argue that a good player influence the pitcher in a variety of ways but my analysis will not be conducted on these factors but rather on the actual statistics provided.

Firstly, I have removed all the players’ entries where pay for R programming assignment there were no information on the salary of the player. I have also removed all entries where the players have a salary of zero. R programming project help I do not know why some players have a zero salary and I am sure they will negatively influence the overall regression results. Make my R programming assignment

These types of entries make up about a fourth of all observations and I removed them.

Afterwards, I have dealt with  all the players who changed their teams during one season. I have added all their statistics together, but averaged the salaries that they received at different teams. This way these players are represented in the same R programming problems with answers manner as all the other players. The summing and averaging was necessary so that the statistics account for all the games a player was in during the season, therefore measuring his overall performance.

In this process there was a significant loss of information since I have removed almost a fourth of all observations, but I find this was necessary in order to perform salary inspection properly. In the tasks where salary is not the important factor I have used the original table.

The R codes to all the questions is attached in a text document .

Question 1

To check in what manner is the salary connected to the overall performance of the batter, I will be using the reduced table I have created where the null and unavailable R coding homework help salaries have been removed. I have used R programming homework for money several regressions with different independent variables but the dependent variable is always the salary in all these regressions.

Model 1

In this model I have regressed the salary variable over all other variables excluding : player, year, league  and team ID, base on balls, intentional walks and hits by pitch. The ID variables were omitted for obvious reasons. The other variables not included in the model, base on balls, intentional walks and hits by pitch, were omitted due to their statistics not being a consequence of help with R programming homework a batters skills but rather the mistakes of the opposing team’s pitchers’ mistakes. These variables are the part of players statistics because they did contribute to the number of points R coding questions with answers a player and his team made.

The summary of the regression is :

Call:

lm(formula = SALARY ~ G + AB + R + H + X2B + X3B + HR + RBI +

SB + CS + SO + SH + SF + GIDP)

Residuals:

Min       1Q   Median       3Q      Max

-6553725 -3469816 -2344400  1564556 28362287

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3754946.1   246705.1  15.220<2e-16 ***

G             13209.1     7320.3   1.804   0.0714 .

AB             7745.8     8284.3   0.935   0.3499

R             66356.4    35384.2   1.875   0.0609 .

H            -74232.6    31439.0  -2.361   0.0183 *

X2B           16809.3    55569.3   0.302   0.7623

X3B           45549.9   157428.7   0.289   0.7724

HR          -197135.0    80704.9  -2.443   0.0147 *

RBI           48182.1    35249.4   1.367   0.1719

SB             7469.8    46152.3   0.162   0.8714

CS          -205368.9   131539.5  -1.561   0.1187

SO             6627.8    12463.7   0.532   0.5950

SH             -904.4    74752.1  -0.012   0.9903

SF           197626.7   140683.5   1.405   0.1603

GIDP         -61593.1    67039.9  -0.919   0.3584

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5390000 on 1547 degrees of freedom

Multiple R-squared:  0.01758,   Adjusted R-squared:  0.008685

F-statistic: 1.977 on 14 and 1547 DF,  p-value: 0.01638

As the summary suggests, there is a certain importance of the variables chosen as p-value is 0.016. This means that I can reject the null hypothesis that the salary is not dependent of all the variables chosen at 0.1 confidence level. This is a relief when compared to the previous regression.

However, the R squared value is 0.008 meaning that these do my R programming assignment  variables chosen account for only 0.8% of the variance of salary.

Therefore, my conclusion is that the players with better overall performance will get a higher salary. However, the differences in salary as too great and are surely unrealistic. Even though the performance significantly influence the salary, the differences in salary are great between two players help with R programming assignment that are approximately the same with their seasonal statistics. This means that there are some other factors influencing the players’ salaries. But since I have regressed almost all pertinent statistics the remaining variables which will account for more variance in salary are of some other nature different from abilities of players.

Question 2

To find whether there are players that specialize in certain types of players I have divided the pertinent information in two groups.

First group presents the variables which are a measure of players’ preferences for individual play. By individual play I mean that the batter is fully capable to perform high R coding assignment help standard battings that will allow him to run at least one base and maximum of three bases and a plate which is a homerun. R programming project help This type of play is a characteristic of a batter who doesn’t take chances with tactics R programming homework help and is confident of making an excellent run. Variables showing this kind of play are : Hits, Doubles, Triples and Homeruns. I have added the values of these three variables and will be working with a sum of them.

Second group presents the variables which are a measures of players’ preferences for team play. In other words, high values of these variables for a player show his tendency of a tactical team play. This type of play is usually preferred by players who are not as confident in their own batting possibilities as much as they are confident in the team’s reaction to their choice of strategy. Variables that present this R coding homework solution type of game play are : Runs Batted in, Stolen Bases, Sacrifice Hits and Sacrifice flies. I have added the values of these four variables to present them as a group of characteristics for a certain game play.

Comparing the sums of these two variables is shown below.

The sum of the first group is: 61420

The sum of the second group is: 29328

It is apparent that the first type of play is preferable. Mostly because it is much more reliable to hit the ball well and do your best. However, the sum of other variables.

which account for team’s effort is not negligible as well. The ratio of these two sums is 2.094244. This is an data analysis using R coding help indicator that the second type of play is preferred at around one third of the players.

To find the different preferences of players more precisely I have counted the instances where one sort of play is preferred to the other. In other words, I have counted all R Programming assignment help the players which have the higher sum of the first group of variables than R programming assignment help  the second and vice versa. The results are:

The number of players who prefer the first, individual type of play are: 811

The number of players who prefer the second, team type of play are: 141

This was as expected because a successful statistics using R programming homework help team does actually rely on excellent batters. However, it is clear at this point that there are different types of players.

In order to make a strict depiction of players whose top specialty is the second (less favorable) type of play, I have searched for the number of players who will have a higher score of second type of variable than the score of first type of variable multiplied by the ratio of the two. This way I looked for an extreme case of tacticians or simply bad betters as some would say.

The number of individual players whose second type of variable is larger than the first type multiplied by the general ration of the sums of all players is: 86. This number R programming assignment solution is not negligible and I have thus shown that there are some players whose sole specialty is not to be a good batter.

From the quantitative analysis above I deduce that there are players with different specialties in this data.

Question 3

To find whether the salary is a good predictor of hitting 10 or help with R programming homework more homeruns I have converted all the observations with of homeruns greater than 10 to the factor 1 and all the others to the factor 0. Then I conducted a one-way ANOVA and the results are:

Df    Sum Sq   Mean Sq F value Pr(>F)

hr             1 4.750e+11 4.750e+ 11   0.016  0.899

Residuals   1560 4.575e+16 2.933e+13

The p-value of this ANOVA is 0.89 and it is far greater than 0.1.

Therefore I can conclude that the salary does cannot serve as a predictor that a player will hit 10 or more homeruns.

This can be confirmed by running a regression on numeric variables to see whether the number of homeruns is significant at all in determining the salary:

Call:

lm(formula = SALARY ~ HR)

Residuals:

Min       1Q   Median       3Q      Max

-4093038 -3738034 -2500534  1705403 28320466

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)  4250534     154273  27.552<2e-16 ***

HR             11016      20577   0.535    0.592

Sign if. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5415000 on 1560 degrees of freedom

(1 observation deleted due to missingness)

Multiple R-squared:  0.0001837, Adjusted R-squared:  -0.0004572

F-statistic: 0.2866 on 1 and 1560 DF,  p-value: 0.5925

The p-value of this regression is greater than 0.1 and I can conclude that the number of homeruns is not significantly influenced by salary and vice versa.

Question 4

I will try to determine based on this data set which characteristics are most significant in determining the quality of a player. To limit the discussion of what is considered a quality of a player, I will state that two variables which I will be considering to be dependent statistics using R programming assignment help Runs and Home Runs are a good determinants of an excellent player. Dependent variable Runs shows the agility of a player and its ability to reach bases while dependent variable Home Runs is the best to show an excellent batter.

Therefore I have conducted a multivariate

regression of these two variables over all other R programming assignment help  variables available. Those variables which prove to be significant I will consider to be important.

The results of multivariate regression is:

Response R :

Call:

lm(formula = R ~ H + X2B + X3B + RBI + SB + CS + BB + SO + IBB +

HBP + SH + SF + GIDP)

Residuals:

Min       1Q   Median       3Q      Max

-20.1450  -0.5758   0.1059   0.3824  28.2250

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.105949   0.118981  -0.890    0.373

H            0.238401   0.011654  20.456< 2e-16 ***

X2B          0.248202   0.036642   6.774 1.78e-11 ***

X3B          0.572109   0.106170   5.389 8.20e-08 ***

RBI          0.243837   0.016225  15.028< 2e-16 ***

SB           0.357162   0.030095  11.868< 2e-16 ***

CS          -0.072861   0.089550  -0.814    0.416

BB           0.263547   0.013724  19.204< 2e-16 ***

SO          -0.009132   0.006275  -1.455    0.146

IBB         -0.332486   0.071131  -4.674 3.21e-06 ***

HBP          0.366355   0.050913   7.196 9.65e-13 ***

SH           0.063247   0.048690   1.299    0.194

SF          -0.117664   0.090739  -1.297    0.195

GIDP        -0.389364   0.043001  -9.055  < 2e-16 ***

The first regression summary shows that most of the variables are significant in determining the value of Runs. Out of significant variables I will choose the ones R programming assignment help

with the highest and lowest coefficients.

The variables that because of their low coefficient are mostly positively influential on variable Runs and are significant are :Hits and Runs Batted In.

The variables that are most significant but very badly influence the variable Runs because of their low negative coefficient are: Ground Double Plays and Sacrifice Flies.

R programming homework help Now I will look at the second regression summary and do the same analysis there:

Response HR :

Call:

lm(formula = HR ~ H + X2B + X3B + RBI + SB + CS + BB + SO + IBB +

HBP + SH + SF + GIDP)

Residuals:

Min       1Q   Median       3Q      Max

-12.5692  -0.2232   0.0861   0.2204   8.8153

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.086056   0.059101  -1.456 0.145571

H           -0.041307   0.005789  -7.136 1.47e-12 ***

X2B         -0.070507   0.018201  -3.874 0.000112 ***

X3B         -0.180421   0.052737  -3.421 0.000640 ***

RBI          0.347715   0.008060  43.143< 2e-16 ***

SB           0.008194   0.014949   0.548 0.583657

CS          -0.001684   0.044482  -0.038 0.969811

BB           0.024667   0.006817   3.619 0.000306 ***

SO           0.039233   0.003117  12.587< 2e-16 ***

IBB          0.047134   0.035332   1.334 0.182390

HBP          0.036187   0.025290   1.431 0.152662

SH          -0.212915   0.024185  -8.803  < 2e-16 ***

SF          -0.512929   0.045072 -11.380  < 2e-16 ***

GIDP        -0.156360   0.021360  -7.320 3.96e-13 ***

The variables whish most positively influence the dependent variable Homeruns and are significant are: Base on Balls and Hits by Pitch.

The variables which have the most direct and negative influence on the dependent variable Homeruns are: Hits and Doubles.

In order to properly understand which information should be pertinent to coaches one needs to employ the logic of the problem instead of simply statistics.

It is interesting to notice how pitcher mistakes R programming homework solution such as Hits by Pitch and Base on Balls  tend to positively influence the number of homeruns of a player. There surely must be a tendency of a pitcher to throw an impossible ball to a batter that has shown to be exceptional. help with R programming assignment This is a good sign how the reaction of involved people other than player can carry information on the players abilities. A coach must definitely look at this statistics to notice a possibly exceptional batter.

Ground Double plays has shown to be negatively influencing both Homeruns and Runs. This should be a good indicator to coaches as well that this type of play statistically showed a low scoring player.

R programming questions with answers Stolen base has showed to be a good positive indicator of a players ability to run, as expected and at the same time insignificant to a players ability to make a homerun.

This analysis has shown some of the important variables which might not be obvious but the statistics has proven them significant. The analysis has data analysis using R programming help also shown that positive indicators for Homeruns might be bad indicators for Runs. That is why a coach should look at the appropriate set of variables depending R programming homework help on what type of a player he is looking for.